Serge Chegorian's System Center Blog

Serge Chegorian's System Center Blog

SCCM 2012 R2: Setup is unable to connect to SQL Server

May 28th, 2017

This error can be very painful specifically when you’ve already checked everything:

  1. The SQL Server and instance names are entered correctly
  2. The specified SQL Server instance is not configured to use dynamic ports
  3. If a firewall is enabled on the SQL Server, inbound rules exist to allow connections to the correct ports
  4. The account used to run Setup has permissions to connect to the specified SQL server instance
  5. TCP and named pipes are enabled
  6. ODBC test succeeds.

There is one more trick. Run SQL Server Client Networking Utility (cliconfg.exe), go to Aliases tab, delete all aliases for your database name, try to connect again.

[SCCM]: How to fix SQL collation on your SCCM dedicated SQL cluster

May 17th, 2017

Sometimes in a corporate enviroment someone esle might set up SQL cluster for your SCCM and overlook the collation setting wich must be SQL_Latin1_General_CP1_CI_AS for SCCM. This is a workaround for this case assuming there is currently no database on your SQL instance.

  1. Check server General settings and Master database General settings and confirm that collaction is wrong
  2. Go to the active cluster node
  3. Bring down SQL Server resource (Server and Agent only)
  4. Start cmd.exe as Administrator
  5. Go to SQL binn folder (you’ll find it from the SQL service start string)
  6. Run command:
    sqlservr -m -T4022 -T3659 -s"<instance name>" -q"SQL_Latin1_General_CP1_CI_AS"

There will be a bunch of information messages and it should run for 2-8 minutes. After that it will display notification that it has successfully finished but it won’t exit the application. Wait for another minute, close cmd window, bring the resource up and check collation on the server and Master database.

[SCCM 2012 R2]: multiple SMS__SMS_SQL_SERVERXXX folders are created on remote SCCM SQL

April 27th, 2017

Sometimes when you have a dedicated SCCM Database server or SQL cluster you may notice that SMS_<FQDN>_SMS_SQL_SERVERXXX folder is created on the C: drive of the SQL server or a cluster every 3 minutes where FQDN is the name of your SCCM site server. This happens because SCCM Site Component Manager is not flagged that SMS_SITE_SQL_BACKUP_<FQDN> service is installed so SMS_SERVER_BOOTSTRAP_<FQDN>_SMS_SQL_SERVER creates setup folder for SMS_SITE_BACKUP… over and over again

Solution

Go to the site server SMS_<Site Code>\inboxes\certmgr.box and check for out-dated CMN files. Delete them.

[SCCM 2012 R2]: Troubleshooting database replications and service broker issue

December 24th, 2016

Last week I dealt with a very interesting and unusual SCCM failure. It has started with a link failure error between CAS and one of the primary sites. When I ran Replication Link Analyser the first error message was “SQL Server Broker login is missing for sites: <my primary site code>”. After that RLA informed me that the login is recreated but in fact it was not and the issue was still there. I was also unable to find any information on how to recreate SQL Server Broker or at least what it is.

After more rigorous search I have found the following SQL command which shows you SQL replication status in real time.

Use CM_CAS; Select * from sys.transmission_queue

The content of that table should change dynamically. In my case there was a bunch of stalled messages with ConfigMgr_Site<My Primary Site Code (PSC)> in to_service_name column and “Connection attempt failed with error: 10060” in transmission_status column. That gave me clear indication that the Service Broker transmission is broken between my CAS and PSS.

Note: when the transmission is resumed SQL should clear up stuck messages however sometimes you might need to clear them up yourself using update sys.transmission_queue Also please note that any intervention to the SCCM database is not supported by Microsoft.

After several telnet tests I have figured out that Service Broker is not responding or listening on PSS server database.

In our environment all SQL servers are shared hosts so all Service Brokers are using private ports. To identify the port used by Service Broker run the following SQL script on your SQL instance

Use CM_CAS select port from tcp_endpoints where type_desc like ‘%SERVICE_BROKER%’

Note that there could be only one Service Broker endpoint per database.

I have executed the query above and the result was nil. That gave me an understanding that somehow my Service Broker was deleted on the database.

At that stage I was about to give up. There is a script which creates Service Broker endpoint but I know that SCCM secures all internal communications with certificates had no idea which certificate to use. I’ve been thinking to either call Microsoft or reinstall the site (including several role servers) but fortunately I have found the required script on Internet.

CREATE ENDPOINT [ConfigMgrEndpoint]
STATE=STARTED
AS TCP (LISTENER_PORT = <my port>, LISTENER_IP = ALL)
FOR SERVICE_BROKER (MESSAGE_FORWARDING = ENABLED
,
MESSAGE_FORWARD_SIZE = 5
, AUTHENTICATION = CERTIFICATE
[ConfigMgrEndpointCert]
, ENCRYPTION = REQUIRED ALGORITHM AES)
GO

All good but how would I know what private port was used by my missing Service Broker? In SQL Management Studio go to CM_CAS\Service Broker\Routes\ConfigMgrDRSSiteRoute_<your PSC>, click on properties and in properties check for Address which would look like TCP://<your PSS FQDN>:<port>.

Once I’ve executed the SQL script above my telnet test has succeeded. I ran RLA again and it gave me “SQL Server Broker login is missing for sites: <my primary site code>” error again but this time it’s succeeded in fixing this issue and this error has not re-appear again.

I thought now it’s just a matter of time but in several hours I still saw no activity in rcmctrl.log. However all error messages from sys.transmission_queue have gone.

So I had another look at the link status, specifically at Initialization Detail tab. It is very important to look at it from both sides, i.e. both CAS and PSS. On PSS side I’ve noticed that one replication group has stuck at 1% replicating up to CAS.

There is a way to reset replication group. You have to create <replication group name>.pub file and place it to the rcm.box inbox. This file should disappear in 5-10 seconds. If it does not disappear at all it clearly indicates that the issue is on another end. Delete it and try from another side.

Once I’ve dropped the PUB file to the rcm.box it has pulled the plug. I have started to see replication activity in rcmctrl.log and file exchange in rcm.box. The issue has gone in hour and a half.

Several important things to remember when you have SCCM 2012 replication issue:

  • Start your troubleshooting with RLA.
  • If the primary site sits in link failure state for substantial amount of time SCCM puts the primary site in read only mode and the link in the maintenance state
  • If the issue is not fixed SCCM will also put CAS database in maintenance mode, consequently the rest of links will fail.
  • Check sys.transmission_queue for stuck transactions. The content of this table must rapidly change.
  • Check rcmctrl.log for any activity.
  • Identify your Service Broker ports and run telnet connectivity tests.
  • Check CM_<site code>\Service Broker\Queues if any queue is down.
  • The easiest way to reset replication group replication is to drop <replication group name>.pub file to the rcm.box inbox. Note that the PUB file name should be <replication group name>.pub on PSS, on CAS it should be <replication group name>-<primary site code>.pub. The file should disappear in 5-10 seconds. If it does not, the issue is on another end. Delete the file and try on another end.

SCCM Keeps Processing Package

December 4th, 2015

Sometimes you may see several hundred thousand informational messages produced by the child site distribution manager. The messages look like this:

SMS Distribution Manager successfully processed package “Java 7 Update 91” (package ID = CAS000D3).
SMS Distribution Manager is beginning to process package “Java 7 Update 91” (package ID = CAS000D3).
SMS Distribution Manager successfully processed package “Java 7 Update 91” (package ID = CAS000D3).
SMS Distribution Manager is beginning to process package “Java 7 Update 91” (package ID = CAS000D3).
SMS Distribution Manager successfully processed package “Java 7 Update 91” (package ID = CAS000D3).
SMS Distribution Manager is beginning to process package “Java 7 Update 91” (package ID = CAS000D3).

This package looping may affect both existing and non-existing packages.

Workaround:

On the affected server go to inboxes\distrmgr.box folder.

Select and delete CAS000D3.PKG and CAS000D3.PKN files

Connect to CAS DB and run the following query:

SELECT * FROM PkgServers where NALPath like ‘%<affected server name>%’ and PkgID = ‘CAS000D3′

If this query returns any result redistribute CAS000D3 package.

 

SCCM 2012 Collection Evaluation: Maximum recursion 100 has been exhausted

August 11th, 2015

Sometimes you may see the following error message in SCCM colleval.log file:

Maximum recursion 100 has been exhausted

This typically happens if for one of the collections the collection’s Limit to collection parameter is set to itself. Also if this has happened you might not be able to manage, update or delete the collection in trouble.

In order to fix it you have to go the site database via SQL Management Studio and execute the following command:

UPDATE Collections_G SET LimitToCollection =’XYZ00001′ WHERE SiteID=’XYZ00056′

where XYZ00056 is your collection in trouble and XYZ00001 is any valid collection.

Serge Chegorian's System Center Blog

Serge Chegorian's System Center Blog