Serge Chegorian's System Center Blog

Serge Chegorian's System Center Blog

[MSI]: Giving up on adding redundancy to the home PC hard drive

December 31st, 2018

After several weeks of hard work I have finally given up on building RAID1 array on my new home PC.

Previously I had a fairly old but still good PC built on MSI X58 chipset and i7 CPU. The motherboard had an inbuilt RAID controller so I have built RAID1 and forgotten about it completely. As with any normal old-school RAID1 I was able to swap HDD’s, replace them and even clone them by building the mirror, breaking it and replacing the drive.

Everything has changed when I have built a new gaming machine based on MSI B350 motherboard. This motherboard comes with onboard RAID too, however this RAID uses some closed proprietary standard and the “raided” disk attached to another system looks like a disk with a single partition and zero volumes. It is absolutely unreadable unless you connect it back to MSI RAID. There is no “break mirror” option either. More to that, when I had a faulty SATA cable and one disk became defunct, after restoring the functionality MSI RAID was unable to rebuild the mirror. I spent 3 weeks communicating with MSI support and their verdict was: make a full backup, clean up the RAID, rebuild it, rebuild and restore your machine. Thank you very much, MSI, but where is my data protection?

I went with software driven RAID then and used this workaround provided by Microsoft. This workaround works, however after each reboot Microsoft Mirror set fails and needs to be resynchronised. Turns out this is a known problem(!!!), which persists since Windows 2008 Server era and is still not fixed in Server 2019 or Windows 10.

Verdict: there is no way to provide data protection to home PC using software RAID. Modern MSI RAID should not be used either because it is a big problem, not a solution. Maybe RAIDs from other vendors are better. Keep your system simple and rely only on regular full backups to the second drive or external media.

[SCCM 2012 R2]: Troubleshooting database replications and service broker issue

December 24th, 2016

Last week I dealt with a very interesting and unusual SCCM failure. It has started with a link failure error between CAS and one of the primary sites. When I ran Replication Link Analyser the first error message was “SQL Server Broker login is missing for sites: <my primary site code>”. After that RLA informed me that the login is recreated but in fact it was not and the issue was still there. I was also unable to find any information on how to recreate SQL Server Broker or at least what it is.

After more rigorous search I have found the following SQL command which shows you SQL replication status in real time.

Use CM_CAS; Select * from sys.transmission_queue

The content of that table should change dynamically. In my case there was a bunch of stalled messages with ConfigMgr_Site<My Primary Site Code (PSC)> in to_service_name column and “Connection attempt failed with error: 10060” in transmission_status column. That gave me clear indication that the Service Broker transmission is broken between my CAS and PSS.

Note: when the transmission is resumed SQL should clear up stuck messages however sometimes you might need to clear them up yourself using update sys.transmission_queue Also please note that any intervention to the SCCM database is not supported by Microsoft.

After several telnet tests I have figured out that Service Broker is not responding or listening on PSS server database.

In our environment all SQL servers are shared hosts so all Service Brokers are using private ports. To identify the port used by Service Broker run the following SQL script on your SQL instance

Use CM_CAS select port from tcp_endpoints where type_desc like ‘%SERVICE_BROKER%’

Note that there could be only one Service Broker endpoint per database.

I have executed the query above and the result was nil. That gave me an understanding that somehow my Service Broker was deleted on the database.

At that stage I was about to give up. There is a script which creates Service Broker endpoint but I know that SCCM secures all internal communications with certificates had no idea which certificate to use. I’ve been thinking to either call Microsoft or reinstall the site (including several role servers) but fortunately I have found the required script on Internet.

CREATE ENDPOINT [ConfigMgrEndpoint]
STATE=STARTED
AS TCP (LISTENER_PORT = <my port>, LISTENER_IP = ALL)
FOR SERVICE_BROKER (MESSAGE_FORWARDING = ENABLED
,
MESSAGE_FORWARD_SIZE = 5
, AUTHENTICATION = CERTIFICATE
[ConfigMgrEndpointCert]
, ENCRYPTION = REQUIRED ALGORITHM AES)
GO

All good but how would I know what private port was used by my missing Service Broker? In SQL Management Studio go to CM_CAS\Service Broker\Routes\ConfigMgrDRSSiteRoute_<your PSC>, click on properties and in properties check for Address which would look like TCP://<your PSS FQDN>:<port>.

Once I’ve executed the SQL script above my telnet test has succeeded. I ran RLA again and it gave me “SQL Server Broker login is missing for sites: <my primary site code>” error again but this time it’s succeeded in fixing this issue and this error has not re-appear again.

I thought now it’s just a matter of time but in several hours I still saw no activity in rcmctrl.log. However all error messages from sys.transmission_queue have gone.

So I had another look at the link status, specifically at Initialization Detail tab. It is very important to look at it from both sides, i.e. both CAS and PSS. On PSS side I’ve noticed that one replication group has stuck at 1% replicating up to CAS.

There is a way to reset replication group. You have to create <replication group name>.pub file and place it to the rcm.box inbox. This file should disappear in 5-10 seconds. If it does not disappear at all it clearly indicates that the issue is on another end. Delete it and try from another side.

Once I’ve dropped the PUB file to the rcm.box it has pulled the plug. I have started to see replication activity in rcmctrl.log and file exchange in rcm.box. The issue has gone in hour and a half.

Several important things to remember when you have SCCM 2012 replication issue:

  • Start your troubleshooting with RLA.
  • If the primary site sits in link failure state for substantial amount of time SCCM puts the primary site in read only mode and the link in the maintenance state
  • If the issue is not fixed SCCM will also put CAS database in maintenance mode, consequently the rest of links will fail.
  • Check sys.transmission_queue for stuck transactions. The content of this table must rapidly change.
  • Check rcmctrl.log for any activity.
  • Identify your Service Broker ports and run telnet connectivity tests.
  • Check CM_<site code>\Service Broker\Queues if any queue is down.
  • The easiest way to reset replication group replication is to drop <replication group name>.pub file to the rcm.box inbox. Note that the PUB file name should be <replication group name>.pub on PSS, on CAS it should be <replication group name>-<primary site code>.pub. The file should disappear in 5-10 seconds. If it does not, the issue is on another end. Delete the file and try on another end.

Nokia 1020 – a good gadget but mediocre as a phone

April 25th, 2014

This is a sort of disappointment about Nokia Lumia 1020. So much of PDA and so few of a mobile phone which it supposes to be.

I will be a bit more specific.

1) Ringer and music controls are not independent. Yes, that’s right, they are not. Even iPhone have them independent but for whatever reason not a new Nokia.

2) Ringer profiles have gone and vibration is turned off and on independently. Forget about ‘Silent’, ‘Meeting’ and ‘Outdoor’ settings. Now you have to go to several places and switch several controls to have your desired setting.

3) Alarm does not work when the phone is switched off! That’s a shame.

Overall experience – it’s a great gadget but it does not have a mobile phone functionalities therefore it is a mediocre phone.

The blog is back

July 13th, 2013

Now after several weeks of silence my blog is back online, unfortunately with a new host name. My old an well-known domain name chegorian.com, which I registered 15 years ago has been hijacked and stolen. It is currently “owned” by the company called Aplus.Net. I have raised several complaints with ICANN but I do not really believe I will ever reclaim this domain name again. However I will keep my old logo on the top of the page.

Anyway I had to start this site from scratch. So be it…

Serge Chegorian's System Center Blog

Serge Chegorian's System Center Blog