An Interesting DFSR Change (that probably everyone knew about but me)

First, wooo I hit 5K on ServerFault today.

I'm embarrassed to say that something I read about recently but didn't pay enough attention to at the time officially just bit me in the butt.  

A significant change occurred in January 2012 in the way that DFS Replication behaves.  Windows Server 2008 R2 SP1 post KB2663685 and Windows Server 2012 have changed the default behavior of DFSR.  Auto-recovery of DFSR replicated folders after unexpected shutdown is now disabled.  In other words, if a computer that hosts a DFSR replicated folder experiences an unexpected shutdown, DFSR will not automatically resume upon reboot.  (This includes Sysvol!

On older versions of Windows, DFSR auto-recovery was enabled.  I'm sure the reason for this change involves auto-recovery leading to unexpected rollbacks and unauthoritative conflict resolutions between replication partners, especially in wide-spread domains with high end-to-end replication latency and frequent changes... but even though the news was published, I for one didn’t pay enough attention to it and it has a very real effect on the way we manage our Windows systems that utilize DFSR going forward. 

So what if a domain controller or a file server with a DFSR share on it running 2008R2 or 2012 crashes unexpectedly, leaving the DFS database and the NTFS USN journal out of sync?  Then Sysvol no longer receives updates on that DC.  The DFSR file share no longer receives updates on that file server.  It's up to you to manually restart replication, and to resolve any conflicts with replication partners if changes took place during the time that the crashed server wasn't replicating. 

Luckily that is easy to do, and it's also possible to set the behavior back to auto-recovery if that is what you wish. 

How will I know if this effects my server?

While this first example is just a symptom of the problem, here is how it first came to my attention, triggering the investigation:

(Click on images for a better view.)

Errors in my event log

Application of Group Policy was failing, but only on DC02 and servers which were using DC02 as a domain controller.  Not DC01 or any server logged on by DC01.  As it turns out, the GPO referenced by that error event, a new GPO that I had just created on DC01, didn’t exist on DC02, hence the errors.  Sysvol did not seem to be replicating anymore.

Here is the actual event log event to let you know that DFS Replication has stopped on one or more volumes:

DFSR Error

Luckily, starting replication back up again is easy and the command to do it with your actual GUID, is right there in the event:

wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="12345678-ABCD-1234-EFGH-1A2B3C4E5F" call ResumeReplication

 You can also turn auto-recovery back on with wmic or by modifying the registry if you don’t have the time to be bothered by this:

HKLM\System\CurrentControlSet\Services\DFSR\Parameters\StopReplicationOnAutoRecovery = 0

Just be aware that auto-recovery can lead to unwanted rollbacks of DFSR data in some circumstances.

Comments (2) -

We have a very busy dfsr hub server that results in a dirty dfsr shutdown during every restart.  By default, Windows 2012 allows a service 30 seconds (the WaittoKillService value) to shutdown gracefully before it is terminated to proceed with the shutdown.  Unfortunately the dfsr service is taking longer than this to commit all its database changes.  Windows forces it to terminate and causes the 2213.  The remedy is to change the WaittoKillService value to something longer than 30 seconds.  The longest you can set it is an hour...which should be sufficient.  The setting applies to all services.  Here's the article:


Fantastic info.  Thanks so much for taking the time to comment!

Pingbacks and trackbacks (1)+

Comments are closed