- Removing a corrupted Canon print driver - Fri, Apr 8 2016
- VMware vSphere licensing update 2016 - No love for the little guy - Fri, Mar 25 2016
- Veeam releases free Endpoint Backup 1.5 - Fri, Mar 18 2016
What initially tipped me off that there was an issue is that while the DNS Server service was running, when attempting to access to the console it would say the DNS server wasn’t running. Further accessing any of the Active Directory management tools was exceptionally sluggish and neither the sysvol or netlogon shares were created on the new DC.
After working this myself for a while I ended up contacting Microsoft support and eventually found the issue to be one which doesn’t have a publicly accessible knowledge base article for but evidently is documented internally. In this article I’m going to outline the specifics of the issue, the commands that I found helpful in troubleshooting the issue, and finally what ultimately fixed the issue.
Problem description ^
A good starting point is probably to be able to visualize the network, so please refer to the network diagram above. In my situation all domain controllers are meshed with replication connections to each other. Field office 3 is a brand new location so a new site and subnet were setup first and then a Windows Server 2008 R2 server was spun up in that subnet. After installing the Active Directory Service role and running dcpromo, which had zero errors through the process, is when I began to see the issues described above. Further inspection showed that no site connectors were created on the server in AD Sites and Services. The following errors also showed up repeatedly in the event log:
- ActiveDirectory_DomainService 1865 “The Knowledge Consistency Checker (KCC) was unable to form a complete spanning tree network topology. As a result, the following list of sites cannot be reached from the local site.”
- ActiveDirectory_Domain_Service 1311 “The Knowledge Consistency Checker (KCC) has detected problems with the following directory partition.”
- ActiveDirectory_Domain_Service 1566 “All directory servers in the following site that can replicate the directory partition over this transport are currently unavailable.”
- DNS-Server-Service 4013 “The DNS server is waiting for Active Directory Domain Services (AD DS) to signal that the initial synchronization of the directory has been completed.”
- Above and beyond these issues using the portqry.exe tool I was able to figure out that the server was not listening on any of the relevant domain controller ports, TCP 137-139 or UDP port 53.
Problem troubleshooting ^
Once the problem was as fully defined as possible, both by myself and Microsoft support engineers, the troubleshooting process began. Before contacting support I took the generic step of trying the process of demoting and then re-promoting the domain controller again with no noticeable effect. After contacting support I was honestly surprised that this seemed to be a staple of troubleshooting for them as well, because at each tier of support that I worked up this process was done again; in total domain dontroller promotion was performed on this 4 times. Once we got past that provided quite a bit more information. These include:
- Repadmin /syncall /AdePq Performs a synchronization for a server with all of its replication partners, the modifiers help in performing the sync in a multisite environment
- Repadmin /replsum Summarizes the state of replication of the forest
- Repadmin /kcc * Forces a recalculation of the topology, has the effect of rebuilding the automatically created partner connections in Sites and Services
- Dcdiag /test:Connectivity dcdiag over all is great, but using the /test modifier you are able to run only specific tests as needed
Problem solution ^
As stated in the introduction, the problem here ended up being one known within the Directory Services support group, but as far as they or I know not documented publicly anywhere. Essentially if you bring up a domain controller in a site without a fully replicated domain controller already in it replication will continuously fail, but as soon as the domain controller is logically put into a site with a “good” domain controller it will replicate. So in this case it was as simple as going into AD Sites and Services, choosing move on the domain controller with the issue and putting it in a different site.
Once that’s done I again ran the repadmin /kcc * to create the correct site connections followed by repadmin /syncall. After replication finished I noticed that the local DNS server was functioning correctly and that the sysvol and netlogon shares had been created on the server. Next I ran the repadmin /replsum command again and saw that successful replication had occurred. Finally I was able to logically move the server back to the correct site and everything functioned normally.