The information provided in event logs is often not too clear but it has definitely gotten better starting in W2K8. I recently encountered an issue where replication delays to certain DC were reported. I immediately looked at the repadmin replication summary and noticed that my deltas that usually stayed around within an hour had jumped up to 13~ hours.
The ‘destination DSA’ section gives you a clue about the DC that’s having issues pulling replicated changes in. I looked at the event logs on the said DC and filtered the DS logs around to 13 hours ago. I noticed event ID 1393, and 1480 sighting the issue with the low disk space and how it had paused the netlogon service. Luckily there was no “user authentication traffic” against this DC as it was a hub site DC with no user subnet tied to this site, otherwise the impact would have been bigger with users not being able to logon to their workstations. The lag on the replication on this instance was reported by an internal application portal that was not reflecting the changes that were made in AD. Nonetheless, it was an issue that had to be fixed.
A low disk space issue on a DC is serious issue anyway but in this instance, a FIM job had just run that had imported thousands of new objects incurring a slight NTDS size change. This is one of the reasons I don’t like to put NTDS and SYSVOL on the system partition. I did perform the clean up of some unneeded and temp files but the long term solution is to relocate the DB to a different drive/volume, http://technet.microsoft.com/en-us/library/cc782948%28WS.10%29.aspx.
Immediately after the replication was back to normal.