Simple Repair of AD

 <  Day Day Up  >  

The restoration of data in AD really has a lot of flexibility and the method you choose depends on what has been lost or corrupted. In the simplest, and most common case, a single DC or GC server needs to be restored. This could be caused by

  • Hardware failure causing a crash.

  • Software (including driver) failure causing a crash.

  • Network connectivity failure that, if unnoticed, could cause the DC to fail to replicate for a time period greater than the tombstonelifetime value, requiring a rebuild.

  • DC forcefully retired from the network. Surprisingly, this is probably the number 1 cause of DC failures that I have seen. Usually a result of administrative failures, DCs are simply unplugged and reused or retired without gracefully demoting them.

When a DC becomes unavailable, the decision of how to restore it depends on whether (and when) it will come back online, whether the failure is a hardware or software issue that is not AD- related , or whether the failure is due to an AD failure, such as broken replication.

If the failure is a hardware issue, Windows Server 2003, like Windows 2000, provides Safe Mode startup and the Recovery Console, which allows a low-level boot, permitting the Administrator to replace drivers, make Registry changes, or make other repairs . The Recovery Console is an option available in Windows Server 2003 Setup that can be initialized by following these steps:

1. Boot from a Windows Server 2003 CD.

2. During the text mode portion of setup, you are given an option to Repair or Recover an existing installation. Select R (to repair).

3. When prompted, choose the partition with the Windows Server 2003 installation on it.

4. You are prompted to enter the password for the local Administrator account.

5. This boots to a command-line prompt allowing you to use some basic commands to perform repair operations, including installing or removing drivers, making Registry changes, and other modifications. Type HELP at the command prompt for online help and a list of valid commands to use.

6. After this has been done, a Recovery Console option appears in the boot menu.

The next section describes how to recover DCs and GCs if AD problems exist.

Repairing and Restoring DCs and GCs

One of the things we have learned supporting Windows 2000 ”that Microsoft actually perpetuated ”was to use demotion of a DC as a common troubleshooting technique. I know it sounds kind of lazy, but if you have made a reasonable attempt to fix a DC that is having trouble with replication, AD database corruption, and so on, and if the problem seems isolated to one DC, the easiest and most effective repair is to demote the DC and repromote it back into the domain. The only data lost would be objects created on that DC or GC that had not replicated to any others. Those objects would have to be re-created or restored from media if they had been backed up. Obviously, if the problem is rampant, you need to spend some time and find the problem, but if the problem is a missing object, a corrupt ntds.dit file, or a missing service principle name (and a reasonable effort to repair it has failed), just repromote it. I've seen Administrators spend days on a problem when a demote-repromote action in a few hours would have solved the problem.

This rule applies to GCs as well, but GCs typically hold considerably more data and have to sync with other GCs and might take a much longer time to replicate. Demoting a GC will have other repercussions if Exchange is deployed because the Global Address List (GAL) is held in the GC. Exchange clients will find another GC to get the GAL from, but performance will be reduced if the local GAL is removed and they have to connect to a remote GC.

The strategy of recovering DCs and GC servers depends upon whether other DCs exist in the domain, the recovery of a GC when other GCs exist in the forest, and the recovery of a DC or GC when no others exist in the domain.

Nonauthoritative Restore

A nonauthoritative restore is simply the restoring of a DC from backup. When the DC completes the restore process, it will get updates from its peer DCs via normal AD replication. The process to perform this restore operation is as follows :

1. Boot the DC to be restored into Directory Services Restore Mode (DSRM). Press the F8 key on startup and then select Directory Services Restore Mode from the menu.

2. Log on as the local Administrator (the account you set a password for when you ran DCPromo on this machine to install AD).

3. This boots the DC into Safe Mode without AD components being active, and without any changes made to the ntds.dit database.

4. Run the Windows Server 2003 Backup utility and select the Restore tab.

note

In the Restore process, you must also go into the Advanced options and make sure you are restoring junction points. If you do not do this, the restore process will fail. Note that the Restore Junction Points . . . option is enabled by default, but it's a good idea to check it anyway and make sure.

5. In the left pane of the Backup Utility, expand the File icon to locate the backup file and the system state under it. Check the box for System State.

6. In the Restore Files To drop-down list, select Original Location.

7. Click the Start Restore button.

8. Click OK to accept the warning about restoring system state overwriting current system state (this is what you want, to replace the existing with the copy that is in the backup media).

9. In the Confirm Restore dialog box, click Advanced.

10. In the Advanced Restore Options dialog box, make sure the following are checked:

  • Restore Security

  • Restore Junction Points

  • Preserve Existing Volume Mount Points

11. Click OK.

12. Click OK to confirm restore.

13. When the restore is complete, click Yes to Restart Computer.

After rebooting, the system will replicate with its replication partner to get changes made since the backup that was just used to restore the system state.

note

A nonauthoritative restore of AD automatically executes a nonauthoritative restore of SYSVOL; therefore, no additional steps are required. SYSVOL is included in the system state backup. Restoration of SYSVOL using the Install From Media feature requires certain precautions noted in the Microsoft KB article 311078, "Install from Media," to promote replica Windows Server 2003 DCs. Recovery using the Install From Media features is covered later in this chapter as well.


Effects of Tombstonelifetime

As explained in Chapter 5, "Active Directory Logical Design," tombstonelifetime is a forest setting that defines how long a deleted object remains in the deleted objects container before it is purged from the AD. Within this timeframe, the deleted object (tombstone) is replicated to other DCs to inform them of the deletion. After the tombstonelifetime has expired , the Garbage collector removes the deleted objects from the AD. The valid lifetime of any backup media to restore AD, is equal to that of the TombstoneLifetime parameter, which is, by default, 60 days. Chapter 5 explained that if a DC or GC came back online after 60 days, it might contain objects that had deleted in the meantime and now been purged from the AD. It then tries to replicate them again since its replication partners wouldn't have them in their copy of the AD. This causes orphaned objects to be propagated in the AD, which then breaks AD replication. Windows Server 2003 and Windows 2000 SP3+ provides ways to prevent and repair this situation.

However, restoring a DC or GC from a backup media that is more than tombstonelifetime days old, will have the same effect ”causing those old purged objects to be replicated. It is not recommended to change this value. That said, I have seen statements by Microsoft advising a Tombstone Lifetime recommendation of 120 days to mitigate the effects of a short lifetime.

Disaster Recovery of AD on Different Hardware

It is possible to recover AD onto servers that were not the same hardware configuration as the hardware that the backup was performed on. Obviously, the closer you can get to the same hardware, the easier the restore will be, but it's possible to do it. This procedure is intended for an off-site recovery plan. Many organizations contract with a company that specializes in data storage and recovery, who in turn stores a copy of all backups in an off-site facility. In the event a disaster wipes out all of the DCs, this vendor could take the backup tapes, obtain new hardware, and restore the backups to the new hardware, which might be different that the original hardware that hosted the DC.

Microsoft has published an article on how to do this in KB article 263532, "How to perform a disaster recovery restoration of Active Directory on a computer with a different hardware configuration," which contains step-by-step instructions. At the writing of this book, this KB article specifies Windows 2000, and there are no specific instructions for Windows Server 2003, so I'd advise you to watch Microsoft's site for updates. In selecting the hardware, the new server must meet the following criteria:

  • Have a complete backup of system state and system drive. The delta between the current date and the date of the backup must be less than the TombstoneLifetime value.

  • Have the same Hardware Abstraction Layer (HAL), kernel, and number of processors.

  • Remove teaming network cards (you can re-enable teaming after recovery is complete).

  • Same disk drive controller and configuration.

  • Have the same number of physical disk drives and drive letters .

  • Have the same version of the OS installed (service packs , and so on). When installing the OS on the new server, specify the same drive and folder name as the system drive. For instance if the old DC was installed to D:\Windows , then the new server should be installed to D:\Windows for the system drive.

  • Have the same video bus. Don't use a system that has a different video bus than the original (that is, AGP versus Peripheral Component Interconnect [PCI]).

  • If there are multiple network adapters in the new machine and a single one in the original, disable the extra Network Interface Cards (NICs) until the recovery is complete.

Note that you can always upgrade the hardware (add processors, disks, and so on) after you complete the recovery.

Manual Demotion of a DC/GC

One of the problems in restoring a single DC by demoting and repromoting via DCPromo is that DCPromo can fail. For instance, one of the most frequent reasons to repromote a DC is failure of replication. If replication is broken, DCPromo won't be able to replicate the changes (to remove this DC's objects from the AD on its partners), so you are stuck. Not long after Windows 2000 was released, Microsoft came up with a method to manually demote a DC. Although complex, the method worked. Microsoft never published the method because it hadn't been fully tested , but Microsoft would step you through the procedure if you logged a case and a manual demotion was required to solve the problem.

warning

Manual Demotion of a DC removes the DC from the domain (deletes the computer object) and leaves it as a standalone server in a workgroup. If a DC is also an Exchange server, manually demoting the machine requires resetting security if you manually demote it. See the "Caveats" section (coming up) for details on how to fix this. Likewise, any application relying on domain security could be affected by manual demotion. Be sure to determine if you have apps that fall into this category and determine the necessary recovery procedures.


Manually Demoting a DC

Windows Server 2003 and Windows 2000 SP4 and later provided an official way to force a DC to be demoted to a Standalone Server:

1. At a command prompt, enter the command:

 Dcpromo /forceremoval 

2. A pop-up message warns about the consequences of continuing (see Figure 11.2). Click OK.

Figure 11.2. The Active Directory Installation Wizard warns that you are removing the DC from the domain without updating forest metadata.


3. When the Active Directory Installation Wizard finishes, you will be prompted to reboot. Accept it and reboot the machine.

4. Go to another DC in the domain, login as an Enterprise Admin, and remove the metadata from AD using the process described in Microsoft KB article 216498: "HOW TO: Remove Data in Active Directory After an Unsuccessful Domain Controller Demotion."

5. Perform metadata cleanup using NTDSUtil.exe and delete the computer object for the DC that was forcefully demoted. This sometimes, but not always, deletes the object seen in the Active Directory Users and Computers snap-in. Delete the object from the snap-in if necessary.

6. Go to Active Directory Sites and Services, open up the site where the DC that was forcefully demoted exists, and delete that DC's server object.

7. In Active Directory Users and Computers, go to the View menu in the snap-in and select Advanced Features. This will expose the System container in the left pane of the snap-in.

8. Expand the System container.

9. Expand the File Replication Service container

10. Expand the Domain System Volume (SYSVOL share). Under this object are objects for each server. These are the File Replication Service (FRS) member OBJECTS. If there is an object for the DC that you are forcefully demoting, delete it.

11. On the DC that was forcefully demoted, make sure there is no %windir%\sysvol directory. The DCPromo /forceremoval command should have deleted SYSVOL, but check anyway. If it's there, delete it.

This procedure deletes the domain computer account and puts the server in a Workgroup, unjoined from the domain.

To rebuild this standalone server back to become a DC with the same name, perform the following steps:

1. After completing all the steps just listed to forcefully demote the DC, wait for a sufficient time to replicate that information end-to-end in the forest. Hint: In Windows Server 2003's Repadmin command, you can find the approximate latency between sites with the command REPADMIN /REPLSUM /BYSRC /BYDEST /SORT:DELTA . A sample output is shown here, as a result of this command being executed in HP's Qtest test environment.

 Replication Summary Start Time: 2004-02-17 19:38:52 Beginning data collection for replication summary, this may take awhile:   ............................. Source DC           largest delta  fails/total  %%  error  HPQAM-DC3         11d.06h:19m:22s    3 /   5   60  (1722) The RPC server is unavailable.  HPQEU-DC4         08d.12h:34m:06s    3 /   3  100  (1722) The RPC server is unavailable.  HPQNET-DC2        05d.11h:36m:26s    3 /   5   60  (1722) The RPC server is unavailable.  HPQEU-DC26        04d.17h:53m:00s    6 /   6  100  (1753) There are no more endpoints available from the endpoint mapper.  HPQEU-DC9             01h:07m:38s    2 /  15   13  (1722) The RPC server is unavailable.  HPQAM-DC2                 48m:50s    0 /  24    0  HPQNET-DC1                48m:49s    0 /   8    0  HPQNET-DC4                48m:35s    0 /  20    0  HPQEU-DC19                48m:15s    0 /  11    0  HPQNET-DC9                47m:57s    0 /  34    0  HPQEU-DC1                 47m:57s    0 /  34    0  HPQNET-DC5                47m:35s    0 /  10    0  HPQEU-DC13                46m:50s    0 /   9    0  HPQEU-DC18                45m:46s    0 /   6    0  HPQAM-DC1                 45m:30s    0 /   9    0  HPQNET-DC3                45m:09s    0 /  27    0  HPQEU-DC7                 42m:56s    0 /   6    0  HPQAP-DC1                 10m:21s    0 /  19    0  HPQAP-DC2                 07m:19s    0 /   5    0  HPQEU-DC3                 01m:49s    0 /   5    0  HPQEU-DC16                01m:47s    0 /   3    0  HPQEU-DC25                01m:26s    0 /   3    0  HPQEU-DC28                01m:12s    0 /   3    0  HPQEU-DC14                01m:03s    0 /   3    0  HPQAM-DC13                     0s    0 /   5    0  HPQAM-DC4                      0s    0 /   5    0 

The Largest Delta value will be less than or equal to the replication frequency defined for the site link that applies to the site that the DC is in. In this example, several DCs at the top of the list are logging 1,722 events with Largest Delta times of several days. The %% column indicates the percentage of outstanding changes that have not been replicated to that machine. In a perfect environment, all DCs will have a zero (0) in the %% column and a Delta Time less than or equal to the site's replication frequency. If we performed a manual demotion in the environment with the information shown in Figure 11.3, several DCs would not get the information to delete the metadata for the demoted DC (or any other changes in the AD for that matter) until they clear the error.

Figure 11.3. Modify the rights on the Exchange server object so that the local machine account has sufficient rights using the ADSIedit tool.

2. After end-to-end replication has occurred, you can run DCPromo on the machine that was forcefully demoted, using the same name.

note

If you repromote the DC using a different computer name, you can promote the DC back into the domain without waiting for end-to-end replication. Also if you try to promote a machine back into the domain with the same DC name before replication completes, it will eventually work. Refer to the Replication section in Chapter 5 for additional details.


Caveats

The first caveat is critical to understand before you do this. Unlike a normal DC demotion that bumps the DC back to a member server, manual demotion puts it in a work group . In the process, it deletes the computer account. If the DC is only a DC, that's no big deal ”when DCPromo is run again, the computer account will be re-created. However, if the DC is also hosting an application that uses the computer account, it will likely break the application.

A good example is Exchange. Manually demoting a DC that is also an Exchange server removes the computer account that contains security Access Control Entries (ACEs) for Exchange to work. If you DCPromo the machine back to the domain, the account will be re-created, but the ACEs will not be there. The computer account for the Exchange server is granted Full Control on the Exchange server configuration object of the same name:

 CN=ServerName,CN=Servers,CN=AdminGroupName,CN=Administrative Groups,CN=OrgName,CN=Microsoft Exchange,CN=Services, CN=Configuration,DC=domain,DC=com 

The permissions for the computer account should also flow down to the child objects. This is accomplished by opening the ADSIedit tool, available in the Windows Server 2003 Support Tools located in the \Support directory on the Windows Server 2003 CD. Go to Start, Run, and enter ADSIedit.msc to open the snap-in. Browse the Distinguished Name (DN) path just noted. Figure 11.3 shows how we drilled down to the rights for Exchange server ALFNADRLAB5 in the ALFMSLAB.Local domain, in the Exchange Organization ALFMESSAGELAB, and a member of the FirstSite administrative group. For this example, we expanded the following folders:

 Configuration [alfnadrlab5.alfmslab.local]       CN=Configuration,DC=alfmslab,DC=local CN=Services            CN=Microsoft Exchange                 CN=ALFMESSAGELAB                         CN=Administrative Groups                         CN=FirstSite                         CN=Servers                               CN=ALFNADRLAB5 

Right-click CN=ALFNADRLAB5 to get the Properties page. Select the Security tab, and then click the Advanced button at the bottom of the page. In the Advanced Security Settings dialog box, select the machine account, in this case . ALFNADRLAB5$ , and then click the Edit button. In the Permission Entry for the ALFNADRLAB5$ dialog box, check the Full Control Right (all boxes for all rights should then be checked). In the Apply Onto field, select This Object and All Child Objects. This is an important step, because if this is left as This Object Only, Exchange won't work (no mail received or sent).

Preventing Disaster: The Lag Site

Errors, inconsistencies, and corruption in AD can be corrected by an authoritative restore, described in the following section, which backs the AD to a version that existed when the last backup was completed. However, this has repercussions in that there is a possibility of losing data and changes since the backup. In addition, you can't restore the schema other than by restoring the entire forest as described in the "Recovery of a Forest" section in this chapter. Utilizing a Lag Site can conceivably mitigate these disaster scenarios. Think of it as an almost-real-time-backup.

The concept of a Lag Site is to schedule the replication frequency of one or more sites so that the site doesn't replicate for several days, purposely keeping the associated DCs several days behind in replication. Of course, you don't want to have a bunch of DCs in normal sites in this condition, so a special site is created with a DC in it and the replication frequency on the site link to that site is configured with a long time period. The lag site would have the following characteristics:

  • A new site.

  • Populated with a DC from each domain. This means you need to build DCs specifically for this purpose ”in other words, more new hardware.

  • Replication is scheduled for once a week. Define the replication frequency on the site link to this site for every 168 hours.

  • DCs are not allowed to authenticate users ”just replicate data. Do this by preventing the DCs in the lag site from registering Kerberos SRV records (see the next list item). Because these DCs are way out-of-date, they will have bad group membership data, old passwords, users that have been disabled or deleted, or not contain users and groups that have been created.

As with everything, there are some drawbacks:

  • Requires additional hardware, which includes maintenance, support, and so on. One solution is to use a virtual computer application, such as VMWare or Microsoft's Virtual Server, and host multiple virtual DCs on one host. Of course, if you have a single domain, you need only one DC.

  • Since the Lag DC's data is a week old, it is a security risk, holding disabled accounts, passwords, old group memberships, and so on. A solution to this would be using the virtual DC idea and only power them on when they need to replicate. Because they aren't being used for anything else, it won't hurt to do this. However, your other DCs will get errors in the event logs complaining about the Lag DCs not being available so you'll have to live with that.

  • You don't want the Lag DCs to authenticate users. You can prevent this by keeping them from registering DC Locater Domain Name Server (DNS) records. Create a Site Policy for the Lag Site and define the DNS DC Locater DNS Records Not Registered by the DCs setting located in Computer Settings, Administrative Templates, System, Net Logon, DC Locater DNS Records. You can specify which DC Locater DNS Records (SRV records) are not registered by the Net Logon service. The Mnemonics field is used to specify which records are NOT registered. The Explain tab lists the Mnemonic options (quite handy). Specify the Kerberos SRV record here to prevent these DCs from authenticating users. The Mnemonics to include are basically all of them except the Cname record (required for replication). The list to include in the mnemonics includes

    • LdapIpAddress

    • Ldap

    • LdapAtSite

    • Pdc

    • Gc

    • GcAtSite

    • GcIpAddress

    • DcByGuid

    • Kdc

    • KdcAtSite

    • Dc

    • DcAtSite

    • Rfc1510Kdc

    • Rfc1510KdcAtSite

    • GenericGc

    • GenericGcAtSite

    • Rfc1510UdpKdc

    • Rfc1510Kpwd

    • Rfc1510UdpKpwd

Currently, there are a number of companies, including HP, either using Lag Sites or considering their use. Of course, this should be tested and thoroughly analyzed to ensure it is a good solution for your company.

 <  Day Day Up  >  


Windows Server 2003 on Proliants. Deployment Techniques and Management Tools for System Administrators
Windows Server 2003 on Proliants. Deployment Techniques and Management Tools for System Administrators
ISBN: B004C77T6A
EAN: N/A
Year: 2004
Pages: 214

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net