Restore of AD | Windows Server 2003 on Proliants. Deployment Techniques and Management Tools for System Administrators

< Day Day Up >

On a larger scale than a single DC is the issue of the restoration of the entire AD or large pieces of it. Previously, we discussed nonauthoritative restore, which was a simple backup and restore from media or just letting a DC sync from other DCs. This section discusses the authoritative restore, which allows you to back the AD to the state it was in at a point of time in the past, essentially allowing a rollback of the AD. Also included are special considerations for restoration of operations masters (Flexible Single Master Operations [FSMO] role holders), recovery from accidental mass deletion of objects, recovery of the NTDS.DIT database, and recovery of the entire forest.

note

When objects are authoritatively restored, the objects and their attributes overwrite the tombstoned objects created when the objects were initially deleted.

Authoritative Restore

Authoritative restoration of the AD can be done for the entire AD, a tree in the AD (such as a single domain or OU [Organizational Unit]), or an individual object that was accidentally deleted. The concept is that you have a source DC that you have identified as the source of certain objects, and you want all other DCs to replicate from this source. This source object will forcefully replace all other versions on the other DCs and should be done only when other options have failed to repair the situation. An authoritative restore from a domain with at least two replica DCs is actually a merge operation. A copy of the AD at some point in the past is restored to a DC. This restores all deleted and modified objects to the point in time of the backup. Objects created after the backup will be replicated from the other DCs. This is a good feature in that you restore the deleted objects and keep the ones created in the meantime.

This process is accomplished by identifying a well-connected DC, restoring a backup from tape or other media that has the state of the AD you want to force on the existing DCs, and then using the NTDSUtil authoritative restore feature. Authoritative restore, as well as the restore from media, has to be performed in DS Repair Mode. For example, suppose we installed an AD-enabled application. We installed it on Wednesday and it's now Friday. We want to roll back to the AD as of Tuesday. Assuming we performed a full backup on Tuesday, we could perform the following steps:

1. Identify a DC to use. Make sure it is replicating normally and has good network access, such as being in the "hub" site. The new AD will replicate from this DC so the better connectivity it has to the other DCs, the faster the restore will be.

2. Reboot the DC into Directory Services Repair Mode (DSRM). You'll have to press the F8 key during boot to get the Safe Mode options menu, and then select DSRM. This boots into Safe Mode without starting AD.

note

You must log on with the local Administrator account. The password was defined during DCPromo. If you have forgotten it, in Windows 2003 log on to the DC, go to NTDSUtil, and choose the Set DSRM Password option. After defining the new password, reboot, and log on in DSRM mode.

3. Using your backup software, restore the system state backup of the DC you are logged on to. Reboot at the conclusion if required. If you reboot, you need to restart in DSRM again. If you are using the built-in Windows Server 2003 Backup/Restore utility, see the notes in the "Nonauthoritative Restore" section of this chapter. Make sure you restore:

SYSVOL junction points
System state
Restore to Original Location

4. Boot the DC into DSRM if it isn't at this point, and from a command prompt, start the NTDSUtil program. In NTDSUtil, start the Authoritative Restore:

 Ntdsutil> Authoritative Restore

5. Select the tree or objects to restore (options shown in Figure 11.4). In this case, we want to restore the entire AD so we select Restore Database:

 Authoritative Restore> Restore Database

Figure 11.4. Authoritative Restore options shown in the NDTSutil tool.

Figure 11.5 shows the results of this action. Note in the display, there is a line stating Increasing attribute version numbers by 100000. This means that the attribute versions are increased by a large number to make sure their version is higher than versions of objects in the current AD and thus forcing the lower number version objects to get overwritten.

Figure 11.5. The execution of authoritative restore indicates that the attribute versions in this restoration will be raised by 100,000 to force them to overwrite all other attributes on all other DCs in the domain.

tip

Authoritative restore actually increases the object version number by 100,000 * (age of backup in days). Therefore, if you use backup media that is five days old, the versions would be increased by 500,000.

6. Restart the DC. This DC is now the authoritative DC for the domain, meaning its version of objects and attributes will replace those on all other DCs in the domain. These changes will be replicated to the other DCs within the environment.

You can see the results of raising the Update Sequence Number (USNs) by executing the Repadmin command:

 Repadmin /showmeta dc=company,dc=com

The output of that command is shown here with an explanation following:

 38 entries. Loc.USN                 Originating DC            Org.USN ========           ==========================     ======== Org.Time/Date        Ver Attribute =============        =============   65538     61c413db-23fe-414e-9d46-6c881d4eabc4   65538 2004-02-14 18:20:200001 dc   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 msDS-PerUserTrustTombstonesQuota   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 msDS-AllUsersTrustQuota   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 msDS-PerUserTrustQuota    4098     61c413db-23fe-414e-9d46-6c881d4eabc4    4098 2003-10-31 22:32:08    1 msDS-Behavior-Version   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 ms-DS-MachineAccountQuota   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 gPOptions   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200006 gPLink   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 isCriticalSystemObject   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 objectCategory   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001 wellKnownObjects   74640     61c413db-23fe-414e-9d46-6c881d4eabc4   75554 2004-02-14 23:43:200001

Note that in the Ver (version) column, there are objects with version number of 200001, indicating that these objects were authoritatively restored from media that was 2 days old (2*100000+existing version of 1 = 200,001).

The process described here restores the entire AD. Microsoft KB article 241594, "How to Perform an Authoritative Restore to a Domain Controller in Windows 2000," contains excellent information regarding what can and can't be restored and what objects, such as those in the Configuration container, should and shouldn't be restored. Note that this article applies to Windows Server 2003 as well. If you are doing an authoritative restore, you should read this KB article and associated KB articles. Remember, authoritative restore is the Big Hammer approach to repairing the AD. Don't hit your head with it!

Authoritative Restore: Subtrees and Individual Objects

This method restores specific component(s) of AD and marks them as authoritative for the directory. This method is the most commonly used because there are few occasions when the entire directory needs to be restored. The syntax to restore the Marketing OU that is in the parent NorthAmerica OU in the Company.Com domain is

 Authoritative Restore: restore subtree OU=Marketing, OU=North America,DC=company,DC=com

Exit from NTDSUtil and reboot the DC. This server is now the authoritative AD DC for the Marketing OU and changes will be replicated.

Individual objects can be restored in the same manner using the Restore Object option and specifying the DN of the object:

 Authoritative Restore: restore object CN=olseng, OU=users,DC=company,DC=com

Authoritative Restore and Override Version Increase

Note in Figure 11.4 that each of the restore options has an associated option to override version increase, such as Restore database verinc %d. This permits you to override the version increase of 100,000. One use for this would be if you suspect that a version increase of 100,000 is insufficient. This is used when you have to run the authoritative restore twice, which could be the case when something goes wrong the first time (doesn't finish, power outage , and so on) or as a normal consequence of having to restore user accounts before groups. For instance, to set the version increase to be 250,000, you could enter the command:

 Authoritative Restore: Restore database verinc 250000

Recovery from Accidental Massive Object Deletion

One of the most common reasons to have to restore all or part of an AD is due to errors by Administrators. In fact, in a disaster recovery presentation, Microsoft indicated that it was surprisingly getting a lot of support calls due to large numbers of objects (usually users) being deleted. Microsoft also indicated that the number one way to protect your AD against this is to be careful to whom you give delete privileges. Although some of these operations ask a couple of times if you are sure, many Admins still goes through with a destructive approval. In all fairness to those who have done this, there isn't anyone in the IT industry who hasn't been in one directory, thought they were in another, and deleted the directory. In my days as a VMS Administrator, I actually set up security to deny myself delete privilege so I'd have to change it to perform a delete.

Some of the common ways to accidentally delete valid objects include

Delete entire OU : Admin doesn't realize there are users, computers, and so on in the OU and deletes the OU. Of course, this deletes all the objects in the OU. For instance, you might have two similarly named OUs such as EuropeOU1 and EuropeOU2, and you mistakenly select OU2 with users when OU1 was the empty one.
Search : Administrator performs a search and deletes the contents of the search without looking at all of the returns. In Windows Server 2003, you can create and save queries. You might execute the wrong query and perform a delete.
LDP and ADSIEdit : These tools are very dangerous. Microsoft makes no small effort warning of the repercussions of modifying the Registry, but that only has the potential of wiping out one computer. LDP and ADSIEdit can wipe out large pieces of the AD ”on every DC. You are manipulating the actual AD object. Be careful not only in deleting, but in modifying attributes, values, and so on.
Scripts : Make sure you thoroughly test these. Getting the wrong path in the script is dangerous. Tools like LDIFDE and CSVDE are intended to affect bulk creation, modification, and deletion of objects. Make sure they do what you intend. Consider testing procedures such as populating your lab environment with live data, and then run the script and test the results. This is a good idea for testing schema changes as well.
Third-party tools : Make sure you know how to use them. Watch the use of wild cards, defaults, and so on. The lab with live data will help with these tools as well.

There are some things you can do to reduce the risk of accidental object deletion, such as:

Be very restrictive of who has delete privileges. Just like when I denied delete privileges to myself, deny them to Admins, and restore them only when needed. Note that this isn't totally foolproof as some permissions in AD can be set at the object level and can't be denied by inheritance.
Be very restrictive of any privileges. Remember Windows 2000 and Windows Server 2003 have very granular delegation rights. Make sure Admins are trained for the privileges they are granted.
Thoroughly test applications, scripts, and so on before using in production. I noted earlier the incredible number of customers I deal with who do not keep valid backups ” specifically , they do not routinely validate those backups . Similarly, a large number of companies I've dealt with don't have a test environment that allows valid testing operations.
Use live data in your test environment. Invest in some hardware to build a lab, and take one of them and join the production domain as a DC/GC. Pull it off production and put it in the lab. Seize the FSMO roles, and you have a lab with live data. If you have a multidomain forest, create a lab with all those domains represented. Remember to keep at least two DCs per domain so they can replicate and guard against failure of one of them.
A Lag Site could be used for testing by pulling the Lag DCs off the network into a private net, testing, and if successful, returning them to production. If the tests fail, just manually demote (using the procedure noted earlier in this chapter) and repromote back into the production domain.

warning

At the writing of this book, the Lag Site is a new concept. Using the Lag DCs for testing, introduce the possibility of putting it back online and propagating bad data as a result of the test. Test this procedure and establish rigid testing procedures to ensure you don't populate the production domain with bad data. For instance, testing a script shows it inadvertently deletes a wrong set of users. Plugging this into the production domain will propagate those deletions. However, because the Lag Site replicates only once a week, you will have some window of safety even if this should happen.

When you plan for the recovery of user objects, there is an important concept to understand: the back link issue. This makes restoration of user objects much more complicated than it might seem. The issue and how to successfully recover deleted user objects is detailed in the next section.

Restoring Objects and the Back Link Issue

Your ability to restore user objects isn't as simple as it might seem. In Windows 2000, Windows Server 2003 domains, and forests whose functional level have not been raised to Windows Server 2003 level, the group membership is treated as a single attribute. The groups a user is a member of are not stored with the user; instead, the groups are stored as a link to the user object in an attribute of the group (the member attribute). However, you should understand that a group cannot store a link to the user object, if this object doesn't exist on a DC.

In case of authoritatively restoring both users and groups at the same time, this can be critical because an Administrator doesn't have control over which objects get replicated to the partner DCs first. If the group happens to be replicated first, then a user's membership can be dropped on the partner DCs because they do not yet have the user object in their database. Thus, the user account must be restored before the group in order to safely restore the group membership (at least within the same domain). This becomes a problem if you are restoring a large section of the AD, such as the entire directory or an OU that contains both users and groups.

Microsoft recommends doing an authoritative restore twice to restore deleted user accounts and groups. The first time it will try to restore both users and groups, but the groups will fail if the users have not all been replicated first. By running authoritative restore a second time, it will restore the groups this time. Note that if your AD design places users and groups in separate OUs so that the user OU can be restored before the group OU, then a single restoration of each OU (first the user OU then the group OU) is sufficient.

The Problem with Nonrestored Object Links

If accidentally deleted objects are not restored with their associated cross-domain links between users and groups in a multiple domain environment, it can have some ramifications on the administration of the domain.

After authoritatively restoring accidentally deleted users, the links to groups whose memberships span multiple domains are not restored. In addition, when users that are members of other domains, or groups that are nested in groups in other domains, are restored from backup, other object links will not be restored. These include the manager/directReports (in the "Organization" tab of the user properties) or managedBy/managedObjects in objects such as computers and printers.

Loss of group membership can have serious consequences, such as

Exchange users can't get mail sent via a distribution list if it spans multiple domains.
Users won't be able to access network or Web resources protected by a group that they used to be a member of before the restore.
Even more serious ”users might be able to access resources they shouldn't. For instance, a group that a user is a member of has been denied access (Deny ACE) to a resource such as a share. After the restore, if the link to the group is not in place, the user will have access to the resource.
Software management systems that deploy applications (such as short message service [SMS] or AD Group Policy) can be configured to install or remove software based on group membership. Thus, if the user loses the group membership link, software might be uninstalled because it appears the user is not a member of that group.

These circumstances are true for Windows 2000 and Windows Server 2003. Note that both Windows 2000 and upgraded Windows Server 2003 forests also have this trouble even in the same domain. That is, the user-to-group object links do not get restored when only restoring the user objects. New group-memberships that were added to a Windows Server 2003 forest after switching to the highest forest functional level (enabling Link Value Replication [LVR]), will be restored correctly within the same domain.

Administration Anomalies

The functionality of these object links even in a healthy domain has certain anomalies that make administration difficult in a multiple-domain environment. If a user is a member of a Domain Local Group (DLG) that is hosted in another domain, the Administrator of the user's home domain can't see that group in the user's Member Of properties. For example, Figure 11.6 shows the membership of the DLG Amer-DLG1, in the Qamericas.Qtest.cpqcorp.net domain. Note that there are several users in the group who are members of the root domain Qtest.cpqcorp.net, one of them being Gary Olsen . However, in Figure 11.7, viewing Gary's Member Of tab in his user account properties, the Amer-DLG1 group is not listed. Only groups in the Qtest root domain are shown.

Figure 11.6. Membership of the DLG, Amer-DLG1. Note that Gary Olsen is a member.

Figure 11.7. Listing of user Gary Olsen's group memberships ”note that the DLG Amer-DLG1 in the child domain is not in the list.

In addition, there are problems when attempting to view memberships of a user in a universal group that is hosted in another domain. If you are connected to a GC when you view the user properties, you can see the universal group listed. However, if you are connected to a DC in the user's home domain, you can't see the membership.

As a system administrator, these issues are probably frustrating to you. In an attempt to make this behavior consistent, Microsoft filtered the results in Windows Server 2003 so you couldn't see this membership even if connected to a GC. Administrators complained and Microsoft put it back in SP1.

Another object whose group object link gets lost is the managedBy object. This allows a user or contact ( name , e-mail, phone, and so on) to be responsible for a resource (such as computer, printer, and so on), allowing others to know who the primary contact on that machine is. Unfortunately, you can't get a list of objects that a given user is responsible for, and it's difficult to see the managedObjects if the user is to manage objects in a different domain in the forest. These links also fail to be restored correctly with accidentally deleted objects.

The manager / directReports/manager object, defined in the Organization tab, can't see users from another domain and restoration of the objects won't restore these links.

To summarize, there are basically two problems. User's group memberships are stored as links in the AD database, so these links are not restored correctly in multiple domain forests. This problem exists for other links such as manager , direct reports , and managedBy objects.

Authoritative restore does not correctly restore these links in multidomain forests, and Windows 2000 even has issues restoring object links in the same domain.

Additionally, the default snap-ins, such as AD Users and Computers, will not support viewing these group memberships across domains.

Hewlett Packard recently developed a tool called Active Directory Link Recovery Manager (ADLRM) that provides a way to save and restore these links, as described in the next section.

Active Directory Link Recovery Manager (ADLRM)

Guido Grillenmeir, of HP, was one of the first to discover this relationship of the object links when performing a restore for a customer. He has published internal HP articles on the subject and, with Walter Knopf, developed the ADLRM tool, which can be used to save, restore, and manage these links through a GUI-based console.

ADLRM specifically addresses the problems noted previously in this section regarding the unrestored cross-domain links and other object links by storing them in a SQL (or Microsoft Database Engine [MSDE]) database and providing a GUI-based interface for management. ADLRM's capability to view and manage inter-domain links, cross-domain links, and memberships of users and groups in a DLG is a powerful tool for Administrators because these links are otherwise invisible. For example, all the links for a user can be displayed, as seen in Figure 11.8.

Figure 11.8. A user's group membership links can be displayed in the ADLRM tool; this information is not otherwise viewable by a tool.

The ADLRM tool has the following capabilities:

Create a forest-wide link-catalog via a central SQL database (or MSDE)
View cross-domain group memberships and other important object links such as managedBy/managedObject or manager/directReports in a single window
Compare objects between the database and AD to view missing (deleted) objects or links
Restore missing links (for example, after deleted objects have been restored using the authoritative restore approach)

It is important to note that ADLRM is not intended to be a backup tool or to replace the system state backup process. It is an add-on tool to repair a hole in the process of authoritatively restoring AD objects.

You can see a block diagram of the processes used by ADLRM in Figure 11.9. The two core components , the collector service and the console run on the same machine, but separately. The ADLRM collector service is the heart of the system and stores the links from all domains in the forest in a central SQL database (see Figure 11.10). Features and recommendations include

For performance reasons, run the collector service on the same server as the SQL database or at least on a server with excellent connectivity to the SQL server.
The link collection is really just a backup, and like a backup can be scheduled as desired ”usually after hours.
Multiple collections can be stored in the database separately, allowing the restoration of links at various points in time.

Figure 11.9. Block diagram showing processes in the ADLRM tool.

Figure 11.10. The collector service of the ADLRM tool shows all links for all domains.

Using ADLRM in combination with proper recovery practices using authoritative restore provides the best possibility of success in restoring accidentally deleted objects in single as well as multiple domain forests. It is important to note that if your disaster recovery strategy is to reanimate tombstones, as provided by Windows Server 2003, not all of the attributes are recovered because the tombstone only contains a subset of the attributes. ADLRM does not take care of backing up or restoring these attributes; this must be done by other means.

Microsoft has recently released KB article 840001, "How to Restore Deleted User Accounts and Their Group Memberships in Active Directory." This article is quite detailed regarding the issue of restoring deleted objects. Note that issues such as this one are found and solved as a natural consequence of using AD. It is advisable, then, to monitor Microsoft's Web site for updates on this and other issues.

The Microsoft KB article 840001 article provides information in six important areas:

Reanimating tombstones to restore objects is a good method if you lack a valid system state backup.
Using the new Groupadd tool to target the repopulation of missing group links in AD after recovering deleted objects. Even with this tool, you still have to recover one DC of every domain in the forest to be able to recover the local group memberships in other domains, or other links such as the manager and managedBy links for recovered objects. With ADLRM, you just have to recover the objects in the domain they were deleted in; the links in any domain of the forest are then fixed by ADLRM.
Manually undeleting objects in a deleted object's container.
Determining when and where a deletion occurred.
Minimizing the impact of bulk deletions in the future.
Using tools and scripts that might help you recover from bulk deletions.

ADLRM can also be used to repair the LVR-specific attributes that are not updated correctly in a Windows 2000 to Windows Server 2003 upgrade. The tool can be used to repair the LVR status of groups as well as the manager or managedBy attributes. The ADLRM tool is available from HP by going to the Web site at http://TheADLRMWebsite.com.

Recovery of SYSVOL

Authoritative and nonauthoritative restores of SYSVOL are not related to the authoritative and nonauthoritative restore procedures described here. Refer to Chapter 5 for details on recovery of the SYSVOL tree. Remember that the system state includes the SYSVOL tree.

The restoration of the SYSVOL tree when using the IFM feature in Windows 2003 has some caveats. I know of a customer who had performed a DCPromo using the IFM operation to save bandwidth over the network. However, the customer noticed that at the end of the promotion, in spite of the fact that SYSVOL is part of the system state that was used as a source for the promotion, a full synch was performed from an existing DC to the newly promoted DC. Because SYSVOL was about 250MB, this was a problem. The customer finally resolved it by clearing the ADM cache, but the problem centers around the fact that the system state must be restored to the same volume that you specify to host SYSVOL in the DCPromo UI. For example, you can't restore system state to drive C: and then specify the C: drive for the NTDS.DIT and logs, and the D: drive for SYSVOL. In this case, SYSVOL will not be sourced from the restored system state. If you want to place the database (NTDS.DIT) and the logs on one volume and SYSVOL on another, you must follow a defined process, which is described in Microsoft KB article 311078, "Install From Media."

Recovery of Operations Masters

Operations masters, sometimes called FSMO role holders, have unique characteristics that make them more important than the ordinary vanilla DC. When one of these DCs becomes unavailable, the method of restoration depends on the FSMO role it holds, the type of role, how long it will be before the DC is restored to the network (if ever), and how long your environment can live without that role. The options for recovering these DCs are as follows :

Restore it using the methods described in the "Simple Repair of AD" section of this chapter.
If you must have the role functionality available in the domain or forest before the role holder comes back online, you can seize the role to another DC.
Windows Server 2003 is more tolerant for role seizure. While it is best to never bring the original FSMO role holder back online (as it still thinks it's a FSMO role holder), if it does come back, the conflict will be resolved, and the new role holder will keep the role.

One way to provide for recovery of FMSO role holders is to identify a DC as the Standby FSMO. This DC should be in a site with good connectivity, sufficient resources (memory, disk, etc.), and so forth so that it has the resources to be a role holder and has the connectivity to ensure it is always up-to-date. Thus, when it's necessary to seize an FSMO role, you can seize it to this Standby FSMO. Repadmin provides an excellent way to determine whether the DC that you want to seize a role to is up-to-date in replication with the /showvector option. For instance:

 Schema owner      hpqnet-dc5.hpqnet.qtest.cpqcorp.net Domain role owner      hpqnet-dc3.hpqnet.qtest.cpqcorp.net PDC role      hpqnet-dc3.hpqnet.qtest.cpqcorp.net RID pool manager      hpqnet-dc1.hpqnet.qtest.cpqcorp.net Infrastructure owner      hpqnet-dc3.hpqnet.qtest.cpqcorp.net The command completed successfully.

HPqnet-DC3 holds the domain naming master, the PDC Emulator, and the infrastructure master roles. To find out who has the most updated replication from HPQnet-DC3, we use the Repadmin /showvector command, specifying the DN of the domain followed by the name of the DC we want to look at. In this case, there are three other DCs: HPQnet-DC9, HPQnet-DC5, and HPQnet-DC1. So, we execute the command for each DC and observe the USN for HPQnet-DC3:

 C:\>repadmin /showvector c=hpqnet,dc=qtest,dc=cpqcorp,dc=net HPqnet-dc1.hpqnet.qtest  .cpqcorp.net >fsmodc1.txt Dublin\HPQNET-DC9        @ USN   3745860 @ Time 2004-02-18 03:43:08 Brussels\HPQNET-DC5      @ USN   2201360 @ Time 2004-02-18 03:38:25 Alpharetta\HPQNET-DC3    @ USN   2871861 @ Time 2004-02-18 03:54:10 Seattle\HPQNET-DC1       @ USN   1570576 @ Time 2004-02-18 04:25:09 C:\>repadmin /showvector c=hpqnet,dc=qtest,dc=cpqcorp,dc=net HPqnet-dc5.hpqnet.qtest  .cpqcorp.net >fsmodc1.txt Dublin\HPQNET-DC9        @ USN   3746087 @ Time 2004-02-18 04:10:54 Brussels\HPQNET-DC5      @ USN   2201478 @ Time 2004-02-18 04:24:28 Alpharetta\HPQNET-DC3    @ USN   2871959 @ Time 2004-02-18 04:07:10 Seattle\HPQNET-DC1       @ USN   1570573 @ Time 2004-02-18 03:58:12 C:\>repadmin /showvector c=hpqnet,dc=qtest,dc=cpqcorp,dc=net HPqnet-dc9.hpqnet.qtest  .cpqcorp.net >fsmodc1.txt Dublin\HPQNET-DC9        @ USN   3746189 @ Time 2004-02-18 04:24:47 Brussels\HPQNET-DC5      @ USN   2201478 @ Time 2004-02-18 04:23:23 Alpharetta\HPQNET-DC3    @ USN   2872067 @ Time 2004-02-18 04:22:09 Seattle\HPQNET-DC1       @ USN   1570573 @ Time 2004-02-18 03:58:12

Note that the USN for DC3 as it appears on DC1, DC5, and DC9 is shown in Table 11.1. HPQnet-DC9 has the highest USN and thus is most up to date with HPQnet-DC3.

Table 11.1. Comparison of HPQnet-DC3's USN on the Three Other DCs

DC Name	USN for HPQnet-DC3	Comment
HPQnet-DC1	2871861
HPQnet-DC5	2871959
HPQnet-DC9	2872067	Most up to date

Because HPQnet-DC9 has the highest USN recorded for HPQnet-DC3, HPQnet-DC9 is the best candidate to seize the FSMO roles held by HPQnet-DC3.

Roles and Implications of Loss

Whether or not you immediately recover a role holder depends on the function of the role and how long you can get along without it. Here is a summary of this functionality.

Domain naming master : Can't add or remove domains from forest. No need to restore unless adding or removing a domain or renaming the domain in Windows Server 2003.
Schema master : Can't make changes to schema (including installing applications that modifying the schema). No need to restore it unless schema modifications are needed.
RID master (domain level) : If a DC runs out of Relative IDentifiers (RIDs), it can't get a new RID pool and in this case can't create new security objects (user, computer, group) and can't move them between domains. Usually low impact, but could be important if mass creation of new objects is required.
Infrastructure master : In multidomain environments, it provides cross-domain group membership changes, and so on. This affects Admins making changes to group memberships and indirectly users logging in to a domain with an account in another. In a single domain environment, this has little value. In a multidomain environment, it's pretty important.
PDC Emulator : This has the most impact of any of the role holders because it is responsible for tracking password changes, editing Group Policy (the Group Policy Object [GPO] edited will be on the PDC Emulator) and in Windows Server 2003, it is the authoritative time source for the domain. In the root domain, it is the authoritative time source for the forest (see the "Time Services" section of Chapter 6, "The Physical Design and Developing the Pilot"). Its loss also poses a security risk because it is the time source and could cause replication, user authentication, and other functions to fail. In mixed mode environments, it's also the PDC for Windows NT Backup Domain Controllers (BDCs).

To determine whether you need to seize the role, ask the following questions:

Will this DC be able to be repaired and come back online at some point before the tombstonelifetime period expires ?
Is the data on the disk relative to AD valid (that is, not corrupt)?
Can you maintain your environment without this role holder until it is repaired and comes back online (that is, you don't need to perform the operations noted in the previous list for this role holder)?

If you answer yes to all three questions, then you should seize the role. If you answer no to any of them, then wait for the DC to come back online and don't seize the role. In addition, you should never restore a DC holding the RID master.

note

Additional information on FSMO role holders and their function and replacement is available in the "FSMO Placement" section of Chapter 6.

Disaster Recovery Q&A

To reinforce some of these concepts, lets do a Q&A. See if you pass the test!

Question : Will an authoritative restore roll the AD schema back in time?

Answer : It depends. Restoring a DC in a forest with other DCs will not roll the schema back. Restoring a DC from a previous date's backup and creating a new forest around that DC will roll the schema back. (See the "Forest Recovery from Media" section later in this chapter).

Question : You need to restore a DC from a system state backup, but you notice that the backup is 75 days old, which is older than the tombstonelifetime of 60 days. Can you change the existing tombstonelifetime value to 90 days and use the tape?

Answer : Of course, you can do anything you want, but this will have no effect. The system state backup has the date of the backup and the tombstonelifetime value at the time of the backup. If the delta of the current date to the backup date is greater than the tombstonelifetime value of the backup, it will not restore. In addition, if there were objects deleted after the backup, they will have been purged after 60 days. If a restore were successful, it would reanimate those deleted objects, producing the lingering object behavior described in the "Challenges and Issues in AD Replication" section in Chapter 5.

Question : On Monday, you add 25 users to the domain, and then make a system state backup. The next day, you add another 50 users. On Thursday, you accidentally delete 10 of those 50 users added on Tuesday and all 25 users added on Monday. The following day, you decide to do an authoritative restore of the domain using the system state backup completed on Monday. After completing the authoritative restore and replication completes through the domain, how many of those users added since Monday will exist?

Answer : 65 users. 25 added Monday + 50 added Tuesday “ 10 deleted Thursday. Note that this is really a "merge." You don't lose the objects added after the backup, you are just reading. You get the 25 back from the restore, but lose the 10 that were added Tuesday after the backup was made.

Question : An Administrator mistakenly deletes 5,000 user accounts. Can you restore them by simply restoring the user accounts?

Answer : It depends:

In Windows 2000 or Windows 2003 where the forest functional level has not been raised to Windows Server 2003 level (native), the answer is no ”you have to restore the groups as well.

In Windows Server 2003 where the forest functional level has been raised to Windows Server 2003 ("native") level, and if the user objects were added after the functional level was raised, and if you only have a single domain in your forest, then the answer is Yes. Windows Server 2003 forest functional level has LVR due to the capability of LVR to revive links to groups of deleted objects, and thus can replicate a single user. There is more information on LVR in Chapter 1 and in the "Replication Topology" section of Chapter 5.

note

Remember that cross-domain group memberships won't be recovered in the restore. This was covered earlier in this section.

Database Recovery

Another aspect of disaster recovery is the recovery and maintenance of the AD database. This section describes some basics of the database architecture, how the associated transaction logs work in a write operation, and recovery procedures. Also included are some tips on defragmenting and running integrity checks on the database and the file structure. Figure 11.11 shows a conceptual diagram of the AD architecture. A thorough description of the layers in this architecture is described in Chapter 8 of my Windows 2000 book, Windows 2000 Active Directory Design and Deployment (New Riders, 2000).

Figure 11.11. AD architecture.

The Directory Store ”NTDS.DIT file

The data or directory store is the NTDS.DIT file. This should be familiar to anyone who has installed a DC. One of the questions posed during the final stages of the UI portion of DCPromo is for a desired location for the NTDS.DIT file.

This is the database of the AD in one file. It's important to plan and predict the potential size of this file to obtain adequate disk space. Experimentation with large numbers of AD objects (tens of millions) suggest that performance is enhanced by putting the NTDS.DIT file on a different disk than the logs and SYSVOL share, not only for storage reasons, but also for performance on disk access. These should all be separate from the system disk.

note

The NTDS.DIT file can be moved to a different location with the NTDSutil.exe program using the move db command when a DC is booted to the DSRM. See Microsoft KB article 257420, "HOW TO: Move the Ntds.dit File or Log Files," for details.

Other Files

As shown in Figure 11.11, there are three other files in addition to the NTDS.DIT file: Log Files, the EDB.CHK file, and the TEMP.EDB file. The EDB logs contain information not yet saved in the NTDS.DIT file. EDB.CHK is referred to as the checkpoint file and holds the current transaction. The EDB.CHK file is used to restore log files. The TEMP.EDB file holds the transactions that are in progress as a new log file is being created.

The AD Write Operation

Figure 11.12 illustrates how a write operation is processed in the AD. In step one, the client requestor performs a write operation. This could be an Administrator adding a user. When this operation occurs, shown in step 2, the LSASS process saves that data in a log buffer in memory. In step 3 (two arrows in the figure), the data is then saved in both the log file and in a memory cache. If the transaction is successful on both the log file and memory cache, then the data is saved in the NTDS.DIT file, as shown in step 4.

Figure 11.12. The AD write operation is essentially a four-step process.

With a basic understanding of the Windows 2003 system architecture, the layered AD architecture, and how the AD writes information to the AD data store, we can turn our attention to managing the AD database and issues such as predicting the database size that was noted previously in this chapter.

AD Database Management

In the previous section, we saw how physical database, cache, and log files were used during the AD write operation. We will now turn our attention to the management of the AD database to determine predictable resource demands to aid the architect in determining resource allocation.

It's important to understand the structure of the database tables. Objects are recorded in the database tables as rows and include things such as users and printers. Attributes are recorded as columns , such as a user's address and phone number.

Backing Up and Restoring the AD

The previous discussion of how the AD writes data to the data store described interaction with a number of log files that probably seemed redundant. It is redundant by design. By writing data to these logs and verifying successful transactions before it is finally written to the NTDS.DIT file, it guarantees data integrity. It also provides a method of restoring the log data after a system crash to bring the AD to the last state recorded in the logs.

The AD restoration operation is shown in Figure 11.13. If the AD terminates unexpectedly, such as with a power outage or system crash, the database is prevented from being moved to disk, and a recovery is attempted on the next boot. When the system starts up again, the log files are read sequentially and apply the changes recorded in them to the database to bring the database to the state it was in when the AD terminated .

Figure 11.13. Flow chart of AD recovery after an unplanned shutdown or crash.

The log file name that this information is stored in is the edbXXXXX.log file, where XXXXX is a sequential number starting at 00001. Following is a directory listing of the edb logs from a DC.

 Directory of C:\WINNT\NTDS 02/19/2000  07:49p      <DIR>          . 02/19/2000  07:49p      <DIR>          .. 12/29/1999  12:19a      <DIR>          Drop 02/16/2000  07:54p               8,192 edb.chk 02/19/2000  07:49p          10,485,760 edb.log 02/19/2000  07:49p          10,485,760 edb00001.log 02/11/2000  07:38p          10,502,144 ntds.dit 01/24/2000  10:31p          10,485,760 res1.log 01/24/2000  10:31p          10,485,760 res2.log 02/11/2000  07:38p           2,113,536 temp.edb                7 File(s)     54,566,912 bytes                3 Dir(s)     990,056,448 bytes free

Note that the EDB log files are sequentially numbered in hex. This is used to restore the data in order. The edb.log file is the current log file being written. Note that in this listing there is an edb00001.log. This was the first edb.log file written. It was filled to the maximum size, closed, and saved as edb0001.log; then, a new edb.log was started. The two "reserve"logs, shown in the directory listing, res1.log and res2.log, are simply placeholders that reserve disk space to be used by the log files for a controlled shutdown if disk space is exhausted. The EDB logs, as well as the res1.log and res2.log files, are always the same size 10MB.

Obviously, there is a need to limit the size of the logs to fit within the physical storage limits. Windows 2003 enables circular logging , which overwrites the oldest log file after a certain number of log files are created. This is the default. If circular logging is turned off, the Administrator must delete the old EDB logs manually to manage disk space. The function of the res1.log and res2.log files is not needed when circular logging is enabled.

Edb.chk is the checkpoint file that knows from where in the log file the recovery process should start. If this is missing or not accessible, the recovery would have to start with the oldest log file it could find and determine where to begin to write data to the information store.

Defragmentation and Integrity Checking

Defragmentation consists of online and offline operations. Online defragmentation takes place automatically as part of the garbage-collection process that purges expired tombstones. Online defragmentation releases more space for the database to use, but doesn't release any space back to the system. You should do an offline defragmentation on a regular basis. Compacting the NTDS.DIT database can make a significant difference in the time it takes to replicate.

HP's current NTDS.DIT size of a GC is about 7.2GB compressed (depending on the domain of the GC) and more than 9GB uncompressed. This is important in backup and recovery times, or when promoting a new DC or GC. As noted previously in Chapter 1, the IFM feature considerably reduces the time required to promote a GC. Refer to Microsoft KB articles 229602, "Defragmentation of the Active Directory Database," and 232122, "Performing Offline Defragmentation of the Active Directory Database."

If you suspect that the database is corrupt, you can run the Semantic Database Analysis option in NTDSUtil. You must be booted into DSRM to do this, but I've seen a lot of problems caused by database corruption that were fixed by running this option. There is a Fixup option that fixes any problems it encounters as well as an option to just report the results. I've never seen this hurt the database; it can only help. Refer to Microsoft KB article 315136, "HOW TO: Complete a Semantic Database Analysis for the Active Directory Database by Using NTDSUtil.exe."

Recovery of a Forest

The absolute worst-case scenario in AD disaster recovery is to have to recover an entire forest from backups. This scenario should also be considered as part of your disaster recovery plan. Typically, this plan would allow for a duplicate set of backup tapes sent to an offsite facility ”probably managed by a separate company that specializes in this work. The frequency of this delivery is up to you ”once a day is probably too much work ”so perhaps send the weekly full backup tapes. The plan should specify that this third-party company have the capability to get new hardware and restore the forest. Make sure that you not only define the procedure, but also test it to make sure it will work. Again, a backup without validation is a waste of time and an incredible risk.

You might deploy such a plan for a number of reasons:

Man-made disaster eliminates every DC in a domain or forest : Think of companies in the World Trade Center who lost their entire infrastructure on September 11, 2001. Offsite data recovery plans kept the remainder of the company in business. Also, a disgruntled Administrator could easily cause a lot of damage with the right privileges.
Natural disasters : In a large global company like HP with DCs in many sites around the world, it's highly unlikely that every DC in the forest ”or even in any domain ”would be eliminated by any means. However, a college campus, hospital, or business that occupies several floors of a single building could easily lose all the DCs in the forest in a fire.
Virus or hacker attacks compromise stability or security : Recovering the forest allows you to roll the forest back in time before these events took place, and lets you take corrective action before going live again.
Corrupt schema : You can rebuild the forest by restoring a DC in the forest root and each child domain using a backup taken before the corruption, and then reconstruct the DCs one by one. This will take a while, but it's faster than rebuilding the entire AD ”users, groups, client accounts, and so on.

note

Restoring a forest from backup will lose all the changes in the AD since that backup was made. Make sure you include a plan to re-create the objects created between the backup date and the date of the disaster.

Disaster Recovery Site

Many companies employ a disaster recovery site (DRS) n which a DC from each domain is in a physical location separate from the company's buildings . A high-speed link connects the DRS with the hub site of the normal infrastructure, or perhaps additional links and associated replication topology are built so that this DRS is always up to date. Thus, if you do lose all your DCs in all your sites, you still have a nucleus in the DRS, which is much easier to restore from than backup tapes, and with far less potential loss of data. Figure 11.14 is the topology of one customer I worked with who employed this concept. This shows the replication topology employed to ensure the DRS site had preferential replication from the corporate hub site.

Figure 11.14. AD replication topology map for a Disaster Recovery Site (DRS).

Forest Recovery from Media

There are valid reasons for needing to recover an entire forest from media, but there will seldom be an occasion to do so. Barring a natural disaster that destroys every DC in every domain in the infrastructure, there are certainly better and safer ways of recovery. Remember you are betting the company's entire infrastructure on this operation, and there are a number of dependencies such as:

There must be valid data on the media.
The physical media itself is in good shape.
You can recover the data from the media.
You can recover the data to dissimilar hardware because you can't guarantee that the same hardware configuration will be available when you need to execute the recovery.
You have a plan that has been tested and a good backup strategy.

Preparing for a complete forest recovery involves a great deal of planning and testing to make sure the plan works. After you develop the plan, test it by taking the backup media and restoring it to new hardware. Because you can't guarantee that the hardware you restore it to will be the exact same as the hardware it's running on now, make sure you follow Microsoft's procedure for restoring to dissimilar hardware, detailed in KB article 263532, "How to Perform a Disaster Recovery Restoration of Active Directory on a Computer with a Different Hardware Configuration." Again, this is a procedure to be used in a "life or death" scenario. If there is no other possible way to recover your AD and Microsoft advises it, then use the procedure documented in Microsoft's whitepaper "Best Practice Recommendation for Recovering your Active Directory Forest" located at http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=3EDA5A79-C99B-4DF9-823C-933FEBA08CFE.

Even during the writing of this book, some processes of disaster recovery have changed per Microsoft's recommendations, so you should make sure you are acquainted with the most recent Microsoft recommendations.

warning

Be aware of the incredible impact that a recovery of the entire forest could have on a company ” especially a large global enterprise with multiple domains and perhaps tens of thousands of user accounts. Of course, if all your DCs are in a building that burns down and you have no other sites, then you have no choice. But barring physical destruction of all DCs in the forest, do the following:

1. Consult with Microsoft Support before starting and during the recovery process.

2. Make sure your backup media is good. Don't tear the whole thing down and find out your backups are trash.

3. Do a dry run with the backup media by restoring the forest to a private network. If you are using a private company to store your data, the company can probably help you do this at its site; in fact, many such companies offer that service.

4. Get a good plan and carefully follow the plan.

< Day Day Up >