Architecture of the Active Directory | Performance Consulting: A Practical Guide for HR and Learning Professionals

Chapter 13 - Working with the Active Directory

bySimon Robinsonet al.
Wrox Press 2002

Before we start programming we have to know how the Active Directory works, what it is used for, and what data we can store there.

Features

The features of the Active Directory can be grouped into the following list:

The data in the Active Directory is grouped hierarchically . Objects can be stored inside other container objects. Instead of having a single, large list of users, the users can be grouped inside organizational units. An organizational unit can contain other organizational units, so we can build a tree.
The Active Directory uses a multi-master replication . In Windows NT 4 domains the primary domain controller , PDC , was the master. In Windows 2000 with the Active Directory every domain controller , DC , is a master. If the PDC in a Windows NT 4 domain is down, no user can change their password; the system administrator can only update users when the PDC is up and running. With the Active Directory, updates can be applied to any DC. This model is much more scalable, as updates can happen to different servers concurrently. The disadvantage of this model is that replication is more complex. We will talk about the replication issues later in this chapter.
The replication topology is flexible, to support replications across slow links in WANs. How often data should be replicated is configurable by the domain administrators.
The Active Directory supports open standards . LDAP , the Lightweight Directory Access Protocol , is one of the standards that can be used to access the data in the Active Directory. LDAP is an Internet standard that can be used to access a lot of different directory services. With LDAP a programming interface, LDAP API, is also defined. The LDAP API can be used to access the Active Directory with the C language. Microsoft's preferred programming interface to directory services is ADSI , the Active Directory Service Interface . This, of course, is not an open standard. In contrast to the LDAP API, ADSI makes it possible to access all features of the Active Directory. Another standard that's used within the Active Directory is Kerberos , which is used for authentication. The Windows 2000 Kerberos service can also be used to authenticate UNIX clients .
With the Active Directory we have a fine-grained security . Every object stored in the Active Directory can have an associated access-control list that defines who can do what with that object.

The objects in the directory are strongly typed , which means that the type of an object is exactly defined; no attributes that are not specified may be added to an object. In the Schema , the object types as well as the parts of an object (attributes) are defined. Attributes can be mandatory or optional.

For more information about ADSI you can read Simon Robinson's Professional ADSI Programming , Wrox Press, ISBN 1-861002-26-2.

Active Directory Concepts

Before programming the Active Directory, we need to begin with some basic terms and definitions.

Objects

We store objects in the Active Directory. An object refers to something concrete such as a user, a printer, or a network share. Objects have mandatory and optional attributes that describe them. Some examples of the attributes of a user object are the first name, last name , e-mail address, phone number, and so on.

The following figure shows a container object called Wrox Press that contains some other objects; two user objects, a contact object, a printer object, and a user group object:

Schema

Every object is an instance of a class that is defined in the schema . The schema defines the types , and is itself stored within objects in the Active Directory. We have to differentiate between classSchema and attributeSchema . The types of objects are defined in the classSchema , as well as detailing what mandatory and optional attributes an object has. The attributeSchema defines what an attribute looks like, and what the allowed syntax for a specific attribute is.

We can define custom types and attributes, and add these to the schema. Be aware, however, that a new schema type can never be removed from the Active Directory. It's possible to mark it as inactive so that new objects cannot be created any more, but there can be existing objects of that type, so it's not possible to remove classes or attributes that are defined in the schema. The Windows 2000 Administrator doesn't have enough rights to create new schema entries; the Windows 2000 Domain Enterprise Administrator is needed here.

Configuration

Besides objects and class definitions that are stored as objects, the configuration of the Active Directory itself is stored within the Active Directory. The configuration of the Active Directory stores the information about all sites, such as the replication intervals, that is set up by the system administrator. The configuration itself is stored in the Active Directory, so we can access the configuration information like all other objects in the Active Directory.

Active Directory Domain

A domain is a security boundary of a Windows network. In the Active Directory domain, the objects are stored in a hierarchical order. The Active Directory itself is made up of one or more domains. The hierarchical order of objects within a domain is presented in the figure below, in which we see a domain represented by a triangle. Container objects such as Users , Computers , and Books can store other objects. Each oval in the picture represents an object, with the lines between the objects representing parent-child relationships. For instance, Books is the parent of .NET and Java , and Pro C# , Beg C# , and ASP.NET are child objects of the .NET object.

Domain Controller

A single domain can have multiple domain controllers, each of which stores all of the objects within the domain. There is no master server, and all DCs are treated equally; we have a multi-master model. The objects are replicated across the servers inside the domain.

Site

A site is a location in the network holding at least one DC. If we have multiple locations in the enterprise, which are connected with slow network links, we can use multiple sites for a single domain. For backup or scalability reasons, each site can have one or more DCs running. Replication between servers in a site can happen at shorter intervals due to the faster network connection. Replication is configured to occur at larger time intervals between servers across sites, depending on the speed of the network. Of course, the domain administrator can configure this.

Domain Tree

Multiple domains can be connected by trust relationships. These domains share a common schema , a common configuration , and a global catalog (we will talk about global catalogs soon). A common schema and a common configuration mean that this data is replicated across domains. Domain trees share the same class and attribute schema. The objects themselves are not replicated across domains.

Domains connected in such a way form a Domain Tree. Domains in a domain tree have a contiguous, hierarchical namespace . This means that the domain name of the child domain is the name of that child domain appended to the name of the parent domain. Between domains, trusts that use the Kerberos protocol are established.

For example, we have the root domain wrox.com , which is the parent domain of the child domains india.wrox.com and uk.wrox.com . A trust is set up between the parent and the child domains, so that accounts from one domain can be authenticated by another domain.

Forest

Multiple domain trees connected using a common schema, a common configuration, and a global catalog without a contiguous namespace, are called a forest. A forest is a set of domain trees. A forest can be used if the company has a sub-company where a different domain name should be used. Let's say that asptoday.com should be relatively independent of the domain wrox.com , but it should be possible to have a common management, and be possible for users from asptoday.com to access resources from the wrox.com domain, and the other way around. With a forest we can have trusts between multiple domain trees.

Global Catalog

A search for an object can span multiple domains. If we look for a specific user object with some attributes we have to search every domain. Starting with wrox.com , the search continues to uk.wrox.com and india.wrox.com; across slow links such a search could take a while.

To make searches faster, all objects are copied to the global catalog , GC . The GC is replicated in every domain of a forest. There's at least one server in every domain holding a GC. For performance and scalability reasons, we can have more than one GC server in a domain. Using a GC, a search through all the objects can happen on a single server.

The GC is a read-only cache of all the objects, which can only be used for searches; the domain controllers must be used to do updates.

Not all attributes of an object are stored in the GC. We can define whether or not an attribute should be stored with an object. The decision whether to store an attribute in the GC depends on the frequency of its use in searches. A picture of a user isn't useful in the GC, because you would never search for a picture. Conversely, the phone number would be a useful addition to the store. You can also define that an attribute should be indexed so that a query for it is faster.

Replication

As programmers we are unlikely to ever configure replication, but because it affects the data we store in the Active Directory, we have to know how it works. The Active Directory uses a multi-master server architecture. Updates can and will happen to every domain controller in the domain. The replication latency defines how long it takes until an update happens.

The configurable change notification happens, by default, every 5 minutes inside a site if some attributes change. The DC where a change occurred informs one server after the other with 30-second intervals, so the fourth DC can get the change notification after 7 minutes. The default change notification across sites is set to 180 minutes. Intra- and inter-site replication can each be configured to other values.
If no changes occurred, the scheduled replication occurs every 60 minutes inside a site. This is to ensure that a change notification wasn't missed.
For security-sensitive information such as account lockout immediate notification can occur.

With a replication, only the changes are copied to the DCs. With every change of an attribute a version number (USN, update sequence number) and a time stamp are recorded. These are used to help resolve conflicts if updates happened to the same attribute on different servers.

Let's look at one example. The mobile phone attribute of the user John Doe has the USN number 47. This value is already replicated to all DCs. One system administrator changes the phone number. The change occurs on the server DC1; the new USN of this attribute on the server DC1 is now 48, whereas the other DCs still have the USN 47. For someone still reading the attribute, the old value can be read until the replication to all domain controllers has occurred.

Now the rare case can happen that another administrator changes the phone number attribute, and here a different DC was selected because this administrator received a faster response from the server DC2. The USN of this attribute on the server DC2 is also changed to 48.

At the notification intervals, notification happens because the USN for the attribute changed, and the last time replication occurred was with a USN value 47. With the replication mechanism it is now detected that the servers DC1 and DC2 both have a USN of 48 for the phone number attribute. What server is the winner is not really important, but one server must win. To resolve this conflict the time stamp of the change is used. Because the change happened later on DC2 the value stored in the DC2 domain controller gets replicated.

Important

When reading objects, we have to be aware that the data is not necessarily current. The currency of the data depends on replication latencies. When updating objects, another user can still read some old values after the update. It's also possible that different updates can happen at the same time.

Characteristics of Active Directory Data

The Active Directory doesn't replace a relational database or the Registry but what kind of data would we store in it?

We have hierarchical data within the Active Directory. We can have containers that store further containers, and also objects. Containers themselves are objects, too.
The data should be used for read-mostly . Because of replication occurring at certain time-intervals, we cannot be sure that we will read up-to-date data. In applications we must be aware that the information we read is possibly not the current up-to-date information.
Data should be of global interest to the enterprise; this is because adding a new data type to theschema replicates to it all the servers in the enterprise. For data types that are only of interest to a small number of users, the Domain Enterprise Administrator wouldn't normally install new schema types.
The data stored should be of reasonable size because of replication issues. If the data size is 100K, it is fine to store this data in the directory if the data changes only once per week. However, if the data changes once per hour , then data of this size is too large. Always think about replicating the data to different servers: where the data gets transferred to, and at what intervals. If you have larger data it's possible to put a link into the Active Directory, and store the data itself in a different place.

To summarize, the data we store in the Active Directory should be hierarchically organized, of reasonable size, and important for the enterprise.

Schema

Active Directory objects are strongly typed. The schema defines the types of the objects, mandatory and optional attributes, and the syntaxes and constraints of these attributes. In the schema we can differentiate between class-schema and attribute-schema objects. A class is a collection of attributes. With the classes, single inheritance is supported. As can be seen in the following class diagram, the user class derives from the organizationalPerson class, organizationalPerson is a subclass of person , and the base class is top . The classSchema that defines a class describes the attributes with the systemMayContain attribute.

The diagram to the right shows only a few of all the systemMayContain values. Using the ADSI Edit tool, you can easily see all the values; we will look at this in the next section.

In the root class top we can see that every object can have common name ( cn ), displayName , objectGUID , whenChanged , and whenCreated attributes. The person class derives from top . A person object also has a userPassword and a telephoneNumber . OrganizationalPerson derives from person . In addition to the attributes of person it has a manager , department , and company; and a user has extra attributes needed to log on to a system: