The Architecture of Active Directory | Professional C# 2005 with .NET 3.0

Before starting to program Active Directory, you have to know how it works, what it is used for, and what data can be stored there.

Features

The features of Active Directory can be summarized as follows:

The data in Active Directory is grouped hierarchically. Objects can be stored inside other container objects. Instead of having a single, large list of users, users can be grouped inside organizational units. An organizational unit can contain other organizational units, so you can build a tree.
Active Directory uses a multimaster replication. With the Active Directory every domain controller (DC) is a master. With multiple masters, updates can be applied to any DC. This model is much more scalable than a single-master model, because updates can be made to different servers concurrently. The disadvantage of this model is more complex replication. Replication issues are discussed later in this chapter.
The replication topology is flexible, to support replications across slow links in WANs. How often data should be replicated is configurable by the domain administrators.
Active Directory supports open standards. The Lightweight Directory Access Protocol (LDAP) is one of the standards that can be used to access the data in Active Directory. LDAP is an Internet standard that can be used to access a lot of different directory services. With LDAP a programming interface, LDAP API, is also defined. The LDAP API can be used to access the Active Directory with the C language. Microsoft’s preferred programming interface to directory services is the Active Directory Service Interface (ADSI). This, of course, is not an open standard. In contrast to the LDAP API, ADSI makes it possible to access all features of Active Directory. Another standard used within Active Directory is Kerberos, which is used for authentication. The Windows Server Kerberos service can also be used to authenticate Unix clients.
With Active Directory, a fine-grained security is available. Every object stored in Active Directory can have an associated access-control list that defines who can do what with that object.

The objects in the directory are strongly typed, which means that the type of an object is exactly defined; no attributes that are not specified may be added to an object. In the schema, the object types as well as the parts of an object (attributes) are defined. Attributes can be mandatory or optional.

Active Directory Concepts

Before programming Active Directory, you need to know some basic terms and definitions.

Objects

Active Directory stores objects. An object refers to something concrete such as a user, a printer, or a network share. Objects have mandatory and optional attributes that describe them. Some examples of the attributes of a user object are the first name, last name, e-mail address, phone number, and so on.

Figure 42-1 shows a container object called Wrox Press that contains some other objects; two user objects, a contact object, a printer object, and a user group object.

image from book
Figure 42-1

Schema

Every object is an instance of a class defined in the schema. The schema defines the types and is itself stored in objects in Active Directory. You have to differentiate between classSchema and attributeSchema. The types of objects are defined in classSchema, as well as detailing what mandatory and optional attributes an object has. attributeSchema defines what an attribute looks like and what the allowed syntax for a specific attribute is.

You can define custom types and attributes and add these to the schema. Be aware, however, that a new schema type cannot be removed from Active Directory. You can mark it as inactive so that new objects cannot be created anymore, but there can be existing objects of that type, so it’s not possible to remove classes or attributes defined in the schema.

The user group Administrator doesn’t have enough rights to create new schema entries; the group Enterprise Admins is needed here.

Configuration

In addition to objects and class definitions stored as objects, the configuration of Active Directory itself is stored in Active Directory. The configuration of Active Directory stores the information about all sites, such as the replication interval, which is set up by the system administrator. Because the configuration itself is stored in Active Directory, you can access the configuration information like all other objects in Active Directory.

The Active Directory Domain

A domain is a security boundary of a Windows network. In the Active Directory domain, the objects are stored in a hierarchical order. Active Directory itself is made up of one or more domains. Figure 42-2 shows the hierarchical order of objects in a domain; the domain is represented by a triangle. Container objects such as Users, Computers, and Books can store other objects. Each oval in the picture represents an object, with the lines between the objects representing parent-child relationships. For example, Books is the parent of .NET and Java, and Pro C#, Beg C#, and ASP.NET are child objects of the .NET object.

image from book
Figure 42-2

Domain Controller

A single domain can have multiple domain controllers, each of which stores all of the objects in the domain. There is no master server, and all DCs are treated equally; you have a multimaster model. The objects are replicated across the servers inside the domain.

Site

A site is a location in the network that holds at least one DC. If you have multiple locations in the enterprise, which are connected with slow network links, you can use multiple sites for a single domain. For backup or scalability reasons, each site can have one or more DCs running. Replication between servers in a site can happen at shorter intervals due to the faster network connection. Replication is configured to occur at larger time intervals between servers across sites, depending on the speed of the network. Of course, replication intervals can be configured by the domain administrator.

Domain Tree

Multiple domains can be connected by trust relationships. These domains share a common schema, a common configuration, and a global catalog (more on global catalogs shortly). A common schema and a common configuration imply that this data is replicated across domains. Domain trees share the same class and attribute schema. The objects themselves are not replicated across domains.

Domains connected in such a way form a domain tree. Domains in a domain tree have a contiguous, hierarchical namespace. This means that the domain name of the child domain is the name of that child domain appended to the name of the parent domain. Between domains, trusts using the Kerberos protocol are established.

For example, you have the root domain wrox.com, which is the parent domain of the child domains india.wrox.com and uk.wrox.com. A trust is set up between the parent and the child domains, so that accounts from one domain can be authenticated by another domain.

Forest

Multiple domain trees that are connected by using a common schema, a common configuration, and a global catalog without a contiguous namespace are called a forest. A forest is a set of domain trees; it can be used if the company has a subcompany where a different domain name should be used. Here’s one example: wrox.com should be relatively independent of the domain wiley.com, but it should be possible to have a common management, and be possible for users from wrox.com to access resources from the wiley.com domain and vice versa. With a forest you can have trusts between multiple domain trees.

Global Catalog

A search for an object can span multiple domains. If you look for a specific user object with some attributes, you have to search every domain. Starting with wrox.com, the search continues to uk.wrox.com and india.wrox.com; across slow links such a search could take a while.

To make searches faster, all objects are copied to the global catalog (GC). The GC is replicated in every domain of a forest. There’s at least one server in every domain holding a GC. For performance and scalability reasons, you can have more than one GC server in a domain. Using a GC, a search through all the objects can happen on a single server.

The GC is a read-only cache of all the objects that can only be used for searches; the domain controllers must be used to do updates.

Not all attributes of an object are stored in the GC. You can define whether or not an attribute should be stored with an object. The decision whether to store an attribute in the GC depends on how the attribute is used. If the attribute is frequently used in searches, putting it into the GC makes the search faster. A picture of a user isn’t useful in the GC, because you would never search for a picture. Conversely, a phone number would be a useful addition to the store. You can also define that an attribute should be indexed so that a query for it is faster.

Replication

As a programmer, you are unlikely ever to configure replication, but because it affects the data you store in Active Directory, you have to know how it works. Active Directory uses a multimaster server architecture. Updates happen to every domain controller in the domain. The replication latency defines how long it takes until an update starts:

The configurable change notification happens, by default, every 5 minutes inside a site if some attributes change. The DC where a change occurred informs one server after the other with 30-second intervals, so the fourth DC can get the change notification after 7 minutes. The default change notification across sites is set to 180 minutes. Intra- and intersite replication can each be configured to other values.
If no changes have occurred, the scheduled replication occurs every 60 minutes inside a site. This is to ensure that a change notification wasn’t missed.
For security-sensitive information, such as account lockout, immediate notification can occur.

With a replication, only the changes are copied to the DCs. With every change of an attribute a version number (update sequence number or USN) and a time stamp are recorded. These are used to help resolve conflicts if updates happened to the same attribute on different servers.

Here’s an example. The mobile phone attribute of the user John Doe has the USN number 47. This value is already replicated to all DCs. One system administrator changes the phone number. The change occurs on the server DC1; the new USN of this attribute on the server DC1 is now 48, whereas the other DCs still have the USN 47. For someone still reading the attribute, the old value can be read until the replication to all domain controllers has occurred.

The rare case can happen that another administrator changes the phone number attribute, and a different DC is selected because this administrator received a faster response from the server DC2. The USN of this attribute on the server DC2 is also changed to 48.

At the notification intervals, notification happens because the USN for the attribute changed, and the last time replication occurred was with a USN value of 47. The replication mechanism now detects that the servers DC1 and DC2 both have a USN of 48 for the phone number attribute. Which server is the winner is not really important, but one server must definitely win. To resolve this conflict, the time stamp of the change is used. Because the change happened later on DC2, the value stored in the DC2 domain controller is replicated.

Important

When reading objects, you have to be aware that the data is not necessarily current. The currency of the data depends on replication latencies. When updating objects, another user can still read some old values after the update. It’s also possible that different updates can happen at the same time.

Characteristics of Active Directory Data

Active Directory doesn’t replace a relational database or the registry, so what kind of data would you store in it?

With Active Directory you get hierarchical data. You can have containers that store further containers and objects, too. Containers themselves are objects as well.
The data should be used for read-mostly. Because of replication occurring at certain time intervals, you cannot be sure that you will read up-to-date data. You must be aware that in applications the information you read is possibly not the current up-to-date information.
Data should be of global interest to the enterprise, because adding a new data type to the schema replicates it to all the servers in the enterprise. For data types of interest only to a small number of users, the domain enterprise administrator normally wouldn’t install new schema types.
The data stored should be of reasonable size because of replication issues. It is fine to store data with a size of 100K in the directory, if the data changes only once a week. However, if the data changes every hour, then the data of this size is too large. Always think about replicating the data to different servers: where the data gets transferred to and at what intervals. If you have larger data, it’s possible to put a link into Active Directory and store the data itself in a different place.

To summarize, the data you store in Active Directory should be hierarchically organized, of reasonable size, and of importance to the enterprise.

Schema

Active Directory objects are strongly typed. The schema defines the types of the objects, mandatory and optional attributes, and the syntax and constraints of these attributes. In the schema, it is necessary to differentiate between class-schema and attribute-schema objects. A class is a collection of attributes. With the classes, single inheritance is supported. As you can see in Figure 42-3, the user class derives from the organizationalPerson class, organizationalPerson is a subclass of person, and the base class is top. The classSchema that defines a class describes the attributes with the systemMayContain attribute.

image from book
Figure 42-3

Figure 42-3 shows only a few of all the systemMayContain values. Using the ADSI Edit tool, you can easily see all the values; you look at this tool in the next section. In the root class top you can see that every object can have common name (cn), displayName, objectGUID, whenChanged, and whenCreated attributes. The person class derives from top. A person object also has a userPassword and a telephoneNumber. OrganizationalPerson is derived from person. In addition to the attributes of person, it has a manager, department, and company, and a user has extra attributes needed to log on to a system.