An Overview of the Organization

Understanding and Deploying LDAP Directory Services > 24. Case Study: A Large University > Directory Service Design

< BACK

CONTINUE >

153021169001182127177100019128036004029190136140232051053054012003008084017130129233141

Directory Service Design

In this section we discuss Big State's directory design and how it was developed.

Needs

The main applications driving Big State's directory deployment were the online phone book and the bigstate.edu email service. The phone book application required that the directory be populated with up-to-date white pages information about university faculty, staff, and students. Big State wanted to provide this information to internal and external users. The end users would be able to modify some of the information about themselves ; other information would come only from official university data sources. Because this application would be user -driven, response time would be important, but overall aggregate performance requirements would be minimal. For example, Big State expected the phone book application to receive on the order of a few thousand accesses per day at most.

The bigstate.edu email service was designed to bring some order to the chaotic post-mainframe email environment emerging at Big State. Ever since users began leaving the mainframe system, literally dozens of local email systems had been popping up all over the Big State campus. Big State had no authority or desire to dictate the email systems used on campus ”it just wanted to provide the addressing consistency users enjoyed in the mainframe days while still maintaining the email diversity required by today's users. This was the purpose of the bigstate.edu email service, whose architecture is illustrated in Figure 24.1.

Figure 24.1 The architecture of the `bigstate.edu` email service.

The bigstate.edu email service was designed to give everyone at Big State a short, consistent, easy-to-remember email address in the form firstname . lastname @bigstate.edu or userid @bigstate.edu . This address follows a Big State email user during his or her entire association with the university, regardless of the local system on which the user might actually receive mail. Naturally, a necessary level of indirection is needed to insulate an email address from the user's actual location; the key component of this service is the directory. To deliver mail to email addresses based on names , the system required a name collision policy. To deliver to an address based on user ID, it required a mechanism for maintaining campus-wide unique user IDs.

Another key feature of the service is the ability for end users to create groups or mailing lists that receive mail at the bigstate.edu domain, a feature users of the mainframe system enjoyed. To implement email groups with a directory service, the directory deployment team developed specialized client software for creating and updating groups, and it imposed additional schema and performance requirements.

The overall performance requirements of the directory were driven by the most demanding directory application, which turned out to be the bigstate.edu email service. The Big State directory designers estimated that two- thirds of its 75,000 users would be active email users, receiving an average of three messages per day. If it takes three directory operations to deliver each piece of mail, you can see the load on the directory is substantial.

Throwing mailing lists into the mix increased the directory load even more. At Big State, users create more than 40,000 mailing lists, some with thousands of members (although the average group contains approximately 150 members ). Delivering mail to these lists imposes a much greater load on the directory, and their creation and maintenance imposes an additional update load. This affects general directory performance as well as replication performance.

Data

The primary drivers behind the directory service revolved around the directory-enabled applications slated for deployment. These applications had well-defined schema requirements, which made for simpler identification of the necessary data elements. The broadest data requirements came from the online phone book, which aims to provide access to a wide range of the usual white pages information, such as name, title, address, phone number, organization, and so on. More-focused requirements came from the bigstate.edu email service, which needs local email addresses for users.

Data that might be useful for seeding the phone book application's white pages information was available from a number of sources within the university. In fact, Big State staff did a survey recently in which they identified 17 official university databases holding name and address information. Nevertheless, the directory designers identified two main sources of information: the university's Personnel database for faculty and staff and the Registrar's database for students. These databases were chosen for political as well as technical value.

The Personnel database had one particularly attractive quality: It was already used to publish the printed university faculty and staff directory. This made it easy to convince the keepers of this data that it should be released for the purposes of creating an online directory. The student data was also released after a similar argument was made regarding student phone information already available in the campus locator phone service.

One helpful practice was that the personnel department and the registrar both provided ways for users to request that their information be left out of any publications . This made everyone feel that enough choice had been given to users who did not want their information published.

20-20 Hindsight: Data Population

The ability for users to opt out of inclusion in the online directory was a subject of controversy for two reasons. First, the default action was to include all users, and then any user who did not want to be included had to take some action to be excluded (typically checking a box on a paper form). In the case of the student database, this box had to be checked once a year by the user, or the information would once again be published under university guidelines.

Many users believed that the default should be to omit students' information, and then anyone who wanted to be listed would have to check a box on a form. Directory service administrators did not like this scenario very well because it virtually guaranteed a decrease in the population of the directory. In fact, there was significant worry that such a scheme would cause the directory to fail to achieve a critical mass of data.

Second, many Big State users felt that electronic distribution of information, such as in the directory, is fundamentally different from the paper distribution of information they are used to, such as the printed university phone book. These users felt that signing up to be published in the printed directory should not automatically sign them up to have their information published in the online directory, especially for the Internet at large to see.

These kinds of tensions are typical when deploying an electronically available service using traditional printed data. Big State's answer to these problems was to allow users to opt out of the online directory independently from opting in or out of any other directories.

Another barrier against obtaining and deploying data from official university data sources was political. Big State's IT division was historically comprised of two parts : a traditional, mainframe-based corporate computing department supporting the university's business processes such as payroll, finance, and student registration; and a more progressive Internet and desktop computing department supporting the university's research and teaching missions. The directory was deployed by the latter group, but the databases holding the required information were managed by the former group. In retrospect, the Big State designers should have involved the corporate computing staff more in the design and deployment of the directory. This would undoubtedly have made the data acquisition process smoother and less troublesome . In particular, subsequent data acquisitions projects, such as adding temporary employees to the directory, would probably have been easier.

After an agreement was reached on the data to populate the directory with, procedures were developed for actually obtaining the data, reformatting it correctly, and augmenting it for inclusion in the directory. Procedures were also developed for maintaining the data through subsequent feeds from the source database. These procedures are described in more detail later in this chapter.

Another important kind of information required by the bigstate.edu email service was local email address information for users. This information proved to be much harder to obtain than the white pages information. The reason for this was simple: There is no centrally maintained database containing the required information. Instead, it is scattered around the campus in databases or applications maintained by local system administrators, and sometimes by users themselves.

To overcome this problem and populate the directory with useful information, the Big State designers took two courses of action. First, they worked directly with administrators of the larger systems on campus to develop tools to extract email addresses from their systems. Second, they developed a program with which campus administrators could submit lists of email addresses to the directory. Campus administrators had an incentive to do this because their users would be able to use the bigstate.edu email service. (A future email service developed by Big State and described later in this chapter automatically updated user email addresses at user registration time.)

Another category of data that Big State could store in the directory was administrative data. These data elements contain information used to manage other data elements. For example, Big State includes an expires attribute indicating when an entry scheduled for deletion will be removed. Another example is the noBatchUpdates attribute, which is used to indicate that a user does not wish his or her entry to be updated from official data sources.

As advised in Chapter 6, "Data Design," the Big State directory designers created a table showing the information to be contained in the directory, its source, and who owns the information. This information determined the Big State directory data source diagram, which is shown in Figure 24.2.

Figure 24.2 Big State directory data source diagram.

Schema

The schema used in the central directory is comprised of two basic sets of schema definitions, one representing people and the other groups. The schema for representing people is taken from the standard person schema definition, extended with a few extra fields required by the Big State deployment. For example, Big State added a universityID attribute to hold the university-wide unique identification number. This attribute is used as a common key with external data sources, allowing entries in the directory to be matched up with the corresponding data from an external source.

Other new attributes were added to help keep track of various data handling and other procedures. For example, attributes are used for tracking data sources, noting the expiration time of entries, controlling whether entries are updated from corporate data sources, and other purposes. Attributes were also created to facilitate Big State's directory authentication scheme based on Kerberos, as described later in this chapter, as well as its proxy access control scheme.

The schema for representing groups was created from scratch in conjunction with the design of the bigstate.edu mail routing software. This software was written and designed by Big State staff because no commercial software found at the time satisfied the requirements. The existing standard group schema definitions also proved to be inadequate. For example, the standard group definition requires every member of a group to have a directory entry, making it difficult to create mailing lists that include non-university members. The Big State group definition, on the other hand, allows for both directory and email members. The group schema definition designed by Big State is shown in Listing 24.1 (it is somewhat abbreviated and annotated for clarity).

Listing 24.1 Big State group schema definition

objectclass rfc822MailGroup requires objectClass, owner, # DN of the owner of the group cn # used to name the group entry allows associatedDomain, # domain name associated with the group joinable, # flag indicating if others can join mail, # email members member, # directory members memberOfGroup, # used for nested groups moderator, # moderator of the group requestsTo, # DN to receive list maintenance mail rfc822RequestsTo, # email to receive -request mail rfc822ErrorsTo, # email address for delivery reports errorsTo, # DN to receive delivery errors suppressNoEmailError, # flag indicating if no members are ok ... # other attributes

20-20 Hindsight: Schema Design

Big State's schema definitions proved to be valuable . It would have been a good idea, however, to prefix the names of attributes specific to Big State with some kind of string that would ensure they would not collide with other attribute definitions. Some of Big State's definitions were general-purpose, suitable for use by other institutions, but others were specific to Big State. Making this distinction clear through a naming convention would have made Big State's schema easier to use by other organizations.

Namespace

Two future requirements led Big State to the namespace design it chose. First, Big State wanted the directory to be extensible so it can store other kinds of objects (not just users and groups) as future applications arose. This requirement led Big State to choose the high-level partition-by-object-type namespace recommended in Chapter 8, "Namespace Design."

Second, Big State imagined that at some later time it might want to partition and delegate portions of the directory to different units on campus. The medical campus and the College of Engineering were two likely candidates with the desire and necessary expertise to maintain portions of the directory. To facilitate this future possibility, Big State eschewed the advice to create a flat namespace; instead, it opted for a namespace in the people portion of the tree based on organizational hierarchy. To its credit, Big State let this hierarchy descend only one level. Because of difficulty in matching up the two data sources (one for faculty and staff, and one for students), Big State also decided to separate this data using the namespace (see Figure 24.3).

Figure 24.3 The Big State directory namespace.

To name individual people entries, Big State chose people's actual names. The names, taken from the official university staff and student databases, were constructed whenever possible to include a first name, middle initial, and last name (e.g., Barbara J Jensen). This was done in an effort to reduce the likelihood of name collisions. Recall that Big State wanted to reserve a unique email address based on name for each user.

When collisions do occur, uniqueness is guaranteed through data maintenance procedures that append a number to each name. For example, the first Barbara J. Jensen who comes to the university would be given the number 1. If another Barbara J. Jensen arrives, she would be given the number 2. The two entries would be named using the relative distinguished names cn=Barbara J Jensen 1 and cn=Barbara J Jensen 2 . These data maintenance procedures turned out to be rather complicated, as described in "20-20 Hindsight: Data Population" earlier in the chapter.

Although not part of the namespace, Big State also maintained a userid attribute that was unique across the directory. The attribute was populated and maintained from an existing database of campus-wide login names.

20-20 Hindsight: Namespace Design

Big State's decision to design an extensible namespace proved to be valuable. After the directory infrastructure was in place, many new applications were deployed and many new types of entries were added to the directory, including entries for documents and images. The extensible namespace made this possible with minimal disruption of the service.

Big State's decision to design a deeply hierarchical namespace proved to be a mistake, however. For a variety of reasons, the planned delegation of information to other units on campus never materialized. In retrospect, this possibility was not well-thought-out. With modern software, hierarchy is not necessary to delegate responsibility, and the additional hierarchy caused numerous problems in the operation of the service. For example, special tools had to be developed to handle the name changes that occur when a user moves from one department to another or from student to staff. Despite these special tools, name changes proved to be a continual plague on the directory administrators, requiring special handling.

The hierarchy also meant that users who have appointments in multiple departments are listed in only one department. The same is true for users who are both staff and student, a fairly common occurrence. This can be confusing to users. Finally, the hierarchy also led to longer names, wasting space and network bandwidth during replication. In summary, the hierarchy caused several problems and brought no perceptible benefit.

Another bad decision in hindsight was the use of names to form RDNs. Users often did not like the use of the first-middle-initial-last form of their name, and maintaining the numeric appendages for names was an administrative burden . A better choice for RDNs would have been the userid attribute, whose uniqueness is already maintained and which would have been less apt to encounter user resistance.

Topology

The topology design of the Big State directory service was driven by the requirements of the applications. These applications need to search the people and group portions of the namespace, so those portions of the directory need to be kept together to make these searches efficient. Big State's network is relatively fast and well-connected, indicating no need to partition the directory for performance reasons. Although as mentioned earlier Big State had thoughts of delegating portions of the directory to other units on campus, there was no immediate need to do so. Therefore, Big State decided to keep the directory together in a single server, making the topology very simple.

Replication

Two requirements drove the Big State directory replication design. The first requirement was for the service to always be up and available; the online phone book and email services depending on the directory are mission-critical and must be as available as possible. The second requirement was a certain high level of performance. The directory had to have sufficient capacity to support the directory-enabled applications using it, including the online phone book and email applications driving the directory's deployment, as well as the additional applications that would be deployed later. A replication architecture that would support this kind of incremental capacity increase was an explicit goal.

In the Big State directory replication architecture, a single-master server handles updates and feeds directory replicas serving various directory-enabled applications (see Figure 24.4). Initial deployment plans called for two small replicas to serve the online phone book application and three large replicas to serve the three directory-enabled email machines providing the bigstate.edu mail service. Partitioning directory usage based on the type of application makes it easier to track directory usage, bring down parts of the service for maintenance without affecting the rest of the service, and increase capacity when needed.

Figure 24.4 The Big State directory replication architecture.

20-20 Hindsight: Replication

The Big State replication architecture was the right idea, but two aspects of the design needed attention soon after deployment. These problems were discovered as directory traffic increased because of the overwhelming popularity of the bigstate.edu email service.

First, the email service proved to be more of a directory hog and more popular than anticipated. This necessitated the deployment of two extra email machines with corresponding directory services to handle the load. Luckily, with the change described next , the replication architecture was able to handle this increase.

Second, the double burden of servicing all updates and feeding all replicas turned out to be too much for the single-master server to handle. The solution was to split this responsibility between the master server and a new server that now acts as replication supplier. The master server handles all updates and feeds the replication supplier, which feeds all the replicas.

When more replicas are needed than the replication supplier can feed, a second level of replication suppliers can be added. In this configuration (see Figure 24.5), the replication supplier feeds two or more second-level replication suppliers, which feed the replicas servicing directory-enabled applications. In this way, the number of directory replicas can be increased indefinitely.

Figure 24.5 The revised Big State replication architecture.

Privacy and Security

Privacy and security were paramount concerns for the Big State directory designers. In a university, the general computing environment is relatively open, and Big State has no firewall to protect services such as the directory from the Internet at large. This means the directory service is open to access as well as attack. Unfortunately, the university population includes a large number of students, some of whom are notorious for having too much time, cleverness , and mischievous intent on their hands. These factors combine to produce an environment rife with an impressive array of threats to directory security and privacy.

Because the Big State directory provides a white pages service, it contains personal information about directory users ”information that must be protected. The directory also serves various directory-enabled applications that are considered critical to the mission of the university. Making sure the applications have secure access to accurate directory data is a requirement.

Most of the attributes held in the directory need to have their integrity protected. This means that directory clients must be assured that the information they read from the directory is authentic . A few attributes also need their privacy protected, such as the universityID attribute, which often contains a user's United States Social Security number. This attribute should be accessible only to directory administrators and select directory content administrators such as the help desk. In addition to privacy requirements, all attributes need to be protected from unauthorized tampering. The Big State directory designers constructed an access control scheme that separates the directory attributes into categories with different security requirements, meeting all these requirements. ACLs were constructed to protect each category appropriately.

Another requirement was to support delegated administration. Many faculty and staff members do not have the time or expertise necessary to update their own information in the directory, and they wanted to delegate this task to a departmental administrator or secretary. The Big State directory designers constructed a proxy access control scheme to make this possible. This scheme worked by defining an ACL allowing any distinguished name listed in the special proxy attribute of an entry to have appropriate access to the entry. This way, users can control access to their own entries simply by adding an attribute value. There is no need to modify any directory ACLs.

One important security issue was that Big State wanted to be able to leverage the existing campus Kerberos authentication service for the directory. By "kerberizing" the directory, the designers avoided designing a new authentication system and distributing and maintaining new passwords. Also, using Kerberos allowed the many thousands of Kerberos users on campus to begin their directory life with a password they already knew. This proved to be a great boon to directory use on campus. The only downside was that it required special development on both directory servers and clients. Big State found that even today no directory products support Kerberos out of the box, significantly adding to the cost and difficulty of maintaining and upgrading the existing service.

Privacy is an equally important concern in the directory. In a university environment, users are accustomed to having more control over their personal information than they might have at a big corporation. Big State is no exception, so the directory designers set out to design a system that provides maximum flexibility for directory users. This included the ability for users to opt out of the directory entirely or to hide or publish various attributes such home address information. This capability was accomplished through the use of content-based ACLs, which is similar to the targetfilter capability of Netscape Directory Server described in Chapter 11, "Privacy and Security Design."

Understanding and Deploying LDAP Directory Services, 2002 New Riders Publishing

< BACK

CONTINUE >

Index terms contained in this section

administration
delegated
Big State University
Big State University case study
data 2nd 3rd 4th 5th 6th 7th
administrative
email addresses 2nd
obtaining
online phonebook entries
Personnel database
political barriers
privacy issues 2nd
leveraging applications 2nd
namespaces 2nd 3rd 4th 5th
hierarchy design 2nd
individual entries 2nd
partition-by-object
RDNs
needs 2nd 3rd 4th
email services 2nd 3rd 4th 5th 6th
online phonebook
privacy and security 2nd 3rd
delegated administration
user information
replication 2nd 3rd 4th
schema 2nd 3rd
definition (listing) 2nd
topology
case studies
Big State University
data 2nd 3rd 4th 5th 6th 7th
leveraging applications 2nd
namespaces 2nd 3rd 4th 5th 6th
needs 2nd 3rd 4th
privacy and security 2nd 3rd 4th 5th
replication 2nd 3rd 4th
schema 2nd 3rd
topology
data
Big State University case study 2nd 3rd 4th 5th 6th 7th
administrative
email addresses 2nd
obtaining
online phonebook entries
Personnel database
political barriers
privacy issues 2nd
delegated adminstration
Big State University
directories
case studies
Big State University 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st 22nd 23rd 24th 25th 26th 27th 28th 29th 30th 31st 32nd
listings
schema
definitions 2nd
namespaces
Big State University case study 2nd 3rd 4th 5th
hierarchy design 2nd
naming individual entries 2nd
partition-by-object
RDNs
needs
Big State University case study 2nd 3rd 4th
email services 2nd 3rd 4th 5th 6th
online phonebook
privacy
Big State University case study 2nd 3rd 4th 5th
delegated adminstration
leveraging applications 2nd
user information
replication
Big State University case study 2nd 3rd 4th
schema
Big State University case study 2nd 3rd
definition (listing) 2nd
security
Big State University case study 2nd 3rd
delegated administration
leveraging applications 2nd
user information
privacy
Big State University case study 2nd
topologies
Big State University case study
users
security
Big State University case study

2002, O'Reilly & Associates, Inc.

Directory Service Design

Needs

Figure 24.1 The architecture of the bigstate.edu email service.

Data

20-20 Hindsight: Data Population

Figure 24.2 Big State directory data source diagram.

Schema

Listing 24.1 Big State group schema definition

20-20 Hindsight: Schema Design

Namespace

Figure 24.3 The Big State directory namespace.

20-20 Hindsight: Namespace Design

Topology

Replication

Figure 24.4 The Big State directory replication architecture.

20-20 Hindsight: Replication

Figure 24.5 The revised Big State replication architecture.

Privacy and Security

Index terms contained in this section

Figure 24.1 The architecture of the `bigstate.edu` email service.