What a Directory Is

   

Most people are familiar with various kinds of directories, whether they realize it or not. Directories are part of our everyday lives. Everyday examples of directories we encounter include phone books (white and yellow pages), TV Guide , shopping catalogs, library card catalogs, and others. We refer to these directories as everyday directories , or sometimes offline directories .

With these examples as a guide, it's clear that directories help people find things by describing and organizing the items to be found. Information in such directories ranges from phone numbers to television shows, from consumer goods to reference material, and more.

Directories in the computer and networking world are similar in many ways, but with some important differences. We call these directories online directories . Online directories differ from offline directories in the following ways:

  • Online directories are dynamic .

  • Online directories are flexible .

  • Online directories can be made secure .

  • Online directories can be personalized .

These differences are explored in the following sections.

It's also important to understand the different kinds of directories. We divide directories into the following categories:

  • Application-specific directories . These come bundled with or embedded into an application. In many cases, you may not think of them as directories at all because their function is so tightly integrated into the application of which they are part. Examples include the IBM/Lotus Notes Name and Address book, the Microsoft Exchange directory, and the aliases file used by the sendmail message transfer agent (MTA).

  • Network operating system ( NOS ) “based directories . Directories such as Novell's eDirectory (originally named Netware Directory Services, or NDS), Microsoft's Active Directory, Sun Microsystems' Network Information Service (NIS), and Banyan's StreetTalk directory were designed to meet the needs of a NOS.

  • Purpose-specific directories . These are not tied to an application but are designed for a narrowly defined purpose and are not extensible. Examples are the Internet's Domain Name System (DNS) and centralized Internet phone directories such as Switchboard directory (http://www.switchboard.com).

  • General-purpose, standards-based directories . These are developed to serve the needs of a wide variety of applications. Examples include the LDAP directories we focus on in this book and X.500-based directories.

This chapter makes reference to all four types of directories. Our focus, however, is squarely on the general-purpose type of directory.

Directories Are Dynamic

The everyday directories you are familiar with are relatively static; that is, they do not change often. For example, the phone book is reissued only once a year; you have to call a public directory assistance service such as 411 to get more up-to-date information. A new TV Guide is produced every week, but still your favorite show is preempted without notice more often than you'd like. The shopping catalogs you receive in the mail are updated only several times a year, at most; also, they do not contain such useful information as which items are in stock and in which colors and sizes. Why? Because that information changes so often that by the time the catalog got to you, it would be out-of-date.

By contrast, online directories can be kept much more up-to-date. This feature is not always used, of course. Directories are usually only as up-to-date as their administrators choose to keep them. Sometimes administrative procedures are put in place to update the directory automatically. Often online directories are much better if they are their own ultimate authority for the information they hold. As soon as information changes, it can be updated in the directory and made available to users.

It's easy to see how this online update capability can be used to make directories more accurate, resulting in a more useful directory. This kind of improvement is incremental. But online updates have the potential to produce more revolutionary improvements too. These improvements open the door to brand-new directory applications that have no offline analogy.

For example, consider a directory that contains up-to-date information on who's employed at your organization. Such a directory could be consulted by an automated card reader to authorize access to buildings and rooms at your company. In this case, you could revoke access easily and instantly simply by making a change to the directory.

As another example, consider a directory containing location information that is updated as you move from office to office, from hotel room to hotel room, and to other locations. This directory could be consulted to route your phone calls, faxes, and messages to you wherever you are. Traditional paper directories could never be used for such a purpose. The very nature of this application requires frequent updates of the information.

The superior update capacity of online directories not only tends to keep information more up-to-date; it also can be used to distribute the update responsibility. The closer information is to its source, the more accurate and timely the information is likely to be, for at least three reasons:

  1. The source of the information is, by definition, the most accurate.

  2. Extra delay and opportunity for error between the source and the directory are eliminated if the source makes the update itself.

  3. Depending on the information and the application, the source is likely to be the party most motivated to maintain the information correctly.

To illustrate , consider the location directory example described previously. The source is the user (you), and the information is your current location. Who knows better than you where you are? Which path is more accurate for receiving updates: the path directly from you or the one from your administrative assistant (your typing skills notwithstanding)? Suppose that the update came from a directory administrator typing in information reported by your assistant relayed from you? At each step, opportunity for error is introduced, accuracy decreases, and the cost increases as more people are involved. Finally, who is most motivated to have accurate information about you in the directory? Again, it is likely to be you, the source, because you do not get your phone calls, faxes, and mail unless the information is accurate. Of course, this example assumes that you are responsible enough to want the information to be accurate and that you have the tools and expertise to make it happen.

Directories Are Flexible

Another important difference between static, everyday directories and online directories is that online directories offer far greater flexibility in two areas:

  1. The types of information they can store

  2. The ways in which that information can be organized and searched

Flexible Content

Offline directories are static in terms of their content. By that we mean that offline directories contain a restricted and seldom extended set of information. For example, if you want to know something beyond the phone number, address, and name information provided by your phone book, you are probably out of luck. But there is a whole host of other useful information that you might like to have. Fax number, mobile phone number, pager number, e-mail address, even a picture or short biographical sketch, to name a few, are all items in the same category as the traditional phone book information. But these items are seldom, if ever, included.

By contrast, online directories can easily be extended with new types of information. The cost of additions like these is huge with printed directories but relatively small with online directories. A printed directory would need to be redesigned, reprinted, and redistributed, at enormous cost. The cost of printing the previous directory cannot be leveraged much at all.

Online directories, however, are typically designed to be extended without a redesign. There is no need for reprinting because changes are reflected automatically and immediately. Nor is there a need to redistribute the directory because clients access the directory online and do not keep their own copy. Some clients may cache or replicate portions of the data, but these copies can be updated automatically.

For simple economic and practical reasons, a printed directory is usually not extended in this way unless a large majority of the directory's users are clamoring for the information. First, as a producer of a printed directory, you could not afford to double or triple the size of your directory to include more information without a compelling reason; doing so could double or triple your cost in producing the directory. Second, from a practical standpoint the directory itself could become unwieldy and inconvenient for the very customers you are trying to serve.

An online directory, on the other hand, can be extended without such costs. Adding a new data item used by only a small proportion of your users suddenly becomes cost-effective . The cost is incremental to the cost of providing the basic service. It may involve only adding some more disk space to your system and marginally increasing backup time, management, and support costs. No inconvenience is experienced by users of your service, however, because they need not even see the additional information. Customers who want the new information can easily get it. An economic incentive exists as well: You could charge extra for these premium directory services.

Flexible Organization

The second way online directories provide more flexibility is in how they let you organize data. Let's continue with the phone book example. The phone book contains name, phone number, and address information, organized to facilitate searching by name. If you wanted to search by phone number or by address, you would find it difficult, to say the least.

Other specialized directories that are organized to facilitate these kinds of searches may exist, but there is no guarantee of consistency with differently organized directories. Your phone book organized by name might be more or less up-to-date than a special phone book organized by phone number. Such directories contain duplicate information, often leading to inconsistencies and out-of-date information. Also, such directories are usually not readily available, and they are usually expensive. The types of data organization that can be supported are limited. They are also limited by the nature of the medium on which the directories are distributed (for example, paper) and by the capabilities of their end users (people without specialized training, perhaps).

By contrast, online directories can support several kinds of data organization simultaneously . The online analogy to your printed phone book can easily let you search by name, phone number, address, or other information. Furthermore, online directories can provide more advanced types of searches that would be difficult or impossible to provide in printed form.

For example, if you are not sure of the spelling of a name, an online directory can let you search for names that sound like the one you provide. It can also provide searches based on common misspellings, substrings of names , and other variations. These different kinds of searches can be performed simultaneously or in a defined order (for example, an exact search first, then a sounds-like search, then a substring search, and so on) until a match is found. This kind of power in searching is key to providing users with the kind of "do what I mean" behavior they often desire .

Directories Can Be Secure

Offline directories offer little, if any, security. The phone book, for example, is public. Your company's printed internal phone book may have "do not distribute outside the company" stamped on it in big red letters , but this kind of security is advisory at best. Such a lack of security reduces the number of applications that can be served by an offline directory. It also forces users to make difficult choices, if any choice is available to them at all. Most people are familiar with unlisted phone numbers, a service most phone companies offer for a premium fee. Opting out of the directory makes your number less available to telemarketers and other annoying callers , but it also makes your number unavailable to people you probably want to have it.

The root of this problem is the lack of any security in an offline directory. Either its information is accessible to anybody with access to the directory, or information can be left out of the directory and accessible to nobody. This limitation is a natural consequence of the methods used to distribute and access offline directories. Distribution is often wide, and everybody gets a personal copy. The access method consists of flipping through pages or calling a public directory assistance service, such as 411. None of these methods provide any way of determining who is accessing the directory and, therefore, what information they should be able to access.

Online directories can solve these problems. Online directories centralize information, allowing access to that information to be controlled. Clients accessing the directory can be identified through a process called authentication . Simply put, authentication is the process by which a directory client establishes an identity with a directory service, typically by providing some credentials, such as a password, that prove the client is who it says it is. In conjunction with access control lists (ACLs) and other information, such as time of day or the IP address of the client, the directory can use the identity established during authentication to make authorization decisions about which clients have access to what information in the directory.

Returning to our phone book analogy, consider how security features such as ACLs would change the situation. You could be listed in the directory, but your information would be accessible to only a subset of directory clients. You might be able to specify this subset as a list of friends . You might be able to specify according to certain criteria, such as " anyone who lives on my block." You could allow your information to be available to everyone except a list of people you specify. The possibilities go on, and the results are powerful.

It's important to realize that even this level of powerful and flexible security is not a panacea. For example, ACLs can be effectively, if somewhat awkwardly, defeated by a trusted employee who copies confidential information off his or her screen and distributes it outside the company. Still, online directories have security capabilities that are far more advanced than those of offline directories.

Directories Can Be Personalized

Another difference between printed directories and online directories is the degree to which each can be personalized. There are two aspects to this personalization:

  1. Personalized delivery of service to users of the directory

  2. Personalized treatment of information contained in the directory

TV Guide and the phone book are personalized on a regional basis. But everyone accesses the same card catalog at the library, and everyone probably gets the same L.L.Bean catalog. Furthermore, everyone within the same region gets the same phone book or TV Guide . It would be nice to get catalogs tailored to your specific interests, a phone book organized to do searches in the way you prefer, or a card catalog that remembers the kinds of books you like. This is the first aspect of personalization: the capability to deliver information tailored to your needs as an information consumer.

The second aspect of personalization concerns your ability to determine who has access to information about you and other things. This is your ability to tailor the directory to your needs as an information provider. In offline directories, as we saw previously, you have only two broad choices about the accessibility of directory information about yourself: You can either be included in the directory or not ”with no in-between. Furthermore, many directories do not even provide this choice. Trying to get yourself unlisted can be frustrating and time-consuming .

Online directories offer both of these features. The mechanism for doing so is rooted in the directory security capabilities described previously. By identifying users who access the directory and storing profile information about them, an online directory can easily provide personalized views of the directory to different users. For example, an online product catalog can show you the types of products that are most likely to interest you. This personalized service could be based on interests that you have explicitly declared. It could also be based on your previous interactions with the service.

From a user's perspective, personalization of this kind is great because it provides a more desirable service. The user does not need to wade through information that is of less interest just to get to the desired information. From a service provider's perspective, personalization of this kind is great because it provides a more desirable service to the provider's users. It also allows the service provider to better target all kinds of special services. For example, the service provider can provide information about promotions and sales, new product offerings, and advertisements, all tailored to a user's preferences. Of course, some users will voice privacy concerns related to what information is collected about them and how it may be used, so savvy providers always provide flexible privacy controls as part of their directory service.

Directory Described

So far we've been relying primarily on a commonsense understanding of the word directory in our discussion. We've used familiar, everyday printed directories to explain what online directories are and how they differ from offline directories. Now it's time to glean from our previous discussion the defining characteristics of online directories. The definition we will give is not formal or mathematical. Instead, we will expound on a list of characteristics that online directories share.

Design Center Defined

The term design center refers to the defining set of assumptions, constraints, or criteria driving the design or implementation of a system. When designing or implementing a system, you have to make a series of decisions about what's important, what's not, what the system must do well, and what it can afford to do less well. A system's design center is an expression of the focus the designer or implementer had when making these decisions. Design center is a concept that applies to software and other systems and products as well.

For example, suppose you were going to design and implement a vehicle for yourself. Aside from needing a few common characteristics that essentially boil down to a wheeled, motorized conveyance, you have a lot of flexibility. A designer who has a large family might design a station wagon or van. His design center might be focused on large passenger capacity. Another designer with a lot of stuff to haul around might design a truck. Her design center might be focused on cargo capacity. A designer who is a driving enthusiast might focus on performance and handling.

Software and service design centers work in similar ways, and often the designer considers a whole series of questions to determine the appropriate design center ”for example, Does the software system or service need to serve a large community or a small one? Is the community technically sophisticated or inexperienced? Is performance a critical feature of the system? Is security? The answers to these questions and others drive the focus of the design and implementation efforts and ultimately determine the character of the system.

A directory can be thought of as a specialized database. It is interesting to compare databases and directories because the differences have more to do with environment and design center than with anything fundamental. The comparison is also interesting because most people generally have a better understanding of what a database is and does than of what a directory is and does. The differences between a general-purpose database and a general-purpose directory fall into the following broad categories:

  • Read-to-write ratio . Directories typically have a higher read-to-write ratio than databases.

  • Extensibility . Directories are typically more easily extended than databases.

  • Distribution scale . Directory data is usually more widely distributed than data held in databases.

  • Replication scale . Directories are often replicated on a larger scale than databases.

  • Performance . Directories have different performance characteristics than databases.

  • Standards . Support for standards is more important in directories than in databases.

  • Transactions and join . Directories usually do not support transactions or relational operations such as join.

Each of these points is explained in the remainder of this section.

Read-to-Write Ratio

One defining characteristic of a directory is that it is typically read or searched far more often than it is written or updated. This is often not true for a database. A database might be used to record auditing data that is read only under exceptional, or at least infrequent, conditions. For example, such data might be written thousands of times each day (one record for each database transaction) but read only once a month to produce a summary report, or once a year when an internal audit is conducted .

Information in a directory, on the other hand, is usually read many more times than it is written. In fact, it is not unusual for a piece of directory information to be read 1,000 to 10,000 times more often than it is written. If you think about the types of information usually stored in a directory, this makes sense. Information about people, for example, changes relatively infrequently, especially compared to the number of times the information needs to be accessed. How often do you change phone numbers compared with the number of times somebody calls you? How often do you change addresses compared with the number of times you receive mail?

Data with this "often read, seldom written" characteristic is not restricted to information about people. Catalog data, most location information, configuration information, network routing information, reference information, and many other types of information are all read far more often than they are written. The domain of applications that can be served by a directory is large. For some applications, the information is never updated online; instead, it is updated only periodically via a batch process initiated by an administrator.

Why is this characteristic important? It sets a design center for directory implementations. Implementers can make important, simplifying design decisions based on this characteristic. Directory implementations can be highly optimized for the types of operations that will be performed most often. If directory designers know that read and search operations are performed 10,000 times more often than update operations, they can spend more effort to make those operations perform quickly. In contrast, databases are often designed to support write and read operations equally well. The fact that directories are usually optimized for read- intensive applications has implications for other directory features, such as replication, which we will discuss later in this section.

Information Extensibility

Another important, defining characteristic of a directory is that it supports information extensibility. The term directory schema refers to the types of information that can be stored, the rules that information must obey, and the way that information behaves. Schema design for LDAP directories is discussed in detail in Chapter 8, Schema Design.

Directories are not limited to a fixed set schema that can be stored and retrieved. The schema can be extended in response to new needs and new applications. A directory usually comes with a useful set of predefined types of information that can be stored, but many installations have special requirements that dictate the extension of this predefined set. Your organization may have special fields (attributes, in directory parlance) that you want to store, including, for example, employee status for people or the building location code for a printer. Most directories allow new attributes such as these to be added to existing directory objects without affecting the information that is already present. Although databases are used to store many kinds of information organized in all kinds of ways, they are usually constrained in the types of information that can be stored, and some databases make it difficult to add fields to existing records. It is rare to find a database that allows you to introduce a new, primitive data type with new semantics. Some directories, however, do support adding new primitive data types.

Data Distribution

Distribution of data is another area in which directories differ from databases. Data distribution refers to the placement of information in servers throughout your network. Data can be centralized in a single server, as shown in Figure 1.2, or it can be distributed among several servers, as shown in Figure 1.3.

Figure 1.2. Centralized Directory Data Held in a Single Server

Figure 1.3. Distributed Directory Data Held in Three Servers

Although you can find databases that allow limited distribution of data, the scale of the distribution is different. A typical relational database management system allows you to store one table over here and another table over there. This distribution is usually limited to a few sites. The ability to make queries that involve both sites exists, but performance is often a problem, which causes the distribution features to be used only rarely.

Data distribution is a fundamental factor in the design of directories. Part of the directory's purpose is to allow data to be distributed across different parts of a network. This capability is aimed at addressing environments where authority and administration must be distributed. An example of an organization needing this kind of distribution is one with offices in several countries around the world. Each office wants to have authority over its own directory, but the organization wants to present a single, logical directory to the outside world.

Another example in which data distribution is important is in support of large-scale directories. As your directory data grows, at some point the tactic of buying a bigger server with more disk, memory, and CPU horsepower produces diminishing returns.

A better approach may be to construct your directory from a set of smaller machines that work together to provide the overall service. This solution is cheaper in many cases. It has the advantage of harnessing the parallel processing power of all the machines holding the directory, which can improve both read and write performance. It also has the advantage that failure of one machine does not bring down the entire service. In addition, it has certain attractive practical implications for the performance of some system administration functions, such as performing backups , recovering from disasters, and so on. Consider a directory distributed across ten small machines: Backing up or recovering one of the small machines is easier than backing up or recovering a single large machine.

Distribution of data allows information to be stored near the applications and people that need to use it. For example, consider three applications that need to use directory information: an employee lookup tool (online phone book), a private branch exchange (PBX) that stores the configuration of the phone system in a directory, and a network operating system such as Microsoft Windows NT that stores user profile information in a directory. Through distribution, the data specific to each of these three applications can be stored in a directory server close to the application, thereby improving efficiency and avoiding unnecessary duplication of data that is private to each application.

Data Replication

Closely related to data distribution is the topic of replication. Replication is the process of maintaining multiple copies of directory data at different locations. There are several reasons to do this:

  • Reliability . If one copy of the directory is down because of a hardware or software failure, other copies can still be accessed.

  • Availability . Clients are more likely to find an available replica, even if part of the network has failed.

  • Locality . Latency and variation in performance are reduced if clients are located closer to a directory replica.

  • Performance . More queries can be handled as additional replicas are added, thereby improving the overall throughput of the directory service.

More detailed information on replication can be found in Chapter 11, Replication Design.

Databases sometimes support replication, but typically they do so for only a small number of replicas, whereas directories typically support dozens of replicas. Historically, performance has been a big problem with database replication implementations, partly because database replication is almost always strongly consistent; that is, all copies of the data must be in sync at all times. Typically, a distributed cross-network locking mechanism must be employed, and a two-phase commit algorithm must be used to achieve strong consistency for database updates.

Directory replication, on the other hand, is almost always loosely consistent. This means that temporary inconsistencies in the data contained in different replicas are acceptable. This characteristic has important implications for the number of replicas that directories can support and the physical distribution of those replicas across the network.

As you will learn later in this section, performance is an important directory characteristic. One good way of helping to ensure great performance is to make sure that each user of the directory has a copy of it close by. There are two reasons to do this:

  1. Moving directory data close to the clients accessing it cuts down on the network latency of directory requests , which typically increases overall throughput and improves the consistency of performance for each directory operation.

  2. The total number of directory queries processed by the system as a whole can be increased by the addition of replicas. If one directory server can handle 1 million queries per day, adding another server could increase the capacity of the system to 2 million queries per day. The technique of adding more replicas to handle more load is often referred to as horizontal scalability .

Availability of the directory is also a key factor. Directories tend to be used by many different applications for such fundamental purposes as authentication, access control, and configuration management. The directory must always be available to these applications if they are to function at all.

Note that availability is not the same thing as reliability. A reliable directory may have redundant hardware and an uninterruptible power source. Such a directory may almost never go down, but that does not mean that it is always available to the clients that need to access it. For example, network segments that connect clients and servers might go down. From the client's perspective, this causes the same problem as the directory hardware or software going down.

You could try to solve this problem by building into your network the same kind of hardware reliability that is available for servers. Redundancy, uninterruptible power, and other techniques are all valuable , although not always practical. The other approach is to replicate your directory data to bring the data closer to the clients needing access to it. This approach helps mitigate network problems that might otherwise prevent clients from accessing the directory. Figure 1.4 shows a sample unreplicated scenario, and Figure 1.5 shows a sample replicated scenario.

Figure 1.4. An Unreplicated Directory Service with Data Held by Only One Server

Figure 1.5. A Replicated Directory Service with Data Held by Three Servers

These facts have several implications for directory replication. Directories are replicated on a far greater scale than databases. It is not unusual for a directory replica to be maintained on each subnet in a network to minimize latency and increase availability. In some cases, a replica might be maintained on each machine, potentially leading to literally hundreds or thousands of replicas. These replicas may be many network hops away from the central directory. They may even be connected over links that are up only intermittently. These kinds of replication requirements set directories apart from databases.

Performance

As mentioned previously, high performance is another characteristic that differentiates directories from databases. Database performance is typically measured in terms of the number of transactions that can be handled per second. This is also an important measure of directory performance, but the requirements on a directory service are different from those on most database systems.

A typical large database system might handle hundreds of transactions every second. The aggregate directory performance required by a typical large directory system may be thousands or tens of thousands of queries per second. These queries are usually simpler than the complex transactions handled by databases. As described earlier, the read-to-write ratio is typically much higher on a directory than on a database. Therefore, update performance is not as critical for directories as for databases. As you will learn later in this section, though, it is important nonetheless.

Some of the directory's increased query performance requirements are caused by the wide variety of applications that use the directory. Whereas a database may be designed and deployed with a single or a small set of driving applications in mind, directories are often deployed as an infrastructure component that will be used by an unknown but continually increasing number of applications developed across your company, and even across the Internet at large. Access to the directory is distributed, as is the development of the applications causing this access. This means that you, as the directory administrator, often do not have control over the kinds of queries your directory must answer. Therefore, it is important that your directory be flexible and capable of good performance regardless of the types of queries it must respond to.

Also driving directory performance requirements are the types of applications that typically access the directory. Applications access the directory for many different purposes. If your directory is used by your e-mail software to route e-mail, for example, one or more directory lookups are required for each piece of mail. Depending on the volume of mail that your site processes, this can be a significant load on the directory.

Many more applications require high performance. If your directory is used by Web application software as an authentication database, it is accessed each time a user launches a new application. If your directory is used by these applications to store user preference and other information needed to provide location independence, even more directory accesses are called for. If your directory is used to store configuration and access control information for your Web, mail, and other servers, the directory must potentially be accessed each time those services are accessed by clients. With a large user population, this quickly adds up to a lot of traffic. In these environments, using directory locality to minimize network latency is critical to providing adequate performance.

As you can see, directories are at the center of a lot of things that quickly increase performance requirements. Of course, client-side caching can and should be used to minimize the number of times the directory itself is accessed, but even these techniques can only slow the flow of directory queries. High performance is still one of the most important characteristics of a directory.

Earlier we stated that the read-to-write ratio for directories is high. The natural conclusion you could draw from this is that write performance is not nearly as important as read and search performance. Although this is true in a way, the scale of data handled by some directories makes write performance an important factor as well. And, as we described earlier, the capacity for online updating is one of the key enablers of some exciting new online directory applications. Clearly, the ability to update is important, and it must function at a certain level of performance.

For example, consider a directory with 1 million entries. This may seem like a lot, but it is not unreasonable for an e-commerce site or for a large corporation (after you're finished adding entries for all users, groups, network devices, external partners , customers, and other things). If each entry changes only once each month on average, that is 1 million updates per month, 250,000 updates per week, almost 36,000 updates per day, or about 1,500 updates per hour. That's quite a few updates! And the peak number of updates that must be handled within a given hour is much higher because user-initiated changes are usually made during business hours. Administrator-initiated changes may need to be saved up and applied in a batch during limited off-peak hours, further affecting performance requirements.

Standards and Interoperability

Another important factor that sets general-purpose directories apart from general-purpose databases is standards. The database world has various pseudostandards, from the relational model itself to Structured Query Language (SQL). These pseudostandards make it easier to migrate from one database system to another. They also make it so that when you've learned the concepts behind one vendor's system, you can easily apply that knowledge to come up to speed on another's quickly. However, these standards do not provide real interoperability. In the directory world, because applications from any vendor must be able to use the directory, real interoperability standards are critical.

This is where LDAP, the Lightweight Directory Access Protocol, comes in. LDAP provides the standard models and protocols used in today's modern directories. LDAP makes it possible for a client developed by Microsoft to work with a server developed by Netscape, and vice versa. LDAP also makes it possible for you to develop applications that can be used with any directory. In the database world, an Oracle application cannot be used with an Informix database, and an Informix application cannot be used with a Sybase database. This kind of interoperability, which is lacking in databases, is important to directories for two reasons:

  1. It allows the decoupling of directory clients from directory servers.

  2. It allows the decoupling of the application development process from a decision about a particular directory vendor.

Before LDAP came along, each application that needed a directory usually came with its own directory built right in. This may seem like a convenient solution at first glance, but the unpleasantness of the situation becomes clear after you've installed your twenty-fourth application, and therefore your twenty-fourth directory. Each user in your organization who requires access to these applications needs an entry in each directory ”a lot of duplicate information to maintain. All this duplication is a primary source of headaches for system administrators and increased costs for information technology (IT) organizations, as Figure 1.6 illustrates.

Figure 1.6. Application-Specific Directories Cause Duplicate Information and System Administration Headaches

Application developers everywhere can write applications using the standard directory tools of their choice. These applications will run with any LDAP-compliant directory, which essentially turns the directory into a piece of network infrastructure. The result is a dramatic increase in the number of applications that can and will be written to take advantage of the directory. In addition, you are freed from having to rely on a single vendor for your directory solution. The same advantages are what drove the success of other Internet protocols, such as HTTP (Hypertext Transfer Protocol) for the Web, IMAP (Internet Message Access Protocol) for accessing e-mail, and even TCP/IP (Transmission Control Protocol/Internet Protocol) itself. Figure 1.7 illustrates a standards-based directory infrastructure.

Figure 1.7. A Standards-Based, General-Purpose Application Directory Eliminates Information Duplication

Transactions and Join

Directories support a relatively simple transactional model. Directory transactions involve only a single operation and a single directory entry. Databases, on the other hand, are typically designed to handle large and diverse transactions that span multiple records and encompass a series of operations. Databases support a feature called rollback through which a set of operations that is part of an incomplete transaction can be completely undone, or "rolled back," to restore the original state of the data. Rollback is useful when an error occurs late in the series of operations that make up a transaction. For some kinds of applications, the comprehensive transaction support offered by databases is important. Note that some directory software supports database-style transactions through proprietary extensions to the base LDAP standard.

In addition, most databases in use today are relational databases that support data joins. A join is a type of query that brings together related data from multiple sources (data tables) into one result set by leveraging a common key. Most directories do not support joins at all. As an example where a join operation is useful, suppose that you want to search for all printer objects managed by people who work within the Product Development department. Suppose further that each person object has an employee ID field and each printer object has an owner field that also holds an employee ID, and that each person object has a department field. Then the desired result set can easily be produced by use of a join operation to match the printer objects to the Product Development department via the people entries. In SQL, such a query might look like this:

 Select *   From PrinterTable A, EmployeeTable B   Where A.OwnerID = B.EmployeeID AND Department =      "Product Development"; 

To produce the same result set using a directory would require several queries. One approach would be to find all people who work within Product Development (one query) and then, for each person found, search for printers that have an owner ID that matches the person's employee ID (many queries). As you can see, the RDBMS (relational database management system) join operation is powerful. Not all applications need it, though.

Directory Description Summary

In concise terms, a directory is a specialized database that is read or searched far more often than it is written to. A directory usually supports the storage of a wide variety of information and provides a mechanism to extend the types of information that can be stored. Directories can be centralized or distributed. They are often distributed in large scale, in terms of both how and where information is distributed. Directories are usually replicated so that they are highly available to the clients accessing them. The scale of directory replication may involve hundreds or thousands of replicas. Replication also helps increase directory performance, which is important to providing applications with a fast, reliable infrastructure component that can be used with confidence. Finally, with LDAP, directories have become standardized. This standardization allows applications and servers from different vendors to be developed, sold, and deployed independently. Directories do not support the sophisticated transactions and the join operations that databases do.

   


Understanding and Deploying LDAP Directory Services
Understanding and Deploying LDAP Directory Services (2nd Edition)
ISBN: 0672323168
EAN: 2147483647
Year: 2002
Pages: 242

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net