2.4 Directory services

2.4.1 Introduction to directories

Directory services emerged from pioneering work done in the early 1980s at the Xerox Palo Alto Research Center (PARC). The first real implementation was called Clearinghouse and was developed as part of a system architecture called Grapevine. Clearinghouse was used initially to enable users to roam around the network using a personalized desktop profile. As with many PARC inventions, this was a product ahead of its time. In essence, directories are specialized databases containing abstract content (i.e., a collection of objects). Directory services enable networked users or applications to locate resources or retrieve data using one or more attributes. The terms white pages and yellow pages are often used to describe how a directory is accessed (i.e., by name or by subject). For example, a directory of user information (e.g., a company's telephone directory) could be queried to find a person's job title, telephone number, and e-mail address by using his or her name and location as search attributes. A directory of network resources could be searched to find the location of the closest PostScript printer on the network, an ActiveX/CORBA/Java object, X.509 Certificate, or biometric data (for an authentication query).

Directories are increasingly being used to coordinate large-scale network activities, such as security policy management and configuration management data. For example, directories enable single sign-on services as part of an enterprisewide security infrastructure. The key aim of directory services is to provide fast access to a single repository of network data in a managed and authenticated manner. Historically, much of the data would have been spread around the network or held on different servers, often with overlapping or conflicting elements. Directory services reduce the information maintenance burden, improve consistency, eliminate duplication, and thereby improve scalability.

Directories and relational databases and networked file systems

A directory system is not the same as a networked file system directory. Directory systems are also not suitable repositories for storing file information, since they are not designed to hold large binary objects and do not have the granular read/write locking mechanisms required to protect file access.

Directories and name services

All directory services are accessed via a naming service. Names are used to identify unique objects within the directory. Objects are associated with one or more attributes (name-value pairs). The DNS is a very simple form of directory service that includes a name service. DNS takes a Fully Qualified Domain Name (FQDN) and returns an IP address, or vice versa. Other name services include the CORBA's naming service, Common Object Service (COS), and Sun's Network Information Service (NIS/NIS+).

Directories and Relational Databases Management Systems (RDBMS)

A directory has several characteristics that differentiate it from general-purpose relational databases. Unlike most relational databases, directories are designed to handle largely static data. Information flow is highly asymmetrical; directories are typically queried often but seldom updated. Since directories must, therefore, be able to support high volumes of read requests, they are optimized for read access (with write access often limited to system administrators or to the specific owner of the data to be written). Unlike RDBMS, directory services do not support a two-phase commit process and cannot handle transactions against multiple operations.

Another important difference between a directory and a general-purpose relational database is the way in which information may be accessed. Most RDBMS support the standard Structured Query Language (SQL). SQL enables powerful query and update functions to be generated—for example, a query could look as follows:

 SELECT DISTINCT TXDAT.Period, TXDAT.Rate, TXDAT.Metric_ID, TXDAT.DoW, TXDAT.From, TXDAT.To, TXDAT.RadMin, TXDAT.RadMax, TXDAT.Zone, TXDAT.SubZone FROM TXDAT WHERE (RateID Like "BTLL64K") AND (Type Like "Rate") AND (Zone Like "LLoop")" AND (BitsSec =64000) ORDER BY Rate DESC

This relatively simple query retrieves the contents of ten fields from a table, where three of the fields meet specific criteria, and then sorts the output by the contents of one of those fields. However, a more advanced query could retrieve data from several tables where the defined search criteria are met and then update another table depending upon the results of that query (referred to as table joins). This level of flexibility and sophistication does not come without cost (usually expressed as program size, application complexity, and performance).

By contrast, since directories are not intended to provide as many features as general-purpose databases, both the client/server application architecture and the query protocol can be streamlined, providing applications with rapid and efficient access to data in large distributed environments. A directory can, therefore, be visualized as a limited-function database (in fact, some vendors implement a veneer of directory services over a standard relational database). This distinction may blur in the future as transaction-oriented features are added to directory services.

Client/server model

Directories are generally accessed using a client/server model. Directory clients issue requests, and the process that retrieves information is called the directory server (note that a server may act as a client for other server requests). The format and content of the messages exchanged between the client and server are defined by a directory access protocol (e.g., the Lightweight Directory Access Protocol (LDAP), discussed shortly). The data within directories may be held locally on a particular server, distributed across multiple servers, or replicated across multiple servers. The three dimensions of a directory (i.e., the scope of information, location of clients, and distribution of servers) are independent of each other. As users and applications increasingly begin to rely on common directories, it is important that these directories are robust, secure, and scalable. In practice directory services are, therefore, widely replicated to increase performance and availability. This infrastructure in turn promotes much tighter management and control and enables application developers to focus more on application functionality, instead of wasting time developing custom database subsystems and resolving data conflicts. Applications that have the ability to interact with directories are said to be directory enabled.

When not to use a directory

Directory systems are clearly invaluable in a networked environment; however, given the caveats already presented, there are circumstances where RDBMS or conventional file systems are better suited, as follows:

Directories are not appropriate for transactional work (where data are stored as part of a transaction).
Directories are not good repositories for dynamic information (such as share prices, device status, etc.).
Directories are not ideal where the relationships between data objects are complex (requiring table joins).
Directories are not ideal where there is a major requirement for reporting. Conventional RDBMS typically include far superior report-generation tools.

Although the directory architecture does not impose limitations on content, in practice it tends to be restricted (by implementation) as to the type and scope of data it holds. A directory will often come with standard schema (data templates) geared around a particular type of application (telephone directory, resource repository, etc.). This schema may or may nor be extensible.

2.4.2 OSI directory service standard (X.500)

In 1988 the CCITT produced a comprehensive directory service specification called X.500. This was standardized in 1990 as ISO 9594, Data Communications Network Directory, Recommendations X.500–X.521. X.500 defines an authentication framework, powerful search capabilities, and a powerful naming/information model, which organizes objects into a hierarchal name space, capable of scaling to contain huge amounts of data. The query protocol used between the client and the server is called the Directory Access Protocol (DAP). DAP runs as an OSI Application Layer protocol (Layer 7) and therefore requires six additional layers of underlying protocol to function. DAP also mandates that the transport class be used, making it very inflexible.

Unfortunately, X.500 is big, nontrivial to implement, and contains functionality that is considered overkill for most small-scale embedded network platforms and desktop devices (especially when one considers that in 1988 most desktop PCs had very little CPU or memory by today's standards). It seemed appropriate, therefore, to develop a streamlined interface to an X.500 directory server. In 1991 the IETF produced two informational RFCs: the Directory Assistance Service (DAS) [45] and the DIXIE Protocol Specification [46]. DAS defines a way for a streamlined client to communicate with the X.500 directory server via a proxy. Since X.500 DAP implementations never really took off, a more direct approach was required, so DIXIE provides a more direct translation of the DAP. This work eventually led to the development of a lightweight DAP protocol in 1993 [47], which subsequently became Lightweight Directory Access Protocol (LDAP) [48].

Note that much of the early work on DIXIE and LDAP was carried out at the University of Michigan in cooperation with members of the IETF Directory Services working group. The University of Michigan provides reference implementations of LDAP and maintains related Web pages and mailing lists.

2.4.3 Lightweight Directory Access Protocol (LDAP)

As indicated, Lightweight Directory Access Protocol (LDAP) was developed as a more practical alternative to X.400 DAP. While X.500 has traditionally been deployed only in very large organizations that have the resources necessary to support it, LDAP is also appropriate for small organizations. Although LDAP still embodies many concepts introduced in X.500, and uses the same basic information and naming model, it uses a simplified functional model, streamlines many X.500 operations, and omits a number of esoteric functions. LDAP uses the LDAP Data Interchange Format (LDIF) to represent data in ASCII strings for exchange between LDAP servers [49, 50]. This may eventually be replaced by Directory Service Markup Language (DSML) (an XML-based markup language used to describe directory information). ASN.1/BER is used for representing binary data. At present LDAP normally runs over TCP/IP but does not mandate any particular transport layer (Novell has an LDAP interface to its IPX-based NDS). This enables LDAP to use security services such as SSL (TLS) and IPSec. LDAP also extends some of the services of X.500 and provides additional security features (e.g., Simple Authentication and Security Layer [SASL]).

LDAP defines a communication protocol between an LDAP client and an LDAP server; it does not define the directory service itself. The client initiates an LDAP session by calling an LDAP API function. The general interaction between an LDAP client and an LDAP server takes place in three phases, as follows:

Binding—The client must establish a session with an LDAP server before any operations are possible. The client specifies the host name or IP address and TCP/IP port number of the LDAP server. The client may be required to provide a user name/password for authentication or may establish an anonymous session using default access rights. For additional security the session may be encrypted.
Operations—Once connected, the client can perform operations on directory data (i.e., search, modify, add, delete, compare, or abandon). Note that in LDAPv3 an extension mechanism has been added, in order to allow implementers to define additional operations (e.g., digitally signing) [51].
Unbinding—When the client is finished making requests, it closes the session with the server. This is also known as unbinding.

An X.500 directory server does not understand LDAP messages; an LDAP server either operates as a gateway (proxy) to an X.500 directory server or runs in standalone mode, providing local access to a directory service. If acting as a gateway, then the server must support the OSI and TCP/ IP stacks, as well as the LDAP and DAP protocols. In standalone LDAP (SLDAP) mode the server need support only TCP and LDAP, but the server architecture is significantly more complex to accommodate access to a local directory. From the client's perspective the directory location is transparent; LDAP clients simply talk to LDAP servers [48, 51].

LDAP versions 2 and 3

Both LDAP version 2 and LDAP version 3 standards are currently defined for use by the IETF, as follows:

LDAPv2 is currently a draft standard within the IETF. Reference [48] defines the LDAP protocol itself. Several other RFCs define related aspects of LDAPv2, such as attribute syntaxes [38], distinguished names, URL formats, and search filters. Since this specification is unlikely to change significantly, many vendors have already implemented products that support LDAPv2.
LDAPv3 is a proposed standard [51]. Even though minor revisions of a proposed standard are likely, a number of vendors are implementing products that support LDAPv3 now. LDAPv3 extends LDAPv2 in several areas, as follows:
- Referrals—a server that does not hold the requested data can refer the client to another server.
- Security—extensible authentication using Simple Authentication and Security Layer (SASL) mechanism.
- Internationalization—UTF-8 support for international characters.
- Extensibility—new object types and operations may be dynamically defined with schema published in a standard manner.

LDAP has now become the de facto standard for accessing directory systems. LDAP reduces the complexity of clients so that directories can be made accessible even to thin clients (e.g., PDAs and WAP microbrowsers). Throughout this book, the term LDAP refers to LDAPv3 unless specified otherwise.

LDAP information model

LDAP uses an information model similar to that of X.500. Objects are organized within a hierarchical tree structure referred to as the Directory Information Tree (DIT). The highest element in the tree is referred to as the Top-Level Domain (TLD). Below the TLD are the subdomains (referred to as Organizational Units [OUs] or branches). A branch without any child elements is called a leaf. Tree objects are referred to as entries (similar to a relational database row). Each entry is associated with an object class, which determines the associate attributes. The attribute comprises of a type and a value.

A schema is a set of rules used to define the data structure of entries. The schema defines what object classes are allowed in the directory, what attributes they must contain, what attributes are optional, and the syntax of each attribute. For example, a schema could define a person object class. The person schema might require a surname attribute (defined as a character string) and an optional telephone number attribute (defined as a number string, with spaces, hyphens, etc.). (See Figure 2.14.)

click to expand
Figure 2.14: LDAP directory information tree hierarchy.

Replication

Replication is important for scalability in large enterprises by improving both performance and availability. The LDAP specifications do not cover replication or synchronization of multiple directories. In practice many vendors provide some form of proprietary replication model.

LDAP naming model

The LSAP naming model specifies how entries are referenced within the TID; this is based both on a hierarchical naming model and on X.500. Each entry has a Distinguished Name (DN), determined by its position in the tree. A DN is made up of a number of components, each called a Relative Distinguished Name (RDN). X.500 naming is quite cumbersome—for example:

 dn: uid=woodj, ou=Marketing, o=Acme Ltd, l=Berkhamsted, c=UK

where

dn—Distinguished Name
c—Country
1—Location
o—Organization (e.g., company name)
ou—Organizational Unit (i.e., department or business unit)
uid—User ID (i.e., normally the login ID)

Since there is already a perfectly good global naming system in operation (i.e., DNS FQDNs) the architects of LDAP decided that FQDNs should be used as the TLD within a distinguished name (with the assumption that anyone technically able to deploy LDAP would probably have a Web presence). Our example would, therefore, appear as follows:

 dn: uid=woodj, ou=Marketing, o=acme.co.uk

The IETF has also proposed a draft standard [52] recommending that DNs should be constructed from an FQDN using a domain component (dc) attribute. For example:

 dn: uid=woodj, ou=Marketing, dc=acme, dc=co, dc=uk

In practice both schemata are used at present.

LDAP URL

Note that if an LDAP-enabled Web browser is used on a client, directory information can be retrieved using an LDAP URL. This takes one of two forms:

 ldap://<domain>:<port>/query_string ldaps://<domain>:<port>/query_string

where the latter syntax is the secure format for LDAP queries. Default ports are 389 and 636, respectively. An example query might be:

 ldap://10.0.0.1:389/uid=woodj, ou=Marketing, o=acme.co.uk

Referrals

LDAP supports referrals in LDAPv3 [51]. If an LDAP server does not have the required data, then a referral can be provided as a pointer to another server (or set of servers) in response to a query. This is similar to the way DNS servers can refer resolvers to other DNS servers if they do not hold the correct bindings. In LDAP the client code must always handle or ignore a referral (similar to DNS iterative mode).

Aliases

LDAP supports aliasing (a metaphor for the UNIX symbolic link). Aliases act as a shortcut to eliminate the need for a strict hierarchy when referencing objects. Since aliases are validated each time they are used, they may present performance overheads.

Security

Since LDAP is connection oriented, a client must first bind to a server to create an LDAP session before performing any operations. LDAP supports a flexible security model. At the basic level a simple authentication option provides authentication via a cleartext password. In practice, in a TCP/IP environment, SSL (TLS) is commonly used to secure the session (or variations such as WTLS for wireless applications). Many LDAP implementations also support Access Control Lists (ACLs). These can be used to control user or group access to directory elements (i.e., permissions for read/write/modify/delete). The IETF is currently working on specifications for syntax associated with the ACL attribute; therefore, implementations are currently using private formats.

LDAP implementations and APIs

Both Netscape and Microsoft Web browsers include LDAP client support as standard. Vendor implementations of LDAP servers include the following:

Novell Directory Services (NDS)—NDS is based on X.500 but runs over IPX. Later versions of NetWare enable clients to query NDS via an LDAP interface [53].
Microsoft's Active Directory Services (ADS)—Active Directory/ LDAP functions are incorporated into Windows 2000 [54].
Netscape/iPlanet Directory Server—A powerful alliance currently dominating the directory service market. It includes Sun Microsystems, AOL/Netscape, and Innosoft [55].
Open LDAP—code derived from the University of Michigan [56].

Although LDAP defines the communication protocol between the directory client and server, it does not define an Applications Programming Interface (API) for the client. Commercial LDAP packages include the following:

Sun's JNDI
Microsoft's ADSI
Netscape's LDAP C SDK and LDAP Java SDK
University of Michigan's OpenLDAP C API

Reference [57] defines a C language API to access a directory using LDAPv2. Note that this is an informational RFC; however, it has become a de facto standard. At the time of writing, RFC 1823 is being updated to support LDAPv3.

2.4.4 Directory-Enabled Networking (DEN)

The Directory-Enabled Network (DEN) initiative is a proposal to enable information about people, applications, network equipment, protocols, topology, security policy, and so on to be stored in a unified manner (i.e., an LDAP directory). DEN specifies an information model with a schema that defines the object classes and their related attributes. DEN is a relatively new specification; its first draft was published late in 1997 through the collaboration of Cisco and Microsoft. Many other vendors and organizations have since supported this initiative. The DEN specification defines LDAPv3 as the core protocol for accessing DEN information.

The availability of this information in a common repository, accessible through standard methods, enables vendors to better cooperate at the enterprise level through a consistent database. This greatly simplifies tasks such as provisioning and security policy, reduces overall maintenance, and promotes better network management and scalability. DEN is, therefore, a key component for building intelligent networks and has particular relevance for Policy Management Systems, as discussed in section 2.4.4. For further information about the DEN initiative the interested reader is referred to [54, 58, 59].