Further Reading

Understanding and Deploying LDAP Directory Services > 6. Data Design > General Characteristics of Data Elements

< BACK

CONTINUE >

153021169001182127177100019128036004029190136140232051053055078214168035121186179036191

General Characteristics of Data Elements

All data elements can be described by several general characteristics:

Format
Size of data values
Number of occurrences
Data ownership
Consumers
Dynamic versus static
Shared versus application-specific
Relationship with other data elements

You should examine each data element you plan to include in your directory service during this data design phase to determine what characteristics it shares with other data elements. By doing this you will save time during the schema and namespace design stages and avoid deployment problems.

For example, suppose a certain dynamic data element will be rewritten many times a second, but is used by only one application. It may not make sense to include the dynamic data element in the directory service at all because most directory implementations are optimized for read access, and writes are relatively expensive.

Each of the characteristics mentioned in the preceding list is discussed in more detail in the following sections.

Tip

Before you design your directory schema (a topic we will tackle in the next chapter), you should characterize each element using the guidelines included in the following sections. You should add this information to the list of data elements you created when you examined your applications' needs.

Format

Data elements can be grouped based on the natural format of the information. For example, people's names are always textual data, but telephone numbers consist primarily of digits. Table 6.3 shows some of the more common data formats and provides sample data elements for each.

Table 6.3. Data formats

General Format Elements	Common Variations	Example Data
Text string	Case sensitive, case insensitive	Person's name, printer's name , URL
Multiline text string	Case sensitive, case insensitive	Postal address, description
Phone number	Local, international	Work phone number, fax number
Numeric	Integer, floating point	Employee number, cost
Multimedia	Image, sound, movie	Photograph, musical sample
Binary	(Many variations)	Digital certificate,preferences data

If your textual data is written in more than one character set or language, be sure to note that as well. As we will see in Chapter 7, each LDAP attribute must be mapped to a syntax that precisely defines the rules for interpreting the stored values. For example, the cn (common name) attribute is of the syntax caseIgnoreString , which means that the case of letters that make up a name are not significant when comparing one cn value with another.

Size of Data Values

Size refers to the number of characters or bytes that a data element value consumes. Knowing the approximate size of each value will help with the more directory-specific design work that we will tackle in subsequent chapters. Although it is sometimes hard to assign hard limits for the size of a data element, it is usually relatively easy to come up with a range that encompasses the typical data values that will be used. For example,Figure 6.2 shows how the elements of a North American telephone number combine to require approximately 14 characters of storage.

Figure 6.2 Size of a telephone number value

At the other end of the scale, if you choose to store images in your directory service, the size of each value will be much larger (perhaps in the 25KB “1MB range). Be sure to check your directory server software to see if it places any arbitrary restrictions on the size of the data values that can be stored.

When considering how large text data values will be, do not forget to take into account the character set if you have any data that is not plain ASCII. In LDAP directories, international data is typically represented using the UTF-8 encoding of the Unicode character set (as explained in Chapter 3, "An Introduction to LDAP" ). UTF-8 is a variable-length encoding, and each character in a UTF-8 string requires between 1 and 4 bytes.

Tip

In some cases it makes more sense to store a pointer to a data value in the directory service instead of storing the data value itself. This pointer-based approach is especially useful when the data value is large but related to another data element that you plan to store in your directory.

For example, if you are storing in the directory information about all your department's current projects, you may want to include HTTP URLs that point to the detailed project plans. The project plans themselves should remain on Web servers as they are probably large, complex documents; but users and applications will still be able to locate these documents by consulting the directory service.

Number of Occurrences

For each data element you should answer the question "How many data values will there typically be for this element?" For example, a person will usually have only one user ID but may have several phone numbers. This information will help in directory capacity planning. It may also be useful if you replicate or synchronize with other data stores which may have different characteristics, since it will be important to know if, for example, a data store can only hold one value for a person's name when your directory allows many values.

Data Ownership

Dealing with data ownership issues is one of the more challenging aspects of directory design. When thinking about access control, privacy, and security it is important to know exactly which people and applications should be allowed to view or modify a data element. Data ownership also affects whether you will allow a data element to be changed by directory clients and whether and how a data element is kept in sync with other data sources.

Related questions include "Who should be notified when this data element is modified?" and "If this data element is stored in more than one data source, which system has final authority over the data element?" Unfortunately, the answers to these questions may be muddy and will likely change over time but just make a note of the muddiest areas and move on.

Consumers

The consumers of a data element are those directory-enabled applications that use it. When planning directory replication and topology, and when managing the relationships between your directory service and other data sources, it is helpful to know about the consumers of each data element.

For example, a mail transfer agent (MTA) is a piece of application software whose job is to route electronic mail messages to their correct destinations. When processing email, MTAs may look at an attribute in a user's entry called mailHost , which gives the hostname of the server that holds the user's email. This is depicted in Figure 6.3.

Figure 6.3 An MTA and the `mailHost` attribute.

The MTA is thus an important consumer of the mailHost attribute, so it may beimportant that the MTA always gets the most up-to-date copy of the mailHost attribute that is available. Email client software might also use the same mailHost attribute to determine where a user's mail drop is, in which case this becomes a shared attribute.

Similarly, users may be allowed to change their home telephone numbers in the directory service, changes that may be propagated to a personnel system that stores its data in an Oracle database. This scenario is shown in Figure 6.4.

Figure 6.4 Home phone number propagated to personnel database

In this scenario the personnel system is a consumer of the home telephone number, but it is also a source for other non-LDAP applications that may access the phone number directly from the personnel system. If the home phone number is not a piece of data critical to the personnel system, it may be okay for information between it and the directory service to be updated fairly infrequently. If the phone number is critical to both the personnel system and the directory service, however, a process that accomplishes frequent synchronization may need to be developed.

If you compile detailed information about all the consumers of a data element, you can aggregate all the information into an estimate of how often a given data element will be accessed. Again, this information is useful when doing capacity planning for your directory service.

Dynamic Versus Static Data Elements

It is also helpful to know which data elements are dynamic (i.e., have values that change) and which are static (i.e., have values that do not change). You will need this information when designing your directory server topology as well as for capacity planning. For example, if you use a replicated directory service that allows writes for a given entry to occur on only one server (a single master system), and you have a lot of attributes whose values change often, you may need to partition your data to avoid overwhelming any one master server with the write traffic.

One way to characterize the dynamic or static nature of attributes is by estimating the ratio of reads-to-writes for each data element. For example, a user ID may be written once when a student joins a university but read dozens of times each day as email is delivered; this attribute is static. In contrast, if a Web browser stores a user's personal bookmarks in the directory service, they may be changed once a day or more often; these attributes are dynamic.

Shared Versus Application-Specific Data Elements

Some data elements are used by many applications; others may be used by only one application. Data elements that are shared require careful planning so that the needs of all the applications are met adequately. On the other hand, if a data element is used by only one application and the data values are large or accessed frequently, you should consider keeping the values outside your directory service to avoid performance problems when accessing the values. You may want to include some guidelines for making this kind of decision in your data policy statement (as discussed earlier in this chapter).

Tip

One thing to watch out for if you conclude that a data element is application-specific is that, over time, new applications may come online that also use the data element. Because your directory service is a great place to put shared data, when in doubt assume that a data element will be shared.

Note that even if data elements are not shared by more than one directory-enabled application, it may make sense to store them together to ease manageability or improve the availability of the information (through directory replication). For example, it may be desirable to delete a person's email-related data elements when the person is deleted from the corporate phonebook. The easiest approach is to store the email-related elements in the directory service along with all the person's contact and other information. That way, you won't need to delete the information in both the directory and the mail system to delete a user's record.

Relationship with Other Data Elements

When selecting schema and laying out the namespace for your directory service, it is useful to know how your data elements are related. Because directory entries typically represent real-world objects, it is important to know which data elements relate to the same kind of object.

For example, if you have an entry in your directory service for each printer attached to your network, you want to make it easy for an application to find all the printer-related data elements. You can accomplish this by choosing a schema that defines an all-inclusive printer object (see Chapter 7 for more information on schema).

Some relationships between data elements are more subtle and may be easily overlooked. For example, if you will use the directory service to determine who used a printer in the last 24 hours, there is a need to relate information about some users to the printer's entry. This need could be addressed by including in the printer entries a set of data elements that refer to user entries.

A Data Element Characteristics Example

Suppose that we work at a large university with a great variety of installed electronic mail systems. (See Chapter 24, "Case Study: A Large University," for a related example.) Electronic mail is often a major factor affecting the expenditure of information technology dollars, and we would like to show our boss the value of our new directory service.

To do this we decide to develop as our first directory-enabled application a service that reroutes all email entering the university from the Internet to the correct system. The basic setup is shown in Figure 6.5.

Figure 6.5 The Business Card Email Service.

By configuring our domain name system correctly, we arrange for all electronic mail messages sent to STRING@bigu.edu to arrive on the machine called redirector.bigu.edu . This machine runs a customized copy of the sendmail software that searches the directory service running on ldap.bigu.edu for a user's entry using criteria constructed from STRING . For example, if a message is addressed to babs.jensen@bigu.edu , a search with an LDAP filter of cn=babs jensen is performed. If an entry is found, the email message is re-sent to the user's mail delivery address, which is typically a mail server in the individual's department or school within the university.

We christen this service the "Business Card Email Service," or BCES , because now people can safely put one centrally managed email address on their business card that will not change even if they switch departments within the university.

One of the first design questions we need to answer is "What data elements do we need and what are their essential characteristics?" Table 6.4 provides one potential answer (the real answer depends on what application software we use to develop and deploy our new service).

Table 6.4. Data element characteristics example

Element (Example)	Format	Size/ # Values	Owner	Consumers	Related To
Full name(for example,John Jones)	Text	<128chars./one or a few values	Personnel Dept.	Users; BCES	User's entry
User ID (login; for example, `jjones` )	Text	<9 chars./ one value	IS Dept.	BCES	User's entry
Email Address ( `jjones@ bigu.edu` )	Text (Internet mail address)	Many chars./ one or a few values	IS Dept.	Users; BCES	User's entry
Delivery address (for example, `jjones@math.bigu.edu` )	Text (Internet mail address)	Many chars./ one value	User and system admins.	BCES	User's entry

Note that some characteristics are missing from Table 6.4. We don't include any information on how dynamic each data element is because we believe all the data elements hold data values that do not change very often. Also, because we are focused on just one application, we have not yet needed to think about which of these data elements will be shared with other directory-enabled applications.

Understanding and Deploying LDAP Directory Services, 2002 New Riders Publishing

< BACK

CONTINUE >

Index terms contained in this section

application-specific data elements
vs. shared data elements 2nd 3rd
consumers
data elements 2nd 3rd 4th
data
elements 2nd 3rd
characteristics example 2nd 3rd
consumers 2nd 3rd 4th
dynamic vs. static 2nd
formats 2nd
number of occurences
object relationships 2nd
ownership
shared vs. application-specific 2nd 3rd
value sizes 2nd 3rd 4th
directories
data
elements 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th
elments 2nd 3rd 4th 5th
dynamic data elements
vs. static data elements 2nd
elements
data 2nd 3rd
characteristics example 2nd 3rd
consumers 2nd 3rd 4th
dynamic vs. static 2nd
formats 2nd
number of occurences
object relationships 2nd
ownership
shared vs. application-specific 2nd 3rd
value sizes 2nd 3rd 4th
formats
data elements 2nd
frequency
data elements
objects
data element relationships 2nd
occurences, number of
data elements
ownership
data elements
shared data elements
vs. application-specific data elements 2nd 3rd
sizes
data element values 2nd 3rd 4th
static data elements
vs. dynamic data elements 2nd
values
data elements
sizes 2nd 3rd 4th

2002, O'Reilly & Associates, Inc.

General Characteristics of Data Elements

Format

Table 6.3. Data formats

Size of Data Values

Figure 6.2 Size of a telephone number value

Number of Occurrences

Data Ownership

Consumers

Figure 6.3 An MTA and the mailHost attribute.

Figure 6.4 Home phone number propagated to personnel database

Dynamic Versus Static Data Elements

Shared Versus Application-Specific Data Elements

Relationship with Other Data Elements

A Data Element Characteristics Example

Figure 6.5 The Business Card Email Service.

Table 6.4. Data element characteristics example

Index terms contained in this section

Figure 6.3 An MTA and the `mailHost` attribute.