Understanding and Deploying LDAP Directory Services > 6. Data Design > General Characteristics of Data Elements |
General Characteristics of Data ElementsAll data elements can be described by several general characteristics:
You should examine each data element you plan to include in your directory service during this data design phase to determine what characteristics it shares with other data elements. By doing this you will save time during the schema and namespace design stages and avoid deployment problems. For example, suppose a certain dynamic data element will be rewritten many times a second, but is used by only one application. It may not make sense to include the dynamic data element in the directory service at all because most directory implementations are optimized for read access, and writes are relatively expensive. Each of the characteristics mentioned in the preceding list is discussed in more detail in the following sections. Tip Before you design your directory schema (a topic we will tackle in the next chapter), you should characterize each element using the guidelines included in the following sections. You should add this information to the list of data elements you created when you examined your applications' needs. FormatData elements can be grouped based on the natural format of the information. For example, people's names are always textual data, but telephone numbers consist primarily of digits. Table 6.3 shows some of the more common data formats and provides sample data elements for each. Table 6.3. Data formats
If your textual data is written in more than one character set or language, be sure to note that as well. As we will see in Chapter 7, each LDAP attribute must be mapped to a syntax that precisely defines the rules for interpreting the stored values. For example, the cn (common name) attribute is of the syntax caseIgnoreString , which means that the case of letters that make up a name are not significant when comparing one cn value with another. Size of Data ValuesSize refers to the number of characters or bytes that a data element value consumes. Knowing the approximate size of each value will help with the more directory-specific design work that we will tackle in subsequent chapters. Although it is sometimes hard to assign hard limits for the size of a data element, it is usually relatively easy to come up with a range that encompasses the typical data values that will be used. For example,Figure 6.2 shows how the elements of a North American telephone number combine to require approximately 14 characters of storage. Figure 6.2 Size of a telephone number valueAt the other end of the scale, if you choose to store images in your directory service, the size of each value will be much larger (perhaps in the 25KB “1MB range). Be sure to check your directory server software to see if it places any arbitrary restrictions on the size of the data values that can be stored. When considering how large text data values will be, do not forget to take into account the character set if you have any data that is not plain ASCII. In LDAP directories, international data is typically represented using the UTF-8 encoding of the Unicode character set (as explained in Chapter 3, "An Introduction to LDAP" ). UTF-8 is a variable-length encoding, and each character in a UTF-8 string requires between 1 and 4 bytes. Tip In some cases it makes more sense to store a pointer to a data value in the directory service instead of storing the data value itself. This pointer-based approach is especially useful when the data value is large but related to another data element that you plan to store in your directory. For example, if you are storing in the directory information about all your department's current projects, you may want to include HTTP URLs that point to the detailed project plans. The project plans themselves should remain on Web servers as they are probably large, complex documents; but users and applications will still be able to locate these documents by consulting the directory service. Number of OccurrencesFor each data element you should answer the question "How many data values will there typically be for this element?" For example, a person will usually have only one user ID but may have several phone numbers. This information will help in directory capacity planning. It may also be useful if you replicate or synchronize with other data stores which may have different characteristics, since it will be important to know if, for example, a data store can only hold one value for a person's name when your directory allows many values. Data OwnershipDealing with data ownership issues is one of the more challenging aspects of directory design. When thinking about access control, privacy, and security it is important to know exactly which people and applications should be allowed to view or modify a data element. Data ownership also affects whether you will allow a data element to be changed by directory clients and whether and how a data element is kept in sync with other data sources. Related questions include "Who should be notified when this data element is modified?" and "If this data element is stored in more than one data source, which system has final authority over the data element?" Unfortunately, the answers to these questions may be muddy and will likely change over time but just make a note of the muddiest areas and move on. ConsumersThe consumers of a data element are those directory-enabled applications that use it. When planning directory replication and topology, and when managing the relationships between your directory service and other data sources, it is helpful to know about the consumers of each data element. For example, a mail transfer agent (MTA) is a piece of application software whose job is to route electronic mail messages to their correct destinations. When processing email, MTAs may look at an attribute in a user's entry called mailHost , which gives the hostname of the server that holds the user's email. This is depicted in Figure 6.3. Figure 6.3 An MTA and the mailHost attribute.The MTA is thus an important consumer of the mailHost attribute, so it may beimportant that the MTA always gets the most up-to-date copy of the mailHost attribute that is available. Email client software might also use the same mailHost attribute to determine where a user's mail drop is, in which case this becomes a shared attribute. Similarly, users may be allowed to change their home telephone numbers in the directory service, changes that may be propagated to a personnel system that stores its data in an Oracle database. This scenario is shown in Figure 6.4. Figure 6.4 Home phone number propagated to personnel databaseIn this scenario the personnel system is a consumer of the home telephone number, but it is also a source for other non-LDAP applications that may access the phone number directly from the personnel system. If the home phone number is not a piece of data critical to the personnel system, it may be okay for information between it and the directory service to be updated fairly infrequently. If the phone number is critical to both the personnel system and the directory service, however, a process that accomplishes frequent synchronization may need to be developed. If you compile detailed information about all the consumers of a data element, you can aggregate all the information into an estimate of how often a given data element will be accessed. Again, this information is useful when doing capacity planning for your directory service. Dynamic Versus Static Data ElementsIt is also helpful to know which data elements are dynamic (i.e., have values that change) and which are static (i.e., have values that do not change). You will need this information when designing your directory server topology as well as for capacity planning. For example, if you use a replicated directory service that allows writes for a given entry to occur on only one server (a single master system), and you have a lot of attributes whose values change often, you may need to partition your data to avoid overwhelming any one master server with the write traffic. One way to characterize the dynamic or static nature of attributes is by estimating the ratio of reads-to-writes for each data element. For example, a user ID may be written once when a student joins a university but read dozens of times each day as email is delivered; this attribute is static. In contrast, if a Web browser stores a user's personal bookmarks in the directory service, they may be changed once a day or more often; these attributes are dynamic. Shared Versus Application-Specific Data ElementsSome data elements are used by many applications; others may be used by only one application. Data elements that are shared require careful planning so that the needs of all the applications are met adequately. On the other hand, if a data element is used by only one application and the data values are large or accessed frequently, you should consider keeping the values outside your directory service to avoid performance problems when accessing the values. You may want to include some guidelines for making this kind of decision in your data policy statement (as discussed earlier in this chapter). Tip One thing to watch out for if you conclude that a data element is application-specific is that, over time, new applications may come online that also use the data element. Because your directory service is a great place to put shared data, when in doubt assume that a data element will be shared. Note that even if data elements are not shared by more than one directory-enabled application, it may make sense to store them together to ease manageability or improve the availability of the information (through directory replication). For example, it may be desirable to delete a person's email-related data elements when the person is deleted from the corporate phonebook. The easiest approach is to store the email-related elements in the directory service along with all the person's contact and other information. That way, you won't need to delete the information in both the directory and the mail system to delete a user's record. Relationship with Other Data ElementsWhen selecting schema and laying out the namespace for your directory service, it is useful to know how your data elements are related. Because directory entries typically represent real-world objects, it is important to know which data elements relate to the same kind of object. For example, if you have an entry in your directory service for each printer attached to your network, you want to make it easy for an application to find all the printer-related data elements. You can accomplish this by choosing a schema that defines an all-inclusive printer object (see Chapter 7 for more information on schema). Some relationships between data elements are more subtle and may be easily overlooked. For example, if you will use the directory service to determine who used a printer in the last 24 hours, there is a need to relate information about some users to the printer's entry. This need could be addressed by including in the printer entries a set of data elements that refer to user entries. A Data Element Characteristics ExampleSuppose that we work at a large university with a great variety of installed electronic mail systems. (See Chapter 24, "Case Study: A Large University," for a related example.) Electronic mail is often a major factor affecting the expenditure of information technology dollars, and we would like to show our boss the value of our new directory service. To do this we decide to develop as our first directory-enabled application a service that reroutes all email entering the university from the Internet to the correct system. The basic setup is shown in Figure 6.5. Figure 6.5 The Business Card Email Service.By configuring our domain name system correctly, we arrange for all electronic mail messages sent to STRING@bigu.edu to arrive on the machine called redirector.bigu.edu . This machine runs a customized copy of the sendmail software that searches the directory service running on ldap.bigu.edu for a user's entry using criteria constructed from STRING . For example, if a message is addressed to babs.jensen@bigu.edu , a search with an LDAP filter of cn=babs jensen is performed. If an entry is found, the email message is re-sent to the user's mail delivery address, which is typically a mail server in the individual's department or school within the university. We christen this service the "Business Card Email Service," or BCES , because now people can safely put one centrally managed email address on their business card that will not change even if they switch departments within the university. One of the first design questions we need to answer is "What data elements do we need and what are their essential characteristics?" Table 6.4 provides one potential answer (the real answer depends on what application software we use to develop and deploy our new service). Table 6.4. Data element characteristics example
Note that some characteristics are missing from Table 6.4. We don't include any information on how dynamic each data element is because we believe all the data elements hold data values that do not change very often. Also, because we are focused on just one application, we have not yet needed to think about which of these data elements will be shared with other directory-enabled applications.
|
Index terms contained in this sectionapplication-specific data elementsvs. shared data elements 2nd 3rd consumers data elements 2nd 3rd 4th data elements 2nd 3rd characteristics example 2nd 3rd consumers 2nd 3rd 4th dynamic vs. static 2nd formats 2nd number of occurences object relationships 2nd ownership shared vs. application-specific 2nd 3rd value sizes 2nd 3rd 4th directories data elements 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th elments 2nd 3rd 4th 5th dynamic data elements vs. static data elements 2nd elements data 2nd 3rd characteristics example 2nd 3rd consumers 2nd 3rd 4th dynamic vs. static 2nd formats 2nd number of occurences object relationships 2nd ownership shared vs. application-specific 2nd 3rd value sizes 2nd 3rd 4th formats data elements 2nd frequency data elements objects data element relationships 2nd occurences, number of data elements ownership data elements shared data elements vs. application-specific data elements 2nd 3rd sizes data element values 2nd 3rd 4th static data elements vs. dynamic data elements 2nd values data elements sizes 2nd 3rd 4th |
2002, O'Reilly & Associates, Inc. |