6.3 ID Requirement Pattern

Basic Details

Related patterns:	Data type, data structure
Anticipated frequency:	Between one and six requirements
Pattern classifications:	Affects database: Yes

Applicability

Use the ID pattern to define a scheme for assigning unique identifiers for some type of entity or to indicate that a data item (or combination of data items) can be used as a unique identifier.

Discussion

We give IDs to things to unambiguously identify them so that we can go directly to whatever an ID refers to. IDs also let us prevent duplicates (such as two customers with the same ID). IDs look simple, but they have a few subtle complexities. It's not good enough to say "customers shall be identified by a six-digit number"; more needs to be said.

An ID can either be added solely for the purpose of identification (such as a customer number, which tells us nothing useful about a customer per se) or it can be a piece of information that's present anyway that's suitable for identification (such as a customer's email address). Write an ID requirement for each type of ID you actually need-not for values that could be used as IDs but which you have no reason to.

The form of an ID can be either a simple data type (such as a number or name) or it can comprise multiple parts (for example, an order ID made up of customer number plus a sequential order number for that customer). Multipart IDs are sometimes needed to uniquely identify an entity, or they may be simply be more convenient (easier to understand, allocate or manage). Multiple ID parts can be used to extend the scope of uniqueness. For example, if we tack on our company's Web site address, we can create IDs that are unique worldwide (as email addresses are). Conversely, part of a multipart ID is unique within a smaller scope. For example, if we allocate order IDs as customer number plus a sequential number, each sequential number is unique within the scope of a single customer-and will suffice, if we confine ourselves to that scope. Audiences for IDs needn't always be aware of every part of an ID. For example, when showing order numbers to a customer, we needn't show the customer number part.

The predominant data types used for IDs (or parts of IDs) are characters (strings) and numbers. Dates and times of day come in a distant third. If you're fortunate enough to be free to choose the most suitable form for a particular type of ID, how do you decide? The key factors to weigh up are

Factor 1: Uniqueness Containing enough information to pick out the single entity it refers to.
Factor 2: Meaningfulness Making inherent sense to a person (for example, names), if they're going to be visible to people (and they usually are).
Factor 3: Conciseness Needing as few characters or digits as possible.
Factor 4: Memorability Being easy for a person to remember. Meaningfulness and conciseness contribute here.
Factor 5: Simplicity Inflicting as few distinct parts on people as possible. This isn't the same thing as avoiding multipart IDs-as the discussion on that subject demonstrated (in the big paragraph preceding). No, it means needing to make visible as few parts as possible.
Factor 6: Who allocates A person or a machine? It's easier for a person to devise a name and easier for a machine to allocate a number.
Factor 7: Connection to other IDs Basing an ID on another ID that's already known. This is convenient when the base ID is known to the user. For example, banks don't inflict on their customers huge transaction numbers that are unique within their system. Rather, they allocate sequential numbers for each customer that are unique within the bank when tacked on to the customer ID.
Factor 8: Flexibility Being able to handle IDs of unknown type or several different types. This might occur if we must accommodate IDs supplied by an external system or by several different external systems.

Some of these factors go hand-in-hand; others conflict with one another. In particular, the uniqueness factor tends to conflict with the others.

Content

An ID requirement should contain the following:

Owner entity name To what are we allocating IDs of this kind? Examples might be customer and employee. In rare situations, we may want more than one type of entity to share the same ID scheme. For example, we could have transaction IDs that are used for both customer orders and defective item returns. Do this if there's a genuine business need, but avoid it otherwise.
ID name If the owner entity has more than one ID, to clearly distinguish one from another.
ID form Restrict an ID requirement to what's actually needed; don't make decisions about the form of IDs if you don't have to. Say what you need to achieve, and no more. One reason is that allocating IDs can have performance impact, especially if all IDs must be allocated by a single allocator (which would represent a bottleneck). Give developers the freedom to allocate IDs in the most efficient way if you can.

For IDs that contain characters, state whether they're case-sensitive. Is "MAC" the same as "Mac"? IDs that aren't case-sensitive are usually better-to avoid confusion.
Scope of uniqueness What are the bounds of the context within which an ID of this kind is unique? It's important that you give some thought to this and not make the scope of uniqueness too small, because it might make life awkward later in the life of the system. For example, if you install multiple instances of your system that all allocate sequential customer numbers and later you want to merge them all into a single database, you'll need something in addition to the customer numbers to distinguish the customers. You could include this something (perhaps a company ID or office ID) in the scope of uniqueness from the outset.

If the scope of uniqueness isn't stated in a requirement, assume that the ID is to be unique for entities of this type within the scope of the system but not beyond it. Take note of that "of this type" qualifier, because it means we can have the same ID for different things-which is usually no problem, because they're unlikely to collide. For example, we could use "JPY" as the ID of currency details about the Japanese Yen and also as the ID of an employee with these initials. If you want to have only one entity with the ID "JPY," you must define the scope of uniqueness such that it spans every type of entity affected.
How allocated IDs don't appear magically out of thin air, so where do you want them to come from? There are three main ways IDs can be allocated, each with its own advantages and drawbacks:
1. Automatically by the system. For this way, you could describe how the system is to allocate IDs, or you could leave that as a design question. The most common approach is sequential numbers-in which case you should consider the first value (start at one?), whether it should be reset (and, if so, when), how big it must be to avoid the risk of it overflowing, and what should happen if it does overflow (it mightn't bear thinking about!).
2. User choice. Let the user type in the value they'd like to use as the ID. The system must cater for them entering a value that's already used as an ID. When this happens in the case of a user ID, it's common practice to ask them to choose a password at the same time and to say there's a problem with either the user ID or password, so as not to alert the user if they stumble upon someone else's user ID. (It doesn't fool an alert user, though.)
3. From an external source, such as another system. A value a user possesses and is asked to type in counts as from an external source (for example, a credit card number), but there is the extra risk of them entering it incorrectly. So validate it thoroughly before using it as an ID.
There are variations that look like combinations of these ways, but they usually boil down to one of the three. For example, the system could generate a suggested ID and then let the user modify it, thereby boiling down to user choice. Or the system might create an ID based on the user's choice-if it's already used, say-thereby boiling down to one of the first two, depending on whether we give the user the chance to modify the ID the system created.

Just for good measure, there are a couple of obscure ID allocation factors to keep an eye open for. First, hidden significance: might an ID's value reveal something you'd rather it didn't? For example, if a customer is allocated customer number 0000012, you're revealing you've done little business and the customer might trust you less. Second, continuity: if an old system allocates IDs in a particular way, doing it differently in a new system might have unexpected consequences, even if it functions perfectly. For example, veteran employees in some companies take pride in having a low employee number; a new system that allocates employee numbers in a different way could be resented. Your ID requirement might respond by demanding that each ID allocated must be higher than all previous IDs.
Display format As per the data type requirement pattern. This is usually only warranted for complex ID forms, but you could use it to state, say, that numbers are to be shown with leading zeroes removed (or present, up to a specified number of digits, if you're so inclined). If an ID has multiple parts, how should it be displayed to people, to make each of the parts clear? For example, we could add separators like dashes and dots. Also consider describing how a multipart ID is to be typed in on a screen: should it be divided into multiple input fields?
Sort order If it's different from the obvious, or if you want to prevent the development team being silly. This applies especially with multipart IDs. For example, you may want "61-001" to come before "123-001," which is likely to happen only if sorting treats the ID as two separate numbers. Avoid sort orders that aren't intuitive or that involve interpreting the meaning of any part (such as extracting the date meaning from a form like, say, "21AUG12").

When sorting on textual values, consider whether you need to worry about the sort order of characters used in languages other than yours. For example, if it's important that "é" and "ë" are sorted as if they are next to "e" (or the same as "e"), say so.
Reuse conditions, if necessary When an entity expires, you may want to be able to reuse its ID. For example, if an employee identified by their email address "chris@ourco.com" departs, a new employee called Chris might want this email address. Avoid reusing IDs if possible, because it opens the door to ambiguity (and consequently mistakes): the new Chris might carry the can for the mistakes of the old one. But if the system must permit reuse, say so, and impose conditions to minimize the chances of the wrong entity being identified. If no reuse conditions are stated, it's reasonable to assume that IDs of this type won't be reused.

Template(s)

Open table as spreadsheet

Summary	Definition
«Owner entity name» [«ID name»] ID	Each «Owner entity name» shall have a unique ID that is in the form of «ID form» allocated by «How allocated». [«Display format».] [Each «ID name» shall be unique «Scope of uniqueness».] [«Sort order statement».] [«Reuse conditions statement».]

Summary

Definition

«Owner entity name» [«ID name»] ID

Each «Owner entity name» shall have a unique ID that is in the form of «ID form» allocated by «How allocated».

[«Display format».]

[Each «ID name» shall be unique «Scope of uniqueness».]

[«Sort order statement».]

[«Reuse conditions statement».]

Example(s)

Open table as spreadsheet

Summary	Definition
Customer number, with check digit	Each customer shall be uniquely identified by a customer ID that is in the form of a number allocated sequentially plus a check digit calculated over that sequential number using the «algorithm name» algorithm (as explained at «algorithm location»).
Order ID	Each order shall be uniquely identified by an order ID that is in the form of the number of the customer that placed it plus an order number allocated sequentially for that customer, starting at one for the customer's first order. Order IDs shall be displayed in the form "«Customer number»-«Order number»"(for example, "10762-1").
Employee ID	Each employee shall have an employee ID that is a five-digit number allocated externally (and entered manually when an employee's details are first entered). Each employee ID shall be unique within the scope of the system; even if an employee departs, their employee ID shall not be reused for another employee.
Loan approval decision rule ID	Each loan approval decision rule shall have an ID by which it can be referred. Rule IDs shall be allocated in such a manner as to not become invalid or incorrect when another rule is added or removed. For example, a rule's sequence in a list of rules cannot be used because that will change if an extra rule is inserted before it.

Extra Requirements

An ID requirement is usually self-contained. But extra requirements might be needed for the following couple of topics:

Rules to be followed for every "invisible" ID scheme that developers choose to add. An "invisible" ID is one that is used internally by the system in order to function but that is not normally visible to users. For example, if we wanted to give customers the option of changing their customer ID, we might assign them a second, invisible customer ID that never changes. We can't demand anything for a particular ID scheme that's not itself mentioned in requirements, but we can write requirements to apply to all invisible IDs (or more widely to all IDs).
Continuity of IDs from a system we're replacing. If our system is taking over from a previous system, we'll have to import all its old data-including all its old IDs. Once our system goes live, it needs to take over from where the old system left off (or, at least, it must start allocating new IDs sensibly, without breaking anything). What must we do to achieve this? What pitfalls do we want to avoid? This can be a significant issue if we're replacing multiple old systems that used different ID schemes.

Here's an example pervasive requirement of the first kind, that cover all IDs (not just invisible IDs):

Open table as spreadsheet

Summary	Definition
All IDs viewable	Every ID that can be used to identify an entity shall be viewable by some means, regardless of whether that ID is intended to be seen by normal users. The purpose of this requirement is to assist developers, testers, and auditors in examining the workings of the system.

Summary

Definition

All IDs viewable

Every ID that can be used to identify an entity shall be viewable by some means, regardless of whether that ID is intended to be seen by normal users.

The purpose of this requirement is to assist developers, testers, and auditors in examining the workings of the system.

Considerations for Development

It's possible to add extra ID schemes beyond those specified in requirements, and it's sometimes necessary. In particular, it can be useful to add "invisible" IDs that users need never see. If you use these "invisible" IDs for most processing, less reliance is placed on "visible" IDs, which makes it easier to change them. Conversely, if it is possible to change an ID, you're likely to need unchanging "invisible" IDs to hold everything together.

Martin Fowler's Analysis Patterns (1996) has an identification scheme analysis pattern.

Considerations for Testing

Test the overflow of sequentially allocated numbers. That is, contrive to reach the highest possible ID and then see what happens when you attempt to go past it.

Is it possible for unusual or special values to be used as IDs, such as zero or all spaces? If so, are they properly handled? If not, what happens if you try to these values as IDs?

If an ID allocated to an expired entity can later be reallocated to a different entity, scrutinize this situation closely. Do the two entities ever get mixed up? Make sure information about one isn't presented as if it belonged to the other.