6.1 Data Type Requirement Pattern

Basic Details

Related patterns:	Unparochialness
Anticipated frequency:	Usually fewer than 10 requirements, but more if you wish to specify how common data types (such as dates) are to be displayed
Pattern classifications:	Pervasive: Maybe; Affects database: Yes

Applicability

Use the data type requirement pattern to define how a particular atomic item of information (a single field) for a particular business purpose is to be represented and/or displayed. Also use the data type requirement pattern to specify how a standard data type is always to be displayed (for example, all dates).

Discussion

Define a logical data type for each unit of information that serves a clear logical role in the system, such as product type code or company ID. Don't bother doing this for items of information that are so clear and simple that they need no further explanation, such as a count of transactions or plain text.

Writing requirements for logical data types isn't essential, but it's good practice. It promotes consistency across the system, saves repetition, and prevents developers arbitrarily making their own decisions on something that's very visible to users. A data type requirement's primary responsibility is to describe in detail the form of the information the data type must convey (for example, six alphabetic characters, or a date). It can also state how all occurrences of this type of data are to be displayed to the user. Sometimes describing the form is easy, and the bulk of the requirement deals with display.

Any data type defined in a requirement is a logical data type, so also define it in terms of what the customer needs: its business purpose, the role it plays. Even if two data types have the same form (say, a three-character code), if they serve different business purposes, they're still logically distinct and must be treated as such. Don't mention technical data types (as used in databases or programming languages): maintain independence from the technology, unless forced to by circumstances. In any case, a logical data type can be represented in multiple ways depending on the technology (for example, one way in the database and another in each programming language).

The table that follows lists the form of various common data types, in the self-evident terms best suited to nontechnical readers (and it's reasonable to assume you'll always have some of these), along with suggestions on what else to say about them (and what not to say), plus other things to consider. If you want to state precisely what a particular common data type means in the context of the current system, write a requirement for it. You can also use a requirement to describe extra characteristics a common data type needs in your system; a few possibilities follow.

Open table as spreadsheet

Form	Suggestions and Extra Considerations
Characters	"Characters" on its own implies that a value can contain anything the user can type in using a keyboard. If you want to restrict this somewhat, you can instead refer to "alphabetic characters" or "alphanumeric characters." (Don't call them strings or varchars or other jargon.) Whenever a data type allows alphabetic characters, consider case: must it all be uppercase, all lowercase, or either? If either, is it case-sensitive (so "a" and "A" are treated as different) or case-insensitive ("a" and "A" are treated as the same)? When entering such values into a screen, you can usually prevent unacceptable case values from being entered. But this isn't possible for all sources of data (such as messages sent by another system, or data read in from flat files). If your system has any of these, consider what should be done if a letter arrives with the wrong case. Should it be automatically converted to the correct case, or should it be rejected? Extra characteristics to consider include which languages' character sets must be allowed: letters with accents, multibyte characters (for languages such as Chinese)? Allow special symbols to be entered? Bear in mind that some users are likely to have keyboards that contain different characters to yours (especially if they come along from anywhere in the world across the Web). It may be worth defining two "characters" data types: one narrow and one broad.
Number	"Number" on its own implies a whole number made up of the digits 0 through 9 (an integer-but don't call it that; also avoid programming terms such as float and double). If it's not a whole number, state how many decimal places you want to allow. (In commercial systems, numeric values that genuinely might have an arbitrary number of decimal places occur relatively rarely.) Some numbers are allowed to take negative values, and some aren't. Often it's clear from the context whether negatives are allowed, but it's usually a good idea to state explicitly in each place where a number is used whether it's signed or unsigned. A percentage is simply a type of number. So are numbers to bases other than ten (hexadecimal, binary), but it's unusual to find them mentioned in the requirements of commercial systems. Any value that has a type of unit associated with it is not a type of number. In particular, a monetary amount isn't a type of number, because every amount is denominated in a particular currency. But you can treat a number-with-unit as a plain number if you're sure-cross-your-heart-and-hope-to-die sure-that the unit will always, always, always have the same value in your system (for example, one currency).
List of values	If you list all the possible values a data type can have, you don't need to say how they're held (or stored). Nor should you, because that's a design decision. In any list of this kind, just give logical values. Find a word or two that sum up the meaning of that value, and then add a brief description of it. Don't invent either coded values for the system to use (that's a design matter) or the text values to show to the user (which you shouldn't tie the system to at this early stage and which will vary in a multilingual system anyway). When giving a list of allowed values in a requirement, include every value you know: never end it with "etc.," which is a cop-out that's likely to bite you later.
Yes or no	This is the simplest of all data types, but it's a little awkward because it's preferable not to inflict the word boolean on nontechnical readers and there's no clear alternative. Still, it can be helpful to list the possible values in the user's terms (yes or no, true or false, male or female, member or not, on our mailing list or not), because that's how it needs to be presented to users. Also consider whether there are more than two alternative values (such as maybe or don't know or unspecified), which means it's a list of values rather than a boolean.
Date	This indicates the value holds an actual calendar date (such as 19^th March 2012). If you want to be able to support special values (such as "today" or "tomorrow"), describe what you need (for example, whether they're solely for the convenience of users entering dates, and then manipulated and stored in the form of an actual date).
Date and time	Use this for an instant in time (such as 4:13 a.m. on 19^th March 2012). If you need values to record the time as well as the date, say so. If you just say "Date," it won't include the time. "Timestamp" is a good word to use to indicate the current date and time at which an event happened-but add this term to your glossary first. See the "Endless Fun with Dates and Times" sidebar later in this pattern for more things to consider.

This table might give the impression that some of these forms are awkward to express, but when you write requirement text it's easy enough to phrase the surrounding words to make the whole description flow nicely. A few other data types that are often handled untidily are discussed in the unparochialness requirement pattern in Chapter 10, "Flexibility Requirement Patterns."

Most data types have a simple form. But a few are compound-that is, they comprise more than one part. Spell out compound data types in detail, especially the intricacies of how users are expected to type them in. Are separator characters needed to divide the different parts (such as dashes in a telephone number)? Conventions for separators can vary from one country to another (as discussed in the unparochialness requirement pattern). Entering a compound value in a single field can be quicker than having a field for each part (especially for power users), but using separate fields makes it easier to provide lists of allowed values and to report errors.

Check Digits

People can make mistakes when they type in values, especially numbers. One way to catch the majority of such mistakes (roughly 90 percent) is to tag on the end an extra one or two digits calculated from the number itself. These are called check digits, and if the check digit(s) entered by the user disagree with the calculation, ask the user to try again. The downside of check digits is that the user has more digits to enter (and perhaps to remember). Check digits are often used for things like customer numbers and account numbers, and if you need to introduce numbers like these, you could consider whether adding check digits would be worthwhile. But don't go overboard: use them sparingly.

For data types that include check digits (or wherever one part of the data is calculated from another), explain how the check digits are calculated. Either cite a document that describes the algorithm (you could put it in an informal part of the requirements specification and then refer to it, or put it in a separate requirement to save overloading the requirement), or describe the algorithm in the requirement itself. This pattern's examples include one for a number with check digits, which specifies its algorithm explicitly.

Consider whether your data type definition is parochial (that is, restricted in scope or outlook somehow). Is it tied geographically? Does it limit your system to use within only one country or state, or within one company or industry? If so, is that acceptable? Even if it is acceptable, might removing such a limitation open extra possibilities? Spell out any such limitations so that readers are aware of the possible implications. Many companies have built such a nice system for themselves that they've decided to sell it-only to find that it's a lot harder than they ever imagined, one reason being little company-specific data types that pervade the whole system. Making provision for wider use of this kind is straightforward when a system is first built, but difficult and expensive to change later, usually prohibitively so. (See the unparochialness requirement pattern for more on this subject.)

For some common data types you can find a standard that defines a suitable format (occasionally more than one acceptable format) and sometimes lists of allowed values, such as the ISO standards for currencies, countries, and languages. The comply-with-standard requirement pattern (in Chapter 5, "Fundamental Requirement Patterns") has example requirements for a few such standards. When mandating the use of a standard to define a data type, we end up with a requirement that combines the data type and comply-with-standard requirement patterns.

Endless Fun with Dates and Times

What could be more straightforward than asking a customer for their date of birth or than recording when a transaction occurred? Dealing with dates and times sounds deceptively easy and as a result various pitfalls are often ignored and fallen into later-the most notorious case being the Y2K problem, which was a failure of requirements as much as anything else. Lest we forget! Following are a few traps to avoid by specifying suitable preventive requirements. (Some of these have representative requirements in the "Extra Requirements" subsection later in this section, and some have further complications discussed in the unparochialness requirement pattern.)

Time zones Dates and times can be held using more complex structures that contain their own time zone. This gets around any confusion but causes extra processing-introducing the need to keep converting between different time zones when displaying them and with it new opportunities for errors. Time zone issues are more important for distributed systems, especially for Web-based ones that can be accessed from anywhere in the world.
Daylight saving time Storing any timestamp (even if it's just the date part) according to a time zone that is subject to daylight saving time is asking for trouble. Any event that occurs in the hour before the clocks go forward (assuming they go forward by an hour) might look like they happened after events that occurred later. Avoid this sort of unpleasantness by always, always, always storing timestamps according to an unchanging time zone. The standard time zone for such purposes is UTC (Universal Time Co-ordinated), effectively what used to be called Greenwich Mean Time, and what pilots call Zulu.

The first "Extra Requirement" that follows covers this topic. But beware that even spelling out the reasons in detail (as that requirement does) may not be enough to get the message through to some nontechnical readers, who can still be worried about having times inflicted upon them according to a strange time zone, despite your reassuring them at every opportunity.
Date display formats These vary around the world (dd/mm versus mm/dd, and so on) and need to be borne in mind if you have users in more than one place. People are usually reasonably tolerant when using a foreign system-but real problems could be caused if 3/12 is read as 3^rd December when it's actually 12^th March. If this is possible, add a note somewhere explaining the format used for dates. This is a matter of unparochialness.
External timestamps Beware of timestamps that are allocated outside your system (by another system, or an unknown Web user's PC): don't trust them. If you want them, supplement them by timestamps generated by your own system. (Actually, this isn't strictly a data type matter, but this is a good place to mention it.)
Full year Oh, and don't forget to store the full year for all dates. We don't want our descendents to endure a Y2100 problem. When you're storing dates in a database using its date-related data types, you should have nothing to worry about. But you might find dates stored in other places (for example, if you store copies of messages or formatted flat files, or if developers embed them in file or directory names).

Content

A data type requirement should contain

Data type name Give this data type a unique name that reflects its business role. Make it self-explanatory yet succinct, because it's likely to be referred to frequently.
Purpose What's this data type used for? What role does it play, in business terms? Write a separate data type requirement for each purpose, even if two or more share the same form.
Form The sort of values the data type needs to convey. Common forms are listed above. This is used to help decide the best way to manipulate (in program code) and store (in a database) values of this type-though they are design decisions to be left for later.
Display format For both output and input. If values of this kind need to be displayed in a particular way, describe what it is. This typically means including separators to make it easier to read (dashes in credit card numbers and phone numbers and such like). For some data types, the commonly accepted format varies from one place to another. If you want the system to accommodate such variations, explain what it needs to do. (See the unparochialness requirement pattern for more.)

You can also describe any other aspects of input that are relevant. A common one is presenting the user with a list of values from which to choose. If the number of options is large, ask yourself if there's a better way. For example, Web sites that ask for your address often confront you with a list of all the countries in the world-which I find unfriendly. Then the valid options might depend on another value already entered. If I've already said which country I'm from, don't present me with states for another country. Use the data type requirement to indicate how smartly you want the system to handle it-but bear in mind that the user interface technology might make it difficult (plain HTML, for example).
Constraints Limits on values-for example, they must be positive, or within a given range, or the value's length must be within a given range. Avoid mentioning field lengths if possible, but if you do, try to stick to minimum lengths that must be accommodated (such as "email addresses at least 60 characters long must be allowed"-so a system that allows more is acceptable). If your system might be used in multiple countries, be prepared to support longer values, especially for those that hold natural language text.

Constraints can specify other restrictions on allowed values, including the detailed form of any value. For example, we might want to insist that an email address has the correct form (in accordance with RFC 2822, the standard that defines it). For complex forms like that for email addresses, it's better to refer to the relevant standard than to attempt to enumerate constraints in the data type requirement-because a requirement doesn't have room to replicate the intricacies in a standard. Merely insisting that an email address contains one "@" isn't good enough, and trying to be more precise starts you on the path of duplicating the standard.

Constraints must be only those that apply to all occurrences of this data type: state constraints that apply in particular circumstances separately. For example, not allowing a savings account's balance to be negative is very different from saying that no monetary amount anywhere can be negative.
Special handling Some data types need rules for particular ways in which they must always be treated. If so, describe what they are. For instance, a password must never be displayed and must be stored in an indecipherable form (hashed or encrypted), and a new value must be entered twice before it's accepted. A credit card number must be displayed with its last four digits hidden. Describing these kinds of special handling as part of the data type requirement saves repetition and demands that they apply systemwide.

Template(s)

Open table as spreadsheet

Summary	Definition
«Data type name» data type	«Data type name»s, which are used for «Data type purpose», shall be of the form «Data type form». [«Display format statement».] [« Constraints statement».] [«Special handling statement».]

Summary

Definition

«Data type name» data type

«Data type name»s, which are used for «Data type purpose», shall be of the form «Data type form». [«Display format statement».] [« Constraints statement».]

[«Special handling statement».]

Example(s)

Open table as spreadsheet

Summary	Definition
Email address length	The system shall allow email addresses at least 60 characters long.
Telephone number form	Employee telephone numbers shall be of the form:
	AA-LLLL-NNNN xEEEE
		where	AA	is the area code,
			LLLL	is the locality,
			NNNN	is the individual number,
		and	EEEE	is the extension.
Card number format	The card number of a customer membership card shall be a number of up to 16 digits, the last of which is a check digit. For a card number of the form N₁N₂N₃N₄N₅N₆N₇N₈N₉N₁₀N₁₁-N₁₂N₁₃N₁₄N₁₅C, the check digit C shall be calculated as follows:
	ODD	=	N₁ + N₃ + N₅ + N₇ + N₉ + N₁₁ + N₁₃ + N₁₅
	EVEN	=	N₂ + N₄ + N₆ + N₈ + N₁₀ + N₁₂ + N₁₄
	CHECK	=	(ODD * EVEN) + ODD + EVEN + 7
	C	=	CHECK modulo 10 (that is, the last digit of CHECK)

Extra Requirements

Few data type requirements have a need for follow-on requirements. One situation is when you need to say something about the context of the data, when a data item's value alone doesn't tell you what it means. For example, for a date and time you need to know its time zone to know which moment in time it refers to. In a case like this, a follow-on requirement can nail down a context to apply systemwide-as the following example does. Note that the second paragraph is unusually detailed. This was a riposte to a nontechnical customer who wanted to insist on not using UTC and had difficulty understanding the trouble that could lead to.

Open table as spreadsheet

Summary	Definition
Timestamps in UTC	All timestamps recorded by the system shall be in UTC (Universal Time Co-ordinated) when in some form of permanent storage. A "timestamp" is a record of the current time, as attached to transactions for the purpose of fixing when they occurred. The purpose of this is to avoid problems when switching to and from daylight saving time. (Without it, a transaction occurring just after the clocks went back will appear to have occurred before a transaction just before they went back-and, short of shutting the system down for an hour, nothing can be done to prevent any subsequent unpleasantness.) It is worth repeating that this requirement applies only to *the form in which times are stored* (in a database and other stores). It does not apply to the display of times to users; for that, see the following requirement.** This requirement does not care what is the designated local time zone of the machine(s) and/or database product on which the system is run. It is simply saying that when timestamps finally hit the database, they must be according to UTC. The system is responsible for doing whatever it takes to achieve this. (In practice, it is anticipated to be a straightforward task.)

Summary

Definition

Timestamps in UTC

All timestamps recorded by the system shall be in UTC (Universal Time Co-ordinated) when in some form of permanent storage. A "timestamp" is a record of the current time, as attached to transactions for the purpose of fixing when they occurred.

The purpose of this is to avoid problems when switching to and from daylight saving time. (Without it, a transaction occurring just after the clocks went back will appear to have occurred before a transaction just before they went back-and, short of shutting the system down for an hour, nothing can be done to prevent any subsequent unpleasantness.)

It is worth repeating that this requirement applies only to the form in which times are stored (in a database and other stores). It does not apply to the display of times to users; for that, see the following requirement.

This requirement does not care what is the designated local time zone of the machine(s) and/or database product on which the system is run. It is simply saying that when timestamps finally hit the database, they must be according to UTC. The system is responsible for doing whatever it takes to achieve this. (In practice, it is anticipated to be a straightforward task.)

Here are a couple of other examples that define other systemwide aspects of date/time handling:

Open table as spreadsheet

Summary	Definition
Show user times as per their time zone	Whenever a time (or date-and-time) value is shown to a user, it shall be according to the user's designated time zone. This requirement does not apply to Times obtained from an external source (for example, a data feed) from which the time zone cannot be discerned. Times for events that clearly occurred (or will occur) at a remote place, for which its local time zone may be used (for example, when inquiring on times of concerts in a selected city). In circumstances where there is no known user, the system local time zone shall be used instead of the user's time zone. (For example, this might occur when the system automatically runs reports.) A user's "designated time zone" is the one they have specified as a personal preference or, in the absence of that, the system's local time zone.
Time zone of time obvious	Whenever a time is displayed, its time zone shall be obvious. This can be achieved either by displaying the time zone as part of the time, by displaying the time zone prominently elsewhere, or by using an implied time zone (provided it is obvious). A case where an implied time zone is obvious is when displaying the times of a user's previous actions: it should be clear they're according to the user's own time zone. It is acceptable to display commonly used acronyms for the time zones. (Note, however, that ambiguities might exist because there is no international standard for these acronyms, not even in ISO 8601, the standard for date/time formats. As a result, these acronyms should not be used when inputting times.)

Summary

Definition

Show user times as per their time zone

Whenever a time (or date-and-time) value is shown to a user, it shall be according to the user's designated time zone.

This requirement does not apply to

Times obtained from an external source (for example, a data feed) from which the time zone cannot be discerned.
Times for events that clearly occurred (or will occur) at a remote place, for which its local time zone may be used (for example, when inquiring on times of concerts in a selected city).

In circumstances where there is no known user, the system local time zone shall be used instead of the user's time zone. (For example, this might occur when the system automatically runs reports.)

A user's "designated time zone" is the one they have specified as a personal preference or, in the absence of that, the system's local time zone.

Time zone of time obvious

Whenever a time is displayed, its time zone shall be obvious.

This can be achieved either by displaying the time zone as part of the time, by displaying the time zone prominently elsewhere, or by using an implied time zone (provided it is obvious). A case where an implied time zone is obvious is when displaying the times of a user's previous actions: it should be clear they're according to the user's own time zone.

It is acceptable to display commonly used acronyms for the time zones. (Note, however, that ambiguities might exist because there is no international standard for these acronyms, not even in ISO 8601, the standard for date/time formats. As a result, these acronyms should not be used when inputting times.)

Observe the extent to which these requirements go into detail. This is primarily to make the need for the requirement clear to nontechnical readers, who might otherwise not appreciate the difficulties likely to result otherwise. The second reason is to limit the applicability of these requirements, because there are circumstances where they aren't appropriate. Requirements relating to the context of any data type commonly deserve to be specified in such detail.

Any data type requirement that defines the structure of a data type is in effect already a pervasive requirement, so it needs no pervasive requirements of its own. But if you're writing a requirement that defines how a particular data type is to be displayed in certain circumstances, consider whether it's worthwhile to add a pervasive requirement for the structure of the underlying data type itself (if no such requirement already exists).

Considerations for Development

Database designers need to consider how to store this data type in a database: what database data type is most suitable for it? A nonsimple data type may be best stored as two (or more) columns in a database table. For example, a monetary amount that comprises the value plus the currency might warrant two columns, though any entity that has multiple monetary amounts might get by with a single currency column shared by them all. The database designer should liaise with the application developers on such matters, because what best suits the database might be awkward for the application software.

Application developers need to consider how to represent this data type in software: will a built-in data type (such as an integer or string) suffice, or is a special (object-oriented) class needed? If a special class is needed, does such a class exist already, or must one be developed?

Considerations for Testing

Verifying that a system properly handles individual items of data is one of the staples of testing, and one of the most fertile test areas. Any good book on testing (for example, Ash [2003]) will cover this topic in detail. Testing needs to cover the entry, interpretation (parsing), storage, manipulation, and display of all occurrences of each data type for which requirements have been specified. Occurrences can be found in requirements for, among other things, data entities, inquiries, reports, and other user functions. Further occurrences can also crop up in the delivered system, in areas that weren't specified in detail.

Testing that a data type can be entered correctly (and that invalid values are rejected) is likely to constitute the largest number of test cases. Draw up a list of values to enter. This list should include boundary conditions (values at, near, and beyond the limits of what's acceptable) and values that might be treated by the system as special (an empty value, zero, all spaces, for example). Also test values with various kinds of formatting-for example, thousand separators and decimal point in numeric amounts, or dashes and brackets in telephone numbers.

In addition to testing what users of the system see, testers can also study everywhere that data is stored (especially databases) to verify that defined data types are held in a correct way-for example, that dates are stored with the full year, and numbers are held to the right number of decimal places.