Chapter 7: Data Entity Requirement Patterns | Software Requirement Patterns (Best Practices)

Overview

Builders of systems tend to take a cavalier and ad hoc attitude to information: we lack rules for when data can be deleted; we're relaxed about losing data; we permit data to be modified without retaining a record of what it was before; we don't know who did what; we can't tell how amounts were calculated. It's hardly surprising that so many systems handle data in a sloppy and messy manner. This chapter aims to impose some order and consistency, by introducing a scheme for dividing all data entities into fixed categories that share many important characteristics. It then presents a requirement patterns for the most important categories of data entity, namely living entity, transaction, configuration, and chronicle, and suggests demands to make on all entities in each category. These categories are shown in Figure 7-1.

image from book
Figure 7-1: Requirement patterns in the data entity domain

To satisfy information-related requirements, a system needs a supporting infrastructure for information storage, to store information persistently (most commonly in a database); sometimes it needs more than one. The requirement patterns in this chapter are organized and written with a view to using a database as the main storage mechanism, which is true for almost all business systems. These patterns do, however, work perfectly well for data stored in flat files and more specialized repositories (for example, electronic directories for finding addresses, such as those adhering to LDAP, the Lightweight Directory Access Protocol). A database provides all the operations we need to store and retrieve our data. But these operations are at too low a level to relate meaningfully to requirements. We need a stepping-stone to help bridge the gap. This chapter divides a typical business system's data into seven key categories-according to the ways in which that data should be handled. We can then define, for each category, rules that apply to all data in that category (via pervasive extra requirements), to save having to worry about such matters for each individual type of data. In a typical business system, we find the following categories of data:

Open table as spreadsheet

Name	Summary	In requirements?
Living entity	An entity that has a lifespan. It is created, might be modified a few times, and is eventually terminated. Examples are a shop customer and a bank account. Any entity that is for configuration purposes is categorized as configuration rather than as a living entity. (See the configuration category that follows.)	Yes
Transaction	An event in the life of a living entity. In a commercial system, most transactions have a financial impact of one kind or another. Examples are a purchase from a shop, a bank account withdrawal, and a magazine subscription renewal.	Yes
Configuration	Parameters that control how the system behaves. Examples are sales tax rates and bank account types. It's reasonable to assume that anything not explicitly required to be configurable won't be, in which case, every change in behavior means a change to software. Any item of configuration of which there can be more than one (such as account types, or currencies) has a lifespan just like a living entity. The easiest way to distinguish between configuration and living entities is to imagine the organization has yet to open for business for the first time: anything that can be set up at that time is configuration (for example, account types and products); anything that can't is a living entity (for example, customers and their accounts).	Yes
Chronicle	A record of some event that happened. Examples are: an error, a record that some system process ran, a change to a living entity, and approval by a manager of a large withdrawal from bank account. Records of this sort are often called "audit trails" or "logs," but we avoid the first term because many people associate it just with financial transactions, and the second because it conjures up specific solutions (such as using a single log as a dumping ground for everything, and storage in flat files).	Yes
Derived	Data computed from the other kinds of data. Examples are daily order totals and total balances for each account type. Derived data can be identified by asking the question, "If I lost it, would I be able to regenerate it from the other data available?" In theory, you can regenerate all derived data from the data it is based on, though in practice, it demands a sound database design (so there's some risk).	Usually not
Administrative	Data used to keep the system running smoothly. Examples are the time that month-end processing was last performed and counters such as the last-used customer number. Administrative data is either global or associated with a living entity (such as a customer's last order number). Data of this sort results from design decisions, so it doesn't appear in requirements.	Usually not
Historic	Old, inactive data that no longer affects current business, but needs to be retained for legal or future investigative reasons. The structure of historic data is dictated by the original data, which is why requirements are not needed to define it.	Usually not

Each type of information is placed in one (and only one) of these categories, each of which has been chosen to recognize the special characteristics the information has. We can then define requirements to enforce rules for the sound handling of each category. Potential requirements are discussed in the requirement patterns in this chapter. Don't worry if one or two of the category names look rather abstract: you needn't expose readers of requirements specifications to them.

In the preceding table, the In requirements column indicates whether requirements are needed for that category of data. Not all data in a database is visible or of direct use to users; some is intermediate "massaged" data, is there to improve performance, or helps in other ways to keep the system running smoothly. Structures for this sort of data are created during the design of the database and don't appear at the requirements stage. Only data of direct interest to users is reflected in requirements (indicated by a Yes in the table), and only those categories of data have had requirement patterns written for them. Data entities for which Usually not is shown don't result directly from requirements, but there could be a requirement or two that refers obliquely to or implies the presence of such data. Also "Usually not" is preferred to "No" because it would be rash to say that there will never be requirements for these categories of data.

Each entity, regardless of its category, needs a name. Make it concise, unique, and descriptive-describing the kind of thing it stores. A useful convention is to express each one in the singular (so "customer," not "customers"). Avoid vague collective terms, such as "inventory," that don't tell you what that entity contains.

Extra Requirements

Requirements for specific sorts of information define both the data needed to operate the business and any associated functionality. But there are several areas, common to all types of information, for which extra requirements are sometimes appropriate or for which it is worthwhile to specify supporting functionality once rather than each time. Some areas to consider are:

Information integrity, including database back-ups and restore/recovery It can be dangerous to assume that all your information will magically always be there for you, intact, pristine, and available. Some judicious extra requirements can proclaim a few things that we should be able to rely on-partly for insurance and partly to remind the developers of pitfalls to avoid.
Multi-part information entry Any function that involves entry of information in multiple parts (that is, split over multiple screens) needs to be able to handle failure partway through. The simplest way is to record nothing (or act as if nothing has been recorded) until everything has been entered-but expect users to find this frustrating, because they then have to rekey the first information if the user exits partway through. However, if information is stored after each screen is entered, the system should provide a way for the user to pick up where they left off-which can be complicated and cumbersome to implement (so it costs more).
Co-ordination of multiple data stores If your system interacts with another system, and both store data, trouble could result when one of them fails: their data could be inconsistent. This is a technical matter, but point it out so it won't be forgotten, and think through possible problems (which might be significant) and how to deal with them-such as special corrective software, for which we'd need requirements.
Timed and co-ordinated changes Sometimes, changes to information in a system need to be made at a precise moment in time (such as a change in a tax rate at midnight on a specific date, mandated in legislation). Sometimes a collection of changes needs to be made at the same time (such as a revamping of a shop's pricing and discounting). Special provision must be made if you want these sorts of changes to be made while the system is available to users, with minimal risk of human error.
Approval of actions A user of a computer system usually goes ahead and does what they want, but a business sometimes wants some actions approved by a second person before those actions take effect. See the approval requirement pattern in Chapter 11, "Access Control Requirement Patterns."

Areas 4 and 5 have aspects in common, especially if your system ends up having a general mechanism for storing changes before applying them (which is discussed in the "Considerations for Development" section of the approval requirement pattern).
Inquiries When you're specifying a particular type of information (or a function for entering it) is a good time to ask yourself who will want to look at this information and why. For each function for entering information, there should be at least one function to view it-at the very least to let the user see whether what they entered was actually recorded, in the event of a system failure.

The subsections that follow deal with each of these areas in turn-except Areas 5 and 6, which are covered in the approval requirement pattern in Chapter 11 and the inquiry requirement pattern in Chapter 8, "User Interface Requirement Patterns." Areas 1 and 6 are likely to be relevant in all commercial systems; ask yourself whether the other areas apply to your system.

Information Integrity, Including Database Back-Ups and Restore/Recovery

First of all, what do we mean by "information integrity"? It can be summed up by the ACID properties possessed by any reliable database, which are: atomic-any change is made either completely or not at all; consistent-whichever way you look at the data, you get the same picture; isolated-no change is affected by any other change that's in progress but not yet completed; and durable-any completed change stays there. Backing up the information can be regarded as providing durability even when a storage disk is lost. Restoring is the act of copying from a back-up when the main data is lost, and then recovery is the act of bringing the restored data up to date using an update log of changes made since the back-up was taken.

Strictly speaking, if we want information integrity at all, we should define requirements for it. But if we're storing our data in a proper database, it's reasonable to assume that it will take good care of our data and guarantee its integrity. (If you're not happy with that, then by all means specify requirements you expect your database product to satisfy.) This isn't true for other places where information is stored: flat files provide no protection against data loss, for example. Developers like using flat files for such things as configuration parameters and imported/exported data, and they often forget about the integrity of this data-so it's a good idea to add a requirement to demand the integrity of all data, and emphasize that this includes flat files. Such a requirement can also demand (implicitly or explicitly) that any online directory (such as LDAP) or similar product that you use must store its data in a proper database.

You could draw up a list of all the types of information in your system (taking a broad interpretation of "information" for this purpose), including:

Web pages, including page templates (such as JSP or ASP pages), style sheets, and help pages.
Images, sounds, and other multimedia resources.
All flat files referenced by the system, especially configuration files. Don't forget files that belong to third party products (since they are indirectly part of your system). Including files used by the operating system is perhaps going too far.
Emails and any other types of electronic messages sent and received, including attachments.
Document templates (such as for letters generated using a mail merge facility).
Data recorded by sensors.

Then go through the list asking what would happen if you lost the disk on which those files live. Bear in mind that even if you have back-up copies of these files, you will lose all changes made since the last back-up copy. If some of the possibilities scare you, take steps to protect against them-for which you need to define some requirements. The more of this information you can store in a database, the better.

Here are some suggested pervasive requirements for this area. The first one is uncompromising, and is impractical for some systems and over-the-top for others. The second one is a gentler alternative-but its laxness still means that a disk failure could leave an unpleasant mess (though you will at least know how big a mess). It's possible to specify a compromise in between these two.

Open table as spreadsheet

Summary	Definition
All information in database	All information updated by any mainstream function in the system shall be stored in a core database. The intent of this requirement is to ban the use of flat files to store information needed for the smooth running of the system. But it by no means limits its scope to flat files. It also mandates that any data in, say, an LDAP directory must use a database as its data store (or that its data be stored in a database as well). For any information stored both in a core database and in some other place, the database shall be regarded as the primary store. For the purposes of this requirement: A mainstream function in the system is any function needed to satisfy any business requirement (including those used by a user and automated functions). Low-level configuration activities that must store data outside core databases for technical reasons (such as configuration files used by the operating system or third party products) are excluded. A core database is one that is backed up regularly, for which transaction logs are stored on separate disks, and for which recovery procedures are in place. The prime motivation for this requirement is to avoid data loss when a disk is lost. Data stored in a core database is recoverable in such a situation; data stored in flat files is not. A secondary motivation is security: databases offer several degrees of protection (access control, protection against tampering, and logging of changes).
Record changes to information outside database	Whenever a change is made to information that is stored outside a database, the fact of the change shall be recorded in a database. The intent of this requirement is that in the event of the loss of a disk, we will at least know which files have been changed.
Recover secondary copies of information	For all data that exists in the database and also in a secondary form outside the database, the database recovery process shall cause the recovery of the secondary form. For example, if certain information is stored both in a database table and in an LDAP directory, then database recovery must bring the LDAP copy into line with the database table (after recovery of the latter), including removing entries no longer present in the database table.

Summary

Definition

All information in database

All information updated by any mainstream function in the system shall be stored in a core database.

The intent of this requirement is to ban the use of flat files to store information needed for the smooth running of the system. But it by no means limits its scope to flat files. It also mandates that any data in, say, an LDAP directory must use a database as its data store (or that its data be stored in a database as well).

For any information stored both in a core database and in some other place, the database shall be regarded as the primary store.

For the purposes of this requirement:

A mainstream function in the system is any function needed to satisfy any business requirement (including those used by a user and automated functions). Low-level configuration activities that must store data outside core databases for technical reasons (such as configuration files used by the operating system or third party products) are excluded.
A core database is one that is backed up regularly, for which transaction logs are stored on separate disks, and for which recovery procedures are in place.

The prime motivation for this requirement is to avoid data loss when a disk is lost. Data stored in a core database is recoverable in such a situation; data stored in flat files is not.

A secondary motivation is security: databases offer several degrees of protection (access control, protection against tampering, and logging of changes).

Record changes to information outside database

Whenever a change is made to information that is stored outside a database, the fact of the change shall be recorded in a database.

The intent of this requirement is that in the event of the loss of a disk, we will at least know which files have been changed.

Recover secondary copies of information

For all data that exists in the database and also in a secondary form outside the database, the database recovery process shall cause the recovery of the secondary form.

For example, if certain information is stored both in a database table and in an LDAP directory, then database recovery must bring the LDAP copy into line with the database table (after recovery of the latter), including removing entries no longer present in the database table.

Multi-Part Information Entry

Have you ever bought anything from a Web site that involved a longer succession of Web pages than you expected, and just when you thought you'd entered everything, they hit you with yet another page? Then you quit because one little piece of information isn't at hand, and when you returned later, you had to start from the beginning again? Or you couldn't tell whether your order was in there somewhere, and if so, what state it was in? Systems abound that give an unpleasant experience through lack of consideration for their users-for example, by not accommodating users who deviate from the expected path. The main reason is neglect that can be avoided by specifying suitable requirements. (One could argue that use cases that emphasize the primary path can be partly to blame. Also, having to write a second use case to cover the completion of a half-finished process is somewhat tedious, and hard to in an easy-to-follow way without repetition.)

Things that can be done to improve multi-part information entry include:

Allow the user to recommence entering information later on, from the point they had reached previously.
If the system assigns a transaction number (for example, an order number), tell the user what it is as soon as it's allocated. This lets them know that the system has registered at least some of the information that they've taken the trouble to enter. (You could go further and inform them of incomplete transactions-say, the next time they log in, or via email.)
Inform the user where they are in the process: first, the status of the transaction (for example, has an order been placed yet?), and second, how many more screens are yet to come. If this isn't possible (because the number of steps depends on what values the user enters), then at a minimum, tell them there's at least one more screen to come. A help page that explains the steps in the transaction (perhaps as some kind of flow chart) is also useful.
Let the user go back to the previous screen. This might sound obvious, but unless you state it as a requirement, the system has no obligation to provide it.

If you want the system to do any of these things, write requirements for them. Here are a few sample extra requirements that apply to all multi-stage data entry functions:

Open table as spreadsheet

Summary	Definition
User can return to complete multi-stage data entry	If a user exits any multi-stage data entry function before completion, they shall be able to return later and complete it without having to reenter details that were received by the system. The system may impose a time limit, such that if the user does not return within a reasonable time, the data can be deleted. This requirement does not specify a precise time limit, but it shall be at least 48 hours.
Go back to previous multi-stage step	The second and subsequent pages in any multi-stage data entry function shall provide a way for the user to go back to the previous page. If accessing the system via a Web browser, then using the browser's "Back" button shall be a valid way of returning to the previous page.
Multi-stage data entry completion clear	The system shall make it clear to the user when any multi-stage data entry has been completed and accepted.

Summary

Definition

User can return to complete multi-stage data entry

If a user exits any multi-stage data entry function before completion, they shall be able to return later and complete it without having to reenter details that were received by the system.

The system may impose a time limit, such that if the user does not return within a reasonable time, the data can be deleted. This requirement does not specify a precise time limit, but it shall be at least 48 hours.

Go back to previous multi-stage step

The second and subsequent pages in any multi-stage data entry function shall provide a way for the user to go back to the previous page.

If accessing the system via a Web browser, then using the browser's "Back" button shall be a valid way of returning to the previous page.

Multi-stage data entry completion clear

The system shall make it clear to the user when any multi-stage data entry has been completed and accepted.

You could write a version of any or all of these requirements to cover a specific function instead of all of them.

Co-ordination of Multiple Data Stores

This means co-ordinating your system with one or more other systems, when each system stores data for itself. Imagine you're specifying a system for a Web-based retail site (a Web shop). You subcontract some products to a supplier, by passing a suborder to them. What happens if either your system or the supplier's system fails while in the middle of processing an order? Systems can be built to deal with situations like this-but if all the possible failures aren't recognized and handled, you could be in trouble when one that you don't accommodate occurs.

Requirements shouldn't worry about how to co-ordinate the updating of data in multiple systems. But they should identify what each system involved must be able to do to play its part, especially in the area of resilience. This is particularly important for each system to which we're interfacing, for various reasons. First, any external system is outside our control. Second, it might already exist. Third, it might require modification, even if it otherwise has all the functionality we need of it-it might already handle a "here is an order" message, but not "did you get this order?" or "delete this order" messages. Fourth, it might not be modified quickly enough. Fifth, it might be expensive to modify. Sixth, it might not be within our power to have it modified at all. In short, the implications could be significant, and they need to be brought to light as early as possible.

For each type of action that involves two or more systems, nominate one system-usually the first one in which data storage occurs-as being in charge. This system is responsible for three things in addition to its own processing:

Keep track that other systems have fulfilled their responsibilities. This involves recording each step in the processing, including when requests are sent to other systems and when acknowledgements are received.
Detect when another system hasn't done its job.
Complete the processing once the system that failed is back. This can be initiated either manually or automatically.

If the system we're specifying is the one in charge, specify requirements to cover these three things to our satisfaction. If our system isn't in charge, write requirements to cover what the system in charge expects of us. (But don't be surprised if it expects nothing, because systems poorly built in this respect are abundant.) Usually, one system is in charge for all types of actions, but it is possible for one system to be in charge for some actions and another system to be in charge for others.

Let's reiterate that the steps described in the previous paragraph must be performed for each type of action that involves two or more systems-although in practice, the number of types of action is usually very small-perhaps only one. While completing incomplete actions must be done in action-specific ways, the initiation of them can be grouped together. So, when we know that an external system that failed is now back, we could have a single function that initiates completion of all types of incomplete actions involving that system.

One way to provide integrity for transactions that span multiple databases-and one that avoids having to figure out all this messy co-ordination for ourselves-is to use a transaction processing monitor product (usually called a TP Monitor), but this can be impractical for technical, performance, cost, or business reasons.

A couple of further subjects you might want to consider are:

Would it be beneficial for the system to modify the way it behaves once it has detected that a particular external system is not responding? For example, we could stop accepting orders for that supplier's products, or we could send subsidiary orders to an alternative supplier instead.
What if a failed external system never comes back? This is an extreme situation, but it is worthwhile devising a fallback plan in case a company you deal with disappears.

Here are a couple of sample extra requirements for a Web shop customer order transaction:

Open table as spreadsheet

Summary	Definition
Customer order recovery from failure	The system shall be able to recover cleanly from failure during the processing of an order received from a customer, whether the failure occurs in the system itself or in a supplier's system with which an order is placed. This shall include a user function to initiate upon request the completion of incomplete orders involving a selected supplier.
Incomplete order inquiry	There shall be an inquiry that shows a summary of orders received from customers for which processing is incomplete. For each supplier to which at least one subsidiary order has been sent without acknowledgment, this inquiry shall show the following information: Supplier name Number of unacknowledged orders Total monetary value of unacknowledged orders Date and time of last acknowledged order Date and time of first acknowledged order Date and time of last unacknowledged order This inquiry does not show a list of individual orders because such a list might be too large to view.

Summary

Definition

Customer order recovery from failure

The system shall be able to recover cleanly from failure during the processing of an order received from a customer, whether the failure occurs in the system itself or in a supplier's system with which an order is placed. This shall include a user function to initiate upon request the completion of incomplete orders involving a selected supplier.

Incomplete order inquiry

There shall be an inquiry that shows a summary of orders received from customers for which processing is incomplete. For each supplier to which at least one subsidiary order has been sent without acknowledgment, this inquiry shall show the following information:

Supplier name
Number of unacknowledged orders
Total monetary value of unacknowledged orders
Date and time of last acknowledged order
Date and time of first acknowledged order
Date and time of last unacknowledged order

This inquiry does not show a list of individual orders because such a list might be too large to view.

Timed and Co-ordinated Changes

A timed change is a change to information that needs to occur at a precise, predetermined moment in time. For example, switching to or from summer time might need to happen at precisely 2 a.m. on a designated Sunday morning. And moving the Ruthenian Dinar across to the Euro might need to be done at midnight on a published date.

Co-ordinated changes are a collection of related changes to information that all need to be applied at exactly the same time. When a retailer changes its pricing schedule-raising some prices, lower others, modifying discount rates, introducing a range of special offers-it won't want them to be done in dribs and drabs; it'll want them all to happen at once.

Timed changes and co-ordinated changes have much in common with each other (including the possibility that a set of co-ordinated changes could happen at a specified time), which is why they're dealt with together in this section. Both also have much in common with the approval of actions (the subject of the approval requirement pattern)-first, because all involve the need to store the details of actions before acting on them, and second, because we might want to approve timed or co-ordinated changes before accepting them.

Innumerable systems have managed to get away without proper provision for timed and co-ordinated changes. After all, making changes manually at the right time, and a set of individual changes one after the other offers only a tiny window during which the system doesn't behave exactly as it should. The hope is that no one will notice. In the main, that's true-and if something should go askew, it might not be tracked to its actual cause. Also, if your system doesn't need 24-hour-a-day availability, you can make these changes when nothing else is happening. But if you need to make such changes while users are active, and you don't want to risk something like this going wrong, take the trouble to see that these changes are made properly. While there is a tendency for more systems to be available at all times, the mentality of system builders has lagged behind: nearly all seem unaware of all the implications-and the extra functionality that such systems need.

A further reason for allowing timed changes and co-ordinated changes to be entered beforehand is so they can be reviewed and checked, and any mistakes can be corrected-that is, to reduce human error. (This is also a reason for having them approved, too.) In any case, changes made in the wee small hours by whoever's working that shift, based on scribbled notes left by someone else, sound more at risk of human error than most.

Even if you manually make a timed change at the right time, you're still likely to need to know what the previous value was and when it was changed. For example, if a sales tax rate is changed at midnight on a certain date, we still should be able to calculate sales tax amounts on orders placed before this date. Again, systems have managed to survive without being able to do this-by performing calculations immediately-but it's untidy: it's hard to justify (say, for audit purposes) what the system did, and if any mistake was made, it's difficult to rectify. (You'll probably have to go, cap in hand, to the developers whose omission led to the problem in the first place.)

Timed and co-ordinated changes most commonly apply to changes in configuration, but they can happen to other types of information. Indeed, they could conceivably apply to any type of function. But worry about that eventuality only if and when you encounter it.

Timed and co-ordinated changes are another area in which lurk all sorts of easily forgotten things, including:

Don't allow backdated timed changes. If you were to enter a time in the past, in all likelihood the change wouldn't be made at all-but you can't be sure what the software would do.
What do you want to happen if a user is in the middle of entering a timed change (particularly a half-finished set of co-ordinated changes) when its time is reached? For instance, a user is halfway through entering some price changes to be applied at midnight when the silicon clock strikes twelve. Should we do nothing, or should we apply the changes that have been entered? Is Cinderella better off wearing just one glass slipper, or neither?
What happens to a timed change that needs approval but hasn't been approved by the time it's due? Should we warn someone when this situation looms?
Provide at least one inquiry that lists pending changes. Let users see both timed changes and unapproved changes (regardless of whether they are timed); they can be shown in a single inquiry or separately in two. Decide which sorts of changes each user can see.
Allow timed changes to be modified and removed before they have been applied. This includes the ability to modify the time at which a change is to be made.
Prevent entry of two separate changes to the same thing at the same time. When a user's entering a timed change, it would also be helpful to warn them of any other pending change to the same thing.

Once you've discovered one thing for which approval, a timed change, or a co-ordinated change is needed, your eyes are open to spotting more. As soon as you have more than one of these things (or if you see prospects for introducing more in the future), it can become worthwhile to introduce a general mechanism for storing actions that aren't ready to happen. Such a mechanism can be used as the basis for approvals, timed changes, and co-ordinated changes, as well as combinations of them. This involves creating a place where you can store any kind of pending action you wish and a range of functions for updating and viewing it-for which you need to specify requirements. But ask for a general mechanism of this kind only if it's worthwhile, because it's a big and complicated undertaking.

Here are a few requirements for timed and co-ordinated changes:

Open table as spreadsheet

Summary	Definition
Sales tax rate change timed	When making a change to a sales tax rate, it shall be possible to nominate a date and time at which the change is to take effect. Every proposed change to a sales tax rate must be approved by a member of the finance department.
Pricing changes co-ordinated	It shall be possible to enter a set of pricing changes and to have none applied until all have been entered.
Pricing changes timed	It shall be possible to have a set of pricing changes automatically applied at a predetermined date and time.

Summary

Definition

Sales tax rate change timed

When making a change to a sales tax rate, it shall be possible to nominate a date and time at which the change is to take effect.

Every proposed change to a sales tax rate must be approved by a member of the finance department.

Pricing changes co-ordinated

It shall be possible to enter a set of pricing changes and to have none applied until all have been entered.

Pricing changes timed

It shall be possible to have a set of pricing changes automatically applied at a predetermined date and time.