First Normal Form

A relation is in first normal form if the domains on which its attributes are defined are scalar. This is at once both the simplest and most difficult concept in data modeling. The principle is straightforward: each attribute of a tuple must contain a single value. But what constitutes a single value? In the relation shown in Figure 2-12, the Items attribute obviously contains multiple values and is therefore not in first normal form. But the issue is not always so clear cut.

click to view at full size.

Figure 2-12. The Items attribute in this relation is not scalar.

We saw some of the problems involved in determining whether an attribute is scalar when we looked at the modeling of names and addresses in Chapter 1. Dates are another tricky domain. They consist of three distinct components: the day, the month, and the year. Ought they be stored as three attributes or as a composite? As always, the answer can be determined only by looking to the semantics of the problem space you're modeling.

If your system uses the date exclusively, or even primarily, as a single value, it is scalar. But if your system must frequently manipulate the individual components of the date, you might be better off storing them as separate attributes. You might not care about the day, for example, but only the month and year. Or you might care only about the month and day, but not the year. This isn't often the case, but it happens.

Date arithmetic is tedious to perform, so you'll most often use an attribute defined on the DateTime data type to offload the majority of the work to the development environment. However, this can get you into trouble if you're doing comparisons on a single component. This is particularly true if you're ignoring the time component of a field. For example, if you set the value of a DateCreated field to the result of the VBA function Now, which returns both date and time, and then later attempt to compare it to the value returned by Date(), which returns the date only, you might get incorrect results. Even if you never display the time to users, it is being stored, and obviously "1/1/1999 12:30:19 AM" isn't the same as "1/1/1999".

Another place people frequently have problems with nonscalar values is with codes and flags. Many companies assign case numbers or reference numbers that are calculated values, usually something along the lines of REF0010398, which might indicate that this is the first case opened in March 1998. While it's unlikely that you'll be able to alter company policy, it's not a good idea to attempt to manipulate the individual components of the reference number in your data model.

It's far easier in the long run to store the values separately: {Ref#, Case#, Month, Year}. This way, determining the next case number or the number of cases opened in a given year becomes a simple query against an attribute and doesn't require additional manipulation. This has important performance implications, particularly in client/server environments, where extracting a value from the middle of an attribute might require that each individual record be examined locally rather than offloaded to the database server.

Another type of nonscalar attribute that causes problems for people is the bit flag. In conventional programming environments, it's common practice to store sets of Boolean values as individual bits in a word, and then to use bitwise operations to check and test them. Windows API programming relies heavily on this technique, for example. In conventional programming environments, this is a perfectly sensible thing to do. In relational data models, it is not. Not only does the practice violate first normal form, but you can't actually do it, since neither the Jet database engine nor SQL Server versions of the SQL language provide bitwise operators. You can accomplish this using custom functionality in the database application, but only in Microsoft Access (Microsoft Visual Basic doesn't support custom functions in queries), and it forces the query to be processed locally.

Unfortunately, this is the kind of constraint that frequently gets imposed on you for historical reasons, but if you've any choice in the matter, don't encode more than one piece of information in a single attribute. If you're using legacy information, you can always unpack the data and store both versions in the recordset.

There's another kind of nonscalar value to be wary of when checking a relation for first normal form: the repeating group. Figure 2-13 shows an Invoice relation. Someone, at some point, decided that customers are not allowed to buy more than 5 items. I wonder if they checked that decision with the sales manager first? Seriously, this is almost certainly an artificial constraint imposed by the system, not the business. Artificial system constraints are evil, and in this case, just plain wrong as well.

click to view at full size.

Figure 2-13. This data model restricts the number of items a customer can purchase.

Another example of a repeating group is shown in Figure 2-14. This isn't as obvious an error, and many successful systems have been implemented using a model similar to this. But this is really just a variation of the structure shown in Figure 2-13 and has the same problems. Imagine the query to determine which products exceeded target by more than 10 percent any time in the first quarter.

click to view at full size.

Figure 2-14. This is a repeating group.



Designing Relational Database Systems
Designing Relational Database Systems (Dv-Mps Designing)
ISBN: 073560634X
EAN: 2147483647
Year: 1999
Pages: 124

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net