SQL Server as a Relational Database Management System | Beginning Transact-SQL With SQL Server 2000 and 2005

This section introduces you to the concepts behind relational databases and how they are implemented from a Microsoft viewpoint. This will, by necessity, skirt the edges of database object creation, which is covered in great detail in Chapter 11, so for the purpose of this discussion I will avoid the exact mechanics and focus on the final results.

As I mentioned earlier, a relational database stores all of its data inside tables. Ideally, each table will represent a single entity or object. You would not want to create one table that contained data about both dogs and cars. That isn't to say you couldn't do this, but it wouldn't be very efficient or easy to maintain if you did.

Tables

Tables are divided up into rows and columns. Each row must be able to stand on its own, without a dependency to other rows in the table. The row must represent a single, complete instance of the entity the table was created to represent. Each column in the row contains specific attributes that help define the instance. This may sound a bit complex, but it is actually very simple. To help illustrate, consider a real-world entity, an employee. If you want to store data about an employee you would need to create a table that has the properties you need to record data about your employee. For simplicity's sake, call your table Employee.

Note

For more information on naming objects, check out the "Naming Conventions" section in Chapter 4.

When you create your employee table you also need to decide on what attributes of the employee you want to store. For the purposes of this example you have decided to store the employee's last name, first name, social security number, department, extension, and hire date. The resulting table would look something like that shown in Figure 1-1.

image from book
Figure 1-1:

The data in the table would look something like that shown in Figure 1-2.

Figure 1-2:

Primary Keys

To efficiently manage the data in your table you need to be able to uniquely identify each individual row in the table. It is much more difficult to retrieve, update, or delete a single row if there is not a single attribute that identifies each row individually. In many cases, this identifier is not a descriptive attribute of the entity. For example, the logical choice to uniquely identify your employee is the social security number attribute. However, there are a couple of reasons why you would not want to use the social security number as the primary mechanism for identifying each instance of an employee. So instead of using the social security number you will assign a non-descriptive key to each row. The key value used to uniquely identify individual rows in a table is called a primary key.

The reasons you choose not to use the social security number as your primary key column boil down to two different areas: security and efficiency.

When it comes to security, what you want to avoid is the necessity of securing the employee's social security number in multiple tables. Because you will most likely be using the key column in multiple tables to form your relationships (more on that in a moment), it makes sense to substitute a non-descriptive key. In this way you avoid the issue of duplicating private or sensitive data in multiple locations to provide the mechanism to form relationships between tables.

As far as efficiency is concerned, you can often substitute a non-data key that has a more efficient or smaller data type associated with it. For example, in your design you might have created the social security number with either a character data type or an integer. If you have fewer than 32,767 employees, you can use a double byte integer instead of a 4-byte integer or 10-byte character type; besides, integers process faster than characters.

You will still want to ensure that every social security number in your table is unique and not NULL, but you will use a different method to guarantee this behavior without making it a primary key.

Note

Keys and enforcement of uniqueness are detailed in Chapter 11.

A non-descriptive key doesn't represent anything else with the exception of being a value that uniquely identifies each row or individual instance of the entity in a table. This will simplify the joining of this table to other tables and provide the basis for a "Relation." In this example you will simply alter the table by adding an EmployeeKey column that will uniquely identify every row in the table, as shown in Figure 1-3.

image from book
Figure 1-3:

With the EmployeeKey column, you have an efficient, easy-to-manage primary key.

Each table can have only one primary key, which means that this key column is the primary method for uniquely identifying individual rows. It doesn't have to be the only mechanism for uniquely identifying individual rows; it is just the "primary" mechanism for doing so. Primary keys can never be NULL and they must be unique. I am a firm believer that primary keys should almost always be single-column keys, but this is not a requirement. Primary keys can also be combinations of columns. If you have a table where two columns in combination are unique, while either single column is not, you can combine the two columns as a single primary key, as illustrated in Figure 1-4.

image from book
Figure 1-4:

In this example the LibraryBook table is used to maintain a record of every book in the library. Because multiple copies of each book can exist, the ISBN column is not useful for uniquely identifying each book. To enable the identification of each individual book the table designer decided to combine the ISBN column with the copy number of each book. I personally avoid the practice of using multiple column keys. I prefer to create a separate column that can uniquely identify the row. This makes it much easier to write JOIN queries (covered in great detail in Chapter 5). The resulting code is cleaner and the queries are generally more efficient. For the library book example, a more efficient mechanism might be to assign each book its own number. The resulting table would look like that shown in Figure 1-5.

image from book
Figure 1-5:

A table is a set of rows and columns used to represent an entity. Each row represents an instance of the entity. Each column in the row will contain at most one value that represents an attribute, or property, of the entity. Take the employee table; each row represents a single instance of the employee entity. Each employee can have one and only one first name, last name, SSN, extension, or hire date according to your design specifications. In addition to deciding what attributes you want to maintain, you must also decide how to store those attributes. When you define columns for your tables you must, at a minimum, define three things:

The name of the column
The data type of the column
Whether or not the column can support NULL

Column Names

Keep the names simple and intuitive. For more information see Chapter 11.

Data Types

The general rule on data types is to use the smallest one you can. This conserves memory usage and disk space. Also keep in mind that SQL Server processes numbers much more efficiently than characters, so use numbers whenever practical. I have heard the argument that numbers should only be used if you plan on performing mathematical operations on the columns that contain them, but that just doesn't wash. Numbers are preferred over string data for sorting and comparison as well as mathematical computations. The exception to this rule is if the string of numbers you want to use starts with a zero. Take the social security number, for example. Other than the unfortunate fact that some social security numbers (like my daughter's) begin with a zero, the social security number would be a perfect candidate for using an integer instead of a character string. However, if you tried to store the integer 012345678 you would end up with 12345678. These two values may be numeric equivalents but the government doesn't see it that way. They are strings of numerical characters and therefore must be stored as characters rather than numbers.

When designing tables and choosing a data type for each column, try to be conservative and use the smallest, most efficient type possible. But, at the same time, carefully consider the exception, however rare, and make sure that the chosen type will always meet these requirements.

The data types available for columns in SQL Server 2000 and 2005 are specified in the following table.

Data Type	Storage	Description
Bigint	8 bytes	An 8-byte signed integer. Valid values are -9223372036854775808 through +9223372036854775807.
Int	4 bytes	A 4-byte signed integer. Valid values are -2,147,483,648 through +2,147,483,647.
SmallInt	2 bytes	A double-byte signed integer. Valid values are -32,768 through +32,767.
TinyInt	1 byte	A single-byte unsigned integer. Valid values are from 0 through 255.
Bit	1 bit	Integer data with either a 1 or 0 value.
Decimal	5 – 17 bytes	A predefined, fixed, signed decimal number ranging from -100000000000000000000000000000000000001 (-10³⁸+1) to 99999999999999999999999999999999999999 (-10^38-1). A decimal is declared with a precision and scale value that determines how many decimal places to the left and right are supported. This is expressed as decimal[(precision,[scale])]. The precision setting determines how many total digits to the left and right of the decimal point are supported. The scale setting determines how many digits to the right of the decimal point are supported. For example, to support the number 3.141592653589793 the decimal data type would have to be specified as decimal(16,15). If the data type was specified as decimal(3,2), only 3.14 would be stored. The scale defaults to zero and must be between 0 and the precision. The precision defaults to 18 and can be a maximum of 38.
Numeric	5 – 17 bytes	Numeric is identical to decimal so use decimal instead. Numeric is much less descriptive because most people think of integers as being numeric.
Money	8 bytes	The money data type can be used to store -922,337,203,685,477.5808 to +922,337,203,685,477.5807 of a monetary unit. The advantage of the money data type over a decimal data type is that developers can take advantage of automatic currency formatting for specific locales. Notice that the money data type supports figures to the fourth decimal place. Accountants like that. A few million of those ten thousandths of a penny add up after a while!
SmallMoney	4 bytes	Bill Gates needs the money data type to track his portfolio, but most of us can get by with the smallmoney data type. It consumes 4 bytes of storage and can be used to store -214,748.3648 to +214,748.3647 of a monetary unit.
Float	4 or 8 bytes	Afloat is an approximate value (SQL Server performs rounding) that supports real numbers between -1.79 x 10³⁰⁸and 1.79 x 10³⁰⁸sdff.
Real	4 bytes	Real is a synonym for a float.
DateTime	8 bytes	Datetime is used to store dates from January 1, 1753 through December 31, 9999 (which could cause a huge Y10K disaster). The accuracy of the datetime data type is 3.33 milliseconds.
SmallDatetime	4 bytes	Smalldatetime stores dates from January 1, 1900 through June 6, 2079 with an accuracy of 1 minute.
Char	1 byte per character. Maximum 8000 characters	The char data type is a fixed-length data type used to store character data. The number of possible characters is between 1 and 8000. The possible combinations of characters in a char data type are 256. The characters that are represented depend on what language, or collation, is defined. English, for example, is actually defined with a Latin collation. The Latin collation provides support for all English and western European characters.
VarChar	1 byte per character. Maximum 8000 characters	The varchar data type is identical to the char data type with the exception of it being a variable length type. If a column is defined as char(8) it will consume 8 bytes of storage even if only three characters are placed in it. Avarchar column only consumes the space it needs. Typically, char data types are more efficient when it comes to processing and varchar data types are more efficient for storage. The rule of thumb is: use char if the data will always be close to the defined length. Use varchar if it will vary widely. For example, a city name would be stored with varchar(167) if you wanted to allow for the longest city name in the world, which is Krung thep mahanakhon bovorn ratanakosin mahintharayutthaya mahadilok pop noparatratchathani burirom udomratchanivetma-hasathan amornpiman avatarnsathit sakkathat-tiyavisnukarmprasit (the poetic name of Bangkok, Thailand). Use char for data that is always the same. For example, you could use char(12) to store a domestic phone number in the United States: (123)456-7890.
Text	1 byte per character. Maximum 2,147,483,648 characters (2GB)	The text data type is similar to the varchar data type in that it is a variable-length character data type. The significant difference is the maximum length of about 2 billion characters (including spaces) and where the data is physically stored. With a varchar data type on a table column, the data is stored physically in the row with the rest of the data. With a text data type, the data is stored separately from the actual row and a pointer is stored in the row so SQLServer can find the text.
nChar	2 bytes per character. Maximum 4000 characters (8000 bytes).	The nchar data type is a fixed-length type identical to the char data type with the exception of the amount of characters supported. Char data is represented by a single byte and thus only 256 different characters can be supported. Nchar is a double-byte data type and can support 65,536 different characters. The cost of the extra character support is the double-byte length, so the maximum nchar length is 4000 characters or 8000 bytes.
nVarChar	2 bytes per character. Maximum 4000 characters (8000 bytes).	The nvarchar data type is a variable length identical to the varchar data type with the exception of the amount of characters supported. Varchar data is represented by a single byte and only 256 different characters can be supported. Nvarchar is a double-byte data type and can support 65,536 different characters. The cost of the extra character support is the double-byte length, so the maximum nchar length is 4000 characters or 8000 bytes.
nText	2 bytes per character. Maximum 1,073,741,823 characters	The ntext data type is identical to the text data type with the exception of the amount of characters supported. Text data is represented by a single byte and only 256 different characters can be supported. Ntext is a double-byte data type and can support 65,536 different characters. The cost of the extra character support is the double-byte length, so the maximum ntext length is 1,073,741,823 characters or 2GB.
Binary	1 – 8000 bytes	Fixed-length binary data. Length is fixed when created between 1 and 8000 bytes.
VarBinary	1 – 8000 bytes	Variable-length binary data type identical to the binary data type with the exception of only consuming the amount of storage that is necessary to hold the data.
Image	Up to 2,147,483,647 bytes	The image data type is similar to the varbinary data type in that it is a variable-length binary data type. The significant difference is the maximum length of about 2GB and where the data is physically stored. With a varbinary data type on a table column, the data is stored physically in the row with the rest of the data. With an image data type, the data is stored separately from the actual row and a pointer is stored in the row so SQL Server can find the data. Image data types are typically used to store actual images, binary documents, or binary objects.
TimeStamp	8 bytes	The timestamp data type has nothing to do with time. It is more accurately described as a row version data type and is, in fact, being replaced by a data type called rowversion. In SQL Server 2000, rowversion is provided as a synonym for the timestamp data type and should be used instead of timestamp. What timestamp actually provides is a database unique identifier to identify a version of a row.
UniqueIdentifier	32 bytes	Adata type used to store a Globally Unique Identifier (GUID).
Sql_Variant	Up to 8016 bytes	The sql_variant is used when the exact data type is unknown. It can be used to hold any data type with the exception of text, ntext, image, and timestamp.

SQL Server supports additional data types that can be used in queries and programming objects, but they are not used to define columns. These data types are listed in the following table.

Data Type	Description
Cursor	The cursor data is used to point to an instance of a cursor.
Table	The table data type is used to store an in-memory rowset for processing. It was developed primarily for use with the new table-valued functions introduced in SQL Server 2000.

SQL Server 2005 Data Types

SQL Server 2005 brings a significant new data type and changes to existing variable data types. New to SQL Server 2005 is the XML data type. The XML data type is a major change to SQL Server. The XML data type allows you to store complete XML documents or well-formed XML fragments in the database. Support for the XML data type includes the ability to create and register an XML schema and then bind the schema to an XML column in a table. This ensures that any XML data stored in that column will adhere to the schema. The XML data type essentially allows the storage and management of objects, as described by XML, to be stored in the database. The argument can then be made that SQL Server 2005 is really an Object-Relational Database Management System (ORDBMS).

LOBs, BLOBs, and CLOBs!

SQL Server 2005 also introduces changes to three variable data types in the form of the new (max) option that can be used with the varchar, nvarchar, and varbinary data types. The (max) option allows for the storage of character or variable-length binary data in excess of the previous 8000-byte limitation. At first glance, this seems like a redundant option because the image data type is already available to store binary data up to 2GB and the text and ntext types can be used to store character data. The difference is in how the data is treated. The classic text, ntext, and image data types are Large Object (LOB) data types and can't typically be used with parameters. The new variable data types with the (max) option are Large Value Types (LVT) and can be used with parameters just like the smaller sized types. This brings a myriad of opportunities to the developer. Large Value Types can be updated or inserted without the need of special handling through STREAM operations. STREAM operations are implemented through an application programming interface (API) such as OLE DB or ODBC and are used to handle data in the form of a Binary Large Object (BLOB). T-SQL cannot natively handle BLOBs, so it doesn't support the use of BLOBs as T-SQL parameters. SQL Server 2005's new Large Value Types are implemented as a Character Large Object (CLOB) and can be interpreted by the SQL engine.

Nullability

All rows from the same table have the same set of columns. However, not all columns will necessarily have values in them. For example, a new employee is hired, but he has not been assigned an extension yet. In this case, the extension column may not have any data in it. Instead, it may contain NULL, which means the value for that column was not initialized. Note that a NULL value for a string column is different from an empty string. An empty string is defined; a NULL is not. You should always consider a NULL as an unknown value. When you design your tables you need to decide whether or not to allow a NULL condition to exist in your columns. NULLs can be allowed or disallowed on a column-by-column basis, so your employee table design could look like that shown in Figure 1-6.

image from book
Figure 1-6:

Relationships

Relational databases are all about relations. To manage these relations you use common keys. For example, your employees sell products to customers. This process involves multiple entities:

The employee
The product
The customer
The sale

To identify which employee sold which product to a customer you need some way to link all the entities together. These links are typically managed through the use of keys, primary keys in the parent table and foreign keys in the child table.

As a practical example you can revisit the employee example. When your employee sells a product, his or her identifying information is added to the Sale table to record who the responsible employee was, as illustrated in Figure 1-7. In this case the Employee table is the parent table and the Sale table is the child table.

image from book
Figure 1-7:

Because the same employee could sell products to many customers, the relationship between the Employee table and the Sale table is called a one-to-many relationship. The fact that the employee is the unique participant in the relationship makes it the parent table. Relationships are very often parent-child relationships, which means that the record in the parent table must exist before the child record can be added. In the example, because every employee is not required to make a sale, the relationship is more accurately described as a one-to-zero-or-more relationship. In Figure 1-7 this relationship is represented by a key and infinity symbol, which doesn't adequately model the true relationship because you don't know if the EmployeeKey field is nullable. In Figure 1-8, the more traditional and informative "Crows Feet" symbols are used. The relationship symbol in this figure represents a one-to-zero-or-more relationship. Figure 1-9 shows the two tables with a one-to-one-or-more relationship symbol.

image from book
Figure 1-8:

image from book
Figure 1-9:

Relationships can be defined as follows:

One-to-zero or many
One-to-one or many
One-to-exactly-one
Many-to-many

The many-to-many relationship requires three tables because a many-to-many constraint would be unenforceable. An example of a many-to-many relationship is illustrated in Figure 1-10. The necessity for this relationship is created by the relationships between your entities: In a single sale many products can be sold, but one product can be in many sales. This creates the many-to-many relationship between the Sale table and the Product table. To uniquely identify every product and sale combination, you need to create what is called a linking table. The Order table manages your many-to-many relationship by uniquely tracking every combination of sale and product.

image from book
Figure 1-10: