Data Definition Language | Beginning Transact-SQL With SQL Server 2000 and 2005

You can do only three things with any database object, other than actually use it; you can create it, alter it, or drop it to get rid of it. Every type of database object at least supports the CREATE and DROP statements. Of course, there is actually a little more to this story. Depending on the object type, there are a number of options affecting certain capabilities and settings. You've already seen a number of these statements, because it would have been difficult to cover earlier topics without first showing you how to create some objects.

Unless permissions are explicitly assigned, only members of the following roles can execute these statements:

sysadmin
dbcreator
db_ddladmin
db_owner

It's a good idea to use role assignments rather than changing permissions for individual users. In the long run, this creates a more manageable environment.

SQL Server 2005 introduces several new objects that have corresponding CREATE and DROP statements. Many of these objects also have a corresponding ALTER statement. Managing all of these special-purpose objects is beyond the scope of this book. This chapter covers the use of these statements related to tables, views, stored procedures, and user-defined functions.

Creating Objects

The basic pattern of the CREATE statement is the same for all objects. However, due to the unique characteristics of different objects, each statement may have a number of different options. To keep things simple, this chapter does not include examples of every incarnation of CREATE statements, but it does includes the most common.

Altering Objects

Generally, any options or changes that can be applied with the CREATE statement can be applied to an existing object using the ALTER statement. If security permissions have been granted or denied for an object, it's a good idea to use the ALTER statement to make changes to an object rather than dropping and re-creating it. This way, the security settings are preserved and the database doesn't make a new entry into the sysobjects system table. A consequence of dropping and re-creating some objects is that this may affect related dependencies. For example, dropping a table with indexes is a cumbersome task and any dependent objects created with schema binding will cause errors to be raised. Altering tables instead wouldn't have the same impact, especially if you didn't make changes to columns that would affect other objects.

Dropping Objects

The syntax for dropping most any object is pretty much the same. You cannot drop objects that would render schema-bound dependencies invalid. For example, you cannot drop a table if it is referenced in a dependent foreign key constraint. In this case, you must either drop the related table or alter the table and remove the constraint.

Names

The rules for naming objects in SQL Server are extraordinarily flexible. They are so flexible, in fact, that it becomes very easy to create objects with names that will cause pain and suffering from the moment you create them. The following sections explain two naming guidelines. The first I would consider more a set of rules than a guideline, and the second is strictly a naming convention. You can disagree with it if you like, but examine the merits of the convention and then come up with your own.

Naming Rules

SQL Server limits the maximum number of characters in any object's name to 128 with the exception of local temporary tables, which are limited to 116 characters. If you choose to create data objects in SQL Server that push this limitation, I can guarantee that you will not be very popular with anyone who must write queries against your objects. For maximum compatibility, all permanent SQL objects should begin with a letter, should not contain any embedded spaces, and should not use any reserved words. After the first letter the name can contain any combination of numbers and letters. It can also contain some special characters, but these should be avoided like the plague. SQL Server's loose naming rules actually allow you to create an object with the same name as a reserved word, but if you do you will have to use delimiter characters. If, for some reason, you actually wanted to create a table called Select with columns called From, Insert, and Delete, SQL Server would let you as long as you delimit the names so that SQL Server does not recognize the reserved words as reserved words, as the following example illustrates:

 CREATE TABLE [Select]  ([From] Int, [Insert] nVarChar(50),  [Delete] nVarChar(50))

There are two different delimiter characters: the square bracket and the double quote. The square bracket delimiter is the default in SQL Server, as illustrated in the previous example. The ANSI standard, however, designates the double quote as the standard delimiter. To use double quotes as delimiters, the database or connection-specific QUOTED_IDENTIFIER option must be set to ON. The .NET Native SQL Client, Microsoft OLE DB Provider for SQL Server, and the SQL Server ODBC driver set the QUOTED_IDENTIFIER option to ON by default when they connect:

 SET QUOTED_IDENTIFIER ON CREATE TABLE "Select" ( "From" Int, "Insert" nVarChar(50), "Delete" nVarChar(50) )

An object's name should be as short as practical but still identify what the object is all about. A table that is used to store data about an employee could simply be called Employee. It's simple and descriptive. The Northwind and Pubs sample databases that ship with SQL Server 7 and 2000 are actually very good examples of how not to name objects. Both Northwind and Pubs use reserved words to define objects, which should never be done. Northwind also contains a few objects with embedded spaces in their names. Order Details is a prime example. This anti-example actually uses a reserved word and an embedded space.

Here is a simple and effective way of checking your names: If you type the name in Query Analyzer and it shows up blue, don't use it as an object name. As mentioned in Chapter 3, Query Analyzer automatically changes the color of any reserved word to blue by default. Query Analyzer is a bit aggressive on what it considers a reserved word because it includes both SQL and ODBC keywords, but discretion is not just the better part of valor, it is also the better part of SQL programming. If it turns blue, choose a different name. When it comes to embedded spaces, they are not prohibited; they are just, in the opinion of this author, stupid. Remember that Database object names are limited to 128 characters, so you could conceivably create a table using the following script:

 CREATE TABLE [This is my table that contains employee data] ( [My primary key to uniquely track employees] Int, [The employee’s  [The employee’s  )

However, I am sure you would not want to write queries against a database that adhered to this type of naming, and, in fact, many database and database application tools do not support embedded spaces.

Sometimes even I want to give an object a name that contains more than one word so I, like many database developers, avoid the embedded space problem by using either underscores or what is known as "camel case." If you choose underscores, just replace the embedded spaces with an underscore. As a result the Order Details table in Northwind would end up as Order_Details. I prefer camel case, probably because I am not a very good typist and having to use the Shift key unnecessarily causes me discomfort. My official public reason for using "camel case" is that underscores disappear in hyperlinks. Camel case avoids both spaces and underscores by pushing multiple words together and capitalizing each individual word. In this way Order Details becomes OrderDetails. There is no hard and fast rule concerning underscores and camel case so it really comes down to personal preference or company coding guidelines. The most important aspect of naming objects is to avoid reserved words, embedded spaces, and to be consistent.

Naming Guidelines

The basic rules outlined previously will prevent a large amount of pain and frustration. The following naming guideline is just one of many. I prefer it, but I know of many developers who do not. What naming guidelines or conventions boil down to are naming rules, best practices, and personal preference. As a general rule, when I name a table or view I express them in the singular. A view and table represent a single instance of the entity described by the table. Therefore, the table that describes an employee is called Employee, not Employees. This may seem a bit picky, but when you do a great deal of data modeling with Entity Relationship Diagram (ERD) tools, Object Relational Modeling (ORM) tools, or United Modeling Language (UML), it becomes rather significant. These tools typically enforce singularity for the object and plurality with the relationship.

Most of the organizations I have worked with have had some sort of established naming convention that dictated the use of a prefix on all objects so that they were identifiable in their own right without database context. This is another one of those areas where some developers cringe and others celebrate. The following table is an example of a typical naming convention.

Object Type	Name
Table	tblEmployee
View	vwOpsEmployee
Stored Procedure	spInsertEmployee
Function	fnNewEmployees
Trigger	trVerifyEmployee
Check Constraint	ckPhoneNumber
Foreign Key Constraint	fkSalesEmployeeLink
Primary Key Constraint	pkEmployeeID
Default	dfRegion
Clustered Index	clRegionID
Non-Clustered Index	ncLastName

As the table illustrates, data objects take on the name of a single instance of their data. Programming objects, such as stored procedures, triggers, and functions, take on the name of what they do.

Now armed with information about the name of objects, the following sections look at how to actually create, alter, and drop them.

CREATE TABLE

In its simplest form, the CREATE TABLE statement contains the name of the new table followed by its column definitions in parentheses. For each column, a name, data type specification, and NULL specification are provided, as in the following example:

 CREATE TABLE MyTable (MyID Int NOT NULL, MyDescription nVarChar(50) NULL)

This example script creates a table called MyTable that is made up of two columns. The first column is called MyID, which has a data type of integer and cannot hold a NULL. The second column is called MyDescription. It has a Unicode variable character data type and can be NULL. This may be sufficient to get started, but it certainly isn't comprehensive. Tables are typically a little more sophisticated than this. Several options are available to us as this partial syntax for CREATE TABLE suggests:

 CREATE TABLE  [ database_name.[ owner ] . | owner. ] table_name  ( { < column_definition >  | column_name AS computed_column_expression | < table_constraint > ::= [ CONSTRAINT constraint_name ] } | [ { PRIMARY KEY | UNIQUE } [ ,...n ] )

Most of the time when a table is created the options that are defined are the table name, the column names, whether or not the columns are nullable, and what column is the primary key. Chapters 1 and 2 reviewed all the possible data types that can be assigned to a column and briefly explained the concept of NULLs and primary keys, so I won't repeat them here except as it applies to the CREATE TABLE statement.

Nullability

You can enable a column to optionally not require a value by designating it as a nullable column. This simply means that if no value is provided it defaults to NULL. NULL is defined as the absence of data, so it does not have any real value. To enable a column to be nullable, the NULL keyword is added immediately after the data type. To prevent nulls, the NOT NULL keywords are added. NOT NULL is actually considered a column constraint, which is covered later in this chapter. If NULL or NOT NULL is not specified, the default setting is to allow nulls, as the next two examples show. However, it is a best practice to always designate the nullability of a column when creating or altering a table.

 CREATE TABLE MyTable  ( Category nVarChar(50) NOT NULL , MyDescription nVarChar(50) )     CREATE TABLE MyTable  ( Category nVarChar(50) NOT NULL  , MyDescription nVarChar(50) NULL )

With either of the two preceding examples the MyTable table only requires the value for Category to be provided for an insert operation:

 INSERT MyTable (Category) VALUES (‘Category1’) SELECT * FROM MyTable     Category                MyDescription ----------------------- --------------------------------------------------  Category1               NULL

Identity

Very often the database developer will want to have a numerical value automatically assigned to a row any time a new row is added. This is the purpose of a column assigned the IDENTITY property. It is very similar to Microsoft Access's AutoNumber feature, which automatically assigns an integer to every new row. However, the IDENTITY property is much more powerful and flexible. Typically what happens is that a table is created with the default values, which is equivalent to how Access functions:

 CREATE TABLE MyTable (MyID Int IDENTITY(1, 1) NOT NULL , MyDescription nVarChar(50) NOT NULL)     CREATE TABLE MyTable (MyID Int IDENTITY NOT NULL , MyDescription nVarChar(50) NOT NULL)

Both of the preceding examples cause an incremental value to be assigned starting at 1 and incrementing by 1. The actual syntax for the IDENTITY property is as follows:

 IDENTITY [ (seed , increment ) ]

As the syntax infers, the seed and increment values are not restricted to 1, and they are not strictly limited to integers. Supported data types for the IDENTITY property are TinyInt, SmallInt Int, BigInt, Decimal, and Numeric. However, the decimal and numeric data types are of limited usefulness because they can only be assigned a scale of 0. The scale of these data types defines how many digits are supported to the right of the decimal point. Specifying a decimal or numeric data type with a scale of 0 makes the data type behave like an integer.

The increment value of the IDENTITY property is restricted to whole numbers regardless of the data type used, but it is not limited to a value of 1 or even to positive numbers. A table could be created with an IDENTITY property that is set to start at 1,000,000 and decrement by 100 for every row added. The CREATE statement would look like this:

 CREATE TABLE MyTable (MyID Int IDENTITY(1000000, -100) NOT NULL ,MyDescription NVarChar(50) NOT NULL )

If you were designing a database that tracked ticket sales for a venue that could seat 3000 people, you could conceivably seed a tracking table at 3000 with an incremental value of –1 and not allow the number to go negative (through the use of a constraint). This way, every sale could also return the number of tickets remaining without writing an expression to calculate the value.

A table could also be created so that the seed value was negative and the increment was positive:

 CREATE TABLE MyTable (MyID Int IDENTITY(-1000000, 100) NOT NULL ,MyDescription NVarChar(50) NOT NULL)

A column with an IDENTITY property is probably the most common form of primary key value, but be aware that the IDENTITY property by itself does not guarantee uniqueness. If a value is explicitly inserted in the IDENTITY column, SQL Server will not prevent a duplicate unless a constraint has been added to the column to prevent duplicates. Also, keep in mind that a table can have only one IDENTITY column defined.

By default, once a column has been assigned the IDENTITY property, SQL Server does not allow explicit values to be inserted into it. Any attempt to manually enter a value will result in the following error:

 INSERT MyTable (MyID, MyDescription) VALUES (5, ‘This will not work’) Server: Msg 544, Level 16, State 1, Line 1 Cannot insert explicit value for identity column in table  IDENTITY_INSERT is set to OFF.

Although SQL Server by default doesn't allow explicit values to be inserted into an IDENTITY column, there may very well be times when you need to do exactly that. To do so, the IDENTITY property must be temporarily turned off. This can be done by using a SET option on the connection to the database that will be used for the explicit insert:

 SET IDENTITY_INSERT MyTable ON INSERT MyTable (MyID, MyDescription)  VALUES (5, 'This will work') SET IDENTITY_INSERT MyTable OFF

It is very important to turn IDENTITY_INSERT off after the transaction is complete because any normal insertions into the table (by not specifying the INDENTITY value) within the same connection context will fail if the IDENTITY column is not explicitly identified and options can only be enabled on one table at a time. The SET IDENTITY_INSERT option is only effective on the connection on which it is used and will be terminated if the connection is closed, even if the option is not reset. Best practices in database design, however, are a lot like the rules our parents tried to teach us: If you use or borrow something, put it back where you found it. So, if you alter a database or connection setting and you do not intend for the change to be permanent, put it back the way you found it. All other connections will continue to work normally by having the IDENTITY value automatically supplied.

It may be a goofy way of remembering how the IDENTITY property works, but for me, I just make a mental note to remember that it is the opposite of what it sounds like. SET IDENTITY_INSERT tablename ON actually turns the IDENTITY property off. SET IDENTITY_INSERT tablename OFF turns the IDENTITY property back on.

Another aspect of explicitly entering a value into an IDENTITY column is the impact on the IDENTITY property's current value. For instance, you create a table with the following script:

 CREATE TABLE MyTable (MyID Int IDENTITY(1, 10) NOT NULL , MyDescription nVarChar(50) NOT NULL)

After the table is created you add two records:

 INSERT MyTable (MyDescription) VALUES ('Auto Record 1') INSERT MyTable (MyDescription)  VALUES ('Auto Record 2')

A query of the table reveals the following data:

 MyID        MyDescription ----------- --------------------------------------------------  1           Auto Record 1 11          Auto Record 2 (2 row(s) affected)

Now you explicitly enter a MyID value with the following script:

 SET IDENTITY_INSERT MyTable ON INSERT MyTable (MyID, MyDescription)  VALUES (5, 'Manual Record 1') SET IDENTITY_INSERT MyTable OFF

What is the next value that SQL Server will automatically assign for MyID? Will it be 15 (incrementing 10 from 5) or will it be 21? The answer is that SQL Server will always choose the highest number as its current seed for a positive increment value or the lowest for a negative increment value, so the results will look like this:

 INSERT MyTable (MyDescription) VALUES (‘Auto Record 3’) zzz SELECT * FROM MyTable MyID        MyDescription ----------- --------------------------------------------------  1           Auto Record 1 11          Auto Record 2 5           Manual Record 1 21          Auto Record 3 zzz (4 row(s) affected)

Often, when working with IDENTITY values, you will want to know what the last value supplied was. SQL Server provides the @@IDENTITY global variable to hold that value. Retrieving the IDENTITY value is as simple as selecting the variable:

 INSERT MyTable (MyDescription) VALUES (‘Auto Record 4’) SELECT * FROM MyTable SELECT @@IDENTITY AS LastIdentity MyID        MyDescription ----------- --------------------------------------------------  1           Auto Record 1 11          Auto Record 2 5           Manual Record 1 21          Auto Record 3 31          Auto Record 4 LastIdentity ----------------------------------------  31

Because @@IDENTITY is a global variable, it will work on every connection, but it will only return the last IDENTITY value issued on the connection that the variable is retrieved. If you need to discover what the last IDENTITY value for a table is regardless of your current scope, you can use the IDENT_CURRENT function. So opening a different connection from that used in the previous example would return results like this:

 SELECT @@IDENTITY AS LastIdentity SELECT IDENT_CURRENT(MyTable) AS CurrentIdentity LastIdentity ----------------------------------------  NULL CurrentIdentity ----------------------------------------  31

Defaults

In its simplest form, a default is simply a hard-coded value that is assigned to a column if it isn't specified in an INSERT statement. Chapter 1 mentioned that I have a deep-rooted prejudice against NULL. Whenever possible I avoid allowing nulls in my table designs. However, there are many times when I need to allow for a value not to be provided in a table insert. In these instances a default is very handy. For example, let's go back to my MyTable example. This time I will add a new column called Region. This column will contain a three-character region code. Because I live in the Seattle area, I want this value to be PNW, for Pacific Northwest, if no value is provided. Here is the script to create my new table:

 CREATE TABLE MyTable (MyID Int IDENTITY(1, 1) NOT NULL , MyDescription nVarChar(50) NOT NULL  , MyRegion nChar(3) NOT NULL DEFAULT 'PNW')

Now if an insert is made to my table and a region is not provided, the value PNW will automatically be used.

This is the preferred method for using defaults, but it is not the only way. A default can also be created as a stand-alone object in the database and then bound to any number of columns in any number of tables in the database:

 CREATE DEFAULT df_Region AS 'PNW' GO CREATE TABLE MyTable (MyID Int IDENTITY(1, 1) NOT NULL , MyDescription nVarChar(50) NOT NULL  , MyRegion nChar(3) NOT NULL) GO sp_bindefault df_Region, MyTable.MyRegion

UniqueIdentifier

A problem with using an auto-incrementing IDENTITY is that this system only works when any users or clients that create records are concurrently connected to the same database server. This introduces problems for replicated systems or off-line applications that need to synchronize data from other sources. For example, I was recently involved with a project where user data needed to be joined from a SQL Server database and an Active Directory domain. Microsoft's Active Directory uniquely identifies all of its resident objects with a Globally Unique Identifier (GUID, pronounced "goo-id"). As a result, a column needed to be defined in the database that could hold this GUID. To complicate matters even more, part of the application design specified that not all of the system's users would reside in Active Directory. Some would exist only in SQL Server. A method was needed to either use existing GUIDs or to generate new ones. That is where SQL Server's UniqueIdentifier data type and NEWID() function come into play. The UniqueIdentifier type stores a 128-bit integer value that is usually displayed as an alpha-numeric representation of its hexadecimal form. These values are not intended for human consumption, because they are quite large and fairly random in composition.

Note

I'm proud to say that I am responsible for creating a new acronym to describe the category for these values. It's a BUN, which stands for Big Ugly Number.

In SQL Server, UniqueIdentifier values are either explicitly provided or they are generated by the NEWID() system function. To define a new table with an ID of this type, I'll use the following script:

 CREATE TABLE MyTable (MyID UniqueIdentifier NOT NULL DEFAULT NewID() , MyDescription nVarChar(50) NOT NULL)

Once the table is created I can either specify a GUID to be inserted or allow a new GUID to be generated by the NEWID() function, which has been specified as the default value. The NEWID() function only works as a default value for a table's UniqueIdentifier column when no value is provided, as demonstrated in the following:

 INSERT MyTable (MyID, MyDescription) VALUES ('', 'Explicitly provided GUID') INSERT MyTable (MyID, MyDescription) VALUES (DEFAULT, 'SQL Server generated GUID') SELECT * FROM MyTable MyID                                 MyDescription ------------------------------------ --------------------------------   Explicitly provided GUID  SQL Server generated GUID

Keep in mind that like the IDENTITY property, the UniqueIdentifier type does not in and of itself guarantee uniqueness. A unique or primary key constraint must be used in conjunction with the UniqueIdentifier data type. The NEWID() function, however, will never generate the same value twice on any database server in the world (at least that is the theory).

Constraints

SQL Server constraints fall into one of five categories, as described in the following table.

Constraint Type	Description
Not Null	Ensures that the column has a defined non-null value.
Primary Key	Enforces uniqueness for the purpose of identifying a row. Doesn't accept Null values.
Check	Validates a row based on the value of a column. Uses a clause, similar to that following a WHERE statement, to identify acceptable values.
Unique	Requires each value in a column to have a unique value. Column will accept a single Null value unless used in conjunction with Not Null.
Foreign Key	Enforces referential integrity rules by checking the value of a column against that of the primary key value in a related table. Null values are allowable unless explicitly restricted.