Data Types | Microsoft SQL Server 2005 Stored Procedure Programming in T-SQL & .NET

Data types specify the type of information (such as number, string, picture, date) that can be stored in a column or a variable.

SQL Server recognizes 28 system-defined data types. Apart from these data types, you can create user-defined data types in T-SQL and in .NET to fulfill specific needs.

The following are the categories of system-defined data types:

Character strings
Unicode character strings

Date and time
Approximate numeric
Exact numeric
Integer numbers
Monetary
Binary
Special

Note

In some cases, you can use different identifiers to refer to a data type in T-SQL code. For example, the char data type can be referenced as character, and varchar can be referenced as character varying. Some of these synonyms are based on ANSI SQL-92 standard requirements.

Character Strings

Character data types store character strings. The four different character types vary in length and storage characteristics:

char
varchar
varchar(max)
text

The char data type is used to store strings of fixed size. As noted earlier, the maximum size of this data type is 8,000 characters, which is a significant increase over the 255-character limit in early versions. When a variable or a table column is assigned with a string that is shorter than its nominal size, it is padded with trailing spaces to fill the specified field length.

The varchar data type stores strings of variable size up to 8,000 characters long. When a character value whose length is less than the nominal size is assigned to the column or variable, SQL Server does not add trailing spaces to it, but records it as is. varchar data types occupy two additional bytes in order to record the length of the string.

Note

Maintenance of this information requires some additional computation during I/O operation, but that time is usually countered by savings in the space required. A record using such columns occupies less space, and more records fit into a single page. Therefore, SQL Server reads more records when accessing data, and it is more likely that a single page contains the information that the user is looking for.

The text data type is used to store huge amounts of data. One field can store up to 2GB (2³¹ - 1 bytes) of information. Only a 16-byte pointer to this data is stored in the table. Therefore, additional processing overhead is involved with the use of text columns. There are special functions for processing text values.

Note

You should avoid using this data type because Microsoft plans to remove it in future versions of SQL Server. You should use the new varchar (max) data type instead.

Varchar (max) looks and operates like the varchar data type, but it is by internal structure and functionality actually much more like the text data type.

It is also designed to store large strings (up to 2GB). The string is stored in special data pages, not in the data page with the rest of the record. Its biggest advantage is that most string operating functions that work with varchar also work with varchar (max); however, there are a few new functions for additional functionality. The following command creates a table with four fields using different character string data types:

      Create table Contacts(ContactId char(8),      Name varchar(50),      Note text,      Resume varchar(max))

Character constants are delimited from the rest of the Transact-SQL code with quotes. For example, the following statement inserts contact information:

      insert into Contacts (Contactld, Name, Note, Resume)      values ('CO-92-81', 'Tom Jones', 'Tom@trigon.com', 'N/a')

Unicode Character Strings

Microsoft SQL Server 2005 has four character data types for storing Unicode data—using non-ASCII character sets. They are equivalent to the char, varchar, varchar (max), and text data types and are called

nchar
nvarchar
nvarchar(max)
ntext

The main difference between these new data types and the older character data types is that the new data types can hold Unicode characters, which occupy 2 bytes per character. Therefore, the maximum string length that they can store is half that of the corresponding older data types (4,000 for nchar and nvarchar).

The following statement creates the same table as the previous example but uses Unicode data types:

 Create table Contacts_2(ContactId nchar(8),      Name nvarchar(50),      Note ntext,      Resume nvarchar(max)) go

Unicode character constants are also delimited with quotes but are prefixed with N’:

      insert into Contacts_2 (Contactld, Name, Note, Resume)      values (N'CO-92-81', N'Tom Jones', N'Tom@trigonblue.com', N'N/a'}

This N'prefix might look a little odd, but you will get used to it. Microsoft documentation is full of samples with Unicode constants. It was some time before I discovered the reason Microsoft uses N’ as a prefix. It stands for "National." In fact, acceptable alternative identifiers for these data types are

National char
National char varying
National char varying (max)
National text

Tip

Typically, it is not a problem if you omit the N’ prefix on constants. SQL Server automatically converts the string to its Unicode equivalent. Naturally, it is better to insert it whenever you are dealing with Unicode columns or variables, but it is not a big problem. The CPU will just have to perform a couple of extra cycles to make the conversion.

However, there are cases in which it becomes a problem. When your string constant is part of a query criterion, then the presence of the N’ prefix might significantly affect execution of the query. If the column is defined as a non-Unicode string and the criterion is specified with the N’ prefix, SQL Server converts every row of the table to compare it with the Unicode constant. As a result, the query performs a table scan instead of using an index.

Date and Time Data Types

SQL Server supports two T-SQL data types for storing date and time:

datetime
smalldatetime

The main difference between these two data types is in the amount of space they occupy, datetime occupies 8 bytes and smalldatetime only 4 bytes. Other differences between the two types are the precision of the date stored and the range of dates that can be used. The precision of smalldatetime is one minute, and it covers dates from January 1, 1900, through June 6, 2079, which is usually more than enough. The precision of datetime is 3.33 ms, and it covers dates from January 1, 1753, to December 31, 9999.

Date and time constants are written in Transact-SQL with quote delimiters (as are character strings):

      update Contacts_2      Set DateOfBirth = '2/21/1965 10:03 AM'      where ContactId = 'CO-92-81'

Tip

SQL Server supports many different date and time formats. The Convert!) function accepts a parameter that controls the format of date and time functions (explained in detail in Chapter 4).

If time is not specified in a constant, SQL Server automatically assigns a default value—12:00 A.M. (midnight). You should keep in mind that SQL Server always records time as a part of these data types. Thus, if you want to select all contacts born on a particular day, you should not use something like this:

      ssselect *      from Contacts_2      where DateOfBirth = '2/21/1965'

This statement would extract records with DateOfBirth set to midnight of that day. Such a solution might be acceptable if all other applications recording values in the field also make the same mistake. A proper solution would be

      select *      from Contacts_2      where DateOfBirth $gt;= '2/21/1965' and DateOfBirth $lt; '2/22/1965'

Integer Numbers

Integers are whole numbers. SQL Server supports 1-, 2-, 4-, and 8-byte integers. The bit data type is used to store 1 or 0, to represent logical true and false values. The following table lists integer data types, their storage size, and range of values.

Data Type	Storage Size	Minimum	Maximum
`int`	4 bytes	-2, 147, 483, 648 (-2G)	s2, 147, 483, 647(26–1)
`smallint`	2 bytes	-32768 (-32Ks)	32767 (32K-1)
`tinyint`	1 byte	0	255(2⁸-1)
`bigint`	8 bytes	-9,223,372,036,854,775,808 (-2⁶³)	9,223,372,036,854,775,807 (2⁶³-1)
`bit`	Ibit	0	1

The great thing about the int data types is that they can store huge numbers in a small space. For this reason, they are often used for key values. If the data type of the primary key is int, the table can store up to four billion records, which is typically enough for any purpose. Therefore, if you are not sure about the size of your table, use int as its primary key.

Tip

However, we are starting to see computers with billions of records—both OLTP and data warehousing systems are getting bigger and bigger, and there are also some implementations of distributed databases that can use integers higher than two billion. In those cases you could use bigint for primary keys.

Integer constants do not need delimiters:

      update Inventory      Set StatusId = 3,          Operational = 0      Where InventoryId = 3432

Approximate Numbers

Decimal numbers are often stored in real and float data types, also known as single and double precision. Their advantage is that they do not occupy much space but they can hold large ranges of numbers. The only trouble is that they are not exact. They store a binary representation of the number that is often approximately, but not exactly, equal to the original decimal number.

Precision is the number of significant digits in the number, and scale is the number of digits to the right of the decimal point. For example, the number 123456.789 has a precision of 9 and a scale of 3. The precision of real numbers is up to 7 digits, and the precision of float numbers is up to 15 digits. For this reason, they are ideal for science and engineering (where, for example, you may not care about a couple of meters when you are measuring the distance between the Earth and the Moon), but they are not adequate for the financial industry (where a company budget has to be exact to the last cent).

To record the number 234,000,000,000 in mathematics, you can use 234xl0⁹, and in Transact-SQL, you can use 234E9. This is known as scientific notation. The number after E is called the exponent, and the number before E is called the mantissa. This notation can be used to store small constants, too. In mathematics, 0.000000000234 can be written as 0.234x10⁹, and in Transact-SQL, it can be written as 0.234E-9.

SQL Server uses the IEEE 754 standard to store these numbers. When a float or real variable or column is assigned a number, SQL Server first converts the decimal number to its binary representation. This conversion is the reason these values are approximately, but not exactly, equal to the decimal version. This is why they are referred to as approximate numbers. Therefore, you should not rely on the equivalence of two such numbers. You should limit their use in Where clauses to < and > operators and avoid the use of the = operator.

Exact Numbers

The decimal or numeric data type does not use approximations when storing numbers. Unfortunately, it requires much more space than the real and float data types. When a decimal column or a variable is defined, you have to specify its scale and precision.

SQL Server can store decimal numbers with a maximum precision of 38. Scale can be less than or equal to the precision.

In the next example, Weight and Height columns have precision 5 and scale 2—the columns can have up to two digits after the decimal point and up to three digits before.

 Create table Patient (PatientId int,      FullName varchar(BO),      Weight decimal(5,2),      Height decimal(5,2),      ADP smallint,      BDZ tinyint) go

decimal constants do not need delimiters either:

 insert into Patient (Patientld, FullName, Weight, Height, ADP, BDZ) values (834021, 'Tom Jones', 89.5, 188.5, 450, 11)

Monetary Data Types

The money and smallmoney data types are a compromise between the precision of decimal numbers and the small size of real numbers, smallmoney occupies 4 bytes and uses the same internal structure as int numbers. The data can have up to four digits after the decimal point. For this reason, you can store numbers ranging from −214,768.3648 to 214,768.3647 in the smallmoney data type. The money data type uses the same structure for storing information as the bigint data type. It occupies 8 bytes for storage, so its values must range from −922,337,203,685,477.5808 to +922,337,203,685,477.5807.

Monetary constants can be preceded by $ or one of 18 other currency symbols (listed in SQL Server Books OnLine):

      update Inventory_2      Set Rent = $0,      LeaseCost = $119.95      Where InventoryId = 3432

Binary Data Types

Binary data types are used to store strings of bits. SQL Server supports four basic binary data types, the attributes of which are similar to character data types:

binary
varbinary
varbinary(max)
image

The binary and varbinary data types can store up to 8,000 bytes of information, and image and varbinary (max) can store up to 2GB of data. The following example creates a table that has two binary columns:

 CREATE TABLE MyTable (      Id int,      BinData varbinary(8000),      Diagram varbinary(max))      go

Binary constants are written as hexadecimal representations of bit strings and prefixed with Ox (zero and x):

      Update MyTable      Set BinData = Ox82A7210B      where Id = 121131

A cool, new feature of SQL Server 2005 is the OPENROWSET() function with the new BULK OLE DB provider to read a file and load it as a rowset to a varbinary (max) column:

      UPDATE dbo.MyTable      SET Diagram = (SELECT *                    FROM OPENROWSET(BULK 'C:\My Pictures\desktop.bmp',                                    SINGLE_BLOB) AS a }      where Id = 121131

Special Data Types

The following sections cover the special data types.

timestamp

The timestamp data type is not designed to store date or time information, but rather is a binary value that serves as a version number of the record. The value is updated every time the record is updated, and the value is unique in the database. It is used to implement optimistic locking. You can find more details about this subject in "Optimistic Locking Using timestamp Values" in Chapter 15. Only one field in a table can be defined as the timestamp value. It occupies 8 bytes.

uniqueidentifier

The uniqueidentifier data type stores 16-byte binary values. These values are often called globally unique identifiers (GUIDs). When a system generates a new GUID value, it is guaranteed that the same value cannot be produced again, neither on the same computer nor on any other computer in the world. GUIDs are generated using the identification number of the network card and a unique number obtained from the computer's clock. Manufacturers of network cards guarantee that the identification number of a network card will not be repeated in the next 100 years.

A uniqueidentifier constant is usually presented as

Character string ‘{}’
Binary constant Oxaf16a66f7f8b31d3b41d30c04fc96f46

However, you will rarely type such values. In Transact-SQL, GUIDs should be generated using the NEWID function. There is also a Win32 API function that a client application can use to produce a GUID value.

uniqueidentifier values are used relatively often for implementations of web applications and distributed database systems. In web applications, designers might use the uniqueidentifier data type to generate a unique identifier before the record is sent to the database. In distributed systems, this data type serves globally unique identifiers.

xml

The xml data type is a major new feature of SQL Server 2005. Before it was introduced, users were storing XML documents as strings or binary data. The new data type allows SQL Server to parse an XML document stored in an XML column and verify that it is compliant with the schema that the user associated with the column.

It is also useful in that a user can issue queries (in the XQuery language) against data in xml columns (for example, to find rows that contain XML documents that have specified values in specified attributes or elements). It is very exciting that it is possible to index xml columns and their attributes and elements, so that the engine does not have to do table and column scans.

However, apart from additional features, the xml data type is based on the varchar (max) data type and it is therefore limited to 2GB. The following example creates a table that has an xml column:

 Create table dbo.Eq2(                 EqId int,                 EqCompList xml)

You should use quotes as delimiters around XML constants:

 INSERT INTO dbo.Eq2(Eqld, EqCompList} VALUES(123,'<CompList><CZ101/><AZ401/><BZ407/x/CompList>'}

sql_variant

The sql_variant data type is based on the same idea as the variant data type in Visual Basic. It is designed to allow a single variable, column, or parameter to store values in different data types. Internally, variant objects record two values:

The actual value
The metadata describing the variant: base data type, maximum size, scale, precision, and collation

The following statement creates a lookup table that can store values of different types:

 Create table Lookup(    LookupGroupId tinyint,    LookupId smallint,    LookupValue sql_variant) Go

Before SQL Server 2000, more than one field was needed to store lookup values of different data types.

The following statements illustrate how you can insert different types of values in one column:

      Insert Lookup (LookupGroupId, Lookupld, LookupValue)      Values (2, 34, 'VAR')      Insert Lookup (LookupGroupId, Lookupld, LookupValue)      Values (3, 22, 2000)      Insert Lookup (LookupGroupId, Lookupld, LookupValue)      Values (4, 16, '1/12/2000')      Insert Lookup (LookupGroupId, Lookupld, LookupValue)      Values (4, 11, $50000)

A sql_variant object can store values of any data type except these:

text
ntext
image
varchar(max)
nvarchar (max)
varbinary (max)
timestamp
sql_variant
any user-defined data types

But there are more serious restrictions on their use:

sql_variant columns are limited to 8,016 bytes.

sql_variant columns can be used in indexes and unique keys if the total length of the data in the key is shorter than 900 bytes. However, this is not a limitation of the sql_variant data type. Indexes cannot be based on columns that are larger than 900 bytes in total.
sql_variant columns cannot have an identity property.
sql_variant columns cannot be part of a computed column.
You must use functions for converting data types when assigning values from sql_variant objects to objects of other data types.
The comparison of sql_variant values has complex rules and is prone to errors.
sql_variant values are automatically converted to nvarchar (4000) when accessed from client applications using OLE DB Provider for SQL Server 7.0 or the SQL Server ODBC Driver from SQL Server 7.0. If stored values are longer than 4,000 characters, SQL Server will return just the first 4,000 characters.
sql_var iant values are automatically converted to varchar (255) when accessed from client applications using the SQL Server ODBC Driver from SQL Server 6.5 or earlier, or using DB-Library. If stored values are longer than 255 characters, SQL Server will return just the first 255 characters.
sql_variant columns are not supported in the Like predicate.
sql_variant columns do not support full-text indexes.
sql_variant objects cannot be concatenated using the + operator, even if the stored values are strings or numeric. The proper solution is to convert values before concatenation.
Some functions—Avg(), Identity(), IsNumeric(), Power(), Radians(), Round(), Sign(), StDev(), StDevP(), Sum(), Var(), VarP()—do not support sql_variant parameters.

Tip

You should be very conservative in using the sql_variant data type. Its use has serious performance and design implications.

table

The table data type is used to store a recordset for later processing. In some ways, this data type is similar to a temporary table. You cannot use this type to define a column. It can only be used as a local variable to return the value of a function.

Note

You will find more information about table variables in the "Table Variables" section later in this chapter, and more information about table-valued functions in Chapters 4 and 9.

The Cursor Data Type

This is a special kind of data type that contains references to cursors. You will see in the "Cursors" section later in this chapter that cursors are programming constructs that are designed to allow operations on records one at a time. It is not possible to define a column of this type. It can be used only for variables and stored procedure output values.

Transact-SQL User-defined Data Types

You can define custom data types in a database. Traditional user-defined data types are defined in Transact-SQL. We will describe them in this segment of the book.

In SQL Server 2005, it is possible to define user-defined data types in .NET as well. We will describe the .NET user-defined data types in Chapter 13.

These new types are based on system-defined data types and are accessible only in the database in which they are defined. You can define them from Enterprise Manager or using the system stored procedure sp_addtype:

      Exec sp_addtype Phone, varchar(20), 'NOT NULL'      Exec sp_addtype typPostalCode, varchar(V), 'NULL'

The first parameter is the name of the new data type, the second parameter is the system-defined data type on which it is based, and the third parameter defines the nullability of the new data type. When the command is executed, the server adds the type to the sys.systype system view of the current database.

New types can be based on any system-defined type except timestamp.

Tip

A fascinating aspect of user-defined data types is that you can change them in one step across the database. For example, if you decide that decimal (19,6) is not big enough for your monetary values, you can replace it with decimal (2 8,13). You can simply run the script that first changed the data type and then re-create all database objects that are referencing it. This feature is very useful during the development stage of a database. Unfortunately, when a database is already in the production phase, tables contain data, and this feature becomes a lot more complicated.

The designers of Microsoft SQL Server have included one special data type with the server—sysname. It is used to control the length of Transact-SQL identifiers. When the server is working in default mode, the length of this type is set to 128 characters. When the compatibility level is set to 65 or 60, the length is shortened to 30 characters. You should use it to define columns and variables that will contain Transact-SQL identifiers.