No Strings Attached

Generally all strings could be divided into character strings (to store plain text) and binary strings which contain either machine code (computer programs) or special binary instructions for other programs.

Character strings

A character string can be defined simply as a sequence of bytes. The length of the string is the number of bytes in the sequence. A string of zero length is called an empty string. It can be an equivalent to NULL (special concept introduced at the end of this chapter) or not, depending on implementation. SQL99 specifically differentiates between empty strings and nulls.

All strings in SQL can be of fixed length or varying length. The difference is quite simple, but sometimes not very easy to understand for people with no technical background, so let us explain it in some greater detail.

Fixed-length character strings

If you define string to be of a fixed length, the system preallocates the desired number of bytes in memory and/or computer hard disk. It does not matter if the actual value to be stored in that string is exactly that many bytes, twice smaller, or just one character long — it is still going to occupy the whole allocated space (unused bytes will be padded with blank characters), so all strings will have exactly the same length. For example, imagine you defined a column DATABASE as having a character field of length 13, and now we want to store three strings in that column: ORACLE, UDB2, and MS SQL SERVER. Figure 3-1 illustrates the results of that operation. Note that all strings are exactly 13 bytes long.

click to expand
Figure 3-1: Fixed-length character string storage

Character strings of varying length

If you define DATABASE column as a varying-length string with maximum 13 characters to store, the picture will be different. The actual memory or disk space required to hold our values is allocated dynamically. As a result, all three strings will be of different length and will require different numbers of bytes to hold them. String ORACLE occupies 6 bytes, DB2 UDB — 4 bytes and MS SQL SERVER takes maximum allowed 13 bytes. (See Figure 3-2.)

click to expand
Figure 3-2: Varying-length character string storage

Here is the general platform-independent recommendation: use a fixed-length data type when your values are expected to be of the same size, and a varying-length one for values when size is expected to vary considerably. In the example above it rather looks logical to use the varying-length strings, but if we need a column to store, say, gender in form M or F, a fixed-length string is more appropriate.

National character strings

Even though English is a very popular language, it is not the only language on Earth. And in spite of the fact that practically all major software companies reside in the United States, the market dictates its own rules. For example, Oracle is a very popular database vendor around the world with customers in China, India, Korea, Germany, France, Israel, Russia, Saudi Arabia, and many other countries.

Note 

Oracle has customers in 145 countries. But would that be the case if the English language was the only option customers were able to use? Most likely, the answer is "no." Customers have to be able to store and use information in their native language — otherwise they would rather use some less-efficient and/or more expensive DBMS vendor who provides that option.

Now we have a little bit of a problem. So far we've used terms "character" and "byte" as synonyms. Byte is a computer term for a unit of information storage that consists of 8 bits. Each bit can either be 1 or 0 and the combination of 8 bits allows us to store 256 (28) distinct values (or 256 different characters represented by numbers from 0 to 255). That looks like a lot, but ... not actually. We need separate holders for uppercase and lowercase letters, punctuation marks, digits, math symbols, and so on. That barely leaves space for distinct characters used in other languages that employ the Latin alphabet. And what about the ones which don't? There are about 3,000 different languages in the world, dead and living, in addition to constructed languages like J.R.R. Tolkien's Quenya, Sindanin, or Entish, and most of them have their own distinct alphabets!

Note 

ASCII (American Standard Code for Information Interchange) was published in 1968 as a standard of ANSI. It uses the aforementioned 256 holders to store different characters, and it remains a useful standard to this day.

The solution seems to be rather intuitive — use two bytes per character instead of one. That allows 65,535 (216) distinct combinations, which is enough to store all existing characters from every major language on Earth.

SQL has two data types to store strings in national characters — national character string and national character string of varying length — that behave in exactly same way as previously described character string and character string of varying length correspondingly, but use the two-byte Unicode standard. So, if you declared your DATABASE column as a national character field of size 13, it would still hold 13 characters, but would reserve 2 bytes for each letter, for a total of 26 bytes. The difference is, now it can hold the names from previous examples spelled in practically any language, for example, in Russian. Figure 3-3 illustrates that.

click to expand
Figure 3-3: Fixed-length Unicode character storage

Figure 3-4 shows same concept for national characters of varying length.

click to expand
Figure 3-4: Varying-length Unicode character storage

start sidebar
Introducing Unicode

The ultimate successor to ASCII is Unicode. It is a standard double-byte character set that assigns a unique number to every single character, so all of them can be represented in one character set. To speak in database terminology, the Unicode character number serves as a primary key to index virtually all the world's characters. (Another good example of database use!) The Unicode standard has been adopted by such industry leaders as Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, and many others. Unicode is required by modern standards such as XML, Java, JavaScript, CORBA, WML, HTML, and is the official way to implement ISO standard 10646. It is supported in many operating systems, all modern browsers, major RDBMS vendors, and many other products. The emergence of the Unicode standard and the availability of tools supporting it are among the most significant recent global software technology trends.

end sidebar

Note 

In Russian Oracle is spelled with only five characters rather than six in English, so only five memory (or hard disk) holders are occupied, but now each holder is two bytes long.

Let's talk about SQL99 standards and implementation specifics for all types of character strings. These are summarized in Table 3-1.

Table 3-1: Major Vendor Implementations Character String Data Types

SQL99

Oracle 9i

DB2 UDB 8.1

MS SQL SERVER 2000

CHAR[ACTER]
[(n)]

CHAR[ACTER]
VARYING(n) OR VARYING(n)

CLOB

CHAR[ACTER]
[(n)]

CHAR[ACTER]
VARYING(n)

VARCHAR(n)

VARCHAR2(n)

LONG [VARCHAR]

CLOB

CHAR[ACTGR]
[(n)]

CHAR[ACTER]
VARCHAR(n)

VARCHAR(n)

LONG VARCHAR

CHAR[ACTER][(n)]

CHAR[ACTER]
VARYING[(n)]

VARCHAR[(n)]

TEXT

NATIONAL CHAR[ACTER] [(n)]OR NCHAR[(n)] OR CHARACTER[(n)] CHARACTER SET <char_set_name>

NATIONAL CHAR[ACTER] [(n)]
NCHAR[(n)]

GRAPHIC[(n)]

NATIONAL CHAR[ACTER][(n)] NCHAR[(n)]

NATIONAL CHAR[ACTER] VARYING(n) OR NCHAR VARYING(n) OR CHARACTER VARYING(n) CHARACTER SET <char_set_name>

NATIONAL CHAR[ACTER] VARYING[(n)] NCHAR VARYING(n) NVARCHAR2(n) NCLOB

VARGRAPHIC(n) LONG VARGRAPHIC(n) DBCLOB(n)

NATIONAL CHAR[ACTER] VARYING(n) NCHAR VARYING[(n)] NVARCHAR[(n)]

SQL99

SQL99 has two major character sets: CHARACTER and CHARACTER VARYING. In addition, there are also NATIONAL CHARACTER and NATIONAL CHARACTER VARYING.

CHARACTER can also be abbreviated with CHAR. The size can optionally be specified in the form CHARACTER(n). For example, CHARACTER(15) can hold character strings up to 15 characters long. If size is omitted, the default is 1. An error occurs if one tries to store a string that is bigger than the size declared.

CHARACTER VARYING can be abbreviated with CHAR VARYING or VARCHAR. You have to specify the maximum size for strings to be stored, for example, CHARACTER VARYING(15) holds 15-character strings, or smaller.

NATIONAL CHARACTER (NATIONAL CHAR, NCHAR, CHARACTER CHARACTER SET <char_set_name>) specifies the default data type for the country of implementation. This is a fixed-length character string data type.

NATIONAL CHARACTER VARYING (NATIONAL CHAR VARYING, NCHAR VARYING, CHARACTER VARYING CHARACTER SET < char_set_name>, CHAR VARYING CHARACTER SET < char_set_name>) is a varying-length country-specific character string data type.

CLOB is a new SQL99 data type to store large nondatabase-structured text objects of varying size and complexity, such as employees' resumes, collections of papers, books, and other similar data.

Oracle 9i

Oracle is fully compliant with SQL99 standards for character strings.

  • CHAR is used for fixed-length strings. The default length for a CHAR column is 1 byte with a maximum of 2,000 bytes.

  • VARCHAR2 is an Oracle data type to store varying-length character strings. It does not have the default length, so you have to specify a value from 1 to 4,000 (maximum number of bytes for VARCHAR2).

  • NCHAR and NVARCHAR2 are used to store fixed-length and varying-length national character strings. Beginning with Oracle9i, they were redefined to be Unicode-only data types and can hold up to 2,000 and 4,000 characters (not bytes!) correspondingly. That means if you declare a column to be CHAR(100) it will allocate 100 bytes per column, but NCHAR(100) Unicode-based column requires 200 bytes.

    Note 

    The VARCHAR data type in Oracle is currently a synonym to VARCHAR2. If you declare a column as VARCHAR(30), it will be converted it to VARCHAR2(30) automatically. Oracle does not recommend the use of VARCHAR as a data type, but rather recommends VARCHAR2 instead because keyword VARCHAR may be later used in some different way.

  • CLOB and NCLOB can store up to four gigabytes of data in Oracle. Both fixed-length and variable-length character sets are supported. CLOB uses the CHAR database character set, and NCLOB stores Unicode data using the national character set.

  • LONG is an old Oracle data type to store variable-length character strings containing up to two gigabytes. It is similar to VARCHAR2, but has many limitations. For example, you cannot use LONG in the WHERE clause of a SELECT statement (discussed in Chapter 8), a table can't have more than one LONG column, it can't be indexed, and so on. Oracle strongly recommends to discontinue the use of the LONG data type and use CLOB instead.

    Note 

    Oracle has synonyms for SQL99 compatibility. For example, you can use CHARACTER(100) rather than CHAR(100) or CHARACTER VARYING rather than VARCHAR2 to attain the same results. See Table 3-1 for more details.

DB2 UDB 8.1

DB2 has following character string data types:

  • CHARACTER is compliant with SQL99 standards. The maximum length is 254 characters. The default length is 1.

  • VARCHAR is used for varying-length strings and has a maximum of 32,672 characters.

  • LONG VARCHAR is virtually same as VARCHAR, but can hold larger values (up to 32,700) and can't be limited to a certain number of characters.

  • CLOB types are SQL99 compliant varying-length strings of up to two gigabytes. An optional maximum length can be supplied in kilobytes (K|k), megabytes (M|m), or gigabytes (G|g). For example, CLOB (10M) would allow maximum of 10,048,576 characters.

  • GRAPHIC is a rough DB2 equivalent to NATIONAL CHARACTER. It is a double-byte data type, which may range from 1 to 127 characters. If the length specification is omitted, a length of 1 is assumed.

  • VARGRAPHIC is a varying-length double-byte character string data type, comparable to SQL99 NATIONAL CHARACTER VARYING. The range is from 1 to 16,336.

  • LONG VARGRAPHIC is similar to VARGRAPHIC with a maximum length of 16,350. It does not have an optional length limit to be supplied by user.

  • DBCLOB is a double-byte equivalent to CLOB. Maximum storage is one gigabyte of character data. DBCLOB accepts a maximum length in the same way as CLOB.

    Note 

    GRAPHIC, VARGRAPHIC, and DBCLOB data types are not supported in the Personal Edition of DB2 supplied with your book.

See Table 3-1 for more information.

MS SQL Server 2000

The following character string data types are supported by MS SQL Server:

  • CHAR and VARCHAR are used for fixed-length and variable-length character data correspondingly. The maximum length is 8,000 characters. Unlike Oracle, you don't have to specify length for VARCHAR — it defaults to 1 like CHAR.

  • TEXT is similar to VARCHAR, but can hold much larger values. Its maximum length is two gigabytes or 231 – 1 (2,147,483,647) characters.

  • NCHAR and NVARCHAR, and NTEXT are Unicode equivalents to CHAR, VARCHAR, and TEXT. NCHAR and NVARCHAR can hold up to 4,000 characters; NTEXT is much larger — one gigabyte or 230 – 1 (1,073,741,823) characters.

For SQL99 compatibility synonyms see Table 3-1. If one data type has more than one name (or synonym) the most widely used name is given in italics.

Character string literals

The terms literal and constant refer to a fixed data value, for instance

'Frozen Margarita' 'ALEX'  '2003/08/07' '10101101'

are all character literals. Character literals are enclosed in single quotes. To represent one single quotation mark within a literal, you can enter two single quotation marks:

'O''Neil'

Character literals are surprisingly consistent between all our three major vendors, with only slight variations. For example, MS SQL Server allows double quotes for character literals instead of single ones if the option QUOTED_IDENTIFIER is set off for a connection. To represent a national character set literal, it has to be preceded by capital letter N (DB2 understands G in addition to N):

N'Jack Smith' N'Boris M.  Trukhnov' N'123 OAK ST.'

Text entered using this notation is translated into the national character set.

Binary strings

A binary string is a sequence of bytes in the same way that a character string is, but unlike character strings that usually contain information in the form of text, a binary string is used to hold nontraditional data such as images, audio and video files, program executables, and so on. Binary strings may be used for purposes similar to those of character strings (e.g., to store documents in MS Word format), but the two data types are not compatible; the difference being like text and a photo of the same text. Binary string data types are summarized in Table 3-2.

Table 3-2: Binary String Data Types

SQL99

Oracle 9i

DB2 UDB 8.1

MS SQL SERVER 2000

BIT

  

BINARY[(n)]

BIT VARYING

RAW(n)

 

VARBINARY[(n)]

 

LONG RAW

  

BLOB

BLOB

BLOB(n)

IMAGE

start sidebar
Character vs. Special Files

It might sound a little bit confusing — why plain text documents can be stored as character strings, and a Word document has to be treated as a binary string. The thing is, a Word file is a text document from user's point of view, but from computer storage perspective it is not. In addition to plain text characters it contains many special signs and instructions that only MS Word software can interpret. The same is true for any other special files — bitmaps, spreadsheets, audio and video files, and so forth. You can think of it in this way: a special file (e.g., of the DOC, XLS, BMP, or AVI type) is like a tape for VCR, whereas a program (MS Word, Excel, Paint, QuickTime Player) is like a VCR. You have to have a VCR to play a tape, and it has to be the right VCR — if you try to play a standard US VHS tape in NTSC format on a European video recorder (PAL format), it's not going to work. You might see some blinking on your screen, you will hear some noise, but you will definitely not be able to watch the movie. Just try to open a Word file with, say, Notepad and you will see what we are talking about.

end sidebar

SQL99

SQL99 has following data types to store binary strings: BIT, BIT VARYING, and BLOB.

  • BIT is a fixed-length binary string somewhat similar to CHAR. If you declare a column to be BIT(100), 100 bytes will be allocated in memory/disk, and if the object you store is just 60 bytes, it's still going to occupy all 100 bytes.

  • BIT VARYING is similar to VARCHAR — even if you specify BIT VARYING(100) to be the data type lasting the previous example, it will only take 60 bytes to store the object.

  • BLOB is a binary equivalent to CLOB.

Oracle 9i

Oracle doesn't have an equivalent to SQL99 BIT, but has two data types that correspond to BIT VARYINGRAW and LONG RAW. BLOB data type is also supported.

  • RAW can hold a maximum of 2,000 bytes. The size has to be specified.

  • LONG RAW can accumulate up to two gigabytes of data. This data type is obsolete, and Oracle strongly recommends converting it to BLOB.

  • BLOB can store up to four gigabytes of binary data in Oracle.

DB2 UDB 8.1

The only data type for binary strings in DB2 is BLOB, which can be up to 2 gigabytes long.

MS SQL Server 2000

MS SQL Server has three different data types for binary strings: BINARY, VARBINARY, and IMAGE.

  • BINARY is a fixed-length data type to store binary data. The size can be specified from 1 to 8,000; the actual storage volume is size + 4 bytes.

  • VARBINARY can hold variable-length binary data. The size is from 1 through 8,000. Storage size is the actual length of the data entered + 4 bytes. The data entered can be 0 bytes in length.

  • IMAGE is a variable-length binary data type that can hold from 0 through 2,147,483,647 bytes (two gigabytes) of data.

Binary string literals

MS SQL Server allows literals for binary string fields (BINARY, VARBINARY, IMAGE) either in the form of hexadecimal numbers prefixed with 0x or as binary strings. The value has to be unquoted:

0xAE  0101010010100110

MS SQL Server implicitly converts these literals into appropriate binary format. Oracle and DB2 don't have binary string literals; the values have to be converted into proper format using special functions (see Chapter 10).

Note 

Literals are barely needed for large objects that can store gigabytes of data. In most cases LOBs are not manipulated by traditional SQL statements, but rather accessed by special programs and interfaces that know how to handle such objects without reading them directly into memory.




SQL Bible
Microsoft SQL Server 2008 Bible
ISBN: 0470257040
EAN: 2147483647
Year: 2005
Pages: 208

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net