SQL: Structured Query Language

Conducting business in Japan is pretty easyonce you know the language. The same is true of SQL Server: It's pretty easy to manipulate and access data, once you know the language. In this case, the language is SQL, or Structured Query Language. Originally developed by IBM, SQL has since become a standard across the database industry. Well, kind of. As with America and England, Microsoft's SQL Server and Oracle's "Oracle" are two relational databases divided by a common language. The core parts of the SQL language are pretty consistent between vendors, but each supplier adds a lot of extra features and syntax variations designed by Edgar Codd wannabes.

This section describes those DDL and DML statements that will be most useful in our development of the Library program. You'll be glad to know that SQL isn't too picky about the formatting of the various statements. Upper- and lowercase distinctions are ignored; SELECT is the same as select is the same as SeLeCt. (Traditional SQL code is mostly uppercase. I use uppercase for all keywords, and mixed case for tables, fields, and other custom items. Whatever you choose, consistency is important.) Also, employ whitespace as you see fit. You can put statements on one gigantic line, or put every word on a separate line. The only time whitespace and case matters is in the actual data text strings; whatever you type, that's how it stays.

SQL statements normally end with a semicolon, but some tools do not require you to include the semicolon, and other tools require that you exclude it. When using the SQL Server visual client tools (Management Studio and Management Studio Express), semicolons are optional, but it's a good idea to include them when you are using multiple statements together, one after another. SQL statements used in Visual Basic code never include semicolons.

Later, when you look at a SQL script I wrote, you will see the word "GO" from time to time. In SQL Server, this command says, "For all of the other statements that appeared so far, go ahead and process them now."

DDL Statements

This may come as a shock to you, but before you can store any data in a table, you have to create the table. SQL has just the tool to do this: the CREATE TABLE statement. It's one of the many DDL statements. The basic syntax is pretty straightforward.

CREATE TABLE tableName (    fieldName1    dataType    options,    fieldName2    dataType    options,    and so on... )

Just fill in the parts, and you're ready to populate (data, that is). Table and field names are built from letters and digits; you can include spaces and some other special characters, but it makes for difficult coding later on. Each vendor has its own collection of data types; I'll stick with the SQL Server versions here. The options let you specify things such as whether the field requires data or not, whether it represents the table's primary key, and other similar constraints. Extensions to the syntax let you set up constraints that apply to the entire table, indexes (which let you sort or search a specific column more quickly), and data storage specifics.

Here's a sample CREATE TABLE statement that could be used for the table of order line items (refer to Table 4-4).

CREATE TABLE LineItems (    RecordID     bigint        IDENTITY PRIMARY KEY,    OrderID      bigint        NOT NULL,    CustomerID   varchar(20)   NOT NULL          REFERENCES Customers (CustomerID),    ProductID    varchar(20)   NOT NULL,    Quantity     smallint      NOT NULL )

The IDENTITY keyword lets SQL Server take charge of filling the RecordID field with data; it will use a sequential counter to supply a unique RecordID value with each new record. The PRIMARY KEY clause identifies the RecordID field as the unique identifying value for each record in the table. The bigint and smallint data types indicate appropriately sized integer fields, while the varchar type provides space for text, up to the maximum length specified in the parentheses (20 characters). The REFERENCES option clause identifies a relationship between this LineItems table and another table named Customers; values in the LineItems.CustomerID field match the key values from the Customers.CustomerID field. (Note the "dot" syntax to separate table and field names. It shows up everywhere in SQL.) References between tables are also known as foreign references.

If you need to make structure or option changes to a table or its fields after it is created, SQL includes an ALTER TABLE statement that can change almost everything in the table. Additionally, there is a related DROP TABLE statement used to get rid of a table and all of its data. You might want to avoid this statement on live production data as the users tend to get a bit irritable when their data suddenly disappears off the surface of the earth.

Table 4-5 summarizes the available data types used in SQL Server.

Table 4-5. SQL Server Data Types
Data Type	Description
bigint	An 8-byte (64-bit) integer field for values ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
binary	Fixed-length binary data, up to 8,000 bytes in length. You specify the length through a parameter, as in "binary(100)."
bit	Supports three possible values: 1, 0, or NULL. Generally used for Boolean values. Internally, SQL Server stores multiple bit fields from a single record in a merged integer field.
char, nchar	Fixed-length standard (char) or Unicode (nchar) strings, up to 8,000 characters in length. You specify the length through a parameter, as in "char(100)."
cursor	This data type is used within stored procedures, and cannot be used to create a column.
datetime	A general date and time field for dates ranging from January 1, 1753 AD to December 31, 9999 AD. Time accuracy for any given value is within 3.33 milliseconds.
decimal, numeric	A fixed-precision and scale decimal field. You specify the maximum number of digits to appear on both sides of the decimal point (the precision) and the maximum number of those digits that can appear on the right side of the decimal point (the scale). For instance, a setting of "decimal(10,4)" creates a field with up to ten total digits, four of which may appear after the decimal point. The maximum precision value is 38. "numeric" is a synonym for "decimal," as is "dec."
float	A floating-point decimal field with variable storage. You can specify the number of bits used to store the value, up to 53. By default, all 53 bits are used, so a setting of "float" is equivalent to "float(53)." The pseudo-data type "real" is equivalent to "float(24)." The values stored are on the order of ±1.0x10^±38; the exact range and precision varies by the bits used for storage. This data type is susceptible to minor calculation errors.
image, text, ntext	Don't use these data types as they will eventually be removed from SQL Server.
int	A 4-byte (32-bit) integer field for values ranging from -2,147,483,648 to 2,147,483,647.
money	An 8-byte (64-bit) high-accuracy field for storing currency values, with up to four digits after the decimal point. Stored data values range from -922,337,203,685,477.5808 to 922,337,203,685,477.5807.
smalldatetime	A general date and time field for dates ranging from January 1, 1900 AD to June 6, 2079 AD. Time accuracy for any given value is within 1 minute.
smallint	A 2-byte (16-bit) integer field for values ranging from -32,768 to 32,767.
smallmoney	A 4-byte (32-bit) high-accuracy field for storing currency values, with up to four digits after the decimal point. Stored data values range from -214,748.3648 to 214,748.3647.
sql_variant	A generic type that stores values from many other type-specific fields.
table	A special field that temporarily stores the results of a query in a compacted table format. Defining a table field is somewhat complex, and its use naturally carries with it certain restrictions.
timestamp	This data type is used to record modification times on records. There are restrictions on its use, and it is not guaranteed to be unique within a table.
tinyint	A 1-byte (8-bit) unsigned integer field for values ranging from 0 to 255.
uniqueidentifier	A 16-byte globally unique identifier (GUID). The related NEWID function generates values for this field.
varbinary	Variable-length binary data, up to 8,000 bytes in length. You specify the length through a parameter, as in "varbinary(100)." The field consumes space only for the actual content currently stored in the field. A special setting of "varbinary(max)" allows entry of up to about two billion bytes.
varchar, nvarchar	Variable-length standard (varchar) or Unicode (nvarchar) strings, up to 8,000 characters in length. You specify the length through a parameter, as in "varchar(100)." The field consumes space only for the actual content currently stored in the field. A special setting of "varchar(max)" allows entry of up to about two billion characters.
xml	Provides storage for XML data.

DML Statements

Although DDL statements are powerful, they aren't used that much. Once you create your database objects, there's not much call for tinkering. The DML (Data Manipulation Language) statements are more useful for everyday data surfing.

The INSERT statement adds data records to a table. Data is added to a table one record at a time. (A variation of INSERT lets you insert multiple records, but those records must come from another existing table source.) To use the INSERT statement, specify the destination table and fields, and then the individual values to put into each field. One data value corresponds to each specified data column name.

INSERT INTO LineItems   (OrderID, CustomerID, ProductID, Quantity)   VALUES (10002, 'BW3', 'BEV01COF', 1)

Assuming this statement goes with the CREATE TABLE statement written earlier, this insert action will add a new record to the LineItems table with five new fieldsfour specified fields, plus the primary key automatically added to the RecordID field (because it was marked as IDENTITY). SQL Server also does a variety of data integrity checks on your behalf. Each data field you add must be of the right data type, but you already expected that. Because we designed the CustomerID field to be a reference to the Customer table, the insert will fail if customer 'BW3' does not already exist in the Customer table.

Numeric literals can be included in your SQL statements as needed without any additional qualification. String literals are always surrounded by single quotes, as is done for the customer and product IDs in this INSERT statement. If you need to include single quotes in the literal, enter them twice.

'John O''Sullivan'

Surround literal date and time values with single quotes.

'7-Nov-2005'

Such date and time values accept any recognized format, although you should use a format that is not easy for SQL Server to misinterpret.

Many field types support an "unassigned" value, a value that indicates that the field contains no data at all. Such a value is known as the "null" value, and is specified in SQL Server using the "NULL" keyword. You cannot assign NULL to primary key fields, or to any field marked with the "NOT NULL" option.

To remove a previously added record, use the DELETE statement.

DELETE FROM LineItems WHERE RecordID = 92231

The DELETE statement includes a "WHERE" clause (the "WHERE RecordID = 92231" part). WHERE clauses let you indicate one or more records in a table by making comparisons with data fields. Your WHERE clauses can include AND and OR keywords to join multiple conditions, and parentheses for grouping.

DELETE FROM LineItems WHERE OrderID = 10001    AND ProductID = 'BRD05RYE'

Such a DELETE statement may delete zero, one, or 1,000 records, so precision in the WHERE clause is important. To delete all records in the table, exclude the WHERE clause altogether.

DELETE FROM LineItems

The UPDATE statement also uses a WHERE clause to modify values in existing table records.

UPDATE LineItems SET Quantity = 4    WHERE RecordID = 92231

Assignments are made to fields with the SET clause; put the field name (Quantity) on the left side of the equals sign, and the new value on the right (4). To assign multiple values at once, separate each assignment with a comma. You can also include formulas and calculations.

UPDATE LineItems SET Quantity = Quantity + 1,    ProductID = 'BEV02POP'    WHERE RecordID = 92231

As with the DELETE statement, the UPDATE statement may update zero, one, or many records based on which records match the WHERE clause.

The final DML statement, and the one most often used, is SELECT.

SELECT ProductID, Quantity FROM LineItems    WHERE RecordID = 92231

SELECT scans a table (LineItems), looking for all records matching a given criteria (RecordID = 92231), and returns a smaller table that contains just the indicated fields (ProductID and Quantity) for the matching records. The most basic query returns all rows and columns.

SELECT * FROM LineItems

This returns all records from the table in no particular order. The asterisk (*) means "include all fields."

The optional ORDER BY clause returns the results in a specific order.

SELECT * FROM LineItems    WHERE Quantity > 5    ORDER BY ProductID, Quantity DESC

This query returns all records that have a Quantity field value of more than five, and sorts the results first by the ProductID column (in ascending order), and then by the numeric quantity (in descending order, specified with DESC).

Aggregate functions and grouping features let you summarize results from the larger set of data. The following query documents the total ordered quantity for each product in the table.

SELECT ProductID, SUM(Quantity) FROM LineItems    GROUP BY ProductID

You can use joins to link the data from two or more distinct tables. The following query joins the LineItems and Customer tables on their matching CustomerID columns. This SELECT statement also demonstrates the use of table abbreviations (the LI and CU prefixes) added through the AS clauses; they aren't usually necessary, but they can help make a complex query more readable.

SELECT LI.OrderID, CU.CustomerName, LI.ProductID    FROM LineItems AS LI INNER JOIN Customer AS CU    ON LI.CustomerID = CU.CustomerID    ORDER BY LI.OrderID, CU.CustomerName

This table uses an "inner join," one of the five main types of joins, each of which returns different sets of records based on the relationship between the first (left) and second (right) table in the join.

Inner Join. Returns only those records where there is a match in the linked fields. This type of join uses the "INNER JOIN" keywords.
Left Outer Join. Returns every record from the left table, and only those records from the right table where there is a match in the linked fields. If a left table record doesn't have a match, it acts as if all the fields in the right table for that record contain NULL values. This type of join uses the "LEFT JOIN" keywords. One use might be to join the Product and LineItems tables. You could return a list of the full product name for all available products, plus the total quantity ordered for each one. By putting the Product table on the left of a left outer join, the query would return all product names, even if that product had never been ordered (and didn't appear in the LineItems table).
Right Outer Join. This works just like a left outer join, but all records from the right table are returned, and just the left table records that have a match. This type of join uses the "RIGHT JOIN" keywords.
Full Outer Join. Returns all records from the left and right tables, whether they have matches or not. When there is a match, it is reflected in the results. This type of join uses the "FULL JOIN" keywords.
Cross Join. Also called a Cartesian Join. Returns every possible combination of left and right records. This type of join uses the "CROSS JOIN" keywords.

Joining focuses on the relationship that two tables have. (This use of "relationship," by the way, is not the basis for the term "relational database.") Some tables exist in a "parent-child" relationship; one "parent" record has one or more dependent "child" records in another table. This is often true of orders; a single "order header" has multiple "line items." This type of relationship is known as one-to-many, because one record is tied to many records in the other table. And the relationship is unidirectional; a given child record does not tie to multiple parent records.

A one-to-one relationship ties a single record in one table to a single record in another table. It's pretty straightforward, and is often used to enhance the values found in the original record through a supplementary record in a second table.

In a many-to-many relationship, a single record in one table is associated with multiple records in a second table, and a single record in that second table is also associated with multiple records in the first table. A real-world example would be the relationship between teachers and students in a college setting. One teacher has multiple students in the classroom, but each student also has multiple teachers each semester. Practical implementations of many-to-many relationships actually require three tables: the two related tables, and a "go between" table that links them together. I will show you a sample of such a table in the "Project" section of this chapter.

Beyond Basic SQL

The sample statements I listed here scratch only the surface of the data manipulation possibilities available through SQL. But by now you should have noticed that SQL is remarkably English-like in syntax, much more than even Visual Basic. In fact, the original name for the languageSEQUELwas an acronym for "Structured English Query Language." As the SQL statements get more complex, they will look less and less like an eighth-grade essay, and more like random collections of English words.

The goal here is to introduce you to the basic structure of SQL statements. Most of the statements we will encounter in the Library Project will be no more complex than the samples included here. If you're hungry for more, the "Books Online" component installed with SQL Server (a separate download for the Express Edition) has some pretty good usage documentation. There are also several good books available on the ins and outs of SQL, including vendor-specific dialects.

DDL Statements

Table 4-5. SQL Server Data Types

DML Statements

Beyond Basic SQL