Using Data Manipulation Language to Retrieve and Modify Data

SQL performs many tasks, which we can categorize in two ways: manipulating data and modifying database objects. This chapter is mainly concerned with Data Manipulation Language (DML), the terminology that manages data by manipulating or acting upon it. You can think of DML as the data entry manager that performs the following tasks:

Retrieves data
Modifies and deletes existing data
Adds new data

The DML statements we'll review in this section are the SELECT, SELECT INTO, UPDATE SET, DELETE, and INSERT INTO statements. All the examples in this chapter ignore security restrictions and assume you have permission to retrieve, change, insert, and even delete data. (Chapter 17 covers security restrictions.)

Note

We'll use SQL Server's Query Analyzer to send SQL requests to one of the sample databases that come with SQL Server to process all our SQL statements. You can use any interface and database you like. The only stipulation is that you be somewhat familiar with the database's tables and fields and that system's unique requirements, especially in regard to delimiting data. For more information on delimiters, see the section on delimiters and concatenating later in this chapter.

Retrieving Data with a SELECT Statement

By far, the most frequently used SQL statement is the SELECT statement. It's easily the workhorse of the entire language because it retrieves data for viewing so that you can browse and even analyze data, depending on the format in which you present the data.

The SELECT statement combines with five clauses to specify and limit the data that's retrieved using the following syntax:

SELECT ALL | DISTINCT column1[As alias[, column2...]]| * FROM datasource [WHERE condition] [GROUP BY column1[, column2...]] [HAVING condition] [ORDER BY column1[, column2...] [ASC | DESC]]

The FROM clause is the only clause that's mandatory. The ALL predicate is the default and assumed if omitted, but not optional. You can include it for documentation purposes if you like. The following examples start with the simplest form of the statement and add clauses to restrict the data that's retrieved or how it's presented.

Tip

The asterisk character (*) used as an argument in the SELECT clause retrieves all the columns in the underlying datasource. Avoid using this character unless you truly need all the columns, because the more data you retrieve, the slower the query's performance.

The Simplest SELECT

In its simplest form, the SELECT statement retrieves all the columns from all the records in a table using just the mandatory FROM clause

SELECT * FROM datasource

For example, the following statement retrieves all the columns and records from the Employees table in the Northwind sample database that comes with SQL Server 2000:

SELECT * FROM Employees

The results are shown in Figure 18.1, although we can't show the complete records, since some of the columns scroll off the screen. (Notice that we chose Northwind in the database drop-down list.) You'll seldom retrieve an entire table's worth of data.

click to expand
Figure 18.1: Use SELECT to retrieve data.

Note

Throughout this chapter, we'll use the same general format conventions for SQL statements. First, all keywords will use uppercase. Second, most clauses appear on individual lines, which makes the statements more readable. SQL ignores line breaks. Third, some developers include a semicolon character (;) at the end of each statement. SQL Server doesn't require this, so we've omitted it from our examples. Check with your system's documentation to determine whether the semicolon is required. Some systems will accept multiple statements. When this is the case, you will definitely need the semicolon to separate the statements, but we won't be working with anything that complex in this chapter.

The AS Clause

You're not stuck using column names from the supporting database. Using an AS clause you can create a temporary name, or alias, for the column. The clause is optional, and when it is omitted, SQL uses the column's name. To modify a column name use the syntax

SELECT column1 [AS alias[, column2...]]

Figure 18.2 shows the concatenated first and last name entries in a column named EmployeeName:

click to expand
Figure 18.2: Use the AS clause to create an alias for any column.

SELECT EmployeeID, FirstName + ' ' LastName AS EmployeeName

If you want to include a space character in the alias, be sure to enclose the alias in brackets ([ ]) if required by your system. (We include more information on concatenation later in this chapter.)

Tip

If you're using ADO (ActiveX Data Objects) to carry on a conversation with the supporting database, you might also use the Field object. Furthermore, you might use the Field object's Name property to return that field's name. If you use the AS keyword to set an alias, the Name property will return the alias, not the field's actual name.

The FROM Clause

The FROM clause specifies the datasource from which the SELECT statement will retrieve data. This clause usually refers to one table or query, but can refer to multiple tables. When it does, use the following syntax:

FROM onetable jointype manytable ON onetable.primarykey = manytable.foreignkey

(For more information on the types of joins, see Chapter 5.)

Limit Data by Specifying Columns

The first statement (in "The Simplest SELECT" section) uses the asterisk (*) to retrieve all the data from the Employees table, but you'll seldom want to work with that much data at one time. The first step to limiting data is to limit the columns by identifying only the columns you need in the following form:

SELECT column1[, column2...] FROM datasource

When using this form, you must specify at least one column. If you include a list, separate each with a comma character (,). List the columns in the order you want them displayed in the resultset. The following statement returns the EmployeeID, LastName, and FirstName columns from the Employees table:

SELECT EmployeeID, LastName, FirstName FROM Employees

The results are shown in Figure 18.3.

click to expand
Figure 18.3: Limit the retrieved data by specifying columns.

SQL Predicates

By default, the SELECT statement returns all records because SQL assumes the ALL predicate (the default). The two following statements return the same records:

SELECT ALL * FROM Employees SELECT * FROM Employees

In contrast, the DISTINCT predicate limits the results to only unique values in the field that follows its use:

SELECT DISTINCT column1[, column2...]

This predicate returns a list of unique values in column1 based on the entire SELECT statement. Additional columns are considered, but the elimination of duplicate values takes precedence from left to right; so additional columns should have no effect on the values returned from column1. For instance, the following statement

SELECT DISTINCT City FROM Customers

returns a unique list of 69 cities from the Customers table in Northwind. The statement

SELECT DISTINCT City, Country FROM Customers

returns the same unique list of 69 cities, but also lists the country for each. The DISTINCT predicate eliminates records only if the combined values create a duplicate record.

This predicate presents a chicken or the egg scenario because the first column is already unique; so it's impossible to create a duplicate record. This doesn't seem important if you're really eliminating duplicates from left to right. However, it can produce unexpected results, depending on the order of the columns. For instance, the statement

SELECT DISTINCT Country FROM Customers

returns the unique list of 21 countries shown in Figure 18.4. If you include the City column in the recordset but add it to the right of Country instead of to the left

click to expand
Figure 18.4: The DISTINCT predicate returns a unique list of 21 countries.

SELECT DISTINCT Country, City FROM Customers

the unique list climbs to 69, as shown in Figure 18.5. As you can see, the leftmost column, Country, is not a unique list. This may or may not be what you want.

click to expand
Figure 18.5: The DISTINCT predicate returns unique records, not just values.

Warning

The DISTINCT predicate returns a recordset that cannot be updated. That means you can't modify the underlying data via the query results. For more information on updateable queries, see "Updateable Queries" later in this chapter.

Note

Access provides the DISTINCTROW predicate, which eliminates duplicates based on all the columns in the data- source. The DISTINCT predicate considers only the columns specified in the SELECT statement. You can use the asterisk character (*) with DISTINCTROW.

The TOP predicate is optional and returns the top n rows or top n percent of records, based on the SELECT clause. This predicate is useful when you want to return a subset of records that meet all the other criteria. SQL processes the TOP predicate only after it completes all other criteria-joins, predicates, grouping, and sorts.

The TOP predicate uses the form

TOP n [PERCENT]column1[,column2...]

and can be combined with other predicates in the form

SELECT [ALL | DISTINCT][TOP n [PERCENT]column1[,column2...]]

For example, let's use the TOP predicate to return the five most expensive items from the Northwind products list. To do so, run the following statement:

SELECT TOP 5 ProductID, ProductName, UnitPrice FROM Products ORDER BY UnitPrice DESC

The DESC keyword specifies a descending sort. As you can see in Figure 18.6, the query returns only five records-the five records with the largest values in the UnitPrice column.

click to expand
Figure 18.6: Use the TOP predicate to return a portion of records.

Similarly, you can use the ASC keyword to return the five least expensive items. To do so, run the following statement

SELECT TOP 5 ProductID, ProductName, UnitPrice FROM Products ORDER BY UnitPrice ASC

as we've done in Figure 18.7. By specifying ascending (ASC) order, we're telling the system to start with the lowest value and work up.

click to expand
Figure 18.7: The ASC keyword reverses the ORDER BY order.

Now let's return five percent-as opposed to just five records-of the most expensive products using the following statement:

SELECT TOP 5 PERCENT ProductID, ProductName, UnitPrice FROM Products ORDER BY UnitPrice DESC

As you can see in Figure 18.8, the results aren't the same. The first few records are the same, but instead of returning five actual records, the PERCENT version returns only four records. There are 77 product records, and 5 percent of 77 is 3.85. The TOP predicate always rounds up to the next highest integer, so we end up with four records.

click to expand
Figure 18.8: The TOP predicate also returns a percentage of records.

We jumped the gun a bit by including the ORDER BY clause, which we'll review later in this chapter. Most TOP queries simply don't make sense without the ORDER BY clause, although you can run them. Without an ORDER BY clause, SQL returns what may seem like a meaningless set of records. Figure 18.9 shows the result of our previous TOP 5 statement without the ORDER BY clause. In the absence of any sort, SQL returns the first five records in the table, which appear in entry order (as verified by the values in the Identity column, ProductID).

click to expand
Figure 18.9: Without the ORDER BY clause, the TOP predicate makes little sense.

If the TOP predicate finds duplicate records that meet the SELECT statements criteria, it returns both records and includes both in its count. For instance, if another product had a price of $81. 00, the query in Figure 18.6 would have bumped Carnarvon Tigers from the results and displayed both $81. 00 products.

If the last row in the result has an equivalent record, SQL returns both (or all) rows with equivalent values. This is another reason the ORDER BY clause is so important. Without this clause, SQL depends on all the columns in the SELECT clause to break a tie. Otherwise, SQL depends only on the columns in the ORDER BY clause. (This rule can vary from system to system, so check your system's documentation if you don't get the results you expect.)

Note

The SELECT statement is by far the most versatile of all the DML statements. For that reason, we'll introduce not just the SELECT clause, but the many clauses that work with SELECT that make it so flexible and powerful. However, these clauses aren't restricted to the SELECT statement. You can use them with many of the action clauses.

The WHERE Clause

Just as you can limit the columns that are retrieved, the WHERE clause limits the records (rows) that are retrieved by specifying a condition that a record must meet before SQL will include that record in its results. Use the clause in the following form:

SELECT [ALL | DISTINCT] column1[As alias[, column2...]]| * FROM datasource [WHERE condition]

The condition argument is stated as a conditional expression and can be as simple as a comparison to a literal value or a complex expression. Think of the WHERE clause as a filter that eliminates records from the final resultset.

Use the AND and OR operators to link expressions in the form:

WHERE expression1 AND | OR expression2 ...

The number of columns or expressions in this clause is limited by each system, so refer to your system's documentation for specifics. (Read more about SQL operators later in this chapter.)

Let's start with a simple example that compares the data to a literal value. The following statement returns only beverage products from the Northwind Products table:

SELECT * FROM Products WHERE CategoryID = 1

The results, shown in Figure 18.10, include only beverage products (the CategoryID value 1 indicates a beverage).

click to expand
Figure 18.10: Use the WHERE clause to filter records.

Adding criteria complicates the WHERE clause but gives you more control over the results. For example, let's suppose you want to return only beverages from a specific supplier, Exotic Liquids. The following statement uses an AND operator to include both conditions in one WHERE clause:

SELECT * FROM Products WHERE CategoryID = 1 AND SupplierID = 1

This time, the resulting recordset includes only two records, as shown in Figure 18.11. (The value 1 is Exotic Liquids' SupplierID value.)

click to expand
Figure 18.11: Use an AND or OR operator to include more than one condition in a WHERE clause.

The WHERE clause is flexible. You can refer to columns that aren't in the SELECT clause-as long as the column's in the underlying datasource. For instance, the previous statement includes all the columns, including the CategoryID column. You don't have to include the CategoryID column in the results. For example, the following statement

SELECT ProductName, SupplierID, UnitPrice FROM Products WHERE CategoryID = 1

return the results shown in Figure 18.12-the results correspond to the CategoryID value 1, although that column isn't included.

click to expand
Figure 18.12: The WHERE clause can handle column references that aren't included in the SELECT clause.

Tip

Nulls present a special problem when filtering records. Specifically, SQL tends to ignore Null values when filtering records. If a column contains a Null value, SQL usually eliminates it from the results. To include Null values, add the expression OR column Is Null to the WHERE clause. Your system may respond differently, so consult your system documentation for more information on Null values.

The ORDER BY Clause

The ORDER BY clause doesn't limit records; it only sorts them in ascending or descending order, with an ascending sort being the default. You can sort by text, numeric, and date/time columns. Referencing any other type of column returns an error. Similarly to the WHERE clause, the ORDER BY clause can reference a column that isn't in the SELECT clause.

The ORDER BY clause uses the following syntax:

ORDER BY column1 [ASC | DESC][,column2 [ASC | DESC][,...]]

Although that looks rather complicated, simply put, you specify the column or columns you want to sort by, in the order of preference from left to right. If you want an ascending sort, the ASC keyword isn't required, as it's the default. We'll start with a simple example that sorts customers by name:

SELECT * FROM Customers ORDER BY CompanyName

SQL performs an alphabetic, ascending sort, as shown in Figure 18.13, because CompanyName is a text column.

click to expand
Figure 18.13: The ORDER BY clause sorts a text column alphabetically.

Now let's try something more complicated. The following statement sorts by the ContactTitle first and then performs a second sort, within the first group (ContactTitle), on the ContactName column:

SELECT ContactTitle, ContactName, CompanyName FROM Customers ORDER BY ContactTitle, ContactName

Figure 18.14 shows the results. We've scrolled down just a bit so you can see both sort groups, Accounting Manager and Assistant Sales Agent, at work. Notice that the ContactName entries are alphabetically sorted within the Accounting Manager group until Marketing Assistant resets the sort for the ContactName entries.

click to expand
Figure 18.14: The ORDER BY clause can sort by more than one column.

Note

Some systems allow you to reference columns by position instead of name. For instance, the clause ORDER BY 3 performs an ascending sort on the third column in the datasource. We don't recommend it unless the column's position, within the context of the task, is more important than the column's name. Names are simply more explicit and less prone to errors; a number isn't as accurate or as self-documenting as a name.

The GROUP BY Clause

An Aggregate function evaluates an entire column, and an aggregate query works in a similar fashion. The GROUP BY clause defines groups that you might want to evaluate in some calculation as a whole. When this is the case, the result is a summary of the underlying data. The GROUP BY clause doesn't actually summarize data; it just groups the data. Any calculations for summarizing that data must be provided in the form of Aggregate functions.

You can group data without summarizing it; however, you can't summarize data without grouping it first. When summarizing, the group might not be apparent since you generally see only the results of the Aggregate function and not the data that the function evaluated.

The syntax for the GROUP BY clause is simple:

GROUP BY column1[, column2...]

The column arguments can reference actual columns by name, calculated fields (as long as the expression doesn't reference an Aggregate function), or constants.

All fields in the SELECT clause must also be present in one of the following positions:

An argument to an Aggregate function
In the GROUP BY clause

As a result of this restriction, every column in the resultset either defines a group or evaluates a group. On the other hand, a column needn't be part of the SELECT clause to be included in the GROUP BY clause, as long as that column is in the underlying datasource. Keep in mind that SQL displays only those columns specified in the SELECT clause.

The GROUP BY clause defines groups based on the columns, in the order you specify them in the clause (from left to right). The results are presented in ascending order.

Figure 18.15 shows the results of a simple grouping statement:

click to expand
Figure 18.15: A simple GROUP BY clause can generate results similar to a sort.

SELECT OrderID, OrderDate, ShippedDate FROM Orders GROUP BY OrderDate, ShippedDate, OrderID

It appears to simply sort the results by the OrderDate, then the ShippedDate, and then the OrderID value. As is, the resultset probably isn't useful and closely resembles the data entry order. (Check the Orders table to compare.)

When using the GROUP BY clause, you'll usually want to summarize at least one column. Otherwise, as you just saw, an ORDER BY clause might be a better solution because of GROUP BY's restrictions. It makes more sense to use GROUP BY when you want to evaluate a column in some way. For instance, the following statement groups the records by the OrderDate column and displays a total for the number of orders placed on that day.

SELECT OrderDate, Count(OrderID) AS Total FROM Orders GROUP BY OrderDate

Each unique date value is now a group, and instead of displaying each value, SQL displays just one along with the result of the Count() function, as shown in Figure 18.16. Also notice that the OrderID column isn't in the GROUP BY clause. As long as it's part of an Aggregate function in the SELECT clause, it isn't necessary to include it in the GROUP BY clause.

click to expand
Figure 18.16: Use GROUP BY to define a group you want to evaluate using an Aggregate function.

Note

Systems vary in the way they respond to GROUP BY. First, your system may or may not include Null values in the results. Second, your system probably won't group on a memo or image type column-there's nothing on which to base a group. Third, the number of GROUP BY fields you can include in one clause may be restricted by your specific system.

Limiting the Group's Results

You can limit the results of a grouped query in two ways:

By adding a WHERE clause to eliminate records you don't want grouped
By adding a HAVING clause to act as a filter on the group, eliminating records from the group

These two choices aren't interchangeable, although sometimes they return the same results. One eliminates records before the GROUP BY clause ever considers them, and one eliminates records after SQL applies the GROUP BY clause. (We'll discuss HAVING in the next section.)

Technically, when you add a WHERE clause to a GROUP BY statement, you're not eliminating records from each group. You're limiting the results of the SELECT clause and thereby limiting the records that get grouped by the GROUP BY clause. For instance, the previous query shown in Figure 18.16 doesn't really limit data; it just groups data. Each date in the OrderDate column appears in the results; it just appears once.

Let's limit the records that make it to the GROUP BY process using a WHERE clause to eliminate all orders that were placed before January 1, 1998. To do so, use the following statement:

SELECT OrderDate, Count(OrderID) AS Total FROM Orders WHERE OrderDate >'12-31-1997' GROUP BY OrderDate

Warning

Don't forget to add the appropriate date delimiter if you're not using SQL Server. For instance, Access users would use the statement WHERE OrderDate >#12-31-1997#.Figure 18.17 shows the results, which are still grouped. However, the results no longer display dates prior to 1998.

click to expand
Figure 18.17: The WHERE clause eliminates records before they're grouped by the GROUP BY clause.

The HAVING Clause

Using SQL's HAVING clause is another way to limit the results of a group. Just remember that the HAVING clause is applied after the data is grouped. This is especially useful when you want to filter records based on a summarized evaluation for each group. This clause uses the syntax

SELECT ALL | DISTINCT column1[As alias[, column2...]]| * FROM datasource [GROUP BY column1[, column2...]] [HAVING condition]

in which condition can be one or more expressions combined by the AND or OR operators in the form:

HAVING expression1 AND | OR expression2 ...

We can illustrate the HAVING clause by returning to an earlier example that returned order dates and the number of each placed on each date. Now, let's suppose you want to eliminate all the date groups that have two or fewer orders. To do so, add a HAVING clause as follows:

SELECT OrderDate, Count(OrderID) AS Total FROM Orders GROUP BY OrderDate HAVING Count(OrderID) > 2

The results shown in Figure 18.18 are grouped just as before but include only those dates on which three or more orders were placed.

click to expand
Figure 18.18: The HAVING clause eliminates dates that had two or fewer orders to show for the day.

The previous two sections show different ways to limit the results of a grouped recordset at different times during processing. The WHERE clause eliminates records before they're grouped, and the HAVING clause eliminates records after grouping. It's possible to include both limiting clauses in the same statement.

Our last example used the HAVING clause to eliminate dates with fewer than three orders for the day, but the results still had those pre-1998 dates. We can have both. Figure 18.19 shows the results of the following statement, which includes only dates with three or more orders for orders occurring on or after January 1, 1998:

SELECT OrderDate, Count(OrderID) AS Total FROM Orders WHERE OrderDate > '12-31-1997' GROUP BY OrderDate HAVING Count(OrderID) > 2

click to expand
Figure 18.19: You can combine the WHERE and HAVING clauses.

Modify Data with UPDATE

Websites are no longer confined to just displaying data. Users can actually modify data via your web- site. For instance, a human resources manager might update employee files via an intranet that all managers can access. Thanks to SQL, the human resources manager doesn't need to know a thing about the underlying database-they simply call up the appropriate employee record using a web page designed by the web developer. After making the appropriate changes, they submit those changes by simply clicking a button designed for that purpose, and SQL updates the underlying database behind the scenes.

You can use SQL's UPDATE statement to modify existing data, whether you're updating one record or thousands. SQL UPDATE uses the following syntax:

UPDATE datasource SET column1 = expression1[, column2 = expression2][,...]  [WHERE condition]

The datasource can be a table or an updateable query-a query that refers to a resultset that passes along changes to the underlying datasource. (There's a section on updateable queries later in this chapter.) The SET expression arguments can be constants or expressions.

Warning

We recommend you work with a copy of any table you attempt to modify using the examples in the sections on updating and deleting. Use the SELECT INTO statement discussed in a later section of this chapter to create a copy. (Be sure to give the copy a unique name.)For instance, let's change each occurrence of the title Sales Representative in the Employees table to Account Executive using the following UPDATE statement:

UPDATE Employees SET Title = 'Account Executive' WHERE Title = 'Sales Representative'

The UPDATE statement doesn't return a resultset, but your system may return a message identifying how many records were updated, as shown in Figure 18.20. If you like, run the statement

click to expand
Figure 18.20: Use UPDATE to change existing entries.

SELECT * FROM Employees

to confirm the update, which we've done in Figure 18.21.

click to expand
Figure 18.21: We changed the Title entries from Sales Representative to Account Executive.

Note

Modifying, inserting, and deleting data can violate referential integrity rules, and requests can return errors, unless you plan for the possibility in your design.

The UPDATE statement isn't quite as versatile as the SELECT statement, but it doesn't need to be. Much of what UPDATE can't do makes sense:

Your system probably can't handle an UPDATE statement that includes a GROUP BY clause. It doesn't really make sense to update just one record in a group of records, and that would be the result; so most systems simply don't support such a request.
UPDATE doesn't return a resultset, so you won't need ORDER BY.
Your system may not support subqueries in the SET clause. (We'll cover subqueries later in this chapter.)
Your system may not support joins in the UPDATE clause, but most do.

Tip

Before updating critical data, it's a good idea to create a copy of the table you're modifying.

Remove Data with DELETE

SQL provides a DELETE statement, which most systems support, even though you might choose never to use it. Some developers prefer to archive data rather than destroy it, but that's a decision you'll need to make with the database administrator. Perhaps the most important issue to hurdle, in regard to DELETE, is to remember that the statement deletes an entire record; to delete specific entries, use the UPDATE statement in the form:

UPDATE table SET column = Null

To delete entire records, use the DELETE statement in the form:

DELETE FROM datasource [WHERE condition]

You can specify a field in the DELETE clause, but doing so is useless since you can't delete data from a specific field. Nor do you need to include the asterisk character (*) to indicate that you're deleting all the columns-in fact, SQL won't accept the asterisk character in the DELETE clause.

The simplest form of this statement deletes all the records from a table:

DELETE FROM datasource

We recommend that you back up the table before running such a command.

Let's look at a simple example that deletes all the discontinued items from the Northwind Products table. First, a quick review of the Products table shows that Chef Anton's Gumbo Mix (ProductID of 5) has been discontinued-you can tell by the value 1 in that record's Discontinued column. In this context, the value 1 represents a True value, meaning the product has been discontinued. Next, run the following statement to delete that item and any others that have been discontinued:

DELETE FROM Products WHERE Discontinued = 1

Figure 18.22 lets us know that eight rows were deleted.

click to expand
Figure 18.22: The DELETE statement removes eight records from the Products table.

To browse the Products table, run the following SELECT statement:

SELECT ProductID, ProductName, Discontinued FROM Products WHERE Discontinued = 1

However, as you can see in Figure 18.23, there are no records to browse.

click to expand
Figure 18.23: No records in the Products table have been discontinued.

Add a New Table Using SELECT INTO

Copying a table is a fairly straightforward process with SQL's SELECT INTO statement. It's similar to a SELECT statement in that you specify columns and a datasource, and the statement supports the WHERE and the ORDER BY clauses. However, you also specify the name of a new table.

In a nutshell, you have a source table, which contains data you want to copy. You specify the columns you want to copy and include clauses to limit the actual data copied and the order in which it's copied using the syntax:

SELECT [column1, column2, ... | *] INTO newtable FROM sourcetable [WHERE condition] [GROUP BY column1[, column2,...]] [HAVING condition] [ORDER BY column1[, column2, ...]]

In its simplest form, you can copy an entire table using the form:

SELECT * INTO newtable FROM sourcetable

For instance, let's create a copy of the Northwind Products table using the statement:

SELECT * INTO ProductsCopy FROM Products

You can browse the new table using the statement

SELECT * FROM ProductsCopy

as shown in Figure 18.24.

click to expand
Figure 18.24: After creating a new products table, run a SELECT statement to view the records.

Or, to limit the product records copied to the new table, you might use a statement similar to the following:

SELECT * INTO ProductsCopy FROM Products WHERE ProductID = 1

This time, we copied only one record to the newly created table ProductsCopy-the one record in Products with a ProductID value of 1. Use the simple SELECT statement from Figure 18.24 to see the results of the latest SELECT * INTO. The resulting recordset, shown in Figure 18.25, contains only one record. Similarly, you can use the GROUP BY, SORT BY, and HAVING clauses to further define the copied recordset.

click to expand
Figure 18.25: We copied just one product to the new table.

To copy a table's structure, but not its data, use the form:

SELECT * INTO newtable FROM sourcetable WHERE False

SQL copies the table, but because no record can equal False, no data is copied to the new table.

Note

Your system probably won't make an exact duplicate of the source table. In particular, SELECT INTO may not set a primary key for the new table, even if the source table has one. In addition, check the new table's indexes and properties-your system may not set anything other than the defaults.

Warning

The SELECT INTO clause sometimes replaces an existing table if one exists when you execute the SELECT INTO request. Fortunately, most systems will warn you first. But that may not be enough if your system deletes the table even if you cancel the request (Access does). We recommend that you always back up any table with the same name when creating a new table using SELECT INTO.

Add New Data Using INSERT INTO

The INSERT INTO statement does double duty. It copies data from one table to another, and it inserts new data. As such, you probably won't use this statement for routine data entry. However, it can be efficient when copying data from one source to another, whether you're copying just one record or many.

To copy records into a target table, use the statement's simplest form:

INSERT INTO target SELECT source  [WHERE condition]  [ORDER BY column1[, column2, ...]]

In this statement, target is the table you're inserting data into, and source is the existing data, most likely another table. This statement copies all the records from the source table into the target table.

Warning

Your system may not support the asterisk character (*) in the INSERT INTO clause.This statement can be problematic if you're not aware of some of its requirements:

The target table must exist. This statement won't create a new table. (Use SELECT INTO if you need to also create the table with the same statement.)
You can't insert columns (data) that don't already exist in the target table. The table structures don't have to match; they just have to share the columns you're inserting.

Now, let's suppose you want to combine the employee and customer records and you run the following statement in that attempt:

INSERT INTO Customers SELECT Employees.* FROM Employees

SQL returns the error message shown in Figure 18.26-the structures must match. The error message gives us the necessary clue that the column names don't match. The error message won't tell you what columns are missing; you'll have to run a SELECT query using the asterisk character (*) to view the column names.

click to expand
Figure 18.26: SQL returns an error message if the table structures don't match.

Now, let's copy all the records from the Shippers table into the Shippers table. Yes, you read that correctly. Ordinarily, you wouldn't want to duplicate records in a table this way. However, we need an example that actually works; that is, we need a target table with the same structure as the source, and the Northwind tables are all unique. So copying into the same table is one way to accomplish that. (For the most part, you probably won't find any database with two or more tables that share the same structure; it just isn't efficient. You can learn more about table design in Chapter 5.)

First, run the statement

INSERT INTO Shippers SELECT Shippers.CompanyName, Shippers.Phone  FROM Shippers

as shown in Figure 18.27.

click to expand
Figure 18.27: Insert all the shipper records into the Shippers table.

Notice that we didn't use the asterisk character (*) in this statement. Doing so would return an error message because the primary key column contains duplicate values, which isn't allowed. Therefore, we explicitly specified all the columns except the primary key column.

Afterward, run the following statement to review those records, shown in Figure 18.28.

SELECT * FROM Shippers

click to expand
Figure 18.28: We inserted three new records in the Shippers table.

The second form of this statement inserts a single row into a table:

INSERT INTO target [column1[, column2, ...]] VALUES (value1[, value2, ...])

You must remember a few rules when using this form:

The column references in the INSERT INTO clause are optional. When omitted, you must include a value for each column in the target table. This particular instance is impossible to fulfill if the target table has an autonumber type column because you can't insert a value into such a column. (Such columns are system generated.)
The arguments in the VALUES clause must match the order of their corresponding target columns in the target table.
If you include column references, you must include values for the referenced columns, and those values must be in the same order as the referenced columns in the INSERT INTO clause.
The order of the columns does not have to match the order in which they occur in the target table. This last behavior is convenient when you want the column order to be different from that of the source table.

Now let's add a new shipper to the Shippers table using the following statement:

INSERT INTO Shippers (Companyname, Phone) VALUES ('United Shipping', '(555) 555-1234')

SQL lets us know that one row was added, as shown in Figure 18.29. Now, run the following statement to see the new shipper record shown in Figure 18.30:

SELECT * FROM Shippers

click to expand
Figure 18.29: Use INSERT INTO to add a new record to a table.

click to expand
Figure 18.30: We added a new shipper record to the Shippers table.

Keep in mind that any data you insert is still subject to the table's underlying properties, such as validation rules, constraints, properties, datatypes, and so on. If you attempt to insert data that doesn't comply with all the table's rules, SQL will return an error message and reject the data.

Use TRANSFORM to Create a Crosstab Query

Crosstab queries summarize data by categories, using an Aggregate function of some type. Their usefulness lies in the amount of data they can display because of their specialized format-there's always a least one column heading, and one summary column. Use SQL's TRANSFORM statement in the following form to generate a crosstab query:

TRANSFORM aggregate SELECT statement PIVOT column

The SELECT clause argument is any valid SELECT statement with a GROUP BY clause and specifies the row heading. The PIVOT clause identifies the column headings. In addition, the TRANSFORM statement doesn't support the HAVING clause.

Note

Not all systems support the TRANSFORM statement. SQL Server doesn't, so we'll use Access to execute our crosstab examples.

Perhaps the easiest way to create a crosstab is to start with a grouped SELECT statement. Then, insert a TRANSFORM clause before the SELECT statement and move the Aggregate function from the SELECT clause to the TRANSFORM clause. Then, add a PIVOT clause that uses one of the GROUP BY columns. Delete the PIVOT column from the GROUP BY clause.

We can illustrate this method using Access to generate the following GROUP BY query to return the number of items per each order grouped by the date:

SELECT OrderDate, CustomerID, Count(OrderID) AS ItemsPerOrder FROM Orders GROUP BY OrderDate, CustomerID

The grouped results are shown in Figure 18.31.

click to expand
Figure 18.31: We'll convert this Access grouped query to a crosstab.

Note

The CustomerID field's Caption property is "Customers." Therefore, throughout these examples, you will see Customers used instead of CustomerID, the actual field's name.

Once you have a valid grouped query, a crosstab is within easy reach. First, slip in a TRANSFORMclause before the SELECT clause, and then move the Aggregate function from the SELECT clause to the new TRANSFORM clause:

TRANSFORM Count(OrderID) AS ItemsPerOrder SELECT OrderDate, CustomerID FROM Orders GROUP BY OrderDate,CustomerID

Next, add a PIVOT clause to the end of the statement, and move at least one of the GROUP BY columns to this new clause. Knowing just which column to move may be difficult until you get the hang of this technique. Move the column that most accurately represents the crosstab's column headings. In our example, CustomerID will be the column heading, so move that column to the PIVOT clause:

TRANSFORM Count(OrderID) AS ItemsPerOrder SELECT OrderDate, CustomerID FROM Orders GROUP BY OrderDate PIVOT CustomerID

The resulting crosstab report shown in Figure 18.32 displays the number of items ordered by each customer on a particular date. In this case, SQL ignores the AS clause we tacked on to the counting field. That's because SQL displays the CustomerID value as each column's heading. You could easily omit the AS clause.

click to expand
Figure 18.32: Use TRANSFORM and PIVOT to convert a GROUP BY query to a crosstab.

Customer is the Caption property for the CustomerID field, and it takes precedence over the field name. It's also a lookup field, so it displays the corresponding Customer Name instead of the customer's primary key value. This is an Access feature and is built into the Northwind example database.

Subqueries

The more complex the question, the more complex the query, and sometimes just one query won't get the job done. That's when you need what's known as a subquery or subselect.

A subquery combines two queries by embedding one SELECT statement within another. The embedded query becomes part of the main query's conditional search, behaving similar to a filter. The embedded query is known as an inner select, and the main query is called an outer select. To create a subquery, use one of the following WHERE forms to define the inner select:

WHERE value or expression [NOT] IN (SELECT statement) WHERE column ANY | SOME | ALL (SELECT statement) WHERE [NOT] EXISTS (SELECT statement) WHERE column comparison operator (SELECT statement)

Subqueries are often the solution of choice when displaying extraneous data in a GROUP BYquery. That's because you can't include such data in a grouped query. Every column must be part of an aggregate or one of the GROUP BY clause's arguments. In either case, the addition of the column will change the group dynamics. An alternate solution is a nested query.

Note

A nested query is simply one query based upon another. We don't recommend you convert nested queries into subqueries unless you have a specific reason for doing so. As a general rule, a subquery won't perform any better than its nested counterparts.

For example, let's suppose you want to display the number of items per order, but you also want to identify the employee responsible for each order. The following GROUP BY query will return the date for the latest order:

SELECT Max(OrderDate) AS Date, EmployeeID FROM Orders GROUP BY EmployeeID

We're able to add EmployeeID because that column's part of the GROUP BY clause. Remember, in this type of query, a column must be part of an Aggregate function or in the GROUP BY clause. This query displays the results shown in Figure 18.33. The problem is, there's no way to identify the order or any order details. You also don't know how many orders were placed by each employee on that day-you only know that that was the last day each employee placed an order.

click to expand
Figure 18.33: This simple query returns the latest order date for each employee.

One way to view the information you want to see about each order is to run the following subquery:

SELECT OrderID, OrderDate, EmployeeID FROM Orders WHERE OrderDate IN (SELECT Max(OrderDate) AS Date FROM Orders GROUP BY EmployeeID) ORDER BY EmployeeID

The inner select, which is the same as the previous query, except for the omission of the EmployeeID reference in the SELECT clause, returns the same list of dates shown in Figure 18.33.

However, this query also displays the OrderID value with each record, as shown in Figure 18.34. The outer query returns data from the specified fields in the SELECT clause where the OrderDate value occurs in the result of the inner select. As you can see, the results are different from the previous query-there's a record for each order.

click to expand
Figure 18.34: The subquery can handle the OrderID column.

Note

Notice that column references include a table reference. That's because we're using a join, and the only way to avoid ambiguous reference errors is to explicitly declare the source. Throughout this chapter, we reference only the column to keep the examples as simple as possible. However, most developers include the source's entire reference in the form owner.table.column for all references, even if there's no join. Chapter 3 discusses good coding practices, like this one.

Updateable Queries

Just because you can return data doesn't mean you can modify it, because not every query is updateable. A query must be updateable in order to update the underlying data as you make changes via the query. The following is a list of general rules you can apply to create an updateable query:

Any query based on a single table is updateable if:
- The query contains no Aggregate functions.
- The query doesn't refer to another query that contains an Aggregate function.
- The query doesn't have a GROUP BY clause.
Any query based on more than one table with a one-to-one relationship is updateable (if it also meets the previous three conditions).
Any query based on a one-to-many relationship should allow you to update data on the many side as long as doing so does not violate referential integrity (and the three previous conditions are met).

Keep in mind that an updateable query doesn't ensure success. Some other factors are involved:

You must have permission to modify the underlying datasource.
The table can't be read-only.
The record you're trying to update must not be locked by another user.

In addition, a few queries aren't updateable:

Crosstab
SQL pass-through
UNION
Queries based on three or more tables based on a many-to-one-to-many relationship.

SQL pass-through queries bypass Jet, thereby allowing you to work directly with foreign data without interpretation by (or interference from, as the case may be) Jet. The UNION operator joins records from two similar tables. A crosstab query summarizes data by categories, using an Aggregate function of some type. We'll review the UNION operator later in this chapter. The other two query types are beyond the scope of this chapter. Check your documentation to see if your system supports these query types.

Note

As we've said throughout this chapter, all systems are unique. Please check your system's documentation in regard to updateable queries.