Retrieve and Filter Data Using Transact-SQL | MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition

Retrieve and filter data using Transact-SQL.

SQL Server is a relational database management system. This section focuses on the "Relational" part. It covers how tables relate to one another. In Chapter 2, you covered how Primary Keys and Foreign Keys are used in a database model, and how they impact storing data. This chapter shows you how to retrieve data from a relational database. This involves using the various join types: INNER , RIGHT , LEFT , and CROSS . You're also going to learn about using a technique called derived tables to simplify query writing.

EXAM TIP

Joins and Derived Tables Joins and derived tables will figure prominently in your exam. Joins are the backbone of relational databases; they actually put the "relation" in relational databases. They're used in all the main SQL statements ( SELECT , INSERT , UPDATE , and DELETE ). They're very important. Also, derived tables tend to be overlooked, and they're perceived as complicated, even though they're not, so they're also likely to show up.

A common data model is used throughout this chapter for the examples. This is to make life a bit easier for you. Here are the table layouts and some sample data:

 CREATE TABLE Person (         PersonID int IDENTITY(1,1) NOT NULL,         FirstName varchar(50) NULL,         LastName varchar(50) NOT NULL, ) CREATE TABLE PersonAddress(         PersonID int NOT NULL,         AddressID int NOT NULL ) CREATE TABLE Address (         AddressID int IDENTITY(1,1) NOT NULL,         StreetAddress varchar(250) NOT NULL,         City varchar(50) NOT NULL,         State varchar(50) NOT NULL,         ZipCode char(5) NULL ) CREATE TABLE Sales (         PersonID int,         ProductID int,         QtyPurchased int,         DatePurchased datetime ) CREATE TABLE Product (         ProductID int NOT NULL,         ProductDescription varchar(15) NOT NULL )

This model is for a many-to-many relationship between the Person table and the Address table. This means that one person can have several addresses, and several people can live at the same address. In addition, there is also a one-to-many relationship between Person and Sales, which means one person may have made several purchases, which are recorded in the Sales table. You may want to go ahead and build this structure in its own database on the server on which you're doing the examples.

Now insert some data into the sample schema and try it again. Some SELECT COUNT statements are at the bottom of the script to help you make sure you've got the correct number of rows in each table.

 INSERT INTO Person VALUES ('Danny', 'Jones') INSERT INTO Person VALUES ('Melissa', 'Jones') INSERT INTO Person VALUES ('Scott', 'Smith') INSERT INTO Person VALUES ('Alex', 'Riley') INSERT INTO Person VALUES ('Chris', 'Avery') INSERT INTO Person VALUES ('Jennifer', 'Avery') INSERT INTO Person VALUES ('Bryan', 'Decker') INSERT INTO Person VALUES ('Robin', 'Decker') INSERT INTO Person VALUES ('Shelly', 'Alexander') INSERT INTO Address VALUES ('1213 NW 97th Ct', 'SQL Town', 'MO', '64131') INSERT INTO Address VALUES ('2721 SW 42nd Terr', 'Server City', 'KS', '66212') INSERT INTO Address VALUES ('1939 Overland St', 'Dell Village', 'KS', '66213') INSERT INTO Address VALUES ('9391 Nall Ave', 'Parrot Township', 'MO', '64331') INSERT INTO Address VALUES ('7737 Miner Dr', 'SQL Town', 'MO', '64132') INSERT INTO Address VALUES ('5334 Shamrock Ln', 'Orange', 'KS', '66441') INSERT INTO PersonAddress VALUES (1, 1) INSERT INTO PersonAddress VALUES (2, 1) INSERT INTO PersonAddress VALUES (3, 2) INSERT INTO PersonAddress VALUES (4, 3) INSERT INTO PersonAddress VALUES (5, 4) INSERT INTO PersonAddress VALUES (6, 4) INSERT INTO PersonAddress VALUES (7, 5) INSERT INTO PersonAddress VALUES (8, 5) INSERT INTO PersonAddress VALUES (9, 6) INSERT INTO Sales VALUES (1, 37, 4, getdate()) INSERT INTO Sales VALUES (1, 38, 3, getdate()) INSERT INTO Sales VALUES (3, 39, 1, getdate()) INSERT INTO Sales VALUES (4, 51, 1, getdate()) INSERT INTO Sales VALUES (4, 47, 1, getdate()) INSERT INTO Sales VALUES (9, 37, 10, getdate()) INSERT INTO Sales VALUES (9, 38, 5, getdate()) INSERT INTO Sales VALUES (10, 41, 6, getdate()) INSERT INTO Product VALUES (37, 'Widget') INSERT INTO Product VALUES (38, 'Grommet') INSERT INTO Product VALUES (39, 'Spackle') INSERT INTO Product VALUES (51, 'Sparkle Dust') INSERT INTO Product VALUES (47, 'Shoe Polish') INSERT INTO Product VALUES (38, 'Varnish') INSERT INTO Product VALUES (41, 'Lava') go SELECT COUNT(*) FROM person --9 rows SELECT COUNT(*) FROM address --6 rows SELECT COUNT(*) FROM PersonAddress --9 rows SELECT COUNT(*) FROM Sales --8 rows SELECT COUNT(*) FROM Product  --7 rows

That's a lot of typing, but you'll end up using that data all the way through the chapter. Notice that several tables have data in them. You're probably wondering how to make them all work together. So now it's time to talk about join mechanics.

Join Mechanics

Whenever you query data from two tables, you need to find some way to relate the two tables together. If you need to select out the name and address of everyone in the example database, how would you go about doing it? You'd want to write a query that would relate the Person table to the Address table through the PersonAddress table.

Whenever you want to use a SQL statement to relate one table to another, it's called a join . In this case, you need two joins: one from the Person table to the PersonAddress table, and another one from the PersonAddress table to the Address table. Whenever you want to see a resultset that includes columns from several tables, you need to use a join.

There are three basic join types. An inner join shows results only where there are matches between the elements. In other words, if you query the database and want to see only the people who have addresses, you use an inner join. An inner join leaves out all the records that don't have a match.

An outer join can show all the records from one side of the relationship, records that match where they are available, and NULL values for records that do not have a match. An outer join between the Person and Sales tables can show you each person and the amount of purchases they've made, and show NULL values for people who haven't made purchases yet, but are in the Person table. An outer join shows all the same records as an inner join, plus all the records that don't match.

The final type of join is a cross join . A cross join returns all possible combinations of rows between the two sides of the join. The number of records in the resultset is equal to the number of records on one side of the join multiplied by the number of records on the other side of the join. No correlation is attempted between the two records; all the records from both sides are returned. Performing a cross join on two large tables is probably not a good idea. The number of rows in the return set, or in intermediate sets used by SQL Server, can get out of hand quickly, causing server-wide performance degradation.

IN THE FIELD: HOW MANY RECORDS?

If you have 100 records in the Person table, and 100 records in the Sales table, and then perform a cross join between the two tables, you'd end up with 10,000 records in your resultset (100x100 = 10,000).

Scale that up to a 50,000 record table and a 100 record table (5 million records) and you'll see some typical bad news that can result from a cross join. If you figure that each of those returned records contains about 300 bytes, you can end up with a resultset that weighs in at 1.5 gigabytes, which would probably overwhelm any client computer at which you aimed it.

NOTE

Order of Operations Before some purist out there starts screaming and writing inflammatory post-publishing e- mails , a bit of clarification : Physically, SQL Server performs operations such as joining tables and filtering data in a WHERE clause in whatever order it decides is fastest . Logically, however, you can feel free to think of it doing things in join, then WHERE order. SQL Server doesn't re-order how it does things in such a way that will violate that rule.

In the grand scheme of SQL, joins are processed first. In other words, each join type description says, "will return all the records from one side...", which means the join returns all those records. What you put in the WHERE clause is applied after the joins are processed. So, bear in mind that when a join returns a specified set of records, the SQL statement may or may not return all those records, depending on what you've specified in the WHERE clause.

With all of that out of the way, let's look at each of the different join operators, starting with INNER JOIN .

Using INNER JOIN

The INNER JOIN statement is the easiest and most often used join statement. It's very simple; it relates one table to another table; and it returns a rowset where the two tables match up. For example:

 SELECT * FROM Person         INNER JOIN Sales                 on Person.PersonID = Sales.PersonID

What's that going to do? It returns all the rows where there is a person who has made a purchase. The columns returned are all the columns from both tables, with the columns from the Person table first.

The resultset is wider than you probably want:

[View full width]

  [View full width] 
 PersonID     FirstName     LastName     PersonID     ProductID     QtyPurchased  DatePurchased 1            Danny         Jones        1            37            4  2001-07-22 16:50 1            Danny         Jones        1            38            3  2001-07-22 16:50 3            Scott         Smith        3            39            1  2001-07-22 16:50 4            Alex          Riley        4            51            1  2001-07-22 16:50 4            Alex          Riley        4            47            1  2001-07-22 16:50 9            Shelly        Alexander    9            37            10  2001-07-22 16:50 9            Shelly        Alexander    9            38            5  2001-07-22 16:50

That's probably not what you want in a resultset. You'll get a fairly useful resultset if you write the SELECT statement like this, specifying the column names , instead:

 SELECT FirstName, LastName, ProductID, QtyPurchased FROM Person        INNER JOIN Sales               ON Person.PersonID = Sales.PersonID

This query shows the people in the database to whom you have sold things. It doesn't show all the people in the database, because not everyone has bought something. It also shows some people more than once, because they bought several things.

Now look more closely at the statement after the keyword FROM . First, data is selected from the Person table, and followed by an INNER JOIN to the Sales table. The ON clause then tells SQL Server how the tables are related . That ON clause says to return rows where the PersonID column in the two tables is the same. Notice that the column name is specified as Person.PersonID . Both tables have a PersonID column, so SQL Server gives you an error if you don't specify which table the column comes from. SQL Server also gives you an error if you write this:

 SELECT PersonID, FirstName, LastName, ProductID, QtyPurchased FROM Person        INNER JOIN Sales               ON Person.PersonID = Sales.PersonID

This results in an error of Ambiguous column name 'PersonID' . Once again, because the PersonID column appears more than once in the set of joined tables, you need to specify which PersonID you want, even though they are both going to have the same value after the join.

The code sample is also representative of one way to indent and style the SQL. It's valid to put the entire SQL statement on one line, but it's also valid to wear dark socks with sandals, striped shorts, and a plaid shirt. Use a consistent style when you are writing queries; it makes them easier for you to read and for others to read.

That's all well and good, but how can you get a list of the addresses for all the people to whom you've sold things? That goes something like this:

 SELECT Address.* FROM Person        INNER JOIN Sales               ON Person.PersonID = Sales.PersonID        INNER JOIN PersonAddress               ON PersonAddress.PersonID = Person.PersonID        INNER JOIN Address               ON PersonAddress.AddressID = Address.AddressID

The Address.* in that statement specifies that you want to return all the fields in the Address table, but you don't want the fields in the rest of the tables. This is a convenient shortcut. Here's another neat way to reduce typing:

 SELECT A.* FROM Person P        INNER JOIN Sales S               ON P.PersonID = S.PersonID        INNER JOIN PersonAddress PA               ON PA.PersonID = P.PersonID        INNER JOIN Address A               ON PA.AddressID = A.AddressID

This is an example of using table aliases. When you're doing complex joins, and you don't want to type the table name over and over again, you can alias the table to a different name. In this case, the Person table is aliased to P . After doing that, the only way you can access column names in the Person table is by using the P alias. You can't mix aliased and non-aliased names, so this does not work:

 SELECT A.* FROM Person P        INNER JOIN Sales S               ON P.PersonID = S.PersonID        INNER JOIN PersonAddress PA               ON PA.PersonID = P.PersonID        INNER JOIN Address A               ON PA.AddressID = A.AddressID

If you attempt to run that query, you get an error that says, the column prefix 'Person' does not match with a table name or alias name used in the query. That's because you aliased the table name to something else, so SQL Server can't use the real name for it anymore; it can use only the alias. Another form of table aliasing syntax is

 SELECT A.* FROM Person AS P        INNER JOIN Sales AS S               ON P.PersonID = S.PersonID        INNER JOIN PersonAddress AS PA               ON PA.PersonID = P.PersonID        INNER JOIN Address AS A                ON PA.AddressID = A.AddressID

The AS keyword is always optional, and if the point is to type less by using aliases, it should probably be left out.

NOTE

Using Aliases Effectively You should use aliases that somehow abbreviate the table name in some consistent and standard way. For example, don't alias the first table as A, the second table as B, and so on, because you'll spend more time figuring out what the aliases mean than it would have taken to type in the full names in the first place. You should carefully measure tradeoffs between readability and brevity, and if you have to guess, lean toward readability.

On the topic of optional syntax, the keyword INNER is not required in the queries, either. If you want to write the query as follows , it works the same way:

 SELECT A.* FROM Person P        JOIN Sales S               ON P.PersonID = S.PersonID        JOIN PersonAddress PA               ON PA.PersonID = P.PersonID        JOIN Address A               ON PA.AddressID = A.AddressID

Arguably, this example requires less typing, but many people feel that the keyword INNER provides a little more insight into how the query works and is easier to read.

One more variation on the theme of inner joins. Imagine that you need to join a table back onto itself. For example, you want to find all the people with the same last name in the Person table. You'd write a query like this one:

 SELECT P1.FirstName, P2.FirstName, P1.LastName FROM Person P1        INNER JOIN Person P2               ON P1.LastName = P2.LastName               AND P1.FirstName < P2.FirstName

In this example, the same table is used twice, joined to itself. The example produces the following output:

 FirstName FirstName LastName Danny     Melissa   Jones Chris     Jennifer  Avery Bryan     Robin     Decker

The join also has an extra line in it. The P1.FirstName < P2.FirstName part prevents SQL Server from producing duplicate rows. Without that line in the JOIN clause, or at least in the WHERE clause, you'd have each couple showing up twice, once with each name first.

Now that you've seen how INNER JOIN works, it's time to move on to the outer joins.

Outer Joins

You can use an outer join when you want to return the entire list of rows from one side of the join, and the matching rows from the other side of the join. There are three types of outer joins: LEFT , RIGHT , and FULL . A RIGHT OUTER JOIN , often abbreviated RIGHT JOIN , returns all the rows belonging to the table on the right side of the join, and where there is a matching row on the table on the left side, it returns the matching data. Conversely, LEFT OUTER JOIN returns all the rows from the table on the left side, and where there is a matching row on the table on the right side, it returns the matching data. A FULL OUTER JOIN returns all the rows from both sides with correlations where they exist.

Most people read that for the first time and their eyes cross. Here are some examples to help you get back on track. Remember the query you saw earlier that returned all the records from Person with a matching record in Sales? This example returns all the records in Person, and matching records in Sales where they exist. First, using a LEFT JOIN :

 SELECT FirstName, LastName, ProductID, QtyPurchased FROM Person        LEFT JOIN Sales               ON Person.PersonID = Sales.PersonID

And then a RIGHT JOIN :

 SELECT FirstName, LastName, ProductID, QtyPurchased FROM Sales        RIGHT JOIN Person               ON Person.PersonID = Sales.PersonID

These two queries both return exactly the same resultset, because the order of the tables is different. The LEFT JOIN example does Person LEFT JOIN Sales and the RIGHT JOIN example does Sales RIGHT JOIN Person .

IN THE FIELD: RIGHT JOIN AND LEFT JOIN

As you've just read, LEFT JOIN and RIGHT JOIN both do the same thing. Why does Transact-SQL include both, and which should you use?

Typically, you should use LEFT JOIN whenever possible. There's no reason for this other than convention, and most Western languages read left to right, so people in Western cultures are more trained to think in that direction. So using LEFT JOIN will make your query easier for someone else to understand.

In other order-related issues, the order that the ON clause is in does not change how the clause works. For example, saying ON P.PersonID = S.PersonID is the same as S.PersonID = P.PersonID .

Now it's time to put together everything you've learned so far. You've got an INNER JOIN and an OUTER JOIN . You can use table aliases in an OUTER JOIN the same way as you do in an INNER JOIN , by the way. Table aliasing is a feature of the FROM clause, not the join type. Now here's a list of all the addresses in the database and what they bought, if they bought anything:

 SELECT A.*, S.ProductID, S.QtyPurchased FROM Person P        LEFT JOIN Sales S               ON P.PersonID = S.PersonID        INNER JOIN PersonAddress PA               ON PA.PersonID = P.PersonID        INNER JOIN Address A               on PA.AddressID = A.AddressID

Now, the FULL OUTER JOIN . Imagine that you want to query all the records from Person and Sales , and have the tables put together where possible, but you want all records shown from both tables, regardless of whether there was a correlation or not. Here's how you can do it:

 SELECT FirstName, LastName, ProductID, QtyPurchased FROM Person        FULL JOIN Sales               ON Person.PersonID = Sales.PersonID

That returns the following recordset:

 FirstName       LastName       ProductID       QtyPurchased Danny           Jones          37              4 Danny           Jones          38              3 Melissa         Jones          NULL            NULL Scott           Smith          39              1 Alex            Riley          51              1 Chris           Avery          NULL            NULL Jennifer        Avery          NULL            NULL Bryan           Decker         NULL            NULL Robin           Decker         NULL            NULL Shelly          Alexander      37              10 Shelly          Alexander      38              5 NULL            NULL           41              6

Notice that you have examples where the data that comes from the Sales table ( ProductID and QtyPurchased ) is NULL , for Chris Avery and others, along with an example of ProductID of 41 and QtyPurchased of 6 that was purchased by someone not in the Person table.

Filtering with Outer Joins

One common use of an outer join is to find places where there aren't correlations in data. For example, if you wanted to find all the rows in the Person table that didn't have a corresponding row in Sales , which would mean a person who hadn't bought anything yet, you could run a query like this one:

 SELECT FirstName, LastName, ProductID, QtyPurchased FROM Person        LEFT JOIN Sales               ON Person.PersonID = Sales.PersonID WHERE Sales.QtyPurchased IS NULL

This query exploits the fact that when you do a LEFT JOIN , the columns that are returned for the table on the right when there is no correlation are all NULL . The fact that the QtyPurchased column in the Sales table is used is just coincidence ; any column in the Sales table that doesn't allow NULL values could be used. Whenever there is no correlation between the data rows, the columns for the table on the inside of the join are all set to NULL , making it a very easy target for filtering.

If you're using this data for a report, you probably don't want the word NULL showing up on the report. The managers and sales people who end up reading the report probably won't understand the word NULL . So, to steal some thunder from a bit later in the book, here's how you can get rid of those annoying NULL s:

 SELECT FirstName, LastName, ISNULL(ProductID, 0), ISNULL(QtyPurchased, 0) FROM Person        LEFT JOIN Sales               ON Person.PersonID = Sales.PersonID WHERE Sales.QtyPurchased IS NULL

The ISNULL() function returns the second argument if the first argument is NULL . So, for the cases where the QtyPurchased is NULL , it substitutes in a zero. ISNULL() is covered a bit later during the discussion about CASE expressions.

Cross Joins

Cross joins are joins that return all the rows from both tables associated with each other. For example, currently the Person table has 9 records and the Product table has 7 records. A CROSS JOIN of these two tables would return 63 rows: each of the 9 records in the Person table with each of the 7 records in the Product table. Whenever you use a CROSS JOIN , you should be aware that the returned resultset is likely to be very large. A CROSS JOIN does not have an ON clause; there is no correlation, so it's not needed.

Here's an example of using an ON clause to generate data for a custom order form:

 SELECT FirstName, LastName, ProductDescription FROM Person        CROSS JOIN Product ORDER BY FirstName, LastName, ProductDescription

The FROM clause of a SELECT statement is an incredibly complex piece of work. You have the different join types and their correlations all going on in there. So, how can you make it even more complex and harder to read?

Derived Tables

You can use derived tables to make your queries simpler to read, or at least simpler to write. To use a derived table, put a SELECT statement in parentheses in the FROM clause where you'd normally put a table name. An alias is required for a derived table.

Let's say that you have a query that looks something like this:

 SELECT Person.PersonID, FirstName, LastName, ProductID, QtyPurchased FROM Sales        RIGHT JOIN Person               ON Person.PersonID = Sales.PersonID

And you'd like to get the address for the people returned as well. You've already done this once in the earlier examples, but here's another way to do it:

 Select P.*, Address.StreetAddress, Address.City, Address.ZipCode       from             (SELECT Person.PersonID, FirstName, LastName, ProductID, QtyPurchased                   FROM Sales                   RIGHT JOIN Person                         ON Person.PersonID = Sales.PersonID) P             INNER JOIN PersonAddress                   ON PersonAddress.PersonID = P.PersonID             INNER JOIN Address                   ON Address.AddressID = PersonAddress. AddressID

As you can see, there's a table here aliased to P that is actually a SELECT statement, and it's the same SELECT statement that was used earlier. So you can do some interesting things here with copy-and-paste : you can take a query and write another query around it.

The problem is, as you can see from the example, the resulting query can be very difficult to format or read. Things that are difficult to format and read also tend to be difficult to optimize, modify, and debug.

That's one example of using a query inside another query. Here's another.

The IN Operator

The IN clause can be used in comparisons inside nearly every SQL statement as an operator. The IN operator takes two arguments ”a value and a set ”and checks to see whether the value is part of the set. For example:

 SELECT * FROM Person WHERE PersonID IN (1, 3, 5)

That's a good way to use the IN operator. It can also be used with select queries that return one and only one column, like this:

 SELECT * FROM Person WHERE PersonID IN (Select PersonID from PersonAddress)

In this case, the SQL statement returns every person who has an address, but it always returns each person only one time. If you run the same query with a join, you get back the same list, assuming each person has only one address. If some of the records in Person have more than one address, you end up with duplicates in the resultset.

 SELECT Person.*        from Person               INNER JOIN PersonAddress                      on PersonAddress.PersonID = Person.PersonID

This returns the same list, but SQL Server executes this differently. SQL Server is very efficient at processing joins, but it's not as efficient at processing IN clauses, so use IN clauses sparingly.

Sometimes you need a bit more flexibility in your queries to decide what data to include, based on the data in other fields.

CASE Expressions

A CASE expression works like an IF statement, but can be used in locations where an IF statement cannot. Specifically, a CASE expression returns one of a specific set of values based on the outcome of one or more expressions. Here's an example:

 Select CASE datepart(weekday, getdate())               WHEN 1 then 'Sunday'               WHEN 2 then 'Monday'               WHEN 3 then 'Tuesday'               WHEN 4 then 'Wednesday'               WHEN 5 then 'Thursday'               WHEN 6 then 'Friday'               WHEN 7 then 'Saturday'               ELSE 'Unknown' END

This example gets the day of week for today and turns it into a string that represents the text for the day of week. If, for some reason, the day of the week returned by the datepart function is invalid, it returns the string Unknown . The result is placed into the variable @Result . This is the proper syntax to use when the comparison you want to use is equality ”in this situation, datepart(weekday, getdate()) = 1 . Notice that the expression starts with the keyword CASE and ends with the keyword END . This is the only time you can use an END without a BEGIN . This is called a "simple" CASE statement, contrasted against the "searched" CASE statement, discussed later in this section.

Now, if you wanted to write code with a similar result, you can write this:

 DECLARE @result varchar(30) IF datepart(weekday, getdate()) = 1        set @Result = 'Monday' else if datepart(weekday, getdate()) = 2        set @Result = 'Tuesday' else if datepart(weekday, getdate()) = 3        set @result = 'Wednesday' else if datepart(weekday, getdate()) = 4        set @result = 'Thursday' else if datepart(weekday, getdate()) = 5        set @result = 'Friday' else if datepart(weekday, getdate()) = 6        set @result = 'Saturday' else if datepart(weekday, getdate()) = 7        set @result = 'Sunday' else set @result = 'Unknown' PRINT @Result

Although that's certainly usable in this case, it's not usable if you want to embed the logic into an INSERT or UPDATE statement; plus the CASE involves a lot less typing. Here's another way to write the same code with a different syntax of CASE statement:

 DECLARE @Result char(10) SET @Result = CASE WHEN datepart(weekday, getdate()) = 1 THEN 'Sunday'      WHEN datepart(weekday, getdate()) = 2 THEN 'Monday'      WHEN datepart(weekday, getdate()) = 3 THEN 'Tuesday'      WHEN datepart(weekday, getdate()) = 4 THEN 'Wednesday'      WHEN datepart(weekday, getdate()) = 5 THEN 'Thursday'      WHEN datepart(weekday, getdate()) = 6 THEN 'Friday'      WHEN datepart(weekday, getdate()) = 7 THEN 'Saturday'      ELSE 'Unknown' END

This code will do the exact same thing, but notice that the expression isn't split up. That allows you to do something like this:

 DECLARE @Result char(10) SET @Result = CASE WHEN datepart(weekday, getdate()) in (1,7) THEN 'Weekend'               ELSE 'Weekday' END PRINT @Result

You couldn't use the IN clause with the simple case covered previously, but the searched form of the CASE statement is allowed to use IN along with the rest of the comparison operators.

The CASE expression is most often used in SELECT statements to modify data. For example, imagine that you want to find out which objects in your database were created on a weekend:

 SELECT name,         CASE WHEN datepart(weekday,crdate) in (1,7) THEN 'Weekend'               ELSE 'Weekday'         END FROM sysobjects

CASE statements can also be used in the WHERE clause, an ORDER BY clause, or anywhere else an expression is allowed, like this:

 SELECT name FROM sysobjects WHERE CASE id % 2              WHEN 1 THEN 1              ELSE 0 END = 1

The percent sign in this example is the modulo operator: it returns the remainder of the first number divided by the second number. Basically, what this SELECT statement does is return the names of all the odd-numbered (divisible by 2 with a remainder of 1) objects in the current database. When the ID modulo 2 is 1, then it's an odd number, and the CASE statement returns 1, which the WHERE clause then compares to the number 1, and the row is included in the resultset. Otherwise, the CASE statement returns 0, which does not equal one, so the row is not included in the resultset. The keen of wit will note that a better way to write this would be:

 SELECT name FROM sysobjects WHERE id % 2 = 1

That, however, would not demonstrate the point of using CASE statements in a WHERE clause, nor would it be nearly as convoluted. It would, however, be readable and efficient.

You should be aware of a couple of shortcuts. The ISNULL function is a great way to handle NULL values without using a CASE statement. Instead of writing this:

 SELECT CASE sid WHEN null THEN 0 ELSE sid END FROM sysusers

you could write this statement, which does the same thing:

 SELECT isnull(sid, 0) FROM sysusers

Another statement that's a shortcut for a CASE statement is called COALESCE . It takes a series of values and returns the first one that's not null. You could rewrite the preceding statement with:

 SELECT coalesce(sid, 0) FROM sysusers

and get the same results. Now that you've got the CASE statement down, you can learn how to join tables together end-to-end with the UNION operator.

The UNION Operator

The UNION operator is used to join two queries together end-to-end, instead of side-by-side. A UNION operator takes the output of two or more SELECT statements and creates one recordset. Each SELECT statement in the list must return the same number of columns, and the columns must be of compatible types. So, if the first SELECT statement returns an int , a float , a varchar(15), and a char(6) , the second has to contain two pieces of data that can be converted to numbers followed by two pieces of data that can be converted to characters .

So, what happens if the columns are different types? Well, if they are compatible, meaning that the types and data can be implicitly converted, then the output set will contain as close to the types of the first SELECT as possible, with the following rules for a given column:

If any of the SELECT statements return a mix of variable-length and fixed-length fields, the output is variable length.
If the SELECT statements contain columns of different lengths, the longest length is used for the output.
If the two values are numerics, the value with the most precision is used.

So, if you have an integer and a float, the output is a float, because a float has the most precision. If you have an integer and a string that is a number (such as 42 ), then you will have an integer; but, if the string was 42.00 , you'd have a numeric type that could contain two decimal places and a full integer. These are the "rules of thumb" that'll get you through most situations. If you want to see the entire set of data type precedence rules, it is available in Books Online. Search the index for "Precedence," and choose the article on "Data Type Precedence."

Here's an example of a UNION . This example uses a different set of sample data than the rest of the chapter, because of the type conversion topics:

 CREATE TABLE Table1 (        A int,        B float,        C varchar(15),        D char(10) ) CREATE TABLE Table2 (        First numeric(5,2),        Second int,        Third int ) INSERT INTO Table1 VALUES (1, 3.14, '42', 'Bogus') INSERT INTO Table1 VALUES (2, 2.1828, '93', 'Data') INSERT INTO Table2 VALUES (123.45, 3, 16) INSERT INTO Table2 VALUES (456.78, 4, 29) SELECT A, B, C FROM Table1 UNION SELECT First, Second, Third FROM Table2

Notice that the column names are specified. If you use SELECT * with both, they have an inconsistent number of columns, and you'll get an error, All queries in an SQL statement containing a UNION operator must have an equal number of expressions in their target lists. This returns successfully, and the data types are a numeric(5,2) , a float , and an int . The numeric(5,2) is the compromise between an int and a numeric(5,2) ; the float is the compromise between a float and an int ; and the data is converted to int because all the varchar values in Table1 can convert to int . If there had been a string that couldn't convert to an int , SQL Server would have thrown an error message. If it can't convert a string to a numeric value, it throws an error.

Three additional notes on UNION . First, if you want to sort a UNION , you put the ORDER BY after the last SELECT statement, like this:

 SELECT A, B, C FROM Table1 UNION SELECT First, Second, Third FROM Table2 ORDER BY 1

Next , if you want to do a SELECT...INTO operation, you need to do it as follows:

 SELECT A, B, C INTO #UnionOutput FROM Table1 UNION SELECT First, Second, Third FROM Table2

Finally, the column names returned are taken from the first query in the set of UNION operators. So, in this example the columns would be named A , B , and C .

IN THE FIELD: UNION OPERATORS AND DATABASE DESIGN

Recall for a moment Chapter 2, "Database Modeling." Remember that the elements that make up a database are called tables, and tables should classify entities. A person is an entity; a company is an entity; a sale of a product is another entity.

If you have one table for companies and a separate table for individual people, and you use those tables to track sales, it's pretty natural to write a query that would union the table of people with the table of companies to provide a summary report or even just a list of mailing addresses for a new company catalog.

On the other hand, if you have several different tables of company information, and you're joining the tables together with a UNION , you may have something wrong with your database design. Whenever you use a UNION , ask yourself why the tables on which you are using the UNION are separate tables. Is it because they represent distinct entities, or because the database isn't designed the way it should be?

In general, an overuse of the UNION operator is indicative of bad design. Objects that are so alike in structure that they can be joined with a UNION should in all probability be in the same table in the first place.

One unique feature of the UNION is that it automatically removes duplicates from the final resultset. So, if the Table1 and Table2 tables have rows that are identical, SQL Server automatically filters them all out. Of course, this does have a great deal of overhead associated with it. To avoid that overhead, if you don't care about duplicates, you can use the UNION ALL command, like this:

 SELECT A, B, C FROM Table1 UNION ALL SELECT First, Second, Third FROM Table2

You can now write queries that join data horizontally across columns with the various JOIN clauses and that join data vertically across rows with the UNION operator. You've also packed a few additional tools such as table aliases and derived tables into your toolbox. These are all foundational components for the next piece, grouping data.