Joining Multiple Sets of Data


Joins and derived tables will figure prominently in your exam. Joins are the backbone of relational databases; they actually put the "relation" in relational databases. They're used in all the main SQL statements (SELECT, INSERT, UPDATE, and DELETE). They're very important. Also, derived tables tend to be overlooked, and they're perceived as complicated even though they're not; so they're also likely to show up.

Joining tables is pretty much a natural occurrence within a relational database system. Many of the queries performed on a regular basis will involve multiple tables. Whenever you query data from two tables, you need to find some way to relate the two tables. Connecting one table to another mandates a common element of the tables. You would use this whenever you want to see a resultset that includes columns from several tables.

There are three basic join types, as well as a union operation that can be used to connect tables. An inner join shows results only where there are matches between the elements. An inner join leaves out all the records that don't have a match. An outer join can show all the records from one side of the relationship, records that match where they are available, and NULL values for records that do not have a match. An outer join shows all the same records as an inner join, plus all the records that don't match. The final join, a cross join, returns all possible combinations of rows between the two sides of the join. The number of records in the resultset is equal to the number of records on one side of the join multiplied by the number of records on the other side of the join. No correlation is attempted between the two records; all the records from both sides are returned.

Cross joins are less frequently used than inner and outer joins. With an outer join you are guaranteed to have all records returned from one side or the other. With inner joins the only rows returned will be those that match in the joined tables. It is easier to contemplate the overall processes if you consider join operations first. What you put in the WHERE clause and other clauses is applied after the joins are processed. So bear in mind that when a join returns a specified set of records, the SQL statement may or may not return all those records, depending on what you've specified in the WHERE clause.

With all of that out of the way, let's look at each of the different join operators, starting with INNER JOIN.

Return Output Only When It MatchesInner

The INNER JOIN statement is the easiest and most often used. For this reason, when an INNER JOIN operation is performed, the INNER portion of the syntax is optional. A rowset is returned from the operation for the records that match up based on the criteria provided through the ON clause. A one-to-many relationship is handled inside of an application with the INNER JOIN. To show all orders for a particular customer, the join operation would be this:

 SELECT Orders.* FROM Orders  JOIN Customers   ON Orders.CustomerID = Customers.CustomerID 

The results from the preceding query may appear a little on the unusual side because no sorting is performed. Typically an order is specified or information is grouped to make the output more usable. In performing this type of query, each column that occurs in more than one of the tables must be prefixed with the table name or an alias. Figure 5.8 shows a couple of examples of how the join operation might be used.

Figure 5.8. Customer/Orders INNER JOIN.


Inner joins can involve the use of more than two tables and in many cases are more useful to an application. To look into this further, an order process can be considered in much more detail. A customer has many orders, an employee makes many sales, a single order can have multiple line items/order details, and a product can occur within multiple orders. To relate all of this within a single SELECT operation, you would have something like this:

 SELECT * FROM Orders AS O  JOIN Customers AS C ON O.CustomerID = O.CustomerID  JOIN Employees AS E ON O.EmployeeID = E.EmployeeID  JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID  JOIN Products AS P ON OD.ProductID = P.ProductID 

Of course, you probably would not want to return all the rows from all the tables, but the preceding example was intended to show the syntax with multiple tables. You will note that each table has been aliased to make it easier to write the query. Of course, in the preceding query data is returned only where there is a match in the ID. If you wanted to see data for all customers regardless of whether they have placed an order or something similar, an outer join is necessary.

Return Output Even When It Doesn't MatchOuter

You can use an outer join when you want to return all of one entire list of rows from one side of the join. There are three types of outer joins: left, right, and full. The terms "left" and "right" are used based on the positioning of the tables within the query. The first table is the right; the second, the left. You may find it easier drawing a picture to represent the table when you are first learning outer joins.

A right outer join, often abbreviated to right join, returns all the rows belonging to the table on the right side and only the matching rows on the table on the left side. Conversely, a left outer join returns all the rows from the table on the left side. A full outer join returns all the rows from both sides with correlations where they exist.

Left and right outer joins, for all intents and purposes, are the same operations; it is simply a matter of the position of the tables within the queries. For this reason the following examples use only the left outer join syntax, which is the one typically used. The following query will produce a listing of all customers and their orders. If a customer has never placed an order, he would still be in the listing accompanied by a NULL for the OrderID.

 SELECT * FROM Customers AS C  LEFT JOIN Orders AS O   ON C.CustomerID = O.CustomerID 

Be careful in using conditions within these operations. A join operation is solved by the query analyzer before a WHERE condition. If you are looking for a listing of all customers and their orders and want to limit the listing to a single month, you must not place the condition with the ON clause in a WHERE. Placement of the condition in the WHERE would reduce the output after the join has occurred, and you would end up with a list that did not contain all customers. The correct query would be as follows:

 SELECT C.CustomerID, O.OrderID FROM Customers AS C  LEFT OUTER JOIN Orders AS O  ON C.CustomerID = O.CustomerID AND MONTH(OrderDate) = 6 

Contrast the difference in the output, as shown in Figure 5.9.

Figure 5.9. ON versus WHERE conditions in joins.


Notice that when the WHERE condition included the month restriction, not all customers were included in the results. The full outer join would include all elements from both the left and the right tables. This is helpful when you want to see all the records from both tables regardless of whether there was a correlation. Therefore, the following query would show all customers and all orders. If there was an order for which there was no customer on file, the CustomerID would be NULL. If there was a customer who had not placed an order, the OrderID would be NULL.

 SELECT C.CustomerID, O.OrderID FROM Customers AS C  FULL OUTER JOIN Orders AS O  ON C.CustomerID = O.CustomerID 

It should be identified that in the Northwind database it is not possible for an order to occur unless there is a matching customer. This is because of the restrictions that are in place through maintaining referential integrity. Although the full join allows you to see all the rows from either table, there is one more query, the cross join, that joins every row in both tables.

Joining Everything to EverythingCrossing

Performing a cross join on two large tables is probably not a good idea. The number of rows in the return set, or in intermediate sets used by SQL Server, can get out of hand quickly, causing server-wide performance degradation. If you have 100 records in the Person table, and 100 records in the Sales table, and then perform a cross join between the two tables, you'll end up with 10,000 records in your resultset (100*100 = 10,000).

Scale a cross join up to one table with 50,000 records and a second table with 100 records, and the resulting join produces five million records. This is not desired in most applications. If you figure that each of those returned records contains about 300 bytes, you can end up with a resultset that weighs in at 1.5 gigabytes, which would probably overwhelm any client computer you aimed it at.

Cross joins are rarely used except in some forms of statistical analysis. This is because most of the connections made really have nothing to do with something in common between the tables. It is simply a matter of connecting every row on one side to every row on the other regardless of content.

Multiple DatasetsUnions

Unions are similar to joins in that the resulting output contains combined information. A union, however, is the operator used to join two separate queries together end to end. A union operator takes the output of two or more SELECT statements and creates one recordset.

Each SELECT statement in the list must return the same number of columns, and the columns must be of compatible types. So if the first SELECT statement returns an int, a float, a varchar(15), and a char(6), the second SELECT has to contain two pieces of data that can be converted to numbers followed by two pieces of data that can be converted to characters.

If the columns are different types but are compatible, the output set will contain as close to the types of the first SELECT as possible. If any of the SELECT statements returns a mix of variable-length and fixed-length fields, the output will be variable length. If the SELECT statements contain columns of different lengths, the longest length is used for the output. If the two values are numerics, the value with the most precision is used.

If you have an integer and a float, the output is a float, because a float has the most precision. If you have an integer and a string that is a number (such as 42), you will have an integer; but if the string was 42.00, you'll have a numeric type that can contain two decimal places and a full integer.

Many SELECT queries can be combined using a UNION to create a single resultset. The following example creates a large set of addresses by combining the output from three separate tables:

 SELECT ContactName, Address, City, Country, Phone     FROM Customers  UNION SELECT FirstName + ' ' + LastName, Address, City, Country, HomePhone    FROM Employees  UNION SELECT ContactName, Address, City, Country, Phone    FROM Suppliers 

Notice that the FirstName and LastName columns of the Employees table have been combined and are treated as a single column. The resulting UNION uses the same number of columns in each of the three queries. When ordering the data from a UNION, the ORDER BY clause is attached to the final SELECT statement. If you want to create a new table that combines the output, you supply the INTO clause on the first SELECT. Therefore, to create a sorted address table using the preceding example, the syntax would be this:

 SELECT ContactName, Address As 'StreetAddress', City, Country, Phone    INTO Addresses FROM Customers  UNION SELECT FirstName + ' ' + LastName, Address, City, Country, HomePhone    FROM Employees  UNION SELECT ContactName, Address, City, Country, Phone    FROM Suppliers  ORDER BY Country, City 

The column names for the resulting table are taken from the first query in the set of UNION operators. If you want to change the column names, you would simply need to add aliases to the columns of the first SELECT similar to what has been done in the preceding example with the StreetAddress alias.

One unique feature of the UNION is that it automatically removes duplicates from the final resultset. To achieve a complete UNION operation that would include the duplicates, you would use the UNION ALL command instead of simply using UNION.

Adding and Modifying Data

Any business application includes many more processes than the retrieval of data. As time passes, new data will no doubt need to be inserted, redundant data will need to be deleted, and existing data will need to be updated. This is representative of a basic ACD (add, change, delete) implementation that is performed in virtually every database environment.

Data modifications are performed using UPDATE, and removal of data is achieved through the use of a DELETE command. Although there are many ways to insert data into an existing table, the primary coding method is by using the INSERT statement. The statement causes the data values to be inserted as one or more rows into an existing table.

Data inserted must meet the parameters defined by the table structure. This means that NOT NULL columns must have data provided either through input or through the use of column definitions that provide for their own values. A column can obtain its input value through a DEFAULT, IDENTITY, formula, or default object. Data must also meet all rules and constraints that have been defined in the table schema.

Data can be inserted into a table directly or through the resulting dataset from another command such as a SELECT query or an EXECUTE operation. When a query is performed, it enables a large amount of data to be extracted from a table or view and stored into another table.

A SELECT statement can be used within the INSERT statement to add values into a table from one or more other tables or views. Using a SELECT subquery is also a mechanism that enables more than one row to be inserted at one time. This type of INSERT statement is often used to insert data into a separate table from some other table or data source. In this manner the data can be copied or just separated off for handling of exceptions or specialty tasks.

For example, imagine that you would like to copy all your current employees into a customer table. This will enable your employees to make purchases and, of course, allow for an employee discount (not implemented in the Northwind example). The query to perform this operation might look similar to the following:

 INSERT INTO Customers        SELECT EmployeeID, 'Northwind',               FirstName + ' ' + LastName,               'Employee', Address, City, Region,               PostalCode, Country, HomePhone, NULL           FROM Employees 

The SELECT list of the subquery must match the column list of the INSERT statement. If no column list is specified, the SELECT list must match the columns in the table or view being inserted into, as in the example. Note that NULL has been provided for a fax number at the end of the column list, because none is included in the Employees table. The INSERT SELECT statement can be used to insert data from any viable source. This includes SQL Server tables and views, and sources outside SQL Server.

An EXECUTE statement (typically abbreviated to EXEC) that returns data with SELECT or READTEXT statements can be used to return the insert values to an INSERT operation. Each resultset must be compatible with the columns in the table or in the column list being used by the INSERT. In this manner a stored procedure can be executed and the data returned as input to a table. If an operation returns data with the READTEXT statement, each individual READTEXT statement can return a maximum of 1MB. The EXEC operation can also be used with extended procedures (not available in previous versions of SQL Server).

Using an operation in this manner enables complex logic to be saved into a stored procedure, and the resulting output can be used for the insertion. Using this facility, you can get virtually any data from anywhere and feed it to any destination. Some of these alternatives to query feeds are explored in Chapter 6, "Programming Business Logic."

Alternatively, you can specify just the VALUES keyword, which is a more convenient method. VALUES is required unless you are doing INSERT, SELECT OR INSERT, or EXECUTE. The same results are produced with the following query:

 INSERT INTO PetTable Values(30, 'May', 2002, 'male',  'black', 'cat') 

Of course, adding data is only the first aspect of ACD operations. After data is initially inputted into the system, over time it will need to be altered. Alterations to data are performed through the use of an UPDATE.

Just as with the input of data, any alterations made to the data are subject to the rules and constraints as defined in the table schema. Any data modifications made that do not meet these defined standards result in errors being generated that the front-end application may have to trap and control.

Most data modifications will be based on a conditional operation or alternatively will be an in-place UPDATE of the current record. The following represents a basic UPDATE statement that will adjust the data based on the criteria given:

 UPDATE Customers   SET Region = 'South America'   WHERE Country IN ('Argentina', 'Brazil', 'Venezuela') 

Along with data modifications, data over time will become obsolete and will need to be removed. Data removal is accomplished through the use of the DELETE statement. It is extremely important to provide a condition when performing data deletions because it is too easy to remove all data if you do not exercise caution. The following command removes all records from the specified table:

 DELETE FROM Customers 

If you would like to delete all rows, a fast, nonlogged method is already supplied: TRUNCATE TABLE. It immediately frees all space used by indexes and data by that table, as opposed to DELETE, which should be used when partial data removal is desired. Although both trUNCATE TABLE and a DELETE statement with no WHERE clause remove all rows in a table, TRUNCATE TABLE is faster and uses fewer system and log resources. The DELETE statement removes rows one at a time, recording an entry in the transaction log for each row. trUNCATE TABLE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the log. TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, and indexes remain intact. The counter used by any identity columns is reset to the seed value for the column. If you want to retain the identity counter, use DELETE instead.

Removal of data may impact other tables if cascading deletions have been specified for a relationship. This could mean the removal of other records based on a single DELETE operation. Also, a relationship definition may prevent the deletion of data and return an error condition to the operation. The DELETE statement removes one or more records from a table based on a condition in the WHERE clause. A simplified sample DELETE statement is shown here:

 DELETE FROM Products    WHERE Discontinued = 1 

Traditional applications could be completely centered around the four basic SQL commands: SELECT, INSERT, UPDATE, and DELETE. Essentially, these statements handled about every operation that needed to be performed against the data.



    EXAM CRAM 2 Designing and Implementing Databases with SQL Server 2000 Enterprise
    MCAD/MCSE/MCDBA 70-229 Exam Cram 2: Designing & Implementing Databases w/SQL Server 2000 Enterprise Edition
    ISBN: 0789731061
    EAN: 2147483647
    Year: 2005
    Pages: 154

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net