Advanced Data Retrieval and Modification | MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition

Joins

There are three basic join types. An inner join shows results only where there are matches between the elements. In other words, if you were querying the database and only wanted to see the people who have addresses, you'd use an inner join. An inner join will leave out all records that don't have a match.

An outer join can show all records from one side of the relationship, and records that match if available from the other side and NULL values for records that do not have a match. An outer join between the Person and Sales table could show you each person and the amount of purchases they've made, and also show NULL values for people who haven't made purchases yet, but are in the table.

The final type of join is a cross-join. A cross-join returns all possible combinations of rows between the two sides of the join. The number of records in the resultset is equal to the number of records on one side of the join multiplied by the number of records on the other side of a join. No correlation between the two records is attempted ”all the records from both sides are returned. Performing a cross-join on two large tables is probably not a good idea. The number of rows in the returnset, or in intermediate sets used by SQL Server, can get out of hand quickly, causing server-wide performance degradation.

Using GROUP BY

The GROUP BY operator creates subgroups within a query that can be used by aggregate functions. For example, look at the Sales table. Right now, the Sales table has one record for each sale, but you really want to know how many total items each person bought. Here's how GROUP BY will solve that problem:

 SELECT PersonID, sum(QtyPurchased) as TotalQtyPurchased FROM Sales GROUP BY PersonID

There are, of course, some rules associated with the use of GROUP BY within a query:

The items in the SELECT statement that are not part of an aggregate function have to appear in the GROUP BY clause.
You can have multiple aggregates in one SELECT statement. For example, you can find MIN and MAX values for a particular column in one SELECT statement.
The items in aggregate functions in the SELECT statement cannot appear outside aggregate functions in the SELECT statement. So, although you can do SELECT MIN(value), MAX(value) , you can't do SELECT value, MIN(value), MAX(value) , even if you use the appropriate GROUP BY .

Using GROUP BY and HAVING

The HAVING keyword appears after GROUP BY in a SELECT statement, and it is used to filter data after the GROUP BY . You can actually use HAVING without using GROUP BY , in which case HAVING will act like a part of the WHERE clause. The HAVING operator is typically used with aggregate functions to filter out rows that don't meet certain criteria.

Using COMPUTE and COMPUTE BY

So far, the operators you've looked at do not ever create data rows. That changes now. The COMPUTE and COMPUTE BY operators create summary data across your aggregates and add it into your rowset. There are the inevitable rules, however:

If you use COMPUTE without using BY , you can use the COUNT aggregate; otherwise , you can't.
If you use COMPUTE with BY , you need to have an ORDER BY , with the same columns in the same order. You can have additional items in the ORDER BY , but you at least need to have what is specified in the COMPUTE BY .
You can't use COMPUTE or COMPUTE BY with SELECT INTO .
You can't use COMPUTE or COMPUTE BY as the SELECT statement that feeds a cursor ( cursors are covered in Chapter 6).

Using OPENROWSET and OPENQUERY

The OPENROWSET function returns a rowset to SQL Server. It's used in a SELECT statement in the same place as a table. Here are the parameters for the OPENROWSET function:

provider_name . This is the name of the provider, as specified in the registry. It is not optional.
datasource . This is the name of the file, server, or whatever the OLE-DB provider needs to figure out what it should be talking to.
user_id . This is the username that the provider understands. Some providers may not need a username, so this could be blank.
password . This is the password that the provider understands. Some providers may not need a password, so this could be blank.
provider_string . This is a free text field that has everything else the provider needs to initiate the connection. Only the provider knows for sure what this is; you'll have to dig through piles of documentation.
catalog . This is the name of the database, catalog, or whatever the provider understands. It's the top level of the hierarchy of object names . It's also likely to be blank if the provider doesn't use it.
schema . This is the name of the owner of the object. It may be blank if the provider doesn't use it.
object . This is the name of the object being manipulated. Believe it or not, this could be blank if there's only one object in the datasource , such as a text file.
query . This is a string that's provider-specific, and it's passed directly to the provider as a query. It may be an SQL query, or it may be something else. It's also not processed by SQL Server, so you should make sure that the syntax is valid before you send it; otherwise, you'll get a very incomprehensible error message.

OPENQUERY allows for the execution of a pass-through query to be performed on a linked server. The linked server can be any OLE DB data source. The OPENQUERY function can be referenced in the FROM clause of a query in the same manner as the use of a table name. The OPENQUERY function can also be used as a target for an INSERT , UPDATE , or DELETE statement. The syntax for its use is

 OPENQUERY (linked_server,'query')

where linked_server is the name of the linked server and 'query' is the string executed.

Linked Server

You can think of a linked server as a prebuilt set of arguments for OPENROWSET stored in a database object. You can just create an object called a linked server, and create it with all the attributes that you'd normally use in a call to OPENROWSET . Then, instead of having to type in all those parameters over and over again, you can just use the linked server. When a linked server is used, it is important to remember the four-part naming of objects in queries: server.database.owner.object .

XML

XML is breaking into the Internet, setting a new data standard and providing another mechanism of getting data from point A to point B. SQL Server's use of XML is discussed in Chapter 5. To put data into XML format, the SELECT statement includes an operator called FOR XML . This specifies to SQL Server that instead of returning a rowset, it should return an XML document. There are three different options for generating the XML: RAW , AUTO , and EXPLICIT .

In AUTO mode, SQL Server returns the rowset in an automatically generated nested XML format. If the query has no joins, it doesn't have a nesting at all. If the query has joins, then it returns the first row from the first table, and then all the correlated rows from each joined table as a nested level.

What if you don't want XML that is elaborately formatted? You then can use the RAW mode. RAW mode returns each row as an element with the identifier "row" .

Another way to extract data with the FOR XML clause is by using the option EXPLICIT . The EXPLICIT option enables you to specify the format of the XML that will be created. To define the format, you have to alias the first column of output to the name Tag , name the second column Parent , and then each consecutive column has to be aliased to relate it to a specific Tag . The column names after Parent are named like this: [ ElementName ! TagNumber ! AttributeName ! Directive ] , complete with the square brackets around the alias. (See Table 4 for a list of directives.)

Table 4. Directives for Use in XML Column Aliases

Directive	Description
`ID`	This is used to define an element as an anchor for a referral; other objects can then refer back to this one with `IDREF` and `IDREFS` . If you're not using the `XMLDATA` option, this option doesn't do anything.
`IDREF`	This links the element with another element that is specified with the `ID` directive. Once again, if you're not using the `XMLDATA` option, this doesn't do anything.
`IDREFS`	Same thing as `IDREF` .
`hide`	This specifies that the attribute should not be displayed. This is handy for creating an element that you want to sort on, but you don't necessarily want to have in the resultset.
`element`	This will force the creation of a new element with the name specified in the alias and the data coming from the column data.
`xml`	This directive specifies that the data in the column is already XML and shouldn't be parsed. It can be used only with the `hide` directive.
`xmltext`	The column will be wrapped up into a tag that will be stuck into the rest of your document. It also can be used only with the `hide` directive.
`cdata`	This will wrap the column data in a `CDATA` section with no encoding at all. Can be used only with string and text types, and it can be used only with the `hide` directive.

The OPENXML function works similarly to how OPENROWSET and OPENQUERY work in that it returns a rowset that can be used in a SELECT statement. The full XML process is defined in the following list:

Use the sp_xml_preparedocument system stored procedure to create a document handle.
Use the OPENXML statement to define the format of the XML document and return the rowset.
Use the sp_xml_removedocument system stored procedure to destroy the document handle.

Data Transformation Services

SQL Server 2000 provides a great tool that will import and export data for you, called the Data Transformation Services Import/Export Wizard. This tool uses SQL Server DTS to copy data into and out of SQL Server using nice, easy-to-understand graphical tools.