Data Mining Extensions | Delivering Business Intelligence with Microsoft SQL Server 2005: Utilize Microsofts Data Warehousing, Mining & Reporting Tools to Provide Critical Intelligence to A

In Chapter 11, we learned about the MDX query language and its use in querying OLAP cubes. Data raining structures also have their own query language, known as Data Mining Extensions (DMX).

In this section, we look at the most prominent aspect of DMX, the prediction query. The prediction query provides a means to have a data mining model create predictions based on rows in another table. In fact, you already used DMX prediction queries when you used the Mining Model Prediction tab.

Prediction Query Syntax

The prediction query looks much like a SELECT statement in T-SQL. In fact, it has many of the same clauses as a SELECT statement. We look at each of those clauses and how they are used.

SELECT Clause

The SELECT clause in a DMX query serves the same function as the SELECT clause in a T-SQL statement. The SELECT clause specifies what fields or calculations are going to be included as columns in the result set. This is done by placing a field list following the word "SELECT". For example:

 SELECT Fname, Lname, Address1, City, State_Province, Postal_Code

The SELECT statement has two special keywords that may precede the field list: FLATTENED and TOP n. The FLATTENED keyword causes the prediction query to return a flattened result set of rows and columns. When the FLATTENED keyword is not used, the SELECT statement may return a hierarchical result set.

The TOP n keyword works the same here as it does in the T-SQL SELECT statement: it limits the number of rows in the result set to the number specified for n. We can order the result set, perhaps by a probability in descending order, and then use the TOP n keywords to include only the rows with the highest probability.

FROM Clause

The FROM clause specifies the mining model being used in the query. For example:

 FROM [Neural Network - Children At Home]

PREDICTION JOIN Clause

The PREDICTION JOIN clause lets us feed information into the inputs of the mining model. The content or absence of this clause determines the type of prediction query being run. See the section "Types of Prediction Queries" for details on the syntax of the PREDICTION JOIN clause.

WHERE Clause

The WHERE clause enables us to filter the result of the SELECT statement. The filter conditions in the WHERE clause are applied after the predictions are made by the mining model. Therefore, we can use the prediction result as part of the filter. For example, if the Neural Network - Children At Home mining model is predicting Num Children At Home, we can create a filter to give us only those records where Num Children At Home was predicted to be 0:

 WHERE [Neural Network - Children At Home].[Num Children At Home] = 0

ORDER BY Clause

The ORDER BY clause lets us sort the result set: we can sort in either ascending or descending order. Adding DESC to the clause provides a descending sort. Adding ASC to the clause provides an ascending sort. ASC is the default sort order when neither ASC nor DESC is specified.

The ORDER BY clause in T-SQL includes a comma-separated list of fields to provide the sort columns. The ORDER BY clause in DMX is a bit different. It allows only a single expression. If we want to sort by more than one column, we need to create an expression concatenating these two columns, as follows:

 ORDER BY [Lname] + [Fname] ASC

Types of Prediction Queries

Several types of prediction queries are available to us. As stated earlier, the format or absence of the PREDICTION JOIN clause determines the type of prediction query being created. Let's look at the format and operation of each type.

PREDICTION JOIN

The standard prediction join query enables us to take records from another data source and feed those results into our mining model to create predictions. For this to happen, we must specify where the joined data is coming from. We also need to specify how the columns in the data source are mapped to the input columns of the mining model.

A typical PREDICTION JOIN clause looks like this:

 PREDICTION JOIN    OPENQUERY([Max Min Sales DM],      'SELECT        [Fname],        [Lname],        [Address1],        [City],        [State_Province],        [Postal_Code],        [Num_Cars_Owned],        [Houseowner],        [Marital_Status]     FROM        [MaxMinSalesDM].[Customer]      ') AS t ON    [Neural Network - Children At Home].[Num Cars Owned] = t.[Num_Cars_Owned] AND    [Neural Network - Children At Home].[Houseowner] = t.(Houseowner] AND    [Neural Network - Children At Home].[Marital Status] = t.[Marital_Status]

Remember, the DMX query is running on an Analysis Services server. If we want to include data from a relational table in the prediction join, we need to use the appropriate syntax for utilizing data from another server. This is the OPENQUERY() function.

The OPENQUERY() function executes the specified statement on the server indicated. The OPENQUERY() function requires two parameters. The first is the name of the server where the query is to be executed. In the prediction join, this will probably be the name of a data source denned in the Analysis Services database. In the example code, we are using the Max Min Sales DM data source. This data source points to the MaxMinSalesDM relational database.

The second parameter contains the query to be executed. The query is enclosed in single quotes. In our example, we are selecting fields from the Customer table in the MaxMinSalesDM database. The result set from the OPENQUERY function is given an alias, so it can be referred to elsewhere in the prediction join query. The alias is provided by the "AS t" portion of the PREDICTION JOIN clause.

The lines following the ON keyword map columns from the joined data to the input columns of the mining model. This tells the mining model what input values it should use for each prediction. In our example, the Num_Cars_Owned field from the Customer table is used as the Num Cars Owned input column in the mining model. The Houseowner field is used as the Houseowner input and the Marital_Status field is used as the Marital Status input.

NATURAL PREDICTION JOIN

The NATURAL PREDICTION JOIN clause works exactly like the PREDICTION JOIN clause with one convenient exception: it does not include the ON portion. Instead, the NATURAL PREDICTION JOIN clause does its mapping of fields to input columns based on the names of each. This can be convenient when working with mining models with a large number of input columns. As you might guess, for this to work, the names of the fields must exactly match the names of the input columns.

We can make our PREDICTION JOIN example work as a NATURAL PREDICTION JOIN with a slight modification to the query inside the OPENQUERYQ function, as follows:

 NATURAL PREDICTION JOIN   OPENQUERY([Max Min Sales DM],     'SELECT       [Fname],       [Lname],       [Address1],       [City],       [State_Province],       [Postal_Code],       [Num_Cars_Owned] AS [Num. Cars Owned],       [Houseowner],       [Marital_Status] AS [Marital Status]     FROM       [MaxMinSalesDM].[Customer]     ') AS t

By applying aliases to the Num_Cars_Owned and Marital_Status fields, we can make the field names in the OPENQUERY() result exactly match the input column names. Now, no mapping is necessary.

Empty Prediction Join

The empty prediction join is not a join at all. With an empty prediction join, we do not pass any information to the mining model input columns. When we do this, the mining model returns the most likely prediction.

Here is a sample of an empty prediction join:

 SELECT   [Num Children At Home] From   [Neural Network - Children At Home]

Singleton Query

Where the prediction join and the natural prediction join enabled us to feed data from another data source into the mining model, the singleton query lets us enter hardcoded values for each input column. For example:

 SELECT   [Neural Network - Children At Home].[Num Children At Home] From   [Neural Network - Children At Home] NATURAL PREDICTION JOIN (SELECT 'Y' AS [Houseowner],   'N' AS [Marital Status],   '2' AS [Num Cars Owned]) AS t

To create our singleton query, we are using a NATURAL PREDICTION JOIN clause and hardcoding the content of the inner SELECT statement. Of course, to make this work with a natural prediction join, we must alias each column in the inner SELECT statement to match the names of the mining model input columns.

Learn By Doing—Creating a Query in the SQL Server Management Studio

Feature Highlighted

Creating a DMX query in the SQL Server Management Studio

Business Need One of the power users in the marketing department is familiar with DMX queries and wants to be able to create DMX queries against the mining models without the graphical interface. Entering DMX queries into the query view of the Mining Model Prediction tab is possible. However, if we switch back to the design view, we lose everything entered in the query view.

The SQL Server Management Studio provides a better alternative.

Note

Always take care when giving users, even power users who seem to know what they are doing, access to tools such as SQL Server Management Studio. Make sure security is set appropriately, so the users do not do catastrophic damage, either maliciously or by accident. SQL Server 2005 has expanded its security model. See Books Online for more information.

Steps

Open the SQL Server Management Studio.
Connect to the Analysis Services server hosting the MaxMinSalesDM database.
Expand the Databases folder in the Object Explorer window.
Right-click the MaxMinSalesDM database entry in the Object Explorer window and select New Query | DMX from the Context menu. A DMX query window appears.
Select Neural Network - Children At Home from the Mining Model drop-down list.

Enter the following in the query area to create an empty prediction query:

 SELECT   [Num Children At Home] From   [Neural Network - Children At Home]

Click the Execute button in the toolbar. The SQL Server Management Studio appears as shown in Figure 14-26.

Replace the empty prediction query with the following singleton query:

 SELECT   [Neural Network - Children At Home].[Num Children At Home] From   [Neural Network - Children At Home] NATURAL PREDICTION JOIN (SELECT 'Y' AS [Houseowner],   'N' AS [Marital Status],   '2' AS [Num Cars Owned]) AS t

Click the Execute button in the toolbar. The SQL Server Management Studio appears as shown in Figure 14-27.
If you want to give your fingers a workout, try entering and executing the prediction join query shown in Figure 14-25.

image from book
Figure 14-26: An empty prediction query in the SQL Server Management Studio

image from book
Figure 14-27: A singleton query in the SQL Server Management Studio