|< Day Day Up >|
In order to determine whether the new shipment method is a success, we will need to generate purchases for an entire year and examine the number of orders that are generated. To do this, we will need to utilize a revised copy of the GeneratePurchases routine introduced in the last chapter.
A Revised Copy of GeneratePurchases
In Chapter 5 we used the GeneratePurchases routine to simulate daily purchasing activity for each store on a yearly basis. Orders were generated when purchases brought the quantity levels at or below the minimum thresholds allowed. At the beginning of each day, the orders table was checked for unfilled orders and necessary shipments were generated.
The first overloaded GeneratePurchases routine in the LoadSampleData program for Chapter 6 does something similar. The code for this routine is seen as follows:
'Maximum # of purchases per day Dim nMaxPurchases As Int16 'Maximum # of products per purchase Dim nMaxProducts As Int16 'Maximum value of quantity per product Dim nMaxQuantity As Int16 Dim sYear As String = "2002" Dim nStoreID As Int16 'These are the Product Types in which there 'is an increased chance of product selection 'The default of 1,2 represents snack foods 'and beverages Dim sProductTypeAdj As String 'These are the Vendors in which there is an 'increased chance of product selection 'The default of 20,24,27 represents Kamp, Notch, 'and PNuts as Vendors Dim sVendorAdj As String Dim i As Int16 Try 'Process for all 5 stores For i = 1 To 5 nStoreID = i If i = 1 Then nMaxPurchases = 80 nMaxProducts = 8 nMaxQuantity = 5 sProductTypeAdj = "1,2" sVendorAdj = "20,24,27" ElseIf i = 2 Then nMaxPurchases = 60 nMaxProducts = 10 nMaxQuantity = 3 sProductTypeAdj = "2,6" sVendorAdj = "13,18" ElseIf i = 3 Then nMaxPurchases = 70 nMaxProducts = 6 nMaxQuantity = 4 sProductTypeAdj = "2" sVendorAdj = "34" ElseIf i = 4 Then nMaxPurchases = 90 nMaxProducts = 5 nMaxQuantity = 2 sProductTypeAdj = "" sVendorAdj = "34,18" ElseIf i = 5 Then nMaxPurchases = 50 nMaxProducts = 12 nMaxQuantity = 6 sProductTypeAdj = "6" sVendorAdj = "24,27,34" End If Dim params(2) As SqlParameter params(0) = New SqlParameter("@ID", SqlDbType.Int) params(1) = New _ SqlParameter("@ProdTypeAdj", SqlDbType.VarChar, 50) params(2) = New _ SqlParameter("@VendorAdj", SqlDbType.VarChar, 50) params(0).Value = nStoreID params(1).Value = sProductTypeAdj params(2).Value = sVendorAdj Dim ds As DataSet = _ SqlHelper.ExecuteDataset(sConn, _ CommandType.StoredProcedure, "GetProductIDS", params) Dim j As Int16 = 1 Dim dtDate As DateTime = _ Convert.ToDateTime("01/01/" + sYear) 'Loop through everyday of the year 'we assume the store is open everyday Randomize() Do Until j = 182 'First thing is check to see if orders 'needs to be fulfilled for this day and store 'We assume that all orders are shipped 3 days 'after the orderdate in one shipment Dim params1(1) As SqlParameter params1(0) = New _ SqlParameter("@StoreID", SqlDbType.Int) params1(1) = New _ SqlParameter("@Date", SqlDbType.SmallDateTime) params1(0).Value = nStoreID 'order was placed 3 days ago params1(1).Value = dtDate.AddDays(-3) SqlHelper.ExecuteReader(sConn, _ CommandType.StoredProcedure, _ "InsertShipment", params1) 'We will also call a new stored procedure here 'called CheckShipmentSchedules which will 'be used to see if the store needs orders 'generated for this day. Dim params1b(1) As SqlParameter params1b(0) = New _ SqlParameter("@StoreID", SqlDbType.Int) params1b(1) = New _ SqlParameter("@Date", SqlDbType.SmallDateTime) params1b(0).Value = nStoreID params1b(1).Value = dtDate SqlHelper.ExecuteNonQuery(sConn, _ CommandType.StoredProcedure, _ "CheckShipmentSchedule", params1b) Dim x As Int16 = 1 'This will be the total number of 'purchases for this day Dim nPurchases As Int16 = _ CInt(Int((nMaxPurchases * Rnd()) + 1)) Do Until x = nPurchases + 1 Dim y As Int16 = 1 Dim nEmployeePos As Int16 = _ CInt(Int((ds.Tables(1).Rows.Count * Rnd()))) Dim nEmployeeID As Integer = _ Convert.ToInt32(ds.Tables(1).Rows(nEmployeePos).Item(0)) Dim params2(2) As SqlParameter params2(0) = New _ SqlParameter("@ID1", SqlDbType.Int) params2(1) = _ New SqlParameter("@Date", SqlDbType.SmallDateTime) params2(2) = New SqlParameter("@ID2", SqlDbType.Int) params2(0).Value = nStoreID params2(1).Value = dtDate params2(2).Value = nEmployeeID Dim nPurchaseID As Integer = _ SqlHelper.ExecuteScalar(sConn, _ CommandType.StoredProcedure, _ "InsertPurchase", params2) 'This is total number of products for 'this(purchase) Dim nProducts As Int16 = _ CInt(Int((nMaxProducts * Rnd()) + 1)) Do Until y = nProducts + 1 'This is quantity for this purchase Dim nQty As Int16 = _ CInt(Int((nMaxQuantity * Rnd()) + 1)) 'This is the product for this detail record Dim nProductPos As Int16 = _ CInt(Int((ds.Tables(0).Rows.Count * Rnd()))) Dim nProductID As Integer = _ Convert.ToInt32(ds.Tables(0).Rows(nProductPos).Item(0)) 'Generate the detail record Dim params3(4) As SqlParameter params3(0) = New _ SqlParameter("@StoreID", SqlDbType.Int) params3(1) = New _ SqlParameter("@ProductID", SqlDbType.Int) params3(2) = New _ SqlParameter("@PurchaseID", SqlDbType.Int) params3(3) = New _ SqlParameter("@Qty", SqlDbType.Int) params3(4) = New _ SqlParameter("@Date", SqlDbType.SmallDateTime) params3(0).Value = nStoreID params3(1).Value = nProductID params3(2).Value = nPurchaseID params3(3).Value = nQty params3(4).Value = dtDate 'Note that the stored procedure name being 'called is different than the one from Chapter 4 SqlHelper.ExecuteScalar(sConn, _ CommandType.StoredProcedure, _ "InsertPurchaseDetailNew", params3) y += 1 Loop x += 1 Loop j += 1 ProgressBar1.Value = j dtDate = dtDate.AddDays(1) 'Go to the next day Loop Next MessageBox.Show("Purchases for all 5 stores were generated _ successfully") Catch ex As Exception MessageBox.Show(ex.Message) End Try
It too will simulate random purchase generation for each store. The difference is that we now will use the ShipmentSchedules table to determine when products should be ordered. In addition, the time frame has been lowered to six months to shorten the amount of processing time necessary.
The newly created InsertPurchaseDetailNew stored procedure is used not only to generate purchase records, but also to verify that the product quantity requested is available. If the product quantity is not available, a record is written to the newly created ProductAvailabilityTracking table. Management can use this table to measure the success of the new shipment method. Each record indicates an attempt by a customer to purchase a product not available.
In a real-world situation it would be difficult to utilize such a table. After all, most customers will not announce or even know that they are trying to purchase a product not available. More than likely a store would utilize some other method of tracking product availability. This could involve store managers routinely checking the store aisles or an automated method of querying the database.
The important thing to note is that the application of predictions by data-mining algorithms should be continually measured and evaluated. It is not good enough to simply use Analysis Services to make predictions and then assume that everything will work fine.
For demonstration purposes, you can execute the GeneratePurchases routine by clicking the Generate Purchases button on Form1. The form does not contain input parameters like the ones used in the Chapter 5 application. Instead, the values are hard-coded within the routine itself.
After running the GeneratePurchases routine using the Version 1 database file, we see some interesting results. If you were to perform the following query in Query Analyzer, SELECT COUNT(ShipmentID) FROM Shipments WHERE StoreID = 1 and ShippedDate > '01/01/2002', you might get a return value of 53. So for six months in 2002, only 53 shipments were made to Store 1. This number is significantly less than the 295 shipments made the entire year before. It would be very easy to consider these numbers alone indicative of success, but we must consider something else.
For this chapter, a new table named ProductAvailabilityTracking was created. This table is written to every time a randomly generated purchase is attempted and the product is not available. For Store 1, 1232 records were written to this table. This indicates that in the year 2002 more than 1200 customers were displeased to find that the product they wanted was not available. Any business major will tell you that missed sales opportunities and negative customer impressions can outweigh the benefits of reducing operational costs. In the next section we will examine another method of applying predictions that may lead to better results.
Tuning the Mining Model
Even though the initial run of our Generate Purchases scenario indicated success in that shipments were reduced, further analysis uncovered a flaw. It was discovered that using the mining model would result in a large number of customers not being able to purchase the products they needed.
The biggest factor is the data itself. You will train the mining model based on a set of data that was preselected. For real-world applications this will be a subset of the data that has been scrutinized or cleaned (as was discussed in Chapter 5). Chapter 5 relied on simulated data generated for all five stores in one year. The simulation was based on parameters supplied to the GeneratePurchases routine. Depending on the parameters selected, a varied number of records would be generated.
Utilizing a Version 2 Database File
To demonstrate the significance of the training data utilized, another database file has been provided. The Version 2 database files are also available for download from the book's Web site. This database version was the result of modifying the parameters used to generate purchases in Chapter 5. Table 6.3 lists the parameters along with the values used to build the Version 1 and Version 2 database files.
Version 1 is a snapshot of the data as it would look if the steps in Chapter 5 were followed. Version 2 is an alternative version of the database utilizing different input parameters. The input parameters introduced in Chapter 5 have been lowered to introduce more variability and speed up the processing time.
At this time you should attach the Version 2 file by performing the following steps:
Once the files are attached, you will need to click the Apply Shipment Schedules button so that the mining model is refreshed and new values are written to the ShipmentSchedules table. After the routine completes successfully, open up Analysis Manager and expand the SavingsMart database. Right-click the Analyze Shipments mining model and click Browse. You should instantly see that the model content is significantly different. The tree contains more branches and involves product type. The first factor affecting days between shipments is Store ID as opposed to vendor name. If you double-click the Store ID = 1 node, you will drill down farther into the tree. Figure 6.5 is a screenshot of what this view would look like.
Figure 6.5. Screenshot of the Content Detail in Analysis Manager once the model has been refreshed with data from the Version 2 backup file. The next split is based on product type and finally vendor name. This is in stark contrast to the results seen when utilizing the Version 1 backup file.
Since product type is now considered a factor when determining days between shipments, we will have to modify the query used to retrieve predictions from the mining model. The PopulateShipmentSchedulesTable procedure contains the code used to generate this query; this is the commented code that was discussed earlier. The string shown below is an example of what the new query string would look like.
SELECT Predict([Days Since Last Shipped]),Predict(Quantity) FROM [Analyze Shipments] PREDICTION JOIN (SELECT 'Ables' as VendorName, 1 as StoreID, 'Beverages' as Type) as t ON [Analyze Shipments].[Vendor Name] = t.VendorName AND [Analyze Shipments].[Product Type] = t.Type AND [Analyze Shipments].[Store ID] = t.StoreID
Note that the query now includes product type as a filter. It also joins the mining model based on product type so that more granularity is achieved.
If you were to execute the GeneratePurchases routine by clicking Generate Purchases, you would attain different results. For instance, the query, SELECT COUNT(ShipmentID) FROM Shipments WHERE StoreID = 1 and ShippedDate > '01/01/2002', might return a result of 4. This result is a definite improvement over the amount of 53 attained when processing the Version 1 database file. The best part is that the entries to the ProductAvailabilityTracking table have also been reduced. They went from 1232 for Store 1 to 118. Thus we have not only reduced the number of shipments, but we have also ensured greater product availability.
Other factors that can affect the predictions of a mining model are based on the number of splits that occur within the decision tree. Some of these factors can be controlled through the use of parameters assigned to the model. Table 6.4 lists two parameters available with the decision trees mining model.
|< Day Day Up >|