CASE STUDY Overworking Functions

   

CASE STUDY
Overworking Functions

You've taken on The Sisyphus Corporation as a client to help it solve a data tracking problem. Sisyphus wants to track information about purchase orders (POs) that it has prepared for its vendors. As the vendors send invoices for payment, the Sisyphus staff enters information into an Excel workbook. The information includes data such as PO number, PO date, invoice number, invoice amount, invoice date, and so on.

Over time, Sisyphus acquires a considerable amount of information, some of which appears in Figure 1.1.

Figure 1.1. Not all records with the same PO number are grouped together; the sort is by column F, Date Invoice Received.

graphics/01fig01.gif


NOTE

Although the vendors' names have been altered, the information shown in Figure 1.1 is genuine, taken from a workbook maintained by a mid-sized company (which is not named Sisyphus). All the case studies, and most of the figures, in this book are taken from real-world examples.


The data shown in Figure 1.1 actually extends from Row 1 to Row 6045. As part of your client's management and control process, Sisyphus wants to compare periodically the total amount paid on each PO with the amount for which the PO was originally drawn. To make that comparison, Sisyphus needs the total amount of all the invoices submitted for each PO.

The staff who created the worksheet inserted new rows in the midst of the worksheet to hold PO totals. This approach was reasonable, if rudimentary, when the worksheet contained only a few PO and invoice records. When a new invoice arrived, they took the following steps, using a worksheet with the same data as Figure 1.1, but with extra rows to show invoice totals:

  1. Sort the entire data range by PO number.

  2. Choose Find from the Edit menu to locate the first record with a PO number matching the current invoice.

  3. Scroll down to find the totaling row for that PO.

  4. Insert a blank row just above the totaling row.

  5. Enter information about the current invoice in the new row.

  6. Correct the range address in the totaling formula so that it captures the new invoice amount (see Figure 1.2).

    Figure 1.2. This intuitively obvious approach creates huge problems when it comes time to analyze the data more fully.

    graphics/01fig02.jpg


The arrangement shown in Figure 1.2 causes several problems:

  • Sorting the data for example, to get subtotals by month instead of by PO becomes clumsy. Those extra totaling rows get in the way.

  • Putting in new data, and pointing the SUM function at the proper cells, is subject to keying errors.

  • Each time a new row is inserted, and new data is entered, and a SUM formula is edited, all the SUM formulas recalculate. Sisyphus now has so much data that the process is unacceptably slow.

  • The user can't get a total of all invoices via something like =SUM(H1:H7000) because the PO subtotals get included.

When the worksheet contained roughly 50 POs, and 100 to 200 invoices, this approach wasn't a bad idea. It was quick and dirty, true, but it was quick.

By the time the worksheet grew to contain more than 3,000 POs and more than 6,000 invoices, things had gotten out of hand and Sisyphus called you in. These situations are insidious: They worm their way into how a department does business, people get trained to use them and give them up only reluctantly, and a "that's the way we've always done it" mindset takes over.

A slightly more sophisticated approach might have been more straightforward in the long run. Here's one possibility:

  1. Use a pivot table, or the Advanced Filter, to create a list of unique PO numbers. Figure 1.3 shows a pivot table with the unique PO numbers; see Chapter 3, "Excel's Lists, Names, and Filters," for information on how to use Excel's Advanced Filter.

    Figure 1.3. This pivot table counts the number of invoice records associated with each PO.

    graphics/01fig03.gif


  2. Suppose that the list created in step 1 is in AB3:AB3540. In (for instance) cell AE1, you might array-enter this formula:

     =SUM(IF($A$3:$A$6045=AB3,$H$3:$H$6045,0)) 

    and then copy and paste that formula into AE3:AE3540. Figure 1.4 shows the result.

    Figure 1.4. The curly brackets around the formula in the Formula Bar show that it's an array formula.

    graphics/01fig04.jpg


This array formula begins with a conditional:

 IF($A$3:$A$6045=AB3, 

The conditional evaluates as TRUE when a value in A2:A6045 equals the value in cell AB3. The formula continues:

 $H$3:$H$6045,0)) 

In words, for a row in column A where the conditional is true, the formula returns the corresponding value from column H, and otherwise it returns a zero. Finally, that logic is submitted to the SUM function:

 =SUM(IF($A$3:$A$6045=AB3,$H$3:$H$6045,0)) 

which adds the value in column H when the PO number in cell AB3 is found in column A, and adds zero otherwise.

NOTE

You array-enter a formula with the key combination Ctrl+Shift+Enter, instead of merely pressing the Enter key. You'll see several more examples of array formulas in Chapter 2.


This sequence of steps results in 3,538 array formulas, each providing the sum of the invoice amounts for each PO number. You can compare the results with the amount for which each PO was drawn this comparison tells you which PO amounts have been overrun and which have not yet been used up.

After brief consideration, you realize that this approach won't work. The problem is that it's going to take what seems like forever for all those formulas to calculate. And the moment that Sisyphus makes any change to the content of the worksheet, the formulas all have to recalculate (unless you've had the foresight to set Calculation to Manual, and later remembered to turn it back to Automatic). On a 1.80GHz Pentium 4, that recalculation takes more than a minute with the data shown in Figure 1.1.

Even a minute is too long to wait. Your client has already acquired far too much data for Excel to handle comfortably and effectively. A much better approach is to use a pivot table with PO Number as the row field and Invoice Amount as the data field (see Figure 1.5).

Figure 1.5. The figures shown in pivot tables are values, not formulas that must be recalculated.

graphics/01fig05.jpg


Better yet would be to store the PO and invoice data in a true relational database, because doing so keeps all the individual detail records out of the workbook. With that arrangement, you could

  • Return the data to the workbook in a pivot table, specifying External Data Source as your data location.

  • Return the data to the workbook in an external data range, using a Select query as the data source and grouping on PO number. (See Chapters 4, "Importing Data: An Overview," and 5, "Using Microsoft Query," for information about Select queries.)

In either case, you wind up with values, not formulas, showing the total invoice amount for each PO. This means that Sisyphus doesn't need to wait for 3,538 recalculations to complete whenever there's a change in the underlying data, or when someone edits the worksheet, kicking off a recalculation. Even without recalculating formulas, it's quick and easy to update the totals as new invoices arrive or new POs are created; all your client needs to do is refresh the pivot table or the external data range.



Managing Data with Microsoft Excel
Managing Data with Microsoft Excel
ISBN: 789731002
EAN: N/A
Year: 2003
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net