Chapter 39: Summarizing Data with Database Statistical Functions


Overview

Joolas is a small makeup company. In a Microsoft Office Excel 2007 worksheet, they track each sales transaction. Often, they want to answer questions such as:

  • How many dollars worth of lip gloss did Jen sell?

  • What was the average number of lipstick units sold each time Jen made a sale in the East region?

  • What was the total dollar amount of all makeup sold by Emilee or in the East region?

  • How many dollars worth of lipstick were sold by Colleen or Zaret in the East region?

  • How many lipstick transactions were not in the East region?

  • How many dollars worth of lipstick did Jen sell during 2004?

  • How many units of makeup were sold for a price of at least $3.20?

  • What is the total dollar amount each salesperson sold of each makeup product?

  • What helpful tricks can I use to set up criteria ranges?

  • I have a database that lists for each sales transaction the revenue, the date sold, and the product ID code. Given the date sold and the ID code for a transaction, is there an easy way to extract the transaction’s revenue?

As you saw in Chapter 38, “Using PivotTables to Describe Data,” Microsoft PivotTables are a great tool for summarizing data. Often, however, a PivotTable gives us much more information than we need. Database statistical functions make it easy to answer any “reporting” question without having to create a PivotTable.

You are already familiar with functions such as SUM, AVERAGE, COUNT, MAX, and MIN. By prefixing a D (which stands for database) to these (and other functions), you create database statistical functions. But what does the DSUM function do, for example, that the SUM function can’t? Whereas the SUM function adds up every cell in a cell range, the DSUM function enables you to specify (by using criteria) a subset of rows to add together in a cell range. For example, suppose we have a sales database for a small makeup company that contains the following information about each sales transaction:

  • Name of salesperson

  • Transaction date

  • Product sold

  • Units sold

  • Dollars of revenue generated per transaction

  • Region of country where the transaction took place

You can find this data in the file Makeupdb.xlsx, which is shown in Figure 39-1.

image from book
Figure 39-1: We’ll use this data to describe how to work with database statistical functions.

Using the DSUM function with appropriate criteria, we could, for example, add up the revenue generated only by transactions involving lip gloss sales in the East during 2004. Essentially, the criteria we set up flags those rows that we want to include in the total sum. Within these rows, the DSUM function acts like the ordinary SUM function.

The syntax of the DSUM function is:

 DSUM(database,field,criteria)

  • Database is the cell range that makes up the database. The first row of the list contains labels for each column.

  • Field is the column containing the values you want the function to add. You can define the field by enclosing the column label in quotation marks. (For example, we would designate the Dollars column by entering "Dollars".) The field can also be specified by using the position of the column in the database, measured from left to right. For example, our database will use columns H through M. (We did not include the Transactions column as part of our database.) We could specify column H as field 1 and column M as field 6.

  • Criteria refers to a cell range that specifies the rows on which the function should operate. The first row of a criteria range must include one or more column labels. (The only exception to this rule is computed criteria, which I’ll discuss in the last two examples in this chapter.) As our examples illustrate, the key to creating a criteria range is to understand that multiple criteria in the same row are joined by AND, whereas criteria in different rows are joined by OR.

Now let’s go on to some examples that illustrate the power and versatility of database statistical functions.

  • How many dollars worth of lip gloss did Jen sell?

  • In this example, we want to apply DSUM to column 5 of the database. Column 5 contains the dollar volume for each transaction. (I gave the name data to our database, which consists of the range H4:M1895.) Our criteria range in O4:P5 flags all rows in the database in which Name equals Jen and Product equals lip gloss. Thus, entering in cell N5 (see Figure 39-2) the formula DSUM(data,5,O4:P5) calculates the total dollar amount of lip gloss sold by Jen. We could have also entered the formula as DSUM (data,"Dollars",O4:P5). We find that Jen sold $5,461.61 worth of lip gloss. In cell N6 we found the same answer by using the SUMIFS function (see Chapter 19, “The SUMIF, AVERAGEIF, SUMIFS, and AVERAGEIFS Functions”) with the formula =SUMIFS(Dollars,Name,"Jen",Product,"lip gloss").

    image from book
    Figure 39-2: Database statistical functions

  • What was the average number of lipstick units sold each time Jen had a sale in the East region?

  • We can compute this number by entering in cell N8 the formula DAVERAGE (data,4,O7:Q8). Using 4 as the value for field specifies the Units column, and the criteria range O7:Q8 flags all rows in the database in which Name equals Jen, Product equals lipstick, and Location equals East. Using DAVERAGE ensures that we average the units sold for the flagged rows. We find that, on average, Jen sold 42.25 units of lipstick in transactions in the East region. In cell N9, we found the same answer by using the formula =AVERAGEIFS(Units,Name,"Jen",Product,"lipstick",Location,"east").

  • What was the total dollar amount of all makeup sold by Emilee or in the East region?

  • In cell N11, we can compute the total dollars ($76,156.48) of sales made by Emilee or in the East by using the formula DSUM(data,5,O10:P12). The criteria in O10:P12 flags sales in the East or by Emilee. This is because criteria in different rows are treated as “OR.” The great programmers at Microsoft have ensured that this formula will not double-count Emilee’s sales in the East. Here, we cannot use SUMIFS to easily find the answer.

  • How many dollars worth of lipstick were sold by Colleen or Zaret in the East region?

  • The formula DSUM(data,5,O13:Q15) in cell N14 computes the total lipstick revenue generated through Colleen and Zaret’s sales ($1,073.20) in the East. Notice that O14:Q14 specifies criteria that selects Colleen’s lipstick sales in the East, and O15:Q15 specifies criteria that selects by Zaret’s lipstick sales in the East. Remember that criteria in different rows are treated as “or.” In cell N15, we also found the answer with the formula =SUMIFS(Dollars,Name,"Colleen",Product,"lipstick",Location,"east")+SUMIFS(Dollars, Name,"Zaret",Product,"lipstick",Location,"east").

  • How many lipstick transactions were not in the East region?

  • In cell N17, we compute the total number of lipstick transactions (164) outside the East region with the formula DCOUNT(data,4,O16:P17). I use DCOUNT in this problem because we want to specify criteria by which the function will count the number of rows involving lipstick sales and regions other than the East. Excel treats the expression <>East in the criteria range as “not East.” For this problem, using SUMIFS would require that we have a SUMIFS function for each region.

  • Because the COUNT function counts numbers, we need to refer to a column containing numerical values. Column 4, the Units column, contains numbers, so I designated that column in the formula. The formula DCOUNT(data,3,O16:P17) would return 0 because there are no numbers in the database’s third column (which is column J in the worksheet). Of course, the formula DCOUNTA(data,3,O16:P17) would return the correct answer because COUNTA counts text as well as numbers.

  • How many dollars worth of lipstick did Jen sell during 2004?

  • The trick here is how to flag only sales that occurred in 2004. By including in one row of our criteria range a reference to the Date field, using the expressions >=1/1/2004 and <1/1/2005, we capture only the 2004 sales. Thus, entering in cell N19 the formula DSUM(data,5,O18:R19) computes the total lipstick sales by Jen ($1,690.79) after January 1, 2004, and before January 1, 2005. In cell N20, we found the answer to this problem with the formula =SUMIFS(Dollars,Date,">=1/1/2004",Date,"<=12/31/ 2004",Product,"lipstick",Name,"Jen").

  • How many units of makeup were sold for a price of at least $3.20?

  • This example involves computed criteria. Basically, computed criteria flags rows of the database on the basis of whether a computed condition is true or false for that row. For this question, we want to flag each row that contains Dollars/Units>=$3.20. When setting up a computed criteria (see Figure 39-3), the label in the first row above the computed criteria must not be a column label. For example, you can’t use Name, Product, or another label from row 4 of this worksheet. The computed criteria is set up to be a formula that returns True based on the first row of information in the database. Thus, to specify rows in which the average price is greater than or equal to $3.20, we need to enter =(L5/K5)>=3.2 in our criteria range below a heading that is not a column label. If the first row of data does not satisfy this condition, you will see FALSE in the worksheet, but Excel will still flag all rows having a unit price that’s greater than or equal to $3.20. Entering in N22 the formula DSUM(data,4,O21:O22) computes the total number of units of makeup sold (1127) in orders for which the unit price was greater than or equal to $3.20. Note that cell O22 contains the formula =(L5/K5)>=3.2.

    image from book
    Figure 39-3: Computed criteria

  • What is the total dollar amount each salesperson sold of each makeup product?

  • For this problem, I’d like to use a DSUM function whose criteria range is based on both the Name and Product columns. Using a data table, I can easily “loop through” all possible combinations of Name and Product in the criteria range and compute the total revenue for each Name and Product combination.

  • I begin by entering any name in cell X26 and any product in cell Y26. Then I enter in cell Q25 the formula DSUM(data,5,X25:Y26), which computes total sales revenue for (in this case) Betsy and eye liner. Next, I enter each salesperson’s name in the cell range Q26:Q33 and each product in the cell range R25:V25. Now select the data table range (Q25:V33). On the Data tab, in the Data Tools group, click What-If Analysis, and then click Data Table. Choose cell X26 as the Column Input Cell and Y26 as the Row Input Cell. We then obtain the results shown in Figure 39-4 on the next page. Each entry in the data table computes the revenue generated for a different name/product combination because the data table causes the names to be placed in cell X26 and the products to be placed in cell Y26. For example, we find that Ashley sold $3,245.44 worth of lipstick.

    image from book
    Figure 39-4: Combining data tables with a DSUM function

  • This example shows how combining data tables with database statistical functions can quickly generate many statistics of interest.

  • We also solved this problem by copying from cell R37 to R37:V44 the formula =SUMIFS(Dollars,Name,$Q37,Product,R$36).

  • What helpful tricks can I use to set up criteria ranges? Here are some examples of little tricks that might help you set up an appropriate criteria range. Suppose the column label in the first row of the criteria range refers to a column containing text (for example, column H).

    • *Allie* will flag records containing the text string Allie in column H.

    • A?X will flag a record if the record’s column H entry begins with A and contains the character X as its third character. (The second character can be anything!)

    • <>*B* will flag a record if column H’s entry does not contain any Bs.

  • If a column (for example, column I) contains numerical values,

    • >100 will flag a record if column I contains a value greater than 100.

    • <>100 will flag a record if column I contains a value not equal to 100.

    • >=1000 will flag a record if column I contains a value greater than or equal to 1000.

  • I have a database that lists for each sales transaction the revenue, the date sold, and the product ID code. Given the date sold and the ID code for a transaction, is there an easy way to capture the transaction’s revenue?

  • The file Dget.xlsx (see Figure 39-5) contains a database that lists revenues, dates, and product ID codes for a series of sales transactions. If you know the date and the product ID code for a transaction, how can you find the transaction’s revenue? With the DGET function, it’s simple. The syntax of the DGET function is DGET(database,field#,criteria). Given a database (a cell range) and a field# in the database (counting columns from left to right across the range) the DGET function returns the entry in column field# from the database record satisfying the criteria. If no record satisfies the criteria, the DGET function returns the #VALUE error message. If more than one record satisfies the criteria, the DGET function returns the #NUM! error message.

    image from book
    Figure 39-5: DGET function

  • Suppose that we want to know the revenue for a transaction involving product ID code 62426 that occurred on 1/9/2006. Assuming that the transaction involving this product on the given date is unique, the formula (entered in cell G9) DGET(B7:28,G5:H6) yields the transaction’s revenue of $992. Note that we used 1 for the field# argument because Revenue is listed in the first column of the database. (Our database is contained in the cell range B7:D28). The criteria range G5:H6 ensures we find a transaction involving product 62426 on 1/9/2006.




Microsoft Press - Microsoft Office Excel 2007. Data Analysis and Business Modeling
MicrosoftВ® Office ExcelВ® 2007: Data Analysis and Business Modeling (Bpg -- Other)
ISBN: 0735623961
EAN: 2147483647
Year: 2007
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net