Section 8.4. Calculating a Mode

   

8.4 Calculating a Mode

8.4.1 Problem

You want to calculate a modal average of the bulb-life results in your database. Recall from the discussion earlier in this chapter that the mode represents the most frequently occurring value in a sample.

8.4.2 Solution

SQL Server is not equipped with a mode function, so calculating the mode is a bit more difficult than calculating the mean. As the following solution shows, you can calculate the mode using a creative combination of COUNT and TOP:

 SELECT TOP 1 COUNT(*) frequency, Hours mode  FROM BulbLife WHERE TestPit=1 GROUP BY hours ORDER BY COUNT(*) DESC frequency   mode         ----------- -----------  2           1085 

8.4.3 Discussion

Although it appears strange at first, how the query works becomes clear once you think about the basic definition for mode. A mode is the value that occurs most frequently in an observed sample. You can begin by writing a query to group values together:

 SELECT Hours FROM BulbLife WHERE TestPit=1 GROUP BY hours hours        -----------  1043 1057 1077 1085 1093 1099 1109 1114 1129 

Next, add the COUNT function to the query to include a count of each distinct value with the query's results:

 SELECT COUNT(*) frequency, Hours FROM BulbLife WHERE TestPit=1 GROUP BY Hours frequency   Hours        ----------- -----------  1           1043 1           1057 1           1077 2           1085 1           1093 1           1099 1           1109 1           1114 1           1129 

Finally, use an ORDER BY clause to put the results in descending order by frequency, so that the most frequently occurring value is listed first. Then, use the TOP 1 syntax in your SELECT clause to limit the results to the first row. The hours value in that first row will be the mode.

What happens when you have more than one mode in the observed sample and you need to report all such values? In our hypothetical bulb-life data, the mode for Test Pit #1 is 1085, while the mode for Test Pit #2 is 1043. For both modes, the occurrence count is 2. If you want the mode for all light bulbs , regardless of test pit, then both values should be returned. The following query shows one way to deal with this:

 SELECT COUNT(*) frequency, Hours mode FROM BulbLife GROUP BY Hours HAVING COUNT(*)>= ALL(    SELECT COUNT(*)    FROM BulbLife    GROUP BY Hours) frequency   mode        ----------- -----------  2           1043 2           1085 

The subquery that you see in this example returns a list of all occurrence counts for all distinct values in the BulbLife table. It follows , logically, that one of those counts will represent a maximum. The HAVING clause in the outer query specifies that the occurrence count be greater than or equal to all values returned by the subquery, which, in effect, restricts the results to only those rows with an occurrence count that equals the maximum occurrence count.

Be aware that mode is a weak statistic. The mode can be useful if you know the distribution and the nature of the sample, but it can easily be altered by adding a few cases with extreme values. For example, let's say that we get 2 additional bulbs with a duration of 1129 hours. The mode is then 1129, which is misleading information, since all other bulbs lasted for much shorter periods.

   


Transact-SQL Cookbook
Transact-SQL Cookbook
ISBN: 1565927567
EAN: 2147483647
Year: 2005
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net