13.5.1 Problem
A set of observations is incomplete. You want to find out how much so.
13.5.2 Solution
Count the number of NULL values in the set.
13.5.3 Discussion
Values can be missing from a set of observations for any number of reasons: A test may not yet have been administered, something may have gone wrong during the test that requires invalidating the observation, and so forth. You can represent such observations in a dataset as NULL values to signify that they're missing or otherwise invalid, then use summary queries to characterize the completeness of the dataset.
If a table t contains values to be summarized along a single dimension, a simple summary will do to characterize the missing values. Suppose t looks like this:
mysql> SELECT subject, score FROM t ORDER BY subject; +---------+-------+ | subject | score | +---------+-------+ | 1 | 38 | | 2 | NULL | | 3 | 47 | | 4 | NULL | | 5 | 37 | | 6 | 45 | | 7 | 54 | | 8 | NULL | | 9 | 40 | | 10 | 49 | +---------+-------+
COUNT(*) counts the total number of rows and COUNT(score) counts only the number of non-missing scores. The difference between the two is the number of missing scores, and that difference in relation to the total provides the percentage of missing scores. These calculations are expressed as follows:
mysql> SELECT COUNT(*) AS 'n (total)', -> COUNT(score) AS 'n (non-missing)', -> COUNT(*) - COUNT(score) AS 'n (missing)', -> ((COUNT(*) - COUNT(score)) * 100) / COUNT(*) AS '% missing' -> FROM t; +-----------+-----------------+-------------+-----------+ | n (total) | n (non-missing) | n (missing) | % missing | +-----------+-----------------+-------------+-----------+ | 10 | 7 | 3 | 30.00 | +-----------+-----------------+-------------+-----------+
As an alternative to counting NULL values as the difference between counts, you can count them directly using SUM(ISNULL(score)). The ISNULL( ) function returns 1 if its argument is NULL, zero otherwise:
mysql> SELECT COUNT(*) AS 'n (total)', -> COUNT(score) AS 'n (non-missing)', -> SUM(ISNULL(score)) AS 'n (missing)', -> (SUM(ISNULL(score)) * 100) / COUNT(*) AS '% missing' -> FROM t; +-----------+-----------------+-------------+-----------+ | n (total) | n (non-missing) | n (missing) | % missing | +-----------+-----------------+-------------+-----------+ | 10 | 7 | 3 | 30.00 | +-----------+-----------------+-------------+-----------+
If values are arranged in groups, occurrences of NULL values can be assessed on a per-group basis. Suppose t contains scores for subjects that are distributed among conditions for two factors A and B, each of which has two levels:
mysql> SELECT subject, A, B, score FROM t ORDER BY subject; +---------+------+------+-------+ | subject | A | B | score | +---------+------+------+-------+ | 1 | 1 | 1 | 18 | | 2 | 1 | 1 | NULL | | 3 | 1 | 1 | 23 | | 4 | 1 | 1 | 24 | | 5 | 1 | 2 | 17 | | 6 | 1 | 2 | 23 | | 7 | 1 | 2 | 29 | | 8 | 1 | 2 | 32 | | 9 | 2 | 1 | 17 | | 10 | 2 | 1 | NULL | | 11 | 2 | 1 | NULL | | 12 | 2 | 1 | 25 | | 13 | 2 | 2 | NULL | | 14 | 2 | 2 | 33 | | 15 | 2 | 2 | 34 | | 16 | 2 | 2 | 37 | +---------+------+------+-------+
In this case, the query uses a GROUP BY clause to produce a summary for each combination of conditions:
mysql> SELECT A, B, COUNT(*) AS 'n (total)', -> COUNT(score) AS 'n (non-missing)', -> COUNT(*) - COUNT(score) AS 'n (missing)', -> ((COUNT(*) - COUNT(score)) * 100) / COUNT(*) AS '% missing' -> FROM t -> GROUP BY A, B; +------+------+-----------+-----------------+-------------+-----------+ | A | B | n (total) | n (non-missing) | n (missing) | % missing | +------+------+-----------+-----------------+-------------+-----------+ | 1 | 1 | 4 | 3 | 1 | 25.00 | | 1 | 2 | 4 | 4 | 0 | 0.00 | | 2 | 1 | 4 | 2 | 2 | 50.00 | | 2 | 2 | 4 | 3 | 1 | 25.00 | +------+------+-----------+-----------------+-------------+-----------+
Using the mysql Client Program
Writing MySQL-Based Programs
Record Selection Techniques
Working with Strings
Working with Dates and Times
Sorting Query Results
Generating Summaries
Modifying Tables with ALTER TABLE
Obtaining and Using Metadata
Importing and Exporting Data
Generating and Using Sequences
Using Multiple Tables
Statistical Techniques
Handling Duplicates
Performing Transactions
Introduction to MySQL on the Web
Incorporating Query Resultsinto Web Pages
Processing Web Input with MySQL
Using MySQL-Based Web Session Management
Appendix A. Obtaining MySQL Software
Appendix B. JSP and Tomcat Primer
Appendix C. References