Flylib.com

Books Software

 
 
 

Chapter 6: Missing Values


Chapter 6: Missing Values

Definition of Missing Values

missing value

  • is a value that indicates that no data value is stored for the variable in the current observation. There are three kinds of missing values:

    • numeric

    • character

    • special numeric.

    By default, SAS prints a missing numeric value as a single period (.) and a missing character value as a blank space. See 'Special Missing Values' on page 102 for more information about special numeric missing values.



Special Missing Values

Definition

special missing value

  • is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore .

Tips

  • SAS accepts either uppercase or lowercase letters. Values are displayed and printed as uppercase.

  • If you do not begin a special numeric missing value with a period, SAS identifies it as a variable name . Therefore, to use a special numeric missing value in a SAS expression or assignment statement, you must begin the value with a period, followed by the letter or underscore, as in the following example:

    x=.d;
    

  • When SAS prints a special missing value, it prints only the letter or underscore.

  • When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters . For further information, see the MISSING statement in SAS Language Reference: Dictionary .

Example

The following example uses data from a marketing research company. Five testers were hired to test five different products for ease of use and effectiveness. If a tester was absent, there is no rating to report, and the value is recorded with an X for 'absent.' If the tester was unable to test the product adequately, there is no rating, and the value is recorded with an I for 'incomplete test.' The following program reads the data and displays the resulting SAS data set. Note the special missing values in the first and third data lines:

data period_a; 
  missing X I; 
  input Id . Foodpr1 Foodpr2 Foodpr3 Coffeem1 Coffeem2; 
  datalines; 
1001 115 45 65 I 78 
1002 86 27 55 72 86 
1004 93 52 X 76 88 
1015 73 35 43 112 108 
1027 101 127 39 76 79 
  ; 

proc print data=period_a; 
  title 'Results of Test Period A'; 
  footnote1 'X indicates TESTER ABSENT'; 
  footnote2 'I indicates TEST WAS INCOMPLETE'; 
run;

The following output is produced:

Output 6.1: Output with Multiple Missing Values

start example
Results of Test Period A 
Obs     Id       Foodpr1    Foodpr2    Foodpr3     Coffeem1     Coffeem2 

1      1001        115         45         65            I           78 
2      1002         86         27         55           72           86 
3      1004         93         52          X           76           88 
4      1015         73         35         43          112          108 
5      1027        101        127         39           76           79 

                       X indicates TESTER ABSENT 
                    I indicates TEST WAS INCOMPLETE
end example
 



Order of Missing Values

Numeric Variables

Within SAS, a missing value for a numeric variable is smaller than all numbers; if you sort your data set by a numeric variable, observations with missing values for that variable appear first in the sorted data set. For numeric variables, you can compare special missing values with numbers and with each other. Table 6.1 on page 103 shows the sorting order of numeric values.

Table 6.1: Numeric Value Sort Order

Sort Order

Symbol

Description

smallest

._

underscore

 

.

period

 

.A-.Z

special missing values A (smallest) through Z (largest)

 

- n

negative numbers

 

zero

largest

+ n

positive numbers

For example, the numeric missing value (.) is sorted before the special numeric missing value .A, and both are sorted before the special missing value .Z. SAS does not distinguish between lowercase and uppercase letters when sorting special numeric missing values.

Note: The numeric missing value sort order is the same regardless of whether your system uses the ASCII or EBCDIC collating sequence.

Character Variables

Missing values of character variables are smaller than any printable character value. Therefore, when you sort a data set by a character variable, observations with missing (blank) values of the BY variable always appear before observations in which values of the BY variable contain only printable characters. However, some usually unprintable characters (for example, machine carriage -control characters and real or binary numeric data that have been read in error as character data) have values less than the blank. Therefore, when your data includes unprintable characters, missing values may not appear first in a sorted data set.