Results: TABULATE Procedure


Missing Values

How PROC TABULATE Treats Missing Values

How a missing value for a variable in the input data set affects your output depends on how you use the variable in the PROC TABULATE step. Table 48.7 on page 1236 summarizes how the procedure treats missing values.

Table 48.7: Summary of How PROC TABULATE Treats Missing Values

If

PROC TABULATE, by default,

To override the default

an observation contains a missing value for an analysis variable

excludes that observation from the calculation of statistics (except N and NMISS ) for that particular variable

no alternative

an observation contains a missing value for a class variable

excludes that observation from the table [1]

use MISSING in the PROC TABULATE statement, or MISSING in the CLASS statement

there are no data for a category

does not show the category in the table

use PRINTMISS in the TABLE statement, or use CLASSDATA= in the PROC TABULATE statement

every observation that contributes to a table cell contains a missing value for an analysis variable

displays a missing value for any statistics (except N and NMISS) in that cell

use MISSTEXT= in the TABLE statement

there are no data for a formatted value

does not display that formatted value in the table

use PRELOADFMT in the CLASS statement with PRINTMISS in the TABLE statement, or use CLASSDATA= in the PROC TABULATE statement, or add dummy observations to the input data set so that it contains data for each formatted value

a FREQ variable value is missing or is less than 1

does not use that observation to calculate statistics

no alternative

a WEIGHT variable value is missing or 0

uses a value of 0

no alternative

[1] The CLASS statement applies to all TABLE statements in a PROC TABULATE step. Therefore, if you define a variable as a class variable, PROC TABULATE omits observations that have missing values for that variable even if you do not use the variable in a TABLE statement.

This section presents a series of PROC TABULATE steps that illustrate how PROC TABULATE treats missing values. The following program creates the data set and formats that are used in this section and prints the data set. The data set COMPREV contains no missing values (see Figure 48.6 on page 1237).

 proc format;     value cntryfmt 1='United States'                    2='Japan';     value compfmt  1='Supercomputer'                    2='Mainframe'                    3='Midrange'                    4='Workstation'                    5='Personal Computer'                    6='Laptop';  run;  data comprev;     input Country Computer Rev90 Rev91 Rev92;     datalines;  1 1 788.8 877.6 944.9  1 2 12538.1 9855.6 8527.9  1 3 9815.8 6340.3 8680.3  1 4 3147.2 3474.1 3722.4  1 5 18660.9 18428.0 23531.1  2 1 469.9 495.6 448.4  2 2 5697.6 6242.4 5382.3  2 3 5392.1 5668.3 4845.9  2 4 1511.6 1875.5 1924.5  2 5 4746.0 4600.8 4363.7  ;  proc print data=comprev noobs;     format country cntryfmt. computer compfmt.;     title 'The Data Set COMPREV';  run; 
start figure
 The Data Set COMPREV                   1  Country        Computer             Rev90    Rev91    Rev92  United States  Supercomputer        788.8    877.6    944.9  United States  Mainframe          12538.1   9855.6   8527.9  United States  Midrange            9815.8   6340.3   8680.3  United States  Workstation         3147.2   3474.1   3722.4  United States  Personal Computer  18660.9  18428.0  23531.1  Japan          Supercomputer        469.9    495.6    448.4  Japan          Mainframe           5697.6   6242.4   5382.3  Japan          Midrange            5392.1   5668.3   4845.9  Japan          Workstation         1511.6   1875.5   1924.5  Japan          Personal Computer   4746.0   4600.8   4363.7 
end figure

Figure 48.6: The Data Set COMPREV

No Missing Values

The following PROC TABULATE step produces Figure 48.7 on page 1238:

 proc tabulate data=comprev;     class country computer;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32;     format country cntryfmt. computer compfmt.;     title 'Revenues from Computer Sales';     title2 'for 1990 to 1992';  run; 
start figure
 Revenues from Computer Sales                   1                         for 1990 to 1992                          --------------------------------------------------------------                               Rev90     Rev91     Rev92                                 ----------+----------+--------                                Sum       Sum       Sum      -----------------------------+----------+----------+--------  Computer      Country                                     --------------+--------------                              Supercomputer United States     788.80    877.60  944.90                --------------+----------+----------+--------                Japan             469.90    495.60  448.40  --------------+--------------+----------+----------+--------  Mainframe     United States   12538.10   9855.60 8527.90                --------------+----------+----------+--------                Japan            5697.60   6242.40 5382.30  --------------+--------------+----------+----------+--------  Midrange      United States    9815.80   6340.30 8680.30                --------------+----------+----------+--------                Japan            5392.10   5668.30 4845.90  --------------+--------------+----------+----------+--------  Workstation   United States    3147.20   3474.10 3722.40                --------------+----------+----------+--------                Japan            1511.60   1875.50 1924.50  --------------+--------------+----------+----------+--------  Personal      United States   18660.90  18428.0023531.10  Computer      --------------+----------+----------+--------                Japan            4746.00   4600.80 4363.70  -------------------------------------------------------------- 
end figure

Figure 48.7: Computer Sales Data: No Missing Values Because the data set contains no missing values, the table includes all observations. All headers and cells contain nonmissing values.

A Missing Class Variable

The next program copies COMPREV and alters the data so that the eighth observation has a missing value for Computer. Except for specifying this new data set, the program that produces Figure 48.8 on page 1239 is the same as the program that produces Figure 48.7 on page 1238. By default, PROC TABULATE ignores observations with missing values for a class variable.

 data compmiss;     set comprev;     if _n_=8 then computer=.;  run;  proc tabulate data=compmiss;     class country computer;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32;     format country cntryfmt. computer compfmt.;     title 'Revenues from Computer Sales';     title2 'for 1990 to 1992';  run; 
start figure
 Revenues from Computer Sales                 1                          for 1990 to 1992  --------------------------------------------------------------                               Rev90    Rev91     Rev92                                  ---------+----------+---------                                Sum      Sum       Sum       -----------------------------+---------+----------+---------  Computer      Country                                     --------------+--------------                              Supercomputer United States    788.80    877.60   944.90                --------------+---------+----------+---------                Japan            469.90    495.60   448.40  --------------+--------------+---------+----------+---------  Mainframe     United States  12538.10   9855.60  8527.90                --------------+---------+----------+---------                Japan           5697.60   6242.40  5382.30  --------------+--------------+---------+----------+---------  Midrange      United States   9815.80   6340.30  8680.30  --------------+--------------+---------+----------+---------  Workstation   United States   3147.20   3474.10  3722.40                --------------+---------+----------+---------                Japan           1511.60   1875.50  1924.50  --------------+--------------+---------+----------+---------  Personal      United States  18660.90  18428.00 23531.10  Computer      --------------+---------+----------+---------                Japan           4746.00   4600.80  4363.70  -------------------------------------------------------------- 
end figure

Figure 48.8: Computer Sales Data: Midrange, Japan, Deleted

The observation with a missing value for Computer was the category Midrange, Japan . This category no longer exists. By default, PROC TABULATE ignores observations with missing values for a class variable, so this table contains one fewer row than Figure 48.7 on page 1238.

Including Observations with Missing Class Variables

This program adds the MISSING option to the previous program. MISSING is available either in the PROC TABULATE statement or in the CLASS statement. If you want MISSING to apply only to selected class variables, but not to others, then specify MISSING in a separate CLASS statement with the selected variable(s). The MISSING option includes observations with missing values of a class variable in the report (see Figure 48.9 on page 1240).

 proc tabulate data=compmiss missing;     class country computer;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32;     format country cntryfmt. computer compfmt.;     title 'Revenues from Computer Sales';     title2 'for 1990 to 1992';  run; 

This table includes a category with missing values of Computer. This category makes up the first row of data in the table.

start figure
 1                                                                -------------------------------------------------------------                           Animal                             -----------------------------------------------------------               cat                          dog               -----------------------------+-----------------------------              Food                         Food               -----------------------------+-----------------------------   fish      meat     milk     fish     meat    bones     ---------+---------+---------+---------+---------+---------     N         N        N        N        N        N      ---------+---------+---------+---------+---------+---------          1        1        1        1        1        1  ------------------------------------------------------------- 
end figure

Figure 48.9: Computer Sales Data: Missing Values for Computer

Formatting Headings for Observations with Missing Class Variables

By default, as shown in Figure 48.9 on page 1240, PROC TABULATE displays missing values of a class variable as one of the standard SAS characters for missing values (a period, a blank, an underscore , or one of the letters A through Z). If you want to display something else instead, then you must assign a format to the class variable that has missing values, as shown in the following program (see Figure 48.10 on page 1241):

 proc format;     value misscomp 1='Supercomputer'                    2='Mainframe'                    3='Midrange'                    4='Workstation'                    5='Personal Computer'                    6='Laptop'                    .='No type given';  run;  proc tabulate data=compmiss missing;     class country computer;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32;     format country cntryfmt. computer misscomp.;     title 'Revenues for Computer Sales';     title2 'for 1990 to 1992';  run; 

In this table, the missing value appears as the text that the MISSCOMP. format specifies.

start figure
 Revenues for Computer Sales                 1                       for 1990 to 1992  ---------------------------------------------------------                               Rev90   Rev91   Rev92                                 --------+--------+--------                                 Sum     Sum     Sum     -----------------------------+--------+--------+--------  Computer      Country                                 --------------+--------------   No type given   Japan          5392.10 5668.30 4845.90  --------------+--------------+--------+--------+--------  Supercomputer United States   788.80  877.60  944.90                --------------+--------+--------+--------                Japan           469.90  495.60  448.40  --------------+--------------+--------+--------+--------  Mainframe     United States 12538.10 9855.60 8527.90                --------------+--------+--------+--------                Japan          5697.60 6242.40 5382.30  --------------+--------------+--------+--------+--------  Midrange      United States  9815.80 6340.30 8680.30  --------------+--------------+--------+--------+--------  Workstation   United States  3147.20 3474.10 3722.40                --------------+--------+--------+--------                Japan          1511.60 1875.50 1924.50  --------------+--------------+--------+--------+--------  Personal      United States 18660.9018428.0023531.10  Computer      --------------+--------+--------+--------                Japan          4746.00 4600.80 4363.70  --------------------------------------------------------- 
end figure

Figure 48.10: Computer Sales Data: Text Supplied for Missing Computer Value

Providing Headings for All Categories

By default, PROC TABULATE evaluates each page that it prints and omits columns and rows for categories that do not exist. For example, Figure 48.10 on page 1241 does not include a row for No type given and for United States or for Midrange and for Japan because there are no data in these categories. If you want the table to represent all possible categories, then use the PRINTMISS option in the TABLE statement, as shown in the following program (see Figure 48.11 on page 1242):

 proc tabulate data=compmiss missing;     class country computer;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32 printmiss;     format country cntryfmt. computer misscomp.;     title 'Revenues for Computer Sales';     title2 'for 1990 to 1992';  run; 

This table contains a row for the categories No type given , United States and Midrange , Japan . Because there are no data in these categories, the values for the statistics are all missing.

start figure
 Revenues for Computer Sales                  1                       for 1990 to 1992  -----------------------------------------------------------                              Rev90    Rev91    Rev92                                 ---------+---------+---------                                Sum      Sum      Sum      ----------------------------+---------+---------+---------  Computer      Country                                   --------------+-------------   No type given United States        .        .        .   -------------+---------+---------+---------                Japan          5392.10  5668.30  4845.90  --------------+-------------+---------+---------+---------  Supercomputer United States   788.80   877.60   944.90                -------------+---------+---------+---------                Japan           469.90   495.60   448.40  --------------+-------------+---------+---------+---------  Mainframe     United States 125  8.10 98 55.60 8527.90                -------------+---------+---------+---------                Japan          5697.60  6242.40  5382.30  --------------+-------------+---------+---------+---------  Midrange      United States  9815.80  6340.30  8680.30                -------------+---------+---------+---------   Japan                .        .        .   --------------+-------------+---------+---------+---------  Workstation   United States  3147.20  3474.10  3722.40                -------------+---------+---------+---------                Japan          1511.60  1875.50  1924.50  --------------+-------------+---------+---------+---------  Personal      United States 18660.90 18428.00 23531.10  Computer      -------------+---------+---------+---------                Japan          4746.00  4600.80  4363.70  ----------------------------------------------------------- 
end figure

Figure 48.11: Computer Sales Data: Missing Statistics Values

Providing Text for Cells That Contain Missing Values

If some observations in a category contain missing values for analysis variables, then PROC TABULATE does not use those observations to calculate statistics (except N and NMISS). However, if each observation in a category contains a missing value, then PROC TABULATE displays a missing value for the value of the statistic. To replace missing values for analysis variables with text, use the MISSTEXT= option in the TABLE statement to specify the text to use, as shown in the following program (see Figure 48.12 on page 1243).

 proc tabulate data=compmiss missing;     class country computer;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32 printmiss misstext='NO DATA!';     format country cntryfmt. computer misscomp.;     title 'Revenues for Computer Sales';     title2 'for 1990 to 1992';  run; 

This table replaces the period normally used to display missing values with the text of the MISSTEXT= option.

start figure
 Revenues for Computer Sales        1                                 for 1990 to 1992  ---------------------------------------------------------                               Rev90   Rev91   Rev92                                 --------+--------+--------                                 Sum     Sum     Sum     -----------------------------+--------+------------+----  Computer      Country                                 --------------+--------------                          No type given United States   NO DATA!NO DATA!NO DATA!   --------------+--------+--------+--------                Japan          5392.10 5668.30 4845.90  --------------+--------------+--------+--------+--------  Supercomputer United States   788.80  877.60  944.90                --------------+--------+--------+--------                Japan           469.90  495.60  448.40  --------------+--------------+--------+--------+--------  Mainframe     United States 12538.10 9855.60 8527.90                --------------+--------+--------+--------                Japan          5697.60 6242.40 5382.30  --------------+--------------+--------+--------+--------  Midrange      United States  9815.80 6340.30 8680.30                --------------+--------+--------+--------                Japan   NO DATA!NO DATA!NO DATA!   --------------+--------------+--------+--------+--------  Workstation   United States  3147.20 3474.10 3722.40                --------------+--------+--------+--------                Japan          1511.60 1875.50 1924.50  --------------+--------------+--------+--------+--------  Personal      United States 18660.9018428.0023531.10  Computer      --------------+--------+--------+--------                Japan          4746.00 4600.80 4363.70  --------------------------------------------------------- 
end figure

Figure 48.12: Computer Sales Data: Text Supplied for Missing Statistics Values

Providing Headings for All Values of a Format

PROC TABULATE prints headings only for values that appear in the input data set. For example, the format COMPFMT. provides for six possible values of Computer. Only five of these values occur in the data set COMPREV. The data set contains no data for laptop computers.

If you want to include headings for all possible values of Computer (perhaps to make it easier to compare the output with tables that are created later when you do have data for laptops), then you have three different ways to create such a table:

  • Use the PRELOADFMT option in the CLASS statement with the PRINTMISS option in the TABLE statement. See Example 3 on page 1251 for another example that uses PRELOADFMT.

  • Use the CLASSDATA= option in the PROC TABULATE statement. See Example 2 on page 1249 for an example that uses the CLASSDATA= option.

  • Add dummy values to the input data set so that each value that the format handles appears at least once in the data set.

The following program adds the PRELOADFMT option to a CLASS statement that contains the relevant variable.

The results are shown in Figure 48.13 on page 1244.

 proc tabulate data=compmiss missing;     class country;     class computer / preloadfmt;     var rev90 rev91 rev92;     table computer*country,rev90 rev91 rev92 /           rts=32 printmiss misstext='NO DATA!';     format country cntryfmt. computer compfmt.;     title 'Revenues for Computer Sales';     title2 'for 1990 to 1992';  run; 

This table contains a heading for each possible value of Computer.

start figure
 Revenues for Computer Sales                1                       for 1990 to 1992  --------------------------------------------------------                              Rev90   Rev91   Rev92                                --------+--------+--------                                Sum     Sum     Sum     ----------------------------+--------+--------+--------  Computer      Country                                --------------+-------------                          .             United StatesNO DATA!NO DATA!NO DATA!                -------------+--------+--------+--------                Japan         5392.10 5668.30 4845.90  --------------+-------------+--------+--------+--------  Supercomputer United States  788.80  877.60  944.90                -------------+--------+--------+--------                Japan          469.90  495.60  448.40  --------------+-------------+--------+--------+--------  Mainframe     United States12538.10 9855.60 8527.90                -------------+--------+--------+--------                Japan         5697.60 6242.40 5382.30  --------------+-------------+--------+--------+--------  Midrange      United States 9815.80 6340.30 8680.30                -------------+--------+------------+----                Japan        NO DATA!NO DATA!NO DATA!  --------------+-------------+--------+------------+----  Workstation   United States 3147.20 3474.10 3722.40                -------------+--------+------------+----                Japan         1511.60 1875.50 1924.50  --------------+-------------+--------+--------+--------  Personal      United States18660.9018428.0023531.10  Computer      -------------+--------+--------+--------                Japan         4746.00 4600.80 4363.70  --------------+-------------+--------+--------+--------  Laptop        United StatesNO DATA!NO DATA!NO DATA!                -------------+--------+--------+--------                Japan        NO DATA!NO DATA!NO DATA!  -------------------------------------------------------- 
end figure

Figure 48.13: Computer Sales Data: All Possible Computer Values Included

Understanding the Order of Headings with ORDER=DATA

The ORDER= option applies to all class variables. Occasionally, you want to order the headings for different variables differently. One method for doing this is to group the data as you want them to appear and to specify ORDER=DATA.

For this technique to work, the first value of the first class variable must occur in the data with all possible values of all the other class variables. If this criterion is not met, then the order of the headings might surprise you.

The following program creates a simple data set in which the observations are ordered first by the values of Animal, then by the values of Food. The ORDER= option in the PROC TABULATE statement orders the heading for the class variables by the order of their appearance in the data set (see Figure 48.14 on page 1245). Although bones is the first value for Food in the group of observations where Animal= dog , all other values for Food appear before bones in the data set because bones never appears when Animal= cat . Therefore, the header for bones in the table in Figure 48.14 on page 1245 is not in alphabetical order.

In other words, PROC TABULATE maintains for subsequent categories the order that was established by earlier categories. If you want to re-establish the order of Food for each value of Animal, then use BY-group processing. PROC TABULATE creates a separate table for each BY group, so that the ordering can differ from one BY group to the next.

 data foodpref;     input Animal $ Food $;     datalines;  cat fish  cat meat  cat milk  dog bones  dog fish  dog meat  ;  proc tabulate data=foodpref format=9.                order=data;     class animal food;     table animal*food;  run; 
start figure
 1  ------------------------------------------------------------                           Animal                             -----------------------------------------------------------              cat                          dog                -----------------------------+-----------------------------              Food                         Food               -----------------------------+-----------------------------    fish     meat     milk     fish     meat     bones    ---------+---------+---------+---------+---------+---------      N        N        N        N        N        N      ---------+---------+---------+---------+---------+---------          1        1        1        1        1        1  ------------------------------------------------------------ 
end figure

Figure 48.14: Ordering the Headings of Class Variables

Portability of ODS Output with PROC TABULATE

Under certain circumstances, using PROC TABULATE with the Output Delivery System produces files that are not portable. If the SAS system option FORMCHAR= in your SAS session uses nonstandard line-drawing characters, then the output might include strange characters instead of lines in operating environments in which the SAS

Monospace font is not installed. To avoid this problem, specify the following OPTIONS statement before executing PROC TABULATE:

 options formchar="----+---+=-/\<>*"; 



Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net