Examples: TABULATE Procedure


Example 1: Creating a Basic Two-Dimensional Table

Procedure features:

PROC TABULATE statement options:

  • FORMAT=

TABLE statement

  • crossing (*) operator

TABLE statement options:

  • RTS=

Other features: FORMAT statement

This example

  • creates a category for each type of user (residential or business) in each division of each region

  • applies the same format to all cells in the table

  • applies a format to each class variable

  • extends the space for row headings.

Program

Create the ENERGY data set. ENERGY contains data on expenditures of energy for business and residential customers in individual states in the Northeast and West regions of the United States. A DATA step on page 1401 creates the data set.

 data energy;     length State ;     input Region Division state $ Type Expenditures;     datalines;  1 1 ME 1 708  1 1 ME 2 379  ...  more data lines  ...  4 4 HI 1 273  4 4 HI 2 298  ; 

Create the REGFMT., DIVFMT., and USETYPE. formats. PROC FORMAT creates formats for Region, Division, and Type.

 proc format;     value regfmt 1='Northeast'                  2='South'                  3='Midwest'                  4='West';     value divfmt 1='New England'                  2='Middle Atlantic'                  3='Mountain'                  4='Pacific';     value usetype 1='Residential Customers'                   2='Business Customers';  run; 

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell .

 proc tabulate data=energy format=dollar12.; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Region, Division, and Type.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows and columns . The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell.

 table region*division,        type*expenditures 

Specify the row title space. RTS= provides 25 characters per line for row headings.

 / rts=25; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Output

 Energy Expenditures for Each Region               1                  (millions of dollars)  ---------------------------------------------------                                  Type                                    -------------------------                         Residential    Business                           Customers    Customers                          ------------+------------                         ExpendituresExpenditures                         ------------+------------                             Sum         Sum       -----------------------+------------+------------  Region     Division                             -----------+-----------                          Northeast  New England      ,477      ,129             -----------+------------+------------             Middle                                          Atlantic        ,379     ,078  -----------+-----------+------------+------------  West       Mountain         ,476      ,729             -----------+------------+------------             Pacific         ,959     ,619  -------------------------------------------------- 

Example 2: Specifying Class Variable Combinations to Appear in a Table

Procedure features:

PROC TABULATE Statement options:

  • CLASSDATA=

  • EXCLUSIVE

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example

  • uses the CLASSDATA= option to specify combinations of class variables to appear in a table

  • uses the EXCLUSIVE option to restrict the output to only the combinations specified in the CLASSDATA= data set. Without the EXCLUSIVE option, the output would be the same as in Example 1 on page 1246.

Program

Create the CLASSES data set. CLASSES contains the combinations of class variable values that PROC TABULATE uses to create the table.

 data classes;     input region division type;     datalines;  1 1 1  1 1 2  4 4 1  4 4 2  ; 

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. CLASSDATA= and EXCLUSIVE restrict the class level combinations to those that are specified in the CLASSES data set.

 proc tabulate data=energy format=dollar12.                classdata=classes exclusive; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Region, Division, and Type.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows and columns. The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell.

 table region*division,        type*expenditures 

Specify the row title space. RTS= provides 25 characters per line for row headings.

 / rts=25; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Output

 Energy Expenditures for Each Region                1                 (millions of dollars)  ---------------------------------------------------                                  Type                                    -------------------------                         Residential   Business                            Customers   Customers                           ------------+------------                         ExpendituresExpenditures                         ------------+------------                             Sum          Sum      -----------------------+------------+------------  Region     Division                             -----------+-----------                          Northeast  New England      ,477      ,129  -----------+-----------+------------+------------  West       Pacific         ,959     ,619  --------------------------------------------------- 

Example 3: Using Preloaded Formats with Class Variables

Procedure features:

PROC TABULATE statement option:

  • OUT=

CLASS statement options:

  • EXCLUSIVE

  • PRELOADFMT

TABLE statement option:

  • PRINTMISS

Other features: PRINT procedure

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example

  • creates a table that includes all possible combinations of formatted class variable values (PRELOADFMT with PRINTMISS), even if those combinations have a zero frequency and even if they do not make sense

  • uses only the preloaded range of user-defined formats as the levels of class variables (PRELOADFMT with EXCLUSIVE).

  • writes the output to an output data set, and prints that data set.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.

 proc tabulate data=energy format=dollar12.; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Region, Division, and Type. PRELOADFMT specifies that PROC TABULATE use the preloaded values of the user-defined formats for the class variables.

 class region division type / preloadfmt; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows and columns, and specify row and column options. PRINTMISS specifies that all possible combinations of user-defined formats be used as the levels of the class variables.

 table region*division,        type*expenditures / rts=25 printmiss; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Specify the table options and the output data set. The OUT= option specifies the name of the output data set to which PROC TABULATE writes the data.

 proc tabulate data=energy format=dollar12. out=tabdata; 

Specify subgroups for the analysis. The EXCLUSIVE option, when used with PRELOADFMT, uses only the preloaded range of user-defined formats as the levels of class variables.

 class region division type / preloadfmt exclusive; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows and columns, and specify row and column options. The PRINTMISS option is not specified in this case. If it were, then it would override the EXCLUSIVE option in the CLASS statement.

 table region*division,        type*expenditures / rts=25; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Print the output data set WORK.TABDATA.

 proc print data=tabdata;  run; 

Output

This output, created with the PRELOADFMT and PRINTMISS options, contains all possible combinations of preloaded user-defined formats for the class variable values. It includes combinations with zero frequencies, and combinations that make no sense, such as Northeast and Pacific .

 Energy Expenditures for Each Region              1                   (millions of dollars)  ---------------------------------------------------                                  Type                                    -------------------------                         Residential  Business                             Customers  Customers                            ------------+------------                         ExpendituresExpenditures                         ------------+------------                             Sum         Sum       -----------------------+------------+------------  Region     Division                             -----------+-----------                          Northeast  New England      ,477      ,129             -----------+------------+------------             Middle                                          Atlantic        ,379     ,078             -----------+------------+------------             Mountain              .           .             -----------+------------+------------             Pacific               .           .  -----------+-----------+------------+------------  South      New England           .           .             -----------+------------+------------             Middle                                          Atlantic              .           .             -----------+------------+------------             Mountain              .           .             -----------+------------+------------             Pacific               .           .  -----------+-----------+------------+------------  Midwest    New England           .           .             -----------+------------+------------             Middle                                          Atlantic              .           .             -----------+------------+------------             Mountain              .           .             -----------+------------+------------             Pacific               .           .  -----------+-----------+------------+------------  West       New England           .           .             -----------+------------+------------             Middle                                          Atlantic              .           .             -----------+------------+------------             Mountain         ,476      ,729             -----------+------------+------------             Pacific         ,959     ,619  ------------------------------------------------- 

This output, created with the PRELOADFMT and EXCLUSIVE options, contains only those combinations of preloaded user-defined formats for the class variable values that appear in the input data set. This output is identical to the output from Example 1 on page 1246.

 Energy Expenditures for Each Region                    1                 (millions of dollars)  ---------------------------------------------------                                  Type                                    -------------------------                         Residential   Business                            Customers   Customers                           ------------+------------                         ExpendituresExpenditures                         ------------+------------                              Sum         Sum      -----------------------+------------+------------  Region     Division                             -----------+-----------                          Northeast  New England      ,477      ,129             -----------+------------+------------             Middle                                          Atlantic        ,379     ,078  -----------+-----------+------------+------------  West       Mountain         ,476      ,729             -----------+------------+------------             Pacific         ,959     ,619  --------------------------------------------------- 

This output is a listing of the output data set TABDATA, which was created by the OUT= option in the PROC TABULATE statement. TABDATA contains the data that is created by having the PRELOADFMT and EXCLUSIVE options specified.

 Energy Expenditures for Each Region                             (millions of dollars)                                                                     E                                                                     x                                                                     p                                                                     e                                                                     n                                                                     d                                                                     i                                                                     t                D                                                    u                i                                               _    r     R          v                                        __     T    e     e          i                                        T   P  A    s     g          s                          T             Y   A  B    _  O  i          i                          y             P   G  L    S  b  o          o                          p             E   E  E    u  s  n          n                          e             _   __      m  1  Northeast  New England      Residential Customers  111  1  1   7477  2  Northeast  New England      Business Customers     111  1  1   5129  3  Northeast  Middle Atlantic  Residential Customers  111  1  1  19379  4  Northeast  Middle Atlantic  Business Customers     111  1  1  15078  5  West       Mountain         Residential Customers  111  1  1   5476  6  West       Mountain         Business Customers     111  1  1   4729  7  West       Pacific          Residential Customers  111  1  1  13959  8  West       Pacific          Business Customers     111  1  1  12619 

Example 4: Using Multilabel Formats

Procedure features:

CLASS statement options:

  • MLF

PROC TABULATE statement options:

  • FORMAT=

TABLE statement

  • ALL class variable

  • concatenation (blank) operator

  • crossing (*) operator

  • grouping elements (parentheses) operator

  • label

  • variable list

Other features:

FORMAT procedure

FORMAT statement

VALUE statement options:

  • MULTILABEL

This example

  • shows how to specify a multilabel format in the VALUE statement of PROC FORMAT

  • shows how to activate multilabel format processing using the MLF option with the CLASS statement

  • demonstrates the behavior of the N statistic when multilabel format processing is activated.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=64; 

Create the CARSURVEY data set. CARSURVEY contains data from a survey that was distributed by a car manufacturer to a focus group of potential customers who were brought together to evaluate new car names. Each observation in the data set contains an identification number, the participant s age, and the participant s ratings of four car names . A DATA step creates the data set.

 data carsurvey;     input Rater Age Progressa Remark Jupiter Dynamo;     datalines;  1   38  94  98  84  80  2   49  96  84  80  77  3   16  64  78  76  73  4   27  89  73  90  92  ...  more data lines  ...  77  61  92  88  77  85  78  24  87  88  88  91  79  18  54  50  62  74  80  62  90  91  90  86  ; 

Create the AGEFMT. format. The FORMAT procedure creates a multilabel format for ages by using the MULTILABEL option on page 449. A multilabel format is one in which multiple labels can be assigned to the same value, in this case because of overlapping ranges. Each value is represented in the table for each range in which it occurs. The NOTSORTED option stores the ranges in the order in which they are defined.

 proc format;     value agefmt (multilabel notsorted)           15 - 29 = 'Below 30 years'           30 - 50 = 'Between 30 and 50'         51 - high = 'Over 50 years'           15 - 19 = '15 to 19'           20 - 25 = '20 to 25'           25 - 39 = '25 to 39'           40 - 55 = '40 to 55'         56 - high = '56 and above';  run; 

Specify the table options. The FORMAT= option specifies up to 10 digits as the default format for the value in each table cell.

 proc tabulate data=carsurvey format=10.; 

Specify subgroups for the analysis. The CLASS statement identifies Age as the class variable and uses the MLF option to activate multilabel format processing.

 class age / mlf; 

Specify the analysis variables. The VAR statement specifies that PROC TABULATE calculate statistics on the Progressa, Remark, Jupiter, and Dynamo variables.

 var progressa remark jupiter dynamo; 

Define the table rows and columns. The row dimension of the TABLE statement creates a row for each formatted value of Age. Multilabel formatting allows an observation to be included in multiple rows or age categories. The row dimension uses the ALL class variable to summarize information for all rows. The column dimension uses the N statistic to calculate the number of observations for each age group. Notice that the result of the N statistic crossed with the ALL class variable in the row dimension is the total number of observations instead of the sum of the N statistics for the rows. The column dimension uses the ALL class variable at the beginning of a crossing to assign a label, Potential Car Names . The four nested columns calculate the mean ratings of the car names for each age group.

 table age all, n all='Potential Car Names'*(progressa remark     jupiter dynamo)*mean; 

Specify the titles.

 title1 "Rating Four Potential Car Names";  title2 "Rating Scale 0-100 (100 is the highest rating)"; 

Format the output. The FORMAT statement assigns the user-defined format AGEFMT. to Age for this analysis.

 format age agefmt.;  run; 

Output

Output 48.4
start example
 Rating Four Potential Car Names                       1                 Rating Scale 0-100 (100 is the highest rating)  ---------------------------------------------------------------------------                                         Potential Car Names                                          -------------------------------------------                              Progressa   Remark   Jupiter    Dynamo                                ----------+----------+----------+----------                        N        Mean      Mean      Mean      Mean     ------------------+----------+----------+----------+----------+----------  Age                                                                   ------------------                                                    15 to 19                  14        75        78        81        73  ------------------+----------+----------+----------+----------+----------  20 to 25                  11        89        88        84        89  ------------------+----------+----------+----------+----------+----------  25 to 39                  26        84        90        82        72  ------------------+----------+----------+----------+----------+----------  40 to 55                  14        85        87        80        68  ------------------+----------+----------+----------+----------+----------  56 and above              15        84        82        81        75  ------------------+----------+----------+----------+----------+----------  Below 30 years            36        82        84        82        75  ------------------+----------+----------+----------+----------+----------  Between 30 and 50         25        86        89        81        73  ------------------+----------+----------+----------+----------+----------  Over 50 years             19        82        84        80        76  ------------------+----------+----------+----------+----------+----------  All                       80        83        86        81        74  --------------------------------------------------------------------------- 
end example
 

Example 5: Customizing Row and Column Headings

Procedure features:

  • TABLE statement

    • labels

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example shows how to customize row and column headings. A label specifies text for a heading. A blank label creates a blank heading. PROC TABULATE removes the space for blank column headings from the table.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.

 proc tabulate data=energy format=dollar12.; 

Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows and columns. The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.

 table region*division,       type='Customer Base'*expenditures=' '*sum=' ' 

Specify the row title space. RTS= provides 25 characters per line for row headings.

 / rts=25; 

Format the output. The FORMAT statement assigns formats to Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Output

The heading for Type contains text that is specified in the TABLE statement. The TABLE statement eliminated the headings for Expenditures and Sum.

 Energy Expenditures for Each Region                  1                 (millions of dollars)  ---------------------------------------------------                              Customer Base                               -------------------------                         Residential   Business                            Customers   Customers    -----------------------+------------+------------  Region     Division                             -----------+-----------                          Northeast  New England      ,477      ,129             -----------+------------+------------             Middle                                          Atlantic        ,379     ,078  -----------+-----------+------------+------------  West       Mountain         ,476      ,729             -----------+------------+------------             Pacific         ,959     ,619  --------------------------------------------------- 

Example 6: Summarizing Information with the Universal Class Variable ALL

Procedure features:

  • PROC TABULATE statement options:

    • FORMAT=

  • TABLE statement:

    • ALL class variable

    • concatenation (blank operator)

    • format modifiers

    • grouping elements (parentheses operator)

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example shows how to use the universal class variable ALL to summarize information from multiple categories.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=64 pagesize=60; 

Specify the table options. The FORMAT= option specifies COMMA12. as the default format for the value in each table cell.

 proc tabulate data=energy format=comma12.; 

Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows. The row dimension of the TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division and a row (labeled Subtotal ) that summarizes all divisions in the region. The last row of the report (labeled Total for All Regions ) summarizes all regions. The format modifier f=DOLLAR12. assigns the DOLLAR12. format to the cells in this row.

 table region*(division all='Subtotal')           all='Total for All Regions'*f=dollar12., 

Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Type and a column that is labeled All customers that shows expenditures for all customers in a row of the table. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.

 type='Customer Base'*expenditures=' '*sum=' '  all='All Customers'*expenditures=' '*sum=' ' 

Specify the row title space. RTS= provides 25 characters per line for row headings.

 / rts=25; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Output

The universal class variable ALL provides subtotals and totals in this table.

 Energy Expenditures for Each Region              1                       (millions of dollars)  ----------------------------------------------------------------                              Customer Base                                           -------------------------                                     Residential   Business      All                               Customers   Customers   Customers    -----------------------+------------+------------+------------  Region     Division                                         -----------+-----------                                      Northeast  New England       7,477       5,129      12,606             -----------+------------+------------+------------             Middle                                                      Atlantic         19,379      15,078      34,457             -----------+------------+------------+------------             Subtotal         26,856      20,207      47,063  -----------+-----------+------------+------------+------------  West       Division                                                    -----------                                                 Mountain          5,476       4,729      10,205             -----------+------------+------------+------------             Pacific          13,959      12,619      26,578             -----------+------------+------------+------------             Subtotal         19,435      17,348      36,783  -----------------------+------------+------------+------------  Total for All Regions       ,291     ,555     ,846  ---------------------------------------------------------------- 

Example 7: Eliminating Row Headings

Procedure features:

  • TABLE statement:

    • labels

    • ROW=FLOAT

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example shows how to eliminate blank row headings from a table. To do so, you must both provide blank labels for the row headings and specify ROW=FLOAT in the TABLE statement.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.

 proc tabulate data=energy format=dollar12.; 

Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows. The row dimension of the TABLE statement creates a row for each formatted value of Region. Nested within these rows is a row for each formatted value of Division. The analysis variable Expenditures and the Sum statistic are also included in the row dimension, so PROC TABULATE creates row headings for them as well. The text in quotation marks specifies the headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.

 table region*division*expenditures=' '*sum=' ', 

Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Type.

 type='Customer Base' 

Specify the row title space and eliminate blank row headings. RTS= provides 25 characters per line for row headings. ROW=FLOAT eliminates blank row headings.

 / rts=25 row=float; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Output

Compare this table with the output in Example 5 on page 1258. The two tables are identical, but the program that creates this table uses Expenditures and Sum in the row dimension. PROC TABULATE automatically eliminates blank headings from the column dimension, whereas you must specify ROW=FLOAT to eliminate blank headings from the row dimension.

 Energy Expenditures for Each Region                  1                 (millions of dollars)  ---------------------------------------------------                              Customer Base                               -------------------------                         Residential   Business                            Customers   Customers    -----------------------+------------+------------  Region     Division                             -----------+-----------                          Northeast  New England      ,477      ,129             -----------+------------+------------             Middle                                          Atlantic        ,379     ,078  -----------+-----------+------------+------------  West       Mountain         ,476      ,729             -----------+------------+------------             Pacific         ,959     ,619  --------------------------------------------------- 

Example 8: Indenting Row Headings and Eliminating Horizontal Separators

Procedure features:

  • PROC TABULATE statement options:

    • NOSEPS

  • TABLE statement options:

    • INDENT=

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example shows how to condense the structure of a table by

  • removing row headings for class variables

  • indenting nested rows underneath parent rows instead of placing them next to each other

  • eliminating horizontal separator lines from the row titles and the body of the table.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell. NOSEPS eliminates horizontal separator lines from row titles and from the body of the table.

 proc tabulate data=energy format=dollar12. noseps; 

Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table rows and columns. The TABLE statement creates a row for each formatted value of Region. Nested within each row are rows for each formatted value of Division. The TABLE statement also creates a column for each formatted value of Type. Each cell that is created by these rows and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks in all dimensions specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.

 table region*division,       type='Customer Base'*expenditures=' '*sum=' ' 

Specify the row title space and indention value. RTS= provides 25 characters per line for row headings. INDENT= removes row headings for class variables, places values for Division beneath values for Region rather than beside them, and indents values for Division four spaces.

 / rts=25 indent=4; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region';     title2 '(millions of dollars)';  run; 

Output

NOSEPS removes the separator lines from the row titles and the body of the table. INDENT= eliminates the row headings for Region and Division and indents values for Division underneath values for Region.

 Energy Expenditures for Each Region                   1                 (millions of dollars)  ---------------------------------------------------                              Customer Base                               -------------------------                         Residential   Business                            Customers   Customers    -----------------------+------------+------------  Northeast                                            New England              ,477      ,129      Middle Atlantic         ,379     ,078  West                                                 Mountain                 ,476      ,729      Pacific                 ,959     ,619  --------------------------------------------------- 

Example 9: Creating Multipage Tables

Procedure features:

  • TABLE statement

    • ALL class variable

    • BOX=

    • CONDENSE

    • INDENT=

    • page expression

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT., DIVFMT., and USETYPE. on page 1247

This example creates a separate table for each region and one table for all regions. By default, PROC TABULATE creates each table on a separate page, but the CONDENSE option places them all on the same page.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the table options. The FORMAT= option specifies DOLLAR12. as the default format for the value in each table cell.

 proc tabulate data=energy format=dollar12.; 

Specify subgroups for the analysis. The CLASS statement identifies Region, Division, and Type as class variables.

 class region division type; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Expenditures variable.

 var expenditures; 

Define the table pages. The page dimension of the TABLE statement creates one table for each formatted value of Region and one table for all regions. Text in quotation marks provides the heading for each page.

 table region='Region: ' all='All Regions', 

Define the table rows. The row dimension creates a row for each formatted value of Division and a row for all divisions. Text in quotation marks provides the row headings.

 division all='All Divisions', 

Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Type. Each cell that is created by these pages, rows, and columns contains the sum of the analysis variable Expenditures for all observations that contribute to that cell. Text in quotation marks specifies headings for the corresponding variable or statistic. Although Sum is the default statistic, it is specified here so that you can specify a blank for its heading.

 type='Customer Base'*expenditures=' '*sum=' ' 

Specify additional table options. RTS= provides 25 characters per line for row headings. BOX= places the page heading inside the box above the row headings. CONDENSE places as many tables as possible on one physical page. INDENT= eliminates the row heading for Division. (Because there is no nesting in the row dimension, there is nothing to indent.)

 / rts=25 box=_page_ condense indent=1; 

Format the output. The FORMAT statement assigns formats to the variables Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures for Each Region and All Regions';     title2 '(millions of dollars)';  run; 

Output

 Energy Expenditures for Each Region and All Regions           1                 (millions of dollars)  --------------------------------------------------- Region: Northeast            Customer Base                               -------------------------                         Residential   Business                            Customers   Customers    -----------------------+------------+------------  New England                  ,477      ,129  -----------------------+------------+------------  Middle Atlantic             ,379     ,078  -----------------------+------------+------------  All Divisions               ,856    ,2 07  --------------------------------------------------- --------------------------------------------------- Region: West                 Customer Base                               -------------------------                         Residential   Business                            Customers   Customers    -----------------------+------------+------------  Mountain                     ,476      ,729  -----------------------+------------+------------  Pacific                     ,959     ,619  -----------------------+------------+------------  All Divisions               ,435     ,348  --------------------------------------------------- --------------------------------------------------- All Regions                  Customer Base                               -------------------------                         Residential   Business                            Customers   Customers    -----------------------+------------+------------  New England                  ,477      ,129  -----------------------+------------+------------  Middle Atlantic             ,379     ,078  -----------------------+------------+------------  Mountain                     ,476      ,729  -----------------------+------------+------------  Pacific                     ,959     ,619  -----------------------+------------+------------  All Divisions               ,291     ,555  --------------------------------------------------- 

Example 10: Reporting on Multiple-Response Survey Data

Procedure features:

  • TABLE statement:

    • denominator definition (angle bracket operators)

    • N statistic

    • PCTN statistic

    • variable list

Other features:

  • FORMAT procedure

  • SAS system options:

    • FORMDLIM=

    • NONUMBER

  • SYMPUT routine

The two tables in this example show

  • which factors most influenced customers decisions to buy products

  • where customers heard of the company.

The reports appear on one physical page with only one page number. By default, they would appear on separate pages.

In addition to showing how to create these tables, this example shows how to

  • use a DATA step to count the number of observations in a data set

  • store that value in a macro variable

  • access that value later in the SAS session.

Collecting the Data

Figure 48.15 on page 1270 shows the survey form that is used to collect data.

click to expand
Figure 48.15: Completed Survey Form

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page. The FORMDLIM= option replaces the character that delimits page breaks with a single blank. By default, a new physical page starts whenever a page break occurs.

 options nodate pageno=1 linesize=80 pagesize=18 formdlim=' '; 

Create the CUSTOMER_RESPONSE data set. CUSTOMER_RESPONSE contains data from a customer survey. Each observation in the data set contains information about factors that influence one respondent s decisions to buy products. A DATA step on page 1394 creates the data set. Using missing values rather than 0 s is crucial for calculating frequency counts in PROC TABULATE.

 data customer_response;     input Customer Factor1-Factor4 Source1-Source3           Quality1-Quality3;     datalines;  1 . . 1 1 1 1 . 1 . .  2 1 1 . 1 1 1 . 1 1 .  3 . . 1 1 1 1 . . . .  . . .  more data lines  . . .  119 . . . 1 . . . 1 . .  120 1 1 . 1 . . . . 1 .  ; 

Store the number of observations in a macro variable. The SET statement reads the descriptor portion of CUSTOMER_RESPONSE at compile time and stores the number of observations (the number of respondents) in COUNT. The SYMPUT routine stores the value of COUNT in the macro variable NUM. This variable is available for use by other procedures and DATA steps for the remainder of the SAS session. The IF 0 condition, which is always false, ensures that the SET statement, which reads the observations, never executes. (Reading observations is unnecessary.) The STOP statement ensures that the DATA step executes only once.

 data _null_;     if 0 then set customer_response nobs=count;     call symput('num',left(put(count,4.)));     stop;  run; 

Create the PCTFMT. format. The FORMAT procedure creates a format for percentages. The PCTFMT. format writes all values with at least one digit to the left of the decimal point and with one digit to the right of the decimal point. A blank and a percent sign follow the digits.

 proc format;     picture pctfmt low-high='009.9 %';  run; 

Create the report and use the default table options.

 proc tabulate data=customer_response; 

Specify the analysis variables. The VAR statement specifies that PROC TABULATE calculate statistics on the Factor1, Factor2, Factor3, Factor4, and Customer variables. The variable Customer must be listed because it is used to calculate the Percent column that is defined in the TABLE statement.

 var factor1-factor4 customer; 

Define the table rows and columns. The TABLE statement creates a row for each factor, a column for frequency counts, and a column for the percentages. Text in quotation marks supplies headers for the corresponding row or column. The format modifiers F=7. and F=PCTFMT9. provide formats for values in the associated cells and extend the column widths to accommodate the column headers.

 table factor1='Cost'        factor2='Performance'        factor3='Reliability'        factor4='Sales Staff',        (n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ; 

Specify the titles.

 title 'Customer Survey Results: Spring 1996';     title3 'Factors Influencing the Decision to Buy';  run; 

Suppress page numbers . The SAS system option NONUMBER suppresses page numbers for subsequent pages.

 options nonumber; 

Create the report and use the default table options.

 proc tabulate data=customer_response; 

Specify the analysis variables. The VAR statement specifies that PROC TABULATE calculate statistics on the Source1, Source2, Source3, and Customer variables. The variable Customer must be in the variable list because it appears in the denominator definition.

 var source1-source3 customer; 

Define the table rows and columns. The TABLE statement creates a row for each source of the company name, a column for frequency counts, and a column for the percentages. Text in quotation marks supplies a heading for the corresponding row or column.

 table source1='TV/Radio'        source2='Newspaper'        source3='Word of Mouth',        (n='Count'*f=7. pctn<customer>='Percent'*f=pctfmt9.) ; 

Specify the title and footnote. The macro variable NUM resolves to the number of respondents. The FOOTNOTE statement uses double rather than single quotation marks so that the macro variable will resolve.

 title 'Source of Company Name';     footnote "Number of Respondents: &num";  run; 

Reset the SAS system options. The FORMDLIM= option resets the page delimiter to a page eject. The NUMBER option resumes the display of page numbers on subsequent pages.

 options formdlim='' number; 

Output

 Customer Survey Results: Spring 1996                1  Factors Influencing the Decision to Buy   --------------------------------------                     Count  Percent    ------------------+-------+---------   Cost                   87   72.5 %   ------------------+-------+---------   Performance            62   51.6 %   ------------------+-------+---------   Reliability            30   25.0 %   ------------------+-------+---------   Sales Staff           120  100.0 %   --------------------------------------          Source of Company Name   --------------------------------------                     Count  Percent    ------------------+-------+---------   TV/Radio               92   76.6 %   ------------------+-------+---------   Newspaper              69   57.5 %   ------------------+-------+---------   Word of Mouth          26   21.6 %   --------------------------------------        Number of Respondents: 120 

Example 11: Reporting on Multiple-Choice Survey Data

Procedure features:

  • TABLE statement:

    • N statistic

Other features:

  • FORMAT procedure

  • TRANSPOSE procedure

  • Data set options:

    • RENAME=

This report of listener preferences shows how many listeners select each type of programming during each of seven time periods on a typical weekday. The data was collected by a survey, and the results were stored in a SAS data set. Although this data set contains all the information needed for this report, the information is not arranged in a way that PROC TABULATE can use.

To make this crosstabulation of time of day and choice of radio programming, you must have a data set that contains a variable for time of day and a variable for programming preference. PROC TRANSPOSE reshapes the data into a new data set that contains these variables. Once the data are in the appropriate form, PROC TABULATE creates the report.

Collecting the Data

Figure 48.16 on page 1275 shows the survey form that is used to collect data.

click to expand
Figure 48.16: Completed Survey Form

An external file on page 1419 contains the raw data for the survey. Several lines from that file appear here.

 967 32 f 5 3 5  7 5 5 5 7 0 0 0 8 7 0 0 8 0  781 30 f 2 3 5  5 0 0 0 5 0 0 0 4 7 5 0 0 0 
 859 39 f 1 0 5  1 0 0 0 1 0 0 0 0 0 0 0 0 0  . . .  more data lines  . . .  859 32 m .25 .25 1  1 0 0 0 0 0 0 0 1 0 0 0 0 0 

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=132 pagesize=40; 

Create the RADIO data set and specify the input file. RADIO contains data from a survey of 336 listeners. The data set contains information about listeners and their preferences in radio programming. The INFILE statement specifies the external file that contains the data. MISSOVER prevents the input pointer from going to the next record if it fails to find values in the current line for all variables that are listed in the INPUT statement.

 data radio;     infile '  input-file  ' missover; 

Read the appropriate data line, assign a unique number to each respondent, and write an observation to RADIO. Each raw-data record contains two lines of information about each listener. The INPUT statement reads only the information that this example needs. The / line control skips the first line of information in each record. The rest of the INPUT statement reads Time1-Time7 from the beginning of the second line. These variables represent the listener s radio programming preference for each of seven time periods on weekdays (see Figure 48.16 on page 1275). The listener=_N_ statement assigns a unique identifier to each listener. An observation is automatically written to RADIO at the end of each iteration.

 input /(Time1-Time7) (. +1);     listener=_n_;  run; 

Create the $TIMEFMT. and $PGMFMT. formats. PROC FORMAT creates formats for the time of day and the choice of programming.

 proc format;     value $timefmt 'Time1='6-9 a.m.'                    'Time2'='9 a.m. to noon'                    'Time3'='noon to 1 p.m.'                    'Time4'='1-4 p.m.'                    'Time5'='4-6 p.m.'                    'Time6'='6-10 p.m.'                    'Time7'='10 p.m. to 2 a.m.'                      other='*** Data Entry Error ***';     value $pgmfmt      '0'="Don't Listen"                    '1','2'='Rock and Top 40'                        '3'='Country'                '4','5','6'='Jazz, Classical, and Easy Listening'                        '7'='News/ Information /Talk'                        '8'='Other'                      other='*** Data Entry Error ***';  run; 

Reshape the data by transposing the RADIO data set. PROC TRANSPOSE creates RADIO_TRANSPOSED. This data set contains the variable Listener from the original data set. It also contains two transposed variables: Timespan and Choice. Timespan contains the names of the variables (Time1-Time7) from the input data set that are transposed to form observations in the output data set. Choice contains the values of these variables. (See A Closer Look on page 1278 for a complete explanation of the PROC TRANSPOSE step.)

 proc transpose data=radio                 out=radio_transposed(rename=(col1=Choice))                 name=Timespan;     by listener;     var time1-time7; 

Format the transposed variables. The FORMAT statement permanently associates these formats with the variables in the output data set.

 format timespan $timefmt. choice $pgmfmt.;  run; 

Create the report and specify the table options. The FORMAT= option specifies the default format for the values in each table cell.

 proc tabulate data=radio_transposed format=12.; 

Specify subgroups for the analysis. The CLASS statement identifies Timespan and Choice as class variables.

 class timespan choice; 

Define the table rows and columns. The TABLE statement creates a row for each formatted value of Timespan and a column for each formatted value of Choice. In each column are values for the N statistic. Text in quotation marks supplies headings for the corresponding rows or columns.

 table timespan='Time of Day',        choice='Choice of Radio Program'*n='Number of Listeners'; 

Specify the title.

 title 'Listening Preferences on Weekdays';  run; 

Output

 Listening Preferences on Weekdays                                               1  ---------------------------------------------------------------------------------------------------------------                                                           Choice of Radio Program                                                            -----------------------------------------------------------------------------                                                                        Jazz,                                                                                                  Classical,    News/                                                             Rock and Top              and Easy  Information                                              Don't Listen     40       Country    Listening     /Talk       Other                                     ------------+------------+------------+------------+------------+------------                                  Number of   Number of   Number of   Number of   Number of   Number of                                    Listeners   Listeners   Listeners   Listeners   Listeners   Listeners    -------------------------------+------------+------------+------------+------------+------------+------------  Time of Day                                                                                              -------------------------------                                                                          6-9 a.m.                                 34         143           7          39          96          17  -------------------------------+------------+------------+------------+------------+------------+------------  9 a.m. to noon                          214          59           5          51           3           4  -------------------------------+------------+------------+------------+------------+------------+------------  noon to 1 p.m.                          238          55           3          27           9           4  -------------------------------+------------+------------+------------+------------+------------+------------  1-4 p.m.                                216          60           5          50           2           3  -------------------------------+------------+------------+------------+------------+------------+------------  4-6 p.m.                                 56         130           6          57          69          18  -------------------------------+------------+------------+------------+------------+------------+------------  6-10 p.m.                               202          54           9          44          20           7  -------------------------------+------------+------------+------------+------------+------------+------------  10 p.m. to 2 a.m.                       264          29           3          36           2           2  --------------------------------------------------------------------------------------------------------------- 

A Closer Look

Reshape the data

The original input data set has all the information that you need to make the crosstabular report, but PROC TABULATE cannot use the information in that form. PROC TRANSPOSE rearranges the data so that each observation in the new data set contains the variable Listener, a variable for time of day, and a variable for programming preference. Figure 48.17 on page 1279 illustrates the transposition. PROC TABULATE uses this new data set to create the crosstabular report.

click to expand
Figure 48.17: Transposing Two Observations

PROC TRANSPOSE restructures data so that values that were stored in one observation are written to one variable. You can specify which variables you want to transpose. This section illustrates how PROC TRANSPOSE reshapes the data. The following section explains the PROC TRANSPOSE step in this example.

When you transpose with BY processing, as this example does, you create from each BY group one observation for each variable that you transpose. In this example, Listener is the BY variable. Each observation in the input data set is a BY group because the value of Listener is unique for each observation.

This example transposes seven variables, Time1 through Time7. Therefore, the output data set has seven observations from each BY group (each observation) in the input data set.

Understanding the PROC TRANSPOSE Step

Here is a detailed explanation of the PROC TRANSPOSE step that reshapes the data:

 proc transpose data=radio   [1]                 out=radio_transposed(rename=(col1=Choice))   [2]                  name=Timespan;   [3]     by listener;   [4]     var time1-time7;   [5]     format timespan $timefmt. choice $pgmfmt.;   [6]  run; 
[1]  

The DATA= option specifies the input data set.

[2]  

The OUT= option specifies the output data set. The RENAME= data set option renames the transposed variable from COL1 (the default name) to Choice.

[3]  

The NAME= option specifies the name for the variable in the output data set that contains the name of the variable that is being transposed to create the current observation. By default, the name of this variable is _NAME_.

[4]  

The BY statement identifies Listener as the BY variable.

[5]  

The VAR statement identifies Time1 through Time7 as the variables to transpose.

[6]  

The FORMAT statement assigns formats to Timespan and Choice. The PROC TABULATE step that creates the report does not need to format Timespan and Choice because the formats are stored with these variables.

Example 12: Calculating Various Percentage Statistics

Procedure features:

  • PROC TABULATE statement options:

    • FORMAT=

  • TABLE statement:

      • ALL class variable

      • COLPCTSUM statistic

      • concatenation (blank) operator

      • crossing (*) operator

      • format modifiers

      • grouping elements (parentheses) operator

      • labels

      • REPPCTSUM statistic

      • ROWPCTSUM statistic

      • variable list

  • TABLE statement options:

      • ROW=FLOAT

      • RTS=

Other features: FORMAT procedure

This example shows how to use three percentage sum statistics: COLPCTSUM, REPPCTSUM, and ROWPCTSUM.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=105 pagesize=60; 

Create the FUNDRAIS data set. FUNDRAIS contains data on student sales during a school fund-raiser. A DATA step creates the data set.

 data fundrais;     length name $ 8 classrm $ 1;     input @1 team $ @8 classrm $ @10 name $           @19 pencils @23 tablets;     sales=pencils + tablets;     datalines;  BLUE   A ANN       4   8  RED    A MARY      5  10  GREEN  A JOHN      6   4  RED    A BOB       2   3  BLUE   B FRED      6   8  GREEN  B LOUISE   12   2  BLUE   B ANNETTE   .   9  RED    B HENRY     8  10  GREEN  A ANDREW    3   5  RED    A SAMUEL   12  10  BLUE   A LINDA     7  12  GREEN  A SARA      4   .  BLUE   B MARTIN    9  13  RED    B MATTHEW   7   6  GREEN  B BETH     15  10  RED    B LAURA     4   3  ; 

Create the PCTFMT. format. The FORMAT procedure creates a format for percentages. The PCTFMT. format writes all values with at least one digit, a blank, and a percent sign.

 proc format;     picture pctfmt low-high='009 %';  run; 

Specify the title.

 title "Fundraiser Sales"; 

Create the report and specify the table options. The FORMAT= option specifies up to seven digits as the default format for the value in each table cell.

 proc tabulate format=7.; 

Specify subgroups for the analysis. The CLASS statement identifies Team and Classrm as class variables.

 class team classrm; 

Specify the analysis variable. The VAR statement specifies that PROC TABULATE calculate statistics on the Sales variable.

 var sales; 

Define the table rows. The row dimension of the TABLE statement creates a row for each formatted value of Team. The last row of the report summarizes sales for all teams .

 table (team all), 

Define the table columns. The column dimension of the TABLE statement creates a column for each formatted value of Classrm. Crossed within each value of Classrm is the analysis variable ( sales ) with a blank label. Nested within each column are columns that summarize sales for the class.

  • The first nested column, labeled sum , is the sum of sales for the row for the classroom.

  • The second nested column, labeled ColPctSum , is the percentage of the sum of sales for the row for the classroom in relation to the sum of sales for all teams in the classroom.

  • The third nested column, labeled RowPctSum , is the percentage of the sum of sales for the row for the classroom in relation to the sum of sales for the row for all classrooms.

  • The fourth nested column, labeled RepPctSum , is the percentage of the sum of sales for the row for the classroom in relation to the sum of sales for all teams for all classrooms.

The last column of the report summarizes sales for the row for all classrooms.

 classrm='Classroom'*sales=' '*(sum  colpctsum*f=pctfmt9.  rowpctsum*f=pctfmt9.  reppctsum*f=pctfmt9.)  all*sales*sum=' ' 

Specify the row title space and eliminate blank row headings. RTS= provides 20 characters per line for row headings.

 /rts=20;  run; 

Output

 Fundraiser Sales                                            1  --------------------------------------------------------------------------------------------------------                                                    Classroom                                                            ---------------------------------------------------------------------------                                             A                                    B                    All                      -------------------------------------+-------------------------------------+-------                      Sum  ColPctSumRowPctSumRepPctSum  Sum  ColPctSumRowPctSumRepPctSum  Sum    ------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------  team                                                                                           ------------------                                                                             BLUE                   31     34 %     46 %     15 %     36     31 %     53 %     17 %     67  ------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------  GREEN                  18     19 %     31 %      8 %     39     34 %     68 %     19 %     57  ------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------  RED                    42     46 %     52 %     20 %     38     33 %     47 %     18 %     80  ------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------  All                    91    100 %     44 %     44 %    113    100 %     55 %     55 %    204  -------------------------------------------------------------------------------------------------------- 

A Closer Look

Here are the percentage sum statistic calculations used to produce the output for the Blue Team in Classroom A:

  • COLPCTSUM=31/91*100=34%

  • ROWPCTSUM=31/67*100=46%

  • REPPCTSUM=31/204*100=15%

Similar calculations were used to produce the output for the remaining teams and classrooms.

Example 13: Using Denominator Definitions to Display Basic Frequency Counts and Percentages

Procedure features:

  • TABLE statement:

    • ALL class variable

    • denominator definitions (angle bracket operators)

    • N statistic

    • PCTN statistic

Other features:

  • FORMAT procedure

Crosstabulation tables (also called contingency tables and stub-and-banner reports ) show combined frequency distributions for two or more variables. This table shows frequency counts for females and males within each of four job classes. The table also shows the percentage that each frequency count represents of

  • the total women and men in that job class (row percentage)

  • the total for that gender in all job classes (column percentage)

  • the total for all employees .

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Create the JOBCLASS data set. JOBCLASS contains encoded information about the gender and job class of employees at a fictitious company.

 data jobclass;     input Gender Occupation @@;     datalines;  1 1  1 1  1 1  1 1  1 1  1 1  1 1  1 2  1 2  1 2  1 2  1 2  1 2  1 2  1 3  1 3  1 3  1 3  1 3  1 3  1 3  1 1  1 1  1 1  1 2  1 2  1 2  1 2  1 2  1 2  1 3  1 3  1 4  1 4  1 4  1 4  1 4  1 4  1 1  1 1  1 1  1 1  1 1  1 2  1 2  1 2  1 2  1 2  1 2  1 2  1 3  1 3  1 3  1 3  1 4  1 4  1 4  1 4  1 4  1 1  1 3  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 2  2 2  2 2  2 2  2 2  2 3  2 3  2 3  2 4  2 4  2 4  2 4  2 4  2 4  2 1  2 3  2 3  2 3  2 3  2 3  2 4  2 4  2 4  2 4  2 4  2 1  2 1  2 1  2 1  2 1  2 2  2 2  2 2  2 2  2 2  2 2  2 2  2 3  2 3  2 4  2 4  2 4  2 1  2 1  2 1  2 1  2 1  2 2  2 2  2 2  2 3  2 3  2 3  2 3  2 4  ; 

Create the GENDFMT. and OCCUPFMT. formats. PROC FORMAT creates formats for the variables Gender and Occupation.

 proc format;     value gendfmt 1='Female'                   2='Male'               other='*** Data Entry Error ***';     value occupfmt 1='Technical'                    2='Manager/Supervisor'                    3='Clerical'                    4='Administrative'                other='*** Data Entry Error ***';  run; 

Create the report and specify the table options. The FORMAT= option specifies the 8.2 format as the default format for the value in each table cell.

 proc tabulate data=jobclass format=8.2; 

Specify subgroups for the analysis. The CLASS statement identifies Gender and Occupation as class variables.

 class gender occupation; 

Define the table rows. The TABLE statement creates a set of rows for each formatted value of Occupation and for all jobs together. Text in quotation marks supplies a header for the corresponding row.

The asterisk in the row dimension indicates that the statistics that follow in parentheses are nested within the values of Occupation and All to form sets of rows. Each set of rows includes four statistics:

  • N, the frequency count. The format modifier (F=9.) writes the values of N without the decimal places that the default format would use. It also extends the column width to nine characters so that the word Employees fits on one line.

  • the percentage of the row total (row percent).

  • the percentage of the column total (column percent).

  • the overall percent. Text in quotation marks supplies the heading for the corresponding row. A comma separates the row definition from the column definition.

For detailed explanations of the structure of this table and of the use of denominator definitions, see A Closer Look on page 1286.

 table (occupation='Job Class' all='All Jobs')            *(n='Number of employees'*f=9.            pctn<gender all>='Percent of row total'            pctn<occupation all>='Percent of column total'            pctn='Percent of total'), 

Define the table columns and specify the amount of space for row headings. The column dimension creates a column for each formatted value of Gender and for all employees. Text in quotation marks supplies the heading for the corresponding column. The RTS= option provides 50 characters per line for row headings.

 gender='Gender' all='All Employees'/ rts=50; 

Format the output. The FORMAT statement assigns formats to the variables Gender and Occupation.

 format gender gendfmt. occupation occupfmt.; 

Specify the titles.

 title 'Gender Distribution';     title2 'within Job Classes';  run; 

Output

 Gender Distribution                              1                                 within Job Classes  --------------------------------------------------------------------------------                                                       Gender                                                                  -------------------   All                                                      Female    Male   Employees  ------------------------------------------------+---------+---------+---------  Job Class                                                                   -----------------------+------------------------                             Technical              Number of employees            16       18       34                         ------------------------+---------+---------+---------                         Percent of row total        47.06    52.94   100.00                         ------------------------+---------+---------+---------                         Percent of column total     26.23    29.03    27.64                         ------------------------+---------+---------+---------                         Percent of total            13.01    14.63    27.64  -----------------------+------------------------+---------+---------+---------  Manager/Supervisor     Number of employees            20       15       35                         ------------------------+---------+---------+---------                         Percent of row total        57.14    42.86   100.00                         ------------------------+---------+---------+---------                         Percent of column total     32.79    24.19    28.46                         ------------------------+---------+---------+---------                         Percent of total            16.26    12.20    28.46  -----------------------+------------------------+---------+---------+---------  Clerical               Number of employees            14       14       28                         ------------------------+---------+---------+---------                         Percent of row total        50.00    50.00   100.00                         ------------------------+---------+---------+---------                         Percent of column total     22.95    22.58    22.76                         ------------------------+---------+---------+---------                         Percent of total            11.38    11.38    22.76  -----------------------+------------------------+---------+---------+---------  Administrative         Number of employees            11       15       26                         ------------------------+---------+---------+---------                         Percent of row total        42.31    57.69   100.00                         ------------------------+---------+---------+---------                         Percent of column total     18.03    24.19    21.14                         ------------------------+---------+---------+---------                         Percent of total             8.94    12.20    21.14  -----------------------+------------------------+---------+---------+---------  All Jobs               Number of employees            61       62      123                         ------------------------+---------+---------+---------                         Percent of row total        49.59    50.41   100.00                         ------------------------+---------+---------+---------                         Percent of column total    100.00   100.00   100.00                         ------------------------+---------+---------+---------                         Percent of total            49.59    50.41   100.00  -------------------------------------------------------------------------------- 

A Closer Look

The part of the TABLE statement that defines the rows of the table uses the PCTN statistic to calculate three different percentages.

In all calculations of PCTN, the numerator is N, the frequency count for one cell of the table. The denominator for each occurrence of PCTN is determined by the denominator definition . The denominator definition appears in angle brackets after the keyword PCTN. It is a list of one or more expressions. The list tells PROC TABULATE which frequency counts to sum for the denominator.

Analyzing the Structure of the Table

Taking a close look at the structure of the table helps you understand how PROC TABULATE uses the denominator definitions. The following simplified version of the TABLE statement clarifies the basic structure of the table:

 table occupation='Job Class' all='All Jobs',        gender='Gender' all='All Employees'; 

The table is a concatenation of four subtables. In this report, each subtable is a crossing of one class variable in the row dimension and one class variable in the column dimension. Each crossing establishes one or more categories. A category is a combination of unique values of class variables, such as female, technical or all, clerical . Table 48.8 on page 1287 describes each subtable.

Table 48.8: Contents of Subtables

Class variables contributing to the subtable

Description of frequency counts

Number of categories

Occupation and Gender

number of females in each job or number of males in each job

8

All and Gender

number of females or number of males

2

Occupation and All

number of people in each job

4

All and All

number of people in all jobs

1

Figure 48.18 on page 1288 highlights these subtables and the frequency counts for each category.

click to expand
Figure 48.18: Illustration of the Four Subtables

Interpreting Denominator Definitions

The following fragment of the TABLE statement defines the denominator definitions for this report. The PCTN keyword and the denominator definitions are highlighted.

 table (occupation='Job Class' all='All Jobs')             *(n='Number of employees'*f=5.   pctn<gender all>   ='Row percent'   pctn<occupation all>   ='Column percent'   pctn   ='Percent of total'), 

Each use of PCTN nests a row of statistics within each value of Occupation and All. Each denominator definition tells PROC TABULATE which frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.

Row Percentages

The part of the TABLE statement that calculates the row percentages and that labels the row is

 pctn<gender all>='Row percent' 

Consider how PROC TABULATE interprets this denominator definition for each subtable.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 1: Occupation and Gender

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender within the same value of Occupation.

For example, the denominator for the category female, technical is the sum of all frequency counts for all categories in this subtable for which the value of Occupation is technical . There are two such categories: female, technical and male, technical . The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18, or 34.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 2: All and Gender

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Gender in the subtable.

For example, the denominator for the category all, female is the sum of the frequency counts for all, female and all, male . The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62, or 123.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 3: Occupation and All

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.

For example, the denominator for the category clerical, all is the frequency count for that category, 28.

Note: In these table cells, because the numerator and the denominator are the same, the row percentages in this subtable are all 100.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 4: All and All

PROC TABULATE looks at the first element in the denominator definition, Gender, and asks if Gender contributes to the subtable. Because Gender does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. The variable All does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.

There is only one category in this subtable: all, all . The denominator for this category is 123.

Note: In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.

Column Percentages

The part of the TABLE statement that calculates the column percentages and labels the row is

 pctn<occupation all>='Column percent' 

Consider how PROC TABULATE interprets this denominator definition for each subtable.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 1: Occupation and Gender

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation within the same value of Gender.

For example, the denominator for the category manager/supervisor, male is the sum of all frequency counts for all categories in this subtable for which the value of Gender is male . There are four such categories: technical, male ; manager/ supervisor, male ; clerical, male ; and administrative, male . The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15, or 62.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 2: All and Gender

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for All as the denominator.

For example, the denominator for the category all, female is the frequency count for that category, 61.

Note: In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 3: Occupation and All

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of Occupation in the subtable.

For example, the denominator for the category technical, all is the sum of the frequency counts for technical, all ; manager/supervisor, all ; clerical, all ; and administrative, all . The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26, or 123.

start figure
 Gender Distribution                                within Job Classes                                --------------------------------------------------------------------------------                                                       Gender                                                                 -------------------   All                                                     Female    Male   Employees ------------------------------------------------+---------+---------+--------- Job Class                                                                  -----------------------+------------------------                            Technical              Number of employees            16       18       34                        ------------------------+---------+---------+---------                        Percent of row total        47.06    52.94   100.00                        ------------------------+---------+---------+---------                        Percent of column total     26.23    29.03    27.64                        ------------------------+---------+---------+---------                        Percent of total            13.01    14.63    27.64 -----------------------+------------------------+---------+---------+--------- Manager/Supervisor     Number of employees            20       15       35                        ------------------------+---------+---------+---------                        Percent of row total        57.14    42.86   100.00                        ------------------------+---------+---------+---------                        Percent of column total     32.79    24.19    28.46                        ------------------------+---------+---------+---------                        Percent of total            16.26    12.20    28.46 -----------------------+------------------------+---------+---------+--------- Clerical               Number of employees            14       14       28                        ------------------------+---------+---------+---------                        Percent of row total        50.00    50.50   100.00                        ------------------------+---------+---------+---------                        Percent of column total     22.95    22.58    22.76                        ------------------------+---------+---------+---------                        Percent of total            11.38    11.38    22.76 -----------------------+------------------------+---------+---------+--------- Administrative         Number of employees            11       15       26                        ------------------------+---------+---------+---------                        Percent of row total        42.31    57.69   100.00                        ------------------------+---------+---------+---------                        Percent of column total     18.03    24.19    21.14                        ------------------------+---------+---------+---------                        Percent of total             8.94    12.20    21.14 -----------------------+------------------------+---------+---------+--------- All Jobs               Number of employees            61       62      123                        ------------------------+---------+---------+---------                        Percent of row total        49.59    50.41   100.00                        ------------------------+---------+---------+---------                        Percent of column total    100.00   100.00   100.00                        ------------------------+---------+---------+---------                        Percent of total            49.59    50.41   100.00 -------------------------------------------------------------------------------- 
end figure

Subtable 4: All and All

PROC TABULATE looks at the first element in the denominator definition, Occupation, and asks if Occupation contributes to the subtable. Because Occupation does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is All. Because the variable All does contribute to this subtable, PROC TABULATE uses it as the denominator definition. All is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of All as the denominator.

There is only one category in this subtable: all, all . The frequency count for this category is 123.

Note: In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.

Total Percentages

The part of the TABLE statement that calculates the total percentages and labels the row is

 pctn='Total percent' 

If you do not specify a denominator definition, then PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. Table 48.9 on page 1293 summarizes the process for all subtables in this example.

Table 48.9: Denominators for Total Percentages

Class variables contributing to the subtable

Frequency counts

Total

Occupat and Gender

16, 18, 20, 15 14, 14, 11, 15

123

Occupat and All

34, 35, 28, 26

123

Gender and All

61, 62

123

All and All

123

123

Consequently, the denominator for total percentages is always 123.

Example 14: Specifying Style Elements for ODS Output

Procedure features:

  • STYLE= option in

    • PROC TABULATE statement

    • CLASSLEV statement

    • KEYWORD statement

    • TABLE statement

    • VAR statement

Other features:

  • ODS HTML statement

  • ODS PDF statement

  • ODS RTF statement

Data set: ENERGY ENERGY on page 1401

Formats: REGFMT, DIVFMT, and USETYPE. on page 1247

This example creates HTML, RTF, and PDF files and specifies style elements for various table regions.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= and PAGESIZE= are not set for this example because they have no effect on HTML, RTF, and Printer output.

 options nodate pageno=1; 

Specify the ODS output filenames. By opening multiple ODS destinations, you can produce multiple output files in a single execution. The ODS HTML statement produces output that is written in HTML. The ODS PDF statement produces output in Portable Document Format (PDF). The ODS RTF statement produces output in Rich Text Format (RTF). The output from PROC TABULATE goes to each of these files.

 ods html body='  external-HTML-file  ';  ods pdf file='  external-PDF-file  ';  ods rtf file='  external-RTF-file  '; 

Specify the table options. The STYLE= option in the PROC TABULATE statement specifies the style element for the data cells of the table.

 proc tabulate data=energy style=[font_weight=bold]; 

Specify subgroups for the analysis. The STYLE= option in the CLASS statement specifies the style element for the class variable name headings.

 class region division type / style=[just=center]; 

Specify the style attributes for the class variable value headings. The STYLE= option in the CLASSLEV statement specifies the style element for the class variable level value headings.

 classlev region division type / style=[just=left]; 

Specify the analysis variable and its style attributes. The STYLE= option in the VAR statement specifies a style element for the variable name headings.

 var expenditures / style=[font_size=3]; 

Specify the style attributes for keywords, and label the all keyword. The STYLE= option in the KEYWORD statement specifies a style element for keywords. The KEYLABEL statement assigns a label to the keyword.

 keyword all sum / style=[font_width=wide];  keylabel all="Total"; 

Define the table rows and columns and their style attributes. The STYLE= option in the dimension expression overrides any other STYLE= specifications in PROC TABULATE that specify attributes for table cells. The STYLE= option after the slash (/) specifies attributes for parts of the table other than table cells.

 table (region all)*(division all*[style=[background=yellow]]),        (type all)*(expenditures*f=dollar10.) /        style=[bordercolor=blue] 

Specify the style attributes for cells with missing values. The STYLE= option in the MISSTEXT option of the TABLE statement specifies a style element to use for the text in table cells that contain missing values.

 misstext=[label="Missing" style=[font_weight=light]] 

Specify the style attributes for the box above the row titles. The STYLE= option in the BOX option of the TABLE statement specifies a style element to use for text in the box above the row titles.

 box=[label="Region by Division by Type"     style=[font_style=italic]]; 

Format the class variable values. The FORMAT statement assigns formats to Region, Division, and Type.

 format region regfmt. division divfmt. type usetype.; 

Specify the titles.

 title 'Energy Expenditures';     title2 '(millions of dollars)';  run; 

Close the ODS destinations.

 ods html close;  ods pdf close;  ods rtf close; 

HTML Output

click to expand

PDF Output

click to expand

RTF Output

click to expand



Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net