Concepts: TABULATE Procedure


Statistics That Are Available in PROC TABULATE

Use the following keywords to request statistics in the TABLE statement or to specify statistic keywords in the KEYWORD or KEYLABEL statement. If a variable name (class or analysis) and a statistic name are the same, then enclose the statistic name in single quotation marks ” for example, 'MAX' .

Descriptive statistic keywords

COLPCTN

PCTSUM

COLPCTSUM

RANGE

CSS

REPPCTN

CV

REPPCTSUM

KURTOSIS KURT

ROWPCTN

LCLM

ROWPCTSUM

MAX

SKEWNESS SKEW

MEAN

STDDEVSTD

MIN

STDERR

N

SUM

NMISS

SUMWGT

PAGEPCTN

UCLM

PAGEPCTSUM

USS

PCTN

VAR

Quantile statistic keywords

MEDIANP50

Q3P75

P1

P90

P5

P95

P10

P99

Q1P25

QRANGE

Hypothesis testing keywords

PROBT

T

These statistics, the formulas that are used to calculate them, and their data requirements are discussed in Keywords and Formulas on page 1354.

To compute standard error of the mean (STDERR) or Student s t -test, you must use the default value of the VARDEF= option, which is DF. The VARDEF= option is specified in the PROC TABULATE statement.

To compute weighted quantiles, you must use QMETHOD=OS in the PROC TABULATE statement.

Use both LCLM and UCLM to compute a two-sided confidence limit for the mean. Use only LCLM or UCLM to compute a one-sided confidence limit. Use the ALPHA= option in the PROC TABULATE statement to specify a confidence level.

Formatting Class Variables

Use the FORMAT statement to assign a format to a class variable for the duration of a PROC TABULATE step. When you assign a format to a class variable, PROC TABULATE uses the formatted values to create categories, and it uses the formatted values in headings. If you do not specify a format for a class variable, and the variable does not have any other format assigned to it, then the default format, BEST12., is used, unless the GROUPINTERNAL option is specified.

User -defined formats are particularly useful for grouping values into fewer categories. For example, if you have a class variable, Age, with values ranging from 1 to 99, then you could create a user-defined format that groups the ages so that your tables contain a manageable number of categories. The following PROC FORMAT step creates a format that condenses all possible values of age into six groups of values.

 proc format;     value agefmt 0-29='Under 30'                 3039='3039'                 4049='4049'                 5059='5059'                 6069='6069'                 other=70 or over';  run; 

For information on creating user-defined formats, see Chapter 23, The FORMAT Procedure, on page 429.

By default, PROC TABULATE includes in a table only those formats for which the frequency count is not zero and for which values are not missing. To include missing values for all class variables in the output, use the MISSING option in the PROC TABULATE statement, and to include missing values for selected class variables, use the MISSING option in a CLASS statement. To include formats for which the frequency count is zero, use the PRELOADFMT option in a CLASS statement and the PRINTMISS option in the TABLE statement, or use the CLASSDATA= option in the PROC TABULATE statement.

Formatting Values in Tables

The formats for data in table cells serve two purposes. They determine how PROC TABULATE displays the values, and they determine the width of the columns . The default format for values in table cells is 12.2. You can modify the format for printing values in table cells by

  • changing the default format with the FORMAT= option in the PROC TABULATE statement

  • crossing elements in the TABLE statement with the F= format modifier.

PROC TABULATE determines the format to use for a particular cell from the following default order of precedence for formats:

  1. If no other formats are specified, then PROC TABULATE uses the default format (12.2).

  2. The FORMAT= option in the PROC TABULATE statement changes the default format. If no format modifiers affect a cell, then PROC TABULATE uses this format for the value in that cell.

  3. A format modifier in the page dimension applies to the values in all the table cells on the logical page unless you specify another format modifier for a cell in the row or column dimension.

  4. A format modifier in the row dimension applies to the values in all the table cells in the row unless you specify another format modifier for a cell in the column dimension.

  5. A format modifier in the column dimension applies to the values in all the table cells in the column.

You can change this order of precedence by using the FORMAT_PRECEDENCE= option in the TABLE statement. For example, if you specify FORMAT_PRECEDENCE=ROW and specify a format modifier in the row dimension, then that format overrides all other specified formats for the table cells.

How Using BY- Group Processing Differs from Using the Page Dimension

Using the page-dimension expression in a TABLE statement can have an effect similar to using a BY statement.

Table 48.4 on page 1230 contrasts the two methods .

Table 48.4: Contrasting the BY Statement and the Page Dimension

Issue

PROC TABULATE with a BY statement

PROC TABULATE with a page dimension in the TABLE statement

Order of observations in the input data set

The observations in the input data set must be sorted by the BY variables. [1]

Sorting is unnecessary.

One report summarizing all BY groups

You cannot create one report for all the BY groups.

Use ALL in the page dimension to create a report for all classes. (See Example 6 on page 1260.)

Percentages

The percentages in the tables are percentages of the total for that BY group. You cannot calculate percentages for a BY group compared to the totals for all BY groups because PROC TABULATE prepares the individual reports separately. Data for the report for one BY group are not available to the report for another BY group.

You can use denominator definitions to control the meaning of PCTN (see Calculating Percentages on page 1230.)

Titles

You can use the #BYVAL, #BYVAR, and #BYLINE specifications in TITLE statements to customize the titles for each BY group (see Creating Titles That Contain BY-Group Information on page 20).

The BOX= option in the TABLE statement customizes the page headers, but you must use the same title on each page.

Ordering class variables

ORDER=DATA and ORDER=FREQ order each BY group independently.

The order of class variables is the same on every page.

Obtaining uniform headings

You may need to insert dummy observations into BY groups that do not have all classes represented.

The PRINTMISS option ensures that each page of the table has uniform headings.

Multiple ranges with the same format

PROC TABULATE produces a table for each range.

PROC TABULATE combines observations from the two ranges.

[1] You can use the BY statement without sorting the data set if the data set has an index for the BY variable.

Calculating Percentages

Calculating the Percentage of the Value of in a Single Table Cell

The following statistics print the percentage of the value in a single table cell in relation to the total of the values in a group of cells. No denominator definitions are required; however, an analysis variable may be used as a denominator definition for percentage sum statistics.

  • REPPCTN and REPPCTSUM statistics ”print the percentage of the value in a single table cell in relation to the total of the values in the report.

  • COLPCTN and COLPCTSUM statistics ”print the percentage of the value in a single table cell in relation to the total of the values in the column.

  • ROWPCTN and ROWPCTSUM statistics ”print the percentage of the value in a single table cell in relation to the total of the values in the row.

  • PAGEPCTN and PAGEPCTSUM statistics ”print the percentage of the value in a single table cell in relation to the total of the values in the page.

These statistics calculate the most commonly used percentages. See Example 12 on page 1280 for an example.

Using PCTN and PCTSUM

PCTN and PCTSUM statistics can be used to calculate these same percentages. They allow you to manually define denominators. PCTN and PCTSUM statistics print the percentage of the value in a single table cell in relation to the value (used in the denominator of the calculation of the percentage) in another table cell or to the total of the values in a group of cells. By default, PROC TABULATE summarizes the values in all N cells (for PCTN) or all SUM cells (for PCTSUM) and uses the summarized value for the denominator. You can control the value that PROC TABULATE uses for the denominator with a denominator definition.

You place a denominator definition in angle brackets (< and >) next to the PCTN or PCTSUM statistic. The denominator definition specifies which categories to sum for the denominator.

This section illustrates how to specify denominator definitions in a simple table. Example 13 on page 1283 illustrates how to specify denominator definitions in a table that is composed of multiple subtables. For more examples of denominator definitions, see How Percentages Are Calculated in Chapter 3, Details of TABULATE Processing, in SAS Guide to TABULATE Processing .

Specifying a Denominator for the PCTN Statistic

The following PROC TABULATE step calculates the N statistic and three different versions of PCTN using the data set ENERGY ENERGY on page 1401.

 proc tabulate data=energy;     class division type;     table division*             (n='Number of customers'             pctn<type>='% of row' [1]             pctn<division>='% of column' [2]             pctn='% of all customers'), [3]           type/rts=50;     title 'Number of Users in Each Division';  run; 

The TABLE statement creates a row for each value of Division and a column for each value of Type. Within each row, the TABLE statement nests four statistics: N and three different calculations of PCTN (see Figure 48.4 on page 1232). Each occurrence of PCTN uses a different denominator definition.

start figure
 Number of Users in Each Division  1  ------------------------------------------------------------                                           Type                                             -------------------------                                       1           2        --------------------------------+------------+------------  Division                                                 ---------+----------------------                          1        Number of customers   6.00        6.00   ----------------------+------------+------------           % of row [1]                 50.00       50.00           ----------------------+------------+------------           % of column [2]              27.27       27.27           ----------------------+------------+------------           % of all customers [3]       13.64      13.64  ---------+----------------------+------------+------------  2        Number of customers   3.00        3.00   ----------------------+------------+------------           % of row                     50.00       50.00           ----------------------+------------+------------           % of column                  13.64       13.64           ----------------------+------------+------------           % of all customers            6.82        6.82  ---------+----------------------+------------+------------  3        Number of customers   8.00        8.00   ----------------------+------------+------------           % of row                     50.00       50.00           ----------------------+------------+------------           % of column                  36.36       36.36           ----------------------+------------+------------           % of all customers           18.18       18.18  ---------+----------------------+------------+------------  4        Number of customers   5.00        5.00   ----------------------+------------+------------           % of row                     50.00       50.00           ----------------------+------------+------------           % of column                  22.73       22.73           ----------------------+------------+------------           % of all customers           11.36       11.36 
end figure

Figure 48.4: Three Different Uses of the PCTN Statistic with Frequency Counts Highlighted
[1]  

< type > sums the frequency counts for all occurrences of Type within the same value of Division. Thus, for Division=1, the denominator is 6 + 6, or 12.

[2]  

< division > sums the frequency counts for all occurrences of Division within the same value of Type. Thus, for Type=1, the denominator is 6 + 3 + 8 + 5, or 22.

[3]  

The third use of PCTN has no denominator definition. Omitting a denominator definition is the same as including all class variables in the denominator definition. Thus, for all cells, the denominator is 6 + 3 + 8 + 5 + 6 + 3 + 8 + 5, or 44.

Specifying a Denominator for the PCTSUM Statistic

The following PROC TABULATE step sums expenditures for each combination of Type and Division and calculates three different versions of PCTSUM.

 proc tabulate data=energy format=8.2;     class division type;     var expenditures;     table division*             (sum='Expenditures'*f=dollar10.2              pctsum<type>='% of row' [1]              pctsum<division>='% of column' [2]              pctsum='% of all customers'), [3]           type*expenditures/rts=40;     title 'Expenditures in Each Division';  run; 

The TABLE statement creates a row for each value of Division and a column for each value of Type. Because Type is crossed with Expenditures, the value in each cell is the sum of the values of Expenditures for all observations that contribute to the cell. Within each row, the TABLE statement nests four statistics: SUM and three different calculations of PCTSUM (see Figure 48.5 on page 1233). Each occurrence of PCTSUM uses a different denominator definition.

start figure
 Expenditures in Each Division 1  --------------------------------------------------------                                         Type                                           ---------------------                                      1        2                                        ----------+----------                                   Expend    Expend     --------------------------------+----------+----------  Division                                             -----------+--------------------                      1          Expenditures   ,477.00 ,129.00   --------------------+----------+----------             % of row [1]             59.31     40.69             --------------------+----------+----------             % of column [2]          16.15     13.66             --------------------+----------+----------             % of all customers[3]      8.92      6.12  -----------+--------------------+----------+----------  2          Expenditures   ,379.00,078.00   --------------------+----------+----------             % of row                 56.24     43.76             --------------------+----------+----------             % of column              41.86     40.15             --------------------+----------+----------             % of all customers       23.11     17.98  -----------+--------------------+----------+----------  3          Expenditures   ,476.00 ,729.00   --------------------+----------+----------             % of row                 53.66     46.34             --------------------+----------+----------             % of column              11.83     12.59             --------------------+----------+----------             % of all customers        6.53      5.64  -----------+--------------------+----------+----------  4          Expenditures   ,959.00,619.00   --------------------+----------+----------             % of row                 52.52     47.48             --------------------+----------+----------             % of column              30.15     33.60             --------------------+----------+----------             % of all customers       16.65     15.05  -------------------------------------------------------- 
end figure

Figure 48.5: Three Different Uses of the PCTSUM Statistic with Sums Highlighted
[1]  

< type > sums the values of Expenditures for all occurrences of Type within the same value of Division. Thus, for Division=1, the denominator is $7,477 + $5,129.

[2]  

< division > sums the frequency counts for all occurrences of Division within the same value of Type. Thus, for Type=1, the denominator is $7,477 + $19,379 + $5,476 + $13,959.

[3]  

The third use of PCTN has no denominator definition. Omitting a denominator definition is the same as including all class variables in the denominator definition. Thus, for all cells, the denominator is $7,477 + $19,379 + $5,476 + $13,959 + $5,129 + $15,078 + $4,729 + $12,619.

Using Style Elements in PROC TABULATE

What Are Style Elements?

If you use the Output Delivery System to create HTML, RTF, or Printer output from PROC TABULATE, then you can set the style element that the procedure uses for various parts of the table. Style elements determine presentation attributes, such as font face, font weight, color , and so forth. See Output Delivery System on page 32 for more information. The following table lists the default styles for various regions of a table.

Table 48.5: Default Styles for Table Regions

Region

Style

column headings

Header

box

Header

page dimension text

Beforecaption

row headings

Rowheader

data cells

Data

table

Table

Using the STYLE= Option

You specify style elements for PROC TABULATE with the STYLE= option. The following table shows where you can use this option. Specifications in the TABLE statement override the same specification in the PROC TABULATE statement. However, any style attributes that you specify in the PROC TABULATE statement and that you do not override in the TABLE statement are inherited. For instance, if you specify a blue background and a white foreground for all data cells in the PROC TABULATE statement, and you specify a gray background for the data cells of a particular crossing in the TABLE statement, then the background for those data cells is gray, and the foreground is white (as specified in the PROC TABULATE statement).

Detailed information on STYLE= is provided in the documentation for individual statements.

Table 48.6: Using the STYLE= Option in PROC TABULATE

To set the style element for

Use STYLE in this statement

data cells

PROC TABULATE or dimension expression(s)

page dimension text and class variable name headings

CLASS

class level value headings

CLASSLEV

keyword headings

KEYWORD

table borders, rules, and other parts that are not specified elsewhere

TABLE

box text

TABLE statement, BOX= option

missing values

TABLE statement, MISSTEXT= option

analysis variable name headings

VAR

Applying Style Attributes to Table Cells

PROC TABULATE determines the style attributes to use for a particular cell from the following default order of precedence for styles:

  1. If no other style attributes are specified, then PROC TABULATE uses the default style attributes from the default style (Data).

  2. The STYLE= option in the PROC TABULATE statement changes the default style attributes. If no other STYLE= option specifications affect a cell, then PROC TABULATE uses these style attributes for that cell.

  3. A STYLE= option that is specified in the page dimension applies to all the table cells on the logical page unless you specify another STYLE= option for a cell in the row or column dimension.

  4. A STYLE= option that is specified in the row dimension applies to all the table cells in the row unless you specify another STYLE= option for a cell in the column dimension.

  5. A STYLE= option that is specified in the column dimension applies to all the table cells in the column.

You can change this order of precedence by using the STYLE_PRECEDENCE= option in the TABLE statement. For example, if you specify STYLE_PRECEDENCE=ROW and specify a STYLE= option in the row dimension, then those style attribute values override all others that are specified for the table cells.

Using a Format to Assign a Style Attribute

You can use a format to assign a style attribute value to any cell whose content is determined by value(s) of a class or analysis variable. For example, the following code assigns a red background to cells whose values are less than 10,000, a yellow background to cells whose values are at least 10,000 but less than 20,000, and a green background to cells whose values are at least 20,000:

 proc format;     value expfmt low-<10000='red'                        10000-<20000='yellow'                        20000high='green';  run;  ods html body='  external-HTML-file  ';  proc tabulate data=energy style=[background=expfmt.];     class region division type;     var expenditures;     table (region all)*(division all),             type*expenditures;  run;  ods html close; 



Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net