Procedure Concepts


Input Data Sets

Many base procedures require an input SAS data set. You specify the input SAS data set by using the DATA= option in the procedure statement, as in this example:

 proc print data=emp; 

If you omit the DATA= option, the procedure uses the value of the SAS system option _LAST_=. The default of _LAST_= is the most recently created SAS data set in the current SAS job or session. _LAST_= is described in detail in SAS Language Reference: Dictionary .

RUN- Group Processing

RUN-group processing enables you to submit a PROC step with a RUN statement without ending the procedure. You can continue to use the procedure without issuing another PROC statement. To end the procedure, use a RUN CANCEL or a QUIT statement. Several base SAS procedures support RUN-group processing:

  • CATALOG

  • DATASETS

  • PLOT

  • PMENU

  • TRANTAB

See the section on the individual procedure for more information.

Note: PROC SQL executes each query automatically. Neither the RUN nor RUN CANCEL statement has any effect.

Creating Titles That Contain BY-Group Information

BY-Group Processing

BY-group processing uses a BY statement to process observations that are ordered, grouped, or indexed according to the values of one or more variables . By default, when you use BY-group processing in a procedure step, a BY line identifies each group. This section explains how to create titles that serve as customized BY lines.

Suppressing the Default BY Line

When you insert BY-group processing information into a title, you usually want to eliminate the default BY line. To suppress it, use the SAS system option NOBYLINE.

Note: You must use the NOBYLINE option if you insert BY-group information into titles for the following base SAS procedures:

  • MEANS

  • PRINT

  • STANDARD

  • SUMMARY

If you use the BY statement with the NOBYLINE option, then these procedures always start a new page for each BY group. This behavior prevents multiple BY groups from appearing on a single page and ensures that the information in the titles matches the report on the pages.

Inserting BY-Group Information into a Title

The general form for inserting BY-group information into a title is

#BY-specification<.suffix>

BY-specification

  • is one of the following:

    • BYVAL n BYVAL( BY-variable )

      • places the value of the specified BY variable in the title. You specify the BY variable with one of the following:

        • n

          • is the n th BY variable in the BY statement.

        • BY-variable

          • is the name of the BY variable whose value you want to insert in the title.

    • BYVAR n BYVAR( BY-variable )

      • places the label or the name (if no label exists) of the specified BY variable in the title. You designate the BY variable with one of the following:

        • n

          • is the n th BY variable in the BY statement.

        • BY-variable

          • is the name of the BY variable whose name you want to insert in the title.

    • BYLINE

      • inserts the complete default BY line into the title.

  • suffix

    • supplies text to place immediately after the BY-group information that you insert in the title. No space appears between the BY-group information and the suffix.

Example: Inserting a Value from Each BY Variable into the Title

This example

[1]  

creates a data set, GROC, that contains data for stores from four regions . Each store has four departments. See GROC on page 1402 for the DATA step that creates the data set.

[2]  

sorts the data by Region and Department.

[3]  

uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.

[4]  

uses PROC CHART to chart sales by Region and Department. In the first TITLE statement, #BYVAL2 inserts the value of the second BY variable, Department, into the title. In the second TITLE statement, #BYVAL(Region) inserts the value of Region into the title. The first period after Region indicates that a suffix follows . The second period is the suffix.

[5]  

uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.

 data groc; [1]     input Region . Manager $ Department $ Sales;     datalines;  Southeast    Hayes       Paper       250  Southeast    Hayes       Produce     100  Southeast    Hayes       Canned      120  Southeast    Hayes       Meat         80  ...  more lines of data  ...  Northeast    Fuller      Paper       200  Northeast    Fuller      Produce     300  Northeast    Fuller      Canned      420  Northeast    Fuller      Meat        125  ;  proc sort data=groc; [2]     by region department;  run;  options nobyline nodate pageno=1          linesize=64 pagesize=20; [3]  proc chart data=groc; [4]     by region department;     vbar manager / type=sum sumvar=sales;     title1 'This chart shows #byval2 sales';     title2 'in the #byval(region)..';  run;  options byline;   [5] 

This partial output shows two BY groups with customized BY lines:

 This chart shows Canned sales            1                 in the Northwest.  Sales Sum  400 +       *****       *****             *****       *****  300 +       *****       *****             *****       *****       *****  200 +       *****       *****       *****             *****       *****       *****  100 +       *****       *****       *****             *****       *****       *****      --------------------------------------------            Aikmann     Duncan     Jeffreys                         Manager 
 This chart shows Meat sales              2                  in the Northwest.  Sales Sum  75 +        *****       *****             *****       *****  60 +        *****       *****             *****       *****  45 +        *****       *****             *****       *****  30 +        *****       *****      *****             *****       *****      *****  15 +        *****       *****      *****             *****       *****      *****  --------------------------------------------            Aikmann     Duncan    Jeffreys                         Manager 

Example: Inserting the Name of a BY Variable into a Title

This example inserts the name of a BY variable and the value of a BY variable into the title. The program

[1]  

uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.

[2]  

uses PROC CHART to chart sales by Region. In the first TITLE statement, #BYVAR(Region) inserts the name of the variable Region into the title. (If Region had a label, #BYVAR would use the label instead of the name.) The suffix al is appended to the label. In the second TITLE statement, #BYVAL1 inserts the value of the first BY variable, Region, into the title.

[3]  

uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.

 options nobyline nodate pageno=1          linesize=64 pagesize=20; [1]  proc chart data=groc; [2]     by region;     vbar manager / type=mean sumvar=sales;     title1 '#byvar(region).al Analysis';     title2 'for the #byval1';  run;  options byline; [3] 

This partial output shows one BY group with a customized BY line:

 Regional Analysis                          1                for the Northwest  Sales Mean  300 +       *****             *****  200 +       *****     *****  100 +       *****     *****       *****             *****     *****       *****  --------------------------------------------             Aikmann   Duncan    Jeffreys                        Manager 

Example: Inserting the Complete BY Line into a Title

This example inserts the complete BY line into the title. The program

[1]  

uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.

[2]  

uses PROC CHART to chart sales by Region and Department. In the TITLE statement, #BYLINE inserts the complete BY line into the title.

[3]  

uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.

 options nobyline nodate pageno=1          linesize=64 pagesize=20; [1]  proc chart data=groc; [2]     by region department;     vbar manager / type=sum sumvar=sales;     title 'Information for #byline';  run;  options byline; [3] 

This partial output shows two BY groups with customized BY lines:

 Information for Region=Northwest Department=Canned       1   Sales Sum   400 +       *****       *****              *****       *****   300 +       *****       *****              *****       *****       *****   200 +       *****       *****       *****              *****       *****       *****   100 +       *****       *****       *****              *****       *****       *****       --------------------------------------------             Aikmann     Duncan     Jeffreys                          Manager 
 Information for Region=Northwest Department=Meat          2  Sales Sum  75 +       *****       *****            *****       *****  60 +       *****       *****            *****       *****  45 +       *****       *****            *****       *****  30 +       *****       *****         *****            *****       *****         *****  15 +       *****       *****         *****            *****       *****         *****  --------------------------------------------           Aikmann     Duncan       Jeffreys                        Manager 

Error Processing of BY-Group Specifications

SAS does not issue error or warning messages for incorrect #BYVAL, #BYVAR, or #BYLINE specifications. Instead, the text of the item simply becomes part of the title.

Shortcuts for Specifying Lists of Variable Names

Several statements in procedures allow multiple variable names. You can use these shortcut notations instead of specifying each variable name:

Notation

Meaning

x1-x n

specifies variables X1 through X n . The numbers must be consecutive.

x:

specifies all variables that begin with the letter X.

x--a

specifies all variables between X and A, inclusive. This notation uses the position of the variables in the data set.

x-numeric-a

specifies all numeric variables between X and A, inclusive.

This notation uses the position of the variables in the data set.

x-character-a

specifies all character variables between X and A, inclusive.

This notation uses the position of the variables in the data set.

_numeric_

specifies all numeric variables.

_character_

specifies all character variables.

_all_

specifies all variables.

Note: You cannot use shortcuts to list variable names in the INDEX CREATE statement in PROC DATASETS.

See SAS Language Reference: Concepts for complete documentation.

Formatted Values

Using Formatted Values

Typically, when you print or group variable values, base SAS procedures use the formatted values. This section contains examples of how base procedures use formatted values.

Example: Printing the Formatted Values for a Data Set

The following example prints the formatted values of the data set PROCLIB.PAYROLL. (See PROCLIB.PAYROLL on page 1409 for the DATA step that creates this data set.) In PROCLIB.PAYROLL, the variable Jobcode indicates the job and level of the employee. For example, TA1 indicates that the employee is at the beginning level for a ticket agent.

 libname proclib '  SAS-data-library  ';  options nodate pageno=1          linesize=64 pagesize=40;  proc print data=proclib.payroll(obs=10)             noobs;     title  'PROCLIB.PAYROLL';     title2 'First 10 Observations Only';  run; 

This is a partial printing of PROCLIB.PAYROLL:

 PROCLIB.PAYROLL             1                First 10 Observations Only                   Id  Number  Gender  Jobcode  Salary    Birth    Hired   1919     M       TA2     34376  12SEP60  04JUN87   1653     F       ME2     35108  15OCT64  09AUG90   1400     M       ME1     29769  05NOV67  16OCT90   1350     F       FA3     32886  31AUG65  29JUL90   1401     M       TA3     38822  13DEC50  17NOV85   1499     M       ME3     43025  26APR54  07JUN80   1101     M       SCP     18723  06JUN62  01OCT90   1333     M       PT2     88606  30MAR61  10FEB81   1402     M       TA2     32615  17JAN63  02DEC90   1479     F       TA3     38785  22DEC68  05OCT89 

The following PROC FORMAT step creates the format $JOBFMT., which assigns descriptive names for each job:

 proc format;      value $jobfmt            'FA1'='Flight Attendant Trainee'            'FA2'='Junior Flight Attendant'            'FA3'='Senior Flight Attendant'            'ME1'='Mechanic Trainee'            'ME2'='Junior Mechanic'            'ME3'='Senior Mechanic'            'PT1'='Pilot Trainee'            'PT2'='Junior Pilot'            'PT3'='Senior Pilot'            'TA1'='Ticket Agent Trainee'            'TA2'='Junior Ticket Agent'            'TA3'='Senior Ticket Agent'            'NA1'='Junior Navigator'            'NA2'='Senior Navigator'            'BCK'='Baggage Checker'            'SCP'='Skycap';  run; 

The FORMAT statement in this PROC MEANS step temporarily associates the $JOBFMT. format with the variable Jobcode:

 options nodate pageno=1          linesize=64 pagesize=60;  proc means data=proclib.payroll mean max;     class jobcode;     var salary;     format jobcode $jobfmt.;     title 'Summary Statistics for';     title2 'Each Job Code';  run; 

PROC MEANS produces this output, which uses the $JOBFMT. format:

 Summary Statistics for                 1                           Each Job Code                        The MEANS Procedure                    Analysis Variable : Salary                                N  Jobcode                     Obs           Mean       Maximum  --------------------------------------------------------------- Baggage Checker               9       25794.22      26896.00  Flight Attendant Trainee     11       23039.36      23979.00  Junior Flight Attendant      16       27986.88      28978.00  Senior Flight Attendant       7       32933.86      33419.00  Mechanic Trainee              8       28500.25      29769.00  Junior Mechanic              14       35576.86      36925.00  Senior Mechanic               7       42410.71      43900.00  Junior Navigator              5       42032.20      43433.00  Senior Navigator              3       52383.00      53798.00  Pilot Trainee                 8       67908.00      71349.00  Junior Pilot                 10       87925.20      91908.00  Senior Pilot                  2       10504.50      11379.00  Skycap                        7       18308.86      18833.00  Ticket Agent Trainee          9       27721.33      28880.00  Junior Ticket Agent          20       33574.95      34803.00  Senior Ticket Agent          12       39679.58      40899.00  --------------------------------------------------------------- 

Note: Because formats are character strings, formats for numeric variables are ignored when the values of the numeric variables are needed for mathematical calculations.

Example: Grouping or Classifying Formatted Data

If you use a formatted variable to group or classify data, then the procedure uses the formatted values. The following example creates and assigns a format, $CODEFMT., that groups the levels of each job code into one category. PROC MEANS calculates statistics based on the groupings of the $CODEFMT. format.

 proc format;      value $codefmt            'FA1','FA2','FA3'='Flight Attendant'            'ME1','ME2','ME3'='Mechanic'            'PT1','PT2','PT3'='Pilot'            'TA1','TA2','TA3'='Ticket Agent'                  'NA1','NA2'='Navigator'                        'BCK'='Baggage Checker'                        'SCP'='Skycap';  run;  options nodate pageno=1          linesize=64 pagesize=40;  proc means data=proclib.payroll mean max;     class jobcode;     var salary;     format jobcode $codefmt.;     title 'Summary Statistics for Job Codes';     title2 '(Using a Format that Groups the Job Codes)';  run; 

PROC MEANS produces this output:

 Summary Statistics for Job Codes           1         (Using a Format that Groups the Job Codes)                    The MEANS Procedure                Analysis Variable : Salary                        N  Jobcode             Obs            Mean         Maximum  ------------------------------------------------------- Baggage Checker       9        25794.22        26896.00  Flight Attendant     34        27404.71        33419.00  Mechanic             29        35274.24        43900.00  Navigator             8        45913.75        53798.00  Pilot                20        72176.25        91908.00  Skycap                7        18308.86        18833.00  Ticket Agent         41        34076.73        40899.00  ------------------------------------------------------- 

Example: Temporarily Associating a Format with a Variable

If you want to associate a format with a variable temporarily, then you can use the FORMAT statement. For example, the following PROC PRINT step associates the DOLLAR8. format with the variable Salary for the duration of this PROC PRINT step only:

 options nodate pageno=1          linesize=64 pagesize=40;  proc print data=proclib.payroll(obs=10)             noobs;     format salary dollar8.;     title 'Temporarily Associating a Format';     title2 'with the Variable Salary';  run; 

PROC PRINT produces this output:

 Temporarily Associating a Format                    1                   with the Variable Salary    Id  Number   Gender  Jobcode    Salary     Birth     Hired   1919      M       TA2     ,376   12SEP60   04JUN87   1653      F       ME2     ,108   15OCT64   09AUG90   1400      M       ME1     ,769   05NOV67   16OCT90   1350      F       FA3     ,886   31AUG65   29JUL90   1401      M       TA3     ,822   13DEC50   17NOV85   1499      M       ME3     ,025   26APR54   07JUN80   1101      M       SCP     ,723   06JUN62   01OCT90   1333      M       PT2     ,606   30MAR61   10FEB81   1402      M       TA2     ,615   17JAN63   02DEC90   1479      F       TA3     ,785   22DEC68   05OCT89 

Example: Temporarily Dissociating a Format from a Variable

If a variable has a permanent format that you do not want a procedure to use, then temporarily dissociate the format from the variable by using a FORMAT statement.

In this example, the FORMAT statement in the DATA step permanently associates the $YRFMT. variable with the variable Year. Thus, when you use the variable in a PROC step, the procedure uses the formatted values. The PROC MEANS step, however, contains a FORMAT statement that dissociates the $YRFMT. format from Year for this PROC MEANS step only. PROC MEANS uses the stored value for Year in the output.

 proc format;     value $yrfmt '1'='Freshman'                  '2'='Sophomore'                  '3'='Junior'                  '4'='Senior';  run;  data debate;      input Name $ Gender $ Year $ GPA @@;      format year $yrfmt.;      datalines;  Capiccio m 1 3.598 Tucker  m 1 3.901  Bagwell  f 2 3.722 Berry   m 2 3.198  Metcalf  m 2 3.342 Gold    f 3 3.609  Gray     f 3 3.177 Syme    f 3 3.883  Baglione f 4 4.000 Carr    m 4 3.750  Hall     m 4 3.574 Lewis   m 4 3.421  ;  options nodate pageno=1          linesize=64 pagesize=40;  proc means data=debate mean maxdec=2;     class year;     format year;     title 'Average GPA';  run; 

PROC MEANS produces this output, which does not use the YRFMT. format:

 Average GPA                 1        The MEANS Procedure     Analysis Variable : GPA                N  Year        Obs            Mean  ------------------------------- 1             2            3.75  2             3            3.42  3             3            3.56  4             4            3.69  ------------------------------- 

Formats and BY-Group Processing

When a procedure processes a data set, it checks to see if a format is assigned to the BY variable. If it is, then the procedure adds observations to the current BY groups until the formatted value changes. If nonconsecutive internal values of the BY variable(s) have the same formatted value, then the values are grouped into different BY groups. This results in two BY groups with the same formatted value. Further, if different and consecutive internal values of the BY variable(s) have the same formatted value, then they are included in the same BY group.

Formats and Error Checking

If SAS cannot find a format, then it stops processing and prints an error message in the SAS log. You can suppress this behavior with the SAS system option NOFMTERR. If you use NOFMTERR, and SAS cannot find the format, then SAS uses a default format and continues processing. Typically, for the default, SAS uses the BEST w . format for numeric variables and the $ w . format for character variables.

Note: To ensure that SAS can find user -written formats, use the SAS system option FMTSEARCH=. How to store formats is described in Storing Informats and Formats on page 456.

Processing All the Data Sets in a Library

You can use the SAS Macro Facility to run the same procedure on every data set in a library. The macro facility is part of base SAS software.

Example 9 on page 782 shows how to print all the data sets in a library. You can use the same macro definition to perform any procedure on all the data sets in a library. Simply replace the PROC PRINT piece of the program with the appropriate procedure code.

Operating Environment-Specific Procedures

Several base SAS procedures are specific to one operating environment or one release. Appendix 2, Operating Environment-Specific Procedures, on page 1389 contains a table with additional information. These procedures are described in more detail in the SAS documentation for operating environments.

Statistic Descriptions

Table 2.1 on page 31 identifies common descriptive statistics that are available in several Base SAS procedures. See Keywords and Formulas on page 1354 for more detailed information about available statistics and theoretical information.

Table 2.1: Common Descriptive Statistics That Base Procedures Calculate

Statistic

Description

Procedures

confidence intervals

 

FREQ, MEANS/SUMMARY, TABULATE, UNIVARIATE

CSS

corrected sum of squares

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

CV

coefficient of variation

MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

goodness-of-fit tests

 

FREQ, UNIVARIATE

KURTOSIS

kurtosis

MEANS/SUMMARY, TABULATE, UNIVARIATE

MAX

largest (maximum) value

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

MEAN

mean

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

MEDIAN

median (50 th percentile)

CORR (for nonparametric correlation measures), MEANS/SUMMARY, TABULATE, UNIVARIATE

MIN

smallest (minimum) value

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

MODE

most frequent value (if not unique, the smallest mode is used)

UNIVARIATE

N

number of observations on which calculations are based

CORR, FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

NMISS

number of missing values

FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

NOBS

number of observations

MEANS/SUMMARY, UNIVARIATE

PCTN

the percentage of a cell or row frequency to a total frequency

REPORT, TABULATE

PCTSUM

the percentage of a cell or row sum to a total sum

REPORT, TABULATE

Pearson correlation

 

CORR

percentiles

 

FREQ, MEANS/SUMMARY, REPORT, TABULATE, UNIVARIATE

RANGE

range

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

robust statistics

trimmed means, Winsorized means

UNIVARIATE

SKEWNESS

skewness

MEANS/SUMMARY, TABULATE, UNIVARIATE

Spearman correlation

 

CORR

STD

standard deviation

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

STDERR

the standard error of the mean

MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

SUM

sum

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

SUMWGT

sum of weights

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

tests of location

 

UNIVARIATE

USS

uncorrected sum of squares

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

VAR

variance

CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE

Computational Requirements for Statistics

The following requirements are computational requirements for the statistics that are listed in Table 2.1 on page 31. They do not describe recommended sample sizes.

  • N and NMISS do not require any nonmissing observations.

  • SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing observation.

  • VAR, STD, STDERR, and CV require at least two observations.

  • CV requires that MEAN is not equal to zero.

Statistics are reported as missing if they cannot be computed.




Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net