Many base procedures require an input SAS data set. You specify the input SAS data set by using the DATA= option in the procedure statement, as in this example:
proc print data=emp;
If you omit the DATA= option, the procedure uses the value of the SAS system option _LAST_=. The default of _LAST_= is the most recently created SAS data set in the current SAS job or session. _LAST_= is described in detail in SAS Language Reference: Dictionary .
RUN-group processing enables you to submit a PROC step with a RUN statement without ending the procedure. You can continue to use the procedure without issuing another PROC statement. To end the procedure, use a RUN CANCEL or a QUIT statement. Several base SAS procedures support RUN-group processing:
CATALOG
DATASETS
PLOT
PMENU
TRANTAB
See the section on the individual procedure for more information.
Note: PROC SQL executes each query automatically. Neither the RUN nor RUN CANCEL statement has any effect.
BY-group processing uses a BY statement to process observations that are ordered, grouped, or indexed according to the values of one or more variables . By default, when you use BY-group processing in a procedure step, a BY line identifies each group. This section explains how to create titles that serve as customized BY lines.
When you insert BY-group processing information into a title, you usually want to eliminate the default BY line. To suppress it, use the SAS system option NOBYLINE.
Note: You must use the NOBYLINE option if you insert BY-group information into titles for the following base SAS procedures:
MEANS
STANDARD
SUMMARY
If you use the BY statement with the NOBYLINE option, then these procedures always start a new page for each BY group. This behavior prevents multiple BY groups from appearing on a single page and ensures that the information in the titles matches the report on the pages.
The general form for inserting BY-group information into a title is
#BY-specification<.suffix>
BY-specification
is one of the following:
BYVAL n BYVAL( BY-variable )
places the value of the specified BY variable in the title. You specify the BY variable with one of the following:
n
is the n th BY variable in the BY statement.
BY-variable
is the name of the BY variable whose value you want to insert in the title.
BYVAR n BYVAR( BY-variable )
places the label or the name (if no label exists) of the specified BY variable in the title. You designate the BY variable with one of the following:
n
is the n th BY variable in the BY statement.
BY-variable
is the name of the BY variable whose name you want to insert in the title.
BYLINE
inserts the complete default BY line into the title.
suffix
supplies text to place immediately after the BY-group information that you insert in the title. No space appears between the BY-group information and the suffix.
This example
[1] | creates a data set, GROC, that contains data for stores from four regions . Each store has four departments. See GROC on page 1402 for the DATA step that creates the data set. |
[2] | sorts the data by Region and Department. |
[3] | uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing. |
[4] | uses PROC CHART to chart sales by Region and Department. In the first TITLE statement, #BYVAL2 inserts the value of the second BY variable, Department, into the title. In the second TITLE statement, #BYVAL(Region) inserts the value of Region into the title. The first period after Region indicates that a suffix follows . The second period is the suffix. |
[5] | uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing. |
data groc; [1] input Region . Manager $ Department $ Sales; datalines; Southeast Hayes Paper 250 Southeast Hayes Produce 100 Southeast Hayes Canned 120 Southeast Hayes Meat 80 ... more lines of data ... Northeast Fuller Paper 200 Northeast Fuller Produce 300 Northeast Fuller Canned 420 Northeast Fuller Meat 125 ; proc sort data=groc; [2] by region department; run; options nobyline nodate pageno=1 linesize=64 pagesize=20; [3] proc chart data=groc; [4] by region department; vbar manager / type=sum sumvar=sales; title1 'This chart shows #byval2 sales'; title2 'in the #byval(region)..'; run; options byline; [5]
This partial output shows two BY groups with customized BY lines:
This chart shows Canned sales 1 in the Northwest. Sales Sum 400 + ***** ***** ***** ***** 300 + ***** ***** ***** ***** ***** 200 + ***** ***** ***** ***** ***** ***** 100 + ***** ***** ***** ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
This chart shows Meat sales 2 in the Northwest. Sales Sum 75 + ***** ***** ***** ***** 60 + ***** ***** ***** ***** 45 + ***** ***** ***** ***** 30 + ***** ***** ***** ***** ***** ***** 15 + ***** ***** ***** ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
This example inserts the name of a BY variable and the value of a BY variable into the title. The program
[1] | uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing. |
[2] | uses PROC CHART to chart sales by Region. In the first TITLE statement, #BYVAR(Region) inserts the name of the variable Region into the title. (If Region had a label, #BYVAR would use the label instead of the name.) The suffix al is appended to the label. In the second TITLE statement, #BYVAL1 inserts the value of the first BY variable, Region, into the title. |
[3] | uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing. |
options nobyline nodate pageno=1 linesize=64 pagesize=20; [1] proc chart data=groc; [2] by region; vbar manager / type=mean sumvar=sales; title1 '#byvar(region).al Analysis'; title2 'for the #byval1'; run; options byline; [3]
This partial output shows one BY group with a customized BY line:
Regional Analysis 1 for the Northwest Sales Mean 300 + ***** ***** 200 + ***** ***** 100 + ***** ***** ***** ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
This example inserts the complete BY line into the title. The program
[1] | uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing. |
[2] | uses PROC CHART to chart sales by Region and Department. In the TITLE statement, #BYLINE inserts the complete BY line into the title. |
[3] | uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing. |
options nobyline nodate pageno=1 linesize=64 pagesize=20; [1] proc chart data=groc; [2] by region department; vbar manager / type=sum sumvar=sales; title 'Information for #byline'; run; options byline; [3]
This partial output shows two BY groups with customized BY lines:
Information for Region=Northwest Department=Canned 1 Sales Sum 400 + ***** ***** ***** ***** 300 + ***** ***** ***** ***** ***** 200 + ***** ***** ***** ***** ***** ***** 100 + ***** ***** ***** ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
Information for Region=Northwest Department=Meat 2 Sales Sum 75 + ***** ***** ***** ***** 60 + ***** ***** ***** ***** 45 + ***** ***** ***** ***** 30 + ***** ***** ***** ***** ***** ***** 15 + ***** ***** ***** ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
SAS does not issue error or warning messages for incorrect #BYVAL, #BYVAR, or #BYLINE specifications. Instead, the text of the item simply becomes part of the title.
Several statements in procedures allow multiple variable names. You can use these shortcut notations instead of specifying each variable name:
Notation | Meaning |
---|---|
x1-x n | specifies variables X1 through X n . The numbers must be consecutive. |
x: | specifies all variables that begin with the letter X. |
x--a | specifies all variables between X and A, inclusive. This notation uses the position of the variables in the data set. |
x-numeric-a | specifies all numeric variables between X and A, inclusive. This notation uses the position of the variables in the data set. |
x-character-a | specifies all character variables between X and A, inclusive. This notation uses the position of the variables in the data set. |
_numeric_ | specifies all numeric variables. |
_character_ | specifies all character variables. |
_all_ | specifies all variables. |
Note: You cannot use shortcuts to list variable names in the INDEX CREATE statement in PROC DATASETS.
See SAS Language Reference: Concepts for complete documentation.
Typically, when you print or group variable values, base SAS procedures use the formatted values. This section contains examples of how base procedures use formatted values.
The following example prints the formatted values of the data set PROCLIB.PAYROLL. (See PROCLIB.PAYROLL on page 1409 for the DATA step that creates this data set.) In PROCLIB.PAYROLL, the variable Jobcode indicates the job and level of the employee. For example, TA1 indicates that the employee is at the beginning level for a ticket agent.
libname proclib ' SAS-data-library '; options nodate pageno=1 linesize=64 pagesize=40; proc print data=proclib.payroll(obs=10) noobs; title 'PROCLIB.PAYROLL'; title2 'First 10 Observations Only'; run;
This is a partial printing of PROCLIB.PAYROLL:
PROCLIB.PAYROLL 1 First 10 Observations Only Id Number Gender Jobcode Salary Birth Hired 1919 M TA2 34376 12SEP60 04JUN87 1653 F ME2 35108 15OCT64 09AUG90 1400 M ME1 29769 05NOV67 16OCT90 1350 F FA3 32886 31AUG65 29JUL90 1401 M TA3 38822 13DEC50 17NOV85 1499 M ME3 43025 26APR54 07JUN80 1101 M SCP 18723 06JUN62 01OCT90 1333 M PT2 88606 30MAR61 10FEB81 1402 M TA2 32615 17JAN63 02DEC90 1479 F TA3 38785 22DEC68 05OCT89
The following PROC FORMAT step creates the format $JOBFMT., which assigns descriptive names for each job:
proc format; value $jobfmt 'FA1'='Flight Attendant Trainee' 'FA2'='Junior Flight Attendant' 'FA3'='Senior Flight Attendant' 'ME1'='Mechanic Trainee' 'ME2'='Junior Mechanic' 'ME3'='Senior Mechanic' 'PT1'='Pilot Trainee' 'PT2'='Junior Pilot' 'PT3'='Senior Pilot' 'TA1'='Ticket Agent Trainee' 'TA2'='Junior Ticket Agent' 'TA3'='Senior Ticket Agent' 'NA1'='Junior Navigator' 'NA2'='Senior Navigator' 'BCK'='Baggage Checker' 'SCP'='Skycap'; run;
The FORMAT statement in this PROC MEANS step temporarily associates the $JOBFMT. format with the variable Jobcode:
options nodate pageno=1 linesize=64 pagesize=60; proc means data=proclib.payroll mean max; class jobcode; var salary; format jobcode $jobfmt.; title 'Summary Statistics for'; title2 'Each Job Code'; run;
PROC MEANS produces this output, which uses the $JOBFMT. format:
Summary Statistics for 1 Each Job Code The MEANS Procedure Analysis Variable : Salary N Jobcode Obs Mean Maximum --------------------------------------------------------------- Baggage Checker 9 25794.22 26896.00 Flight Attendant Trainee 11 23039.36 23979.00 Junior Flight Attendant 16 27986.88 28978.00 Senior Flight Attendant 7 32933.86 33419.00 Mechanic Trainee 8 28500.25 29769.00 Junior Mechanic 14 35576.86 36925.00 Senior Mechanic 7 42410.71 43900.00 Junior Navigator 5 42032.20 43433.00 Senior Navigator 3 52383.00 53798.00 Pilot Trainee 8 67908.00 71349.00 Junior Pilot 10 87925.20 91908.00 Senior Pilot 2 10504.50 11379.00 Skycap 7 18308.86 18833.00 Ticket Agent Trainee 9 27721.33 28880.00 Junior Ticket Agent 20 33574.95 34803.00 Senior Ticket Agent 12 39679.58 40899.00 ---------------------------------------------------------------
Note: Because formats are character strings, formats for numeric variables are ignored when the values of the numeric variables are needed for mathematical calculations.
If you use a formatted variable to group or classify data, then the procedure uses the formatted values. The following example creates and assigns a format, $CODEFMT., that groups the levels of each job code into one category. PROC MEANS calculates statistics based on the groupings of the $CODEFMT. format.
proc format; value $codefmt 'FA1','FA2','FA3'='Flight Attendant' 'ME1','ME2','ME3'='Mechanic' 'PT1','PT2','PT3'='Pilot' 'TA1','TA2','TA3'='Ticket Agent' 'NA1','NA2'='Navigator' 'BCK'='Baggage Checker' 'SCP'='Skycap'; run; options nodate pageno=1 linesize=64 pagesize=40; proc means data=proclib.payroll mean max; class jobcode; var salary; format jobcode $codefmt.; title 'Summary Statistics for Job Codes'; title2 '(Using a Format that Groups the Job Codes)'; run;
PROC MEANS produces this output:
Summary Statistics for Job Codes 1 (Using a Format that Groups the Job Codes) The MEANS Procedure Analysis Variable : Salary N Jobcode Obs Mean Maximum ------------------------------------------------------- Baggage Checker 9 25794.22 26896.00 Flight Attendant 34 27404.71 33419.00 Mechanic 29 35274.24 43900.00 Navigator 8 45913.75 53798.00 Pilot 20 72176.25 91908.00 Skycap 7 18308.86 18833.00 Ticket Agent 41 34076.73 40899.00 -------------------------------------------------------
If you want to associate a format with a variable temporarily, then you can use the FORMAT statement. For example, the following PROC PRINT step associates the DOLLAR8. format with the variable Salary for the duration of this PROC PRINT step only:
options nodate pageno=1 linesize=64 pagesize=40; proc print data=proclib.payroll(obs=10) noobs; format salary dollar8.; title 'Temporarily Associating a Format'; title2 'with the Variable Salary'; run;
PROC PRINT produces this output:
Temporarily Associating a Format 1 with the Variable Salary Id Number Gender Jobcode Salary Birth Hired 1919 M TA2 ,376 12SEP60 04JUN87 1653 F ME2 ,108 15OCT64 09AUG90 1400 M ME1 ,769 05NOV67 16OCT90 1350 F FA3 ,886 31AUG65 29JUL90 1401 M TA3 ,822 13DEC50 17NOV85 1499 M ME3 ,025 26APR54 07JUN80 1101 M SCP ,723 06JUN62 01OCT90 1333 M PT2 ,606 30MAR61 10FEB81 1402 M TA2 ,615 17JAN63 02DEC90 1479 F TA3 ,785 22DEC68 05OCT89
If a variable has a permanent format that you do not want a procedure to use, then temporarily dissociate the format from the variable by using a FORMAT statement.
In this example, the FORMAT statement in the DATA step permanently associates the $YRFMT. variable with the variable Year. Thus, when you use the variable in a PROC step, the procedure uses the formatted values. The PROC MEANS step, however, contains a FORMAT statement that dissociates the $YRFMT. format from Year for this PROC MEANS step only. PROC MEANS uses the stored value for Year in the output.
proc format; value $yrfmt '1'='Freshman' '2'='Sophomore' '3'='Junior' '4'='Senior'; run; data debate; input Name $ Gender $ Year $ GPA @@; format year $yrfmt.; datalines; Capiccio m 1 3.598 Tucker m 1 3.901 Bagwell f 2 3.722 Berry m 2 3.198 Metcalf m 2 3.342 Gold f 3 3.609 Gray f 3 3.177 Syme f 3 3.883 Baglione f 4 4.000 Carr m 4 3.750 Hall m 4 3.574 Lewis m 4 3.421 ; options nodate pageno=1 linesize=64 pagesize=40; proc means data=debate mean maxdec=2; class year; format year; title 'Average GPA'; run;
PROC MEANS produces this output, which does not use the YRFMT. format:
Average GPA 1 The MEANS Procedure Analysis Variable : GPA N Year Obs Mean ------------------------------- 1 2 3.75 2 3 3.42 3 3 3.56 4 4 3.69 -------------------------------
When a procedure processes a data set, it checks to see if a format is assigned to the BY variable. If it is, then the procedure adds observations to the current BY groups until the formatted value changes. If nonconsecutive internal values of the BY variable(s) have the same formatted value, then the values are grouped into different BY groups. This results in two BY groups with the same formatted value. Further, if different and consecutive internal values of the BY variable(s) have the same formatted value, then they are included in the same BY group.
If SAS cannot find a format, then it stops processing and prints an error message in the SAS log. You can suppress this behavior with the SAS system option NOFMTERR. If you use NOFMTERR, and SAS cannot find the format, then SAS uses a default format and continues processing. Typically, for the default, SAS uses the BEST w . format for numeric variables and the $ w . format for character variables.
Note: To ensure that SAS can find user -written formats, use the SAS system option FMTSEARCH=. How to store formats is described in Storing Informats and Formats on page 456.
You can use the SAS Macro Facility to run the same procedure on every data set in a library. The macro facility is part of base SAS software.
Example 9 on page 782 shows how to print all the data sets in a library. You can use the same macro definition to perform any procedure on all the data sets in a library. Simply replace the PROC PRINT piece of the program with the appropriate procedure code.
Several base SAS procedures are specific to one operating environment or one release. Appendix 2, Operating Environment-Specific Procedures, on page 1389 contains a table with additional information. These procedures are described in more detail in the SAS documentation for operating environments.
Table 2.1 on page 31 identifies common descriptive statistics that are available in several Base SAS procedures. See Keywords and Formulas on page 1354 for more detailed information about available statistics and theoretical information.
Statistic | Description | Procedures |
---|---|---|
confidence intervals | FREQ, MEANS/SUMMARY, TABULATE, UNIVARIATE | |
CSS | corrected sum of squares | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
CV | coefficient of variation | MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
goodness-of-fit tests | FREQ, UNIVARIATE | |
KURTOSIS | kurtosis | MEANS/SUMMARY, TABULATE, UNIVARIATE |
MAX | largest (maximum) value | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
MEAN | mean | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
MEDIAN | median (50 th percentile) | CORR (for nonparametric correlation measures), MEANS/SUMMARY, TABULATE, UNIVARIATE |
MIN | smallest (minimum) value | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
MODE | most frequent value (if not unique, the smallest mode is used) | UNIVARIATE |
N | number of observations on which calculations are based | CORR, FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
NMISS | number of missing values | FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
NOBS | number of observations | MEANS/SUMMARY, UNIVARIATE |
PCTN | the percentage of a cell or row frequency to a total frequency | REPORT, TABULATE |
PCTSUM | the percentage of a cell or row sum to a total sum | REPORT, TABULATE |
Pearson correlation | CORR | |
percentiles | FREQ, MEANS/SUMMARY, REPORT, TABULATE, UNIVARIATE | |
RANGE | range | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
robust statistics | trimmed means, Winsorized means | UNIVARIATE |
SKEWNESS | skewness | MEANS/SUMMARY, TABULATE, UNIVARIATE |
Spearman correlation | CORR | |
STD | standard deviation | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
STDERR | the standard error of the mean | MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
SUM | sum | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
SUMWGT | sum of weights | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
tests of location | UNIVARIATE | |
USS | uncorrected sum of squares | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
VAR | variance | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
The following requirements are computational requirements for the statistics that are listed in Table 2.1 on page 31. They do not describe recommended sample sizes.
N and NMISS do not require any nonmissing observations.
SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing observation.
VAR, STD, STDERR, and CV require at least two observations.
CV requires that MEAN is not equal to zero.
Statistics are reported as missing if they cannot be computed.