About DATA Step Execution


The Default Sequence of Execution in the DATA Step

The following table outlines the default sequence of execution for statements in a DATA step. The DATA statement begins the step and identifies usually one or more SAS data sets that the step will create. (You can use the keyword _NULL_ as the data set name if you do not want to create an output data set.) Optional programming statements process your data. SAS then performs the default actions at the end of processing an observation.

Table 20.1: Default Execution for Statements in a DATA Step

Structure of a DATA Step

Action Taken

DATA statement

begins the step

counts iterations

Data-reading statements: [*]

 
 

INPUT

describes the arrangement of values in the input data record from a raw data source

 

SET

reads an observation from one or more SAS data sets

 

MERGE

joins observations from two or more SAS data sets into a single observation

 

MODIFY

replaces , deletes, or appends observations in an existing SAS data set in place

 

UPDATE

updates a master file by applying transactions

Optional SAS programming statements, for example:

further processes the data for the current observation.

 

FirstQuarter=Jan+Feb+Mar;

if RetailPrice < 500;

computes the value for FirstQuarter for the current observation.

   

subsets by value of variable RetailPrice for the current observation

Default actions at the end of processing an observation

 
 

At end of DATA step:

writes an observation to a SAS data set

 

Automatic write, automatic return

returns to the DATA statement

 

At top of DATA step:

resets values to missing in program data vector

 

Automatic reset

 

[*] The table shows the default processing of the DATA step. You can alter the sequence of statements in the DATA step. You can code optional programming statements, such as creating or reinitializing a constant, before you code a data-reading statement.

Note: You can also use functions to read and process data. For information about how statements and functions process data differently, see 'Using Functions to Manipulate Files' on page 42. For specific information about SAS functions, see the SAS I/O Files and External Files categories in 'Functions and CALL Routines by Category' in SAS Language Reference: Dictionary .

Changing the Default Sequence of Execution

Using Statements to Change the Default Sequence of Execution

You can change the default sequence of execution to control how your program executes. SAS language statements offer you a lot of flexibility to do this in a DATA step. The following list shows the most common ways to control the flow of execution in a DATA step program.

Table 20.2: Common Methods that Alter the Sequence of Execution

Task

Possible Methods

Read a record

merge, modify, join data sets

read multiple records to create a single observation

randomly select records for processing

read from multiple external files

read selected fields from a record by using statement or data set options

Process data

use conditional logic

retain variable values

Write an observation

write to a SAS data set or to an external file

control when output is written to a data set

write to multiple files

For more information, see the individual statements in SAS Language Reference: Dictionary .

Using Functions to Change the Default Sequence of Execution

You can also use functions to read and process data. For information about how statements and functions process data differently, see 'Using Functions to Manipulate Files' on page 42. For specific information about SAS functions, see the SAS I/O Files and External Files categories in 'Functions and CALL Routines by Category' in SAS Language Reference: Dictionary .

Altering the Flow for a Given Observation

You can use statements, statement options, and data set options to alter the way SAS processes specific observations. The following table lists SAS language elements and their effects on processing.

Table 20.3: Language Elements that Alter Programming Flow

SAS Language Element

Function

subsetting IF statement

stops the current iteration when a condition is false, does not write the current observationto the data set, and returns control to the top of the DATA step.

IF-THEN/ELSE statement

stops the current iteration when a conditon is true, writes the current observation to the data set, and returns control to the top of the DATA step.

DO loops

cause parts of the DATA step to be executed multiple times.

LINK and RETURN statements

alter the flow of control, execute statements following the label specified, and return control of the program to the next statement following the LINK statement.

HEADER= option in the FILE statement

alters the flow of control whenever a PUT statement causes a new page of output to begin; statements following the label specified in the HEADER= option are executed until a RETURN statement is encountered , at which time control returns to the point from which the HEADER= option was activated.

GO TO statement

alters the flow of execution by branching to the label that is specified in the GO TO statement. SAS executes subsequent statements then returns control to the beginning of the DATA step.

EOF= option in an INFILE

statement alters the flow of execution when the end of the input file is reached; statements following the label that is specified in the EOF= option are executed at that time.

_N_ automatic variable in an IF-THEN construct

causes parts of the DATA step to execute only for particular iterations.

SELECT statement

conditionally executes one of a group of SAS statements.

OUTPUT statement in an IF-THEN construct

outputs an observation before the end of the DATA step, based on a condition; prevents automatic output at the bottom of the DATA step.

DELETE statement in an IF-THEN construct

deletes an observation based on a condition and causes a return to the top of the DATA step.

ABORT statement in an IF-THEN construct

stops execution of the DATA step and instruct SAS to resume execution with the next DATA or PROC step. It can also stop executing a SAS program altogether, depending on the options specified in the ABORT statement and on the method of operation.

WHERE statement or WHERE= data set option

causes SAS to read certain observations based on one or more specified criteria.

Step Boundary - How To Know When Statements Take Effect

Understanding step boundaries is an important concept in SAS programming because step boundaries determine when SAS statements take effect. SAS executes program statements only when SAS crosses a default or an explicit step boundary. Consider the following DATA steps:

 data _null_; [1]     set allscores(drop=score5-score7);     title 'Student Test Scores'; [2]  data employees; [3]     set employee_list;  run; 
[1]  

The DATA statement begins a DATA step and is a step boundary.

[2]  

The TITLE statement is in effect for both DATA steps because it appears before the boundary of the first DATA step. (Because the TITLE statement is a global statement,

[3]  

The DATA statement is the default boundary for the first DATA step.

The TITLE statement in this example is in effect for the first DATA step as well as for the second because the TITLE statement appears before the boundary of the first DATA step. This example uses the default step boundary data employees; .

The following example shows an OPTIONS statement inserted after a RUN statement.

 data scores; [1]     set allscores(drop=score5-score7);  run; [2]  options firstobs=5 obs=55; [3]  data test;     set alltests;  run; 

The OPTIONS statement specifies that the first observation that is read from the input data set should be the 5th, and the last observation that is read should be the 55th. Inserting a RUN statement immediately before the OPTIONS statement causes the first DATA step to reach its boundary ( run;) before SAS encounters the OPTIONS statement. In this case, the step boundary is explicit. The OPTIONS statement settings, therefore, are put into effect for the second DATA step only.

[1]  

The DATA statement is a step boundary.

[2]  

The RUN statement is the explicit boundary for the first DATA step.

[3]  

The OPTIONS statement affects the second DATA step only.

Following the statements in a DATA step with a RUN statement is the simplest way to make the step begin to execute, but a RUN statement is not always necessary. SAS recognizes several step boundaries for a SAS step:

  • another DATA statement

  • a PROC statement

  • a RUN statement.

    Note: For SAS programs executed in interactive mode, a RUN statement is required to signal the step boundary for the last step you submit.

  • the semicolon (with a DATALINES or CARDS statement) or four semicolons (with a DATALINES4 or CARDS4 statement) after data lines

  • an ENDSAS statement

  • in noninteractive or batch mode, the end of a program file containing SAS programming statements

  • a QUIT statement (for some procedures).

When you submit a DATA step during interactive processing, it does not begin running until SAS encounters a step boundary. This fact enables you to submit statements as you write them while preventing a step from executing until you have entered all the statements.

What Causes a DATA Step to Stop Executing

DATA steps stop executing under different circumstances, depending on the type and number of sources of input.

Table 20.4: Causes that Stop DATA Step Execution

Data Read

Data Source

SAS Statements

DATA Step Stops

no data

   

after only one iteration

any data

   

when it executes STOP or ABORT

     

when the data is exhausted

raw data

instream data lines

INPUT statement

after the last data line is read

 

one external file

INPUT and INFILE statements

when end-of-file is reached

 

multiple external files

INPUT and INFILE statements

when end-of-file is first reached on any of the files

observations sequentially

one SAS data set

SET and MODIFY statements

after the last observation is read

 

multiple SAS data sets

one SET, MERGE, MODIFY, or UPDATE statement

when all input data sets are exhausted

 

multiple SAS data sets

multiple SET, MERGE, MODIFY, or UPDATE statements

when end-of-file is reached by any of the data-reading statements

A DATA step that reads observations from a SAS data set with a SET statement that uses the POINT= option has no way to detect the end of the input SAS data set. (This method is called direct or random access.) Such a DATA step usually requires a STOP statement.

A DATA step also stops when it executes a STOP or an ABORT statement. Some system options and data set options, such as OBS=, can cause a DATA step to stop earlier than it would otherwise .




SAS 9.1 Language Reference. Concepts
SAS 9.1 Language Reference Concepts
ISBN: 1590471989
EAN: 2147483647
Year: 2004
Pages: 255

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net