Processing a DATA Step: A Walkthrough


Sample DATA Step

The following statements provide an example of a DATA step that reads raw data, calculates totals, and creates a data set:

 data total_points (drop=TeamName); [1]     input TeamName $ ParticipantName $ Event1 Event2 Event3; [2]     TeamTotal + (Event1 + Event2 + Event3); [3]     datalines;  Knights Sue    6   8  8  Cardinals Jane 9   7  8  Knights John   7   7  7  Knights Lisa   8   9  9  Knights Fran   7   6  6  Knights Walter 9   8 10  ; 
[1]  

The DROP= data set option prevents the variable TeamName from being written to the output SAS data set called TOTAL_POINTS.

[2]  

The INPUT statement describes the data by giving a name to each variable, identifying its data type (character or numeric), and identifying its relative location in the data record.

[3]  

The Sum statement accumulates the scores for three events in the variable TeamTotal.

Creating the Input Buffer and the Program Data Vector

When DATA step statements are compiled, SAS determines whether to create an input buffer. If the input file contains raw data (as in the example above), SAS creates an input buffer to hold the data before moving the data to the program data vector (PDV). (If the input file is a SAS data set, however, SAS does not create an input buffer. SAS writes the input data directly to the PDV.)

The PDV contains all the variables in the input data set, the variables created in DATA step statements, and the two variables , _N_ and _ERROR_, that are automatically generated for every DATA step. The _N_ variable represents the number of times the DATA step has iterated. The _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist.

The following figure shows the Input Buffer and the program data vector after DATA step compilation.

click to expand
Figure 20.2: Input Buffer and Program Data Vector

Variables that are created by the INPUT and the Sum statements (TeamName, ParticipantName, Event1, Event2, Event3, and TeamTotal) are set to missing initially. Note that in this representation, numeric variables are initialized with a period and character variables are initialized with blanks. The automatic variable _N_ is set to 1; the automatic variable _ERROR_ is set to 0.

The variable TeamName is marked Drop in the PDV because of the DROP= data set option in the DATA statement. Dropped variables are not written to the SAS data set. The _N_ and _ERROR_ variables are dropped because automatic variables created by the DATA step are not written to a SAS data set. See Chapter 5, 'SAS Variables,' on page 77 for details about automatic variables.

Reading a Record

SAS reads the first data line into the input buffer. The input pointer , which SAS uses to keep its place as it reads data from the input buffer, is positioned at the beginning of the buffer, ready to read the data record. The following figure shows the position of the input pointer in the input buffer before SAS reads the data.

click to expand
Figure 20.3: Position of the Pointer in the Input Buffer Before SAS Reads Data

The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values . The following figure shows both the position of the pointer in the input buffer, and the values in the PDV after SAS reads the first record.

click to expand
Figure 20.4: Values from the First Record are Read into the Program Data Vector

After the INPUT statement reads a value for each variable, SAS executes the Sum statement. SAS computes a value for the variable TeamTotal and writes it to the PDV. The following figure shows the PDV with all of its values before SAS writes the observation to the data set.

click to expand
Figure 20.5: Program Data Vector with Computed Value of the Sum Statement

Writing an Observation to the SAS Data Set

When SAS executes the last statement in the DATA step, all values in the PDV, except those marked to be dropped, are written as a single observation to the data set TOTAL_POINTS. The following figure shows the first observation in the TOTAL_POINTS data set.

click to expand
Figure 20.6: The First Observation in Data Set TOTAL_POINTS

SAS then returns to the DATA statement to begin the next iteration. SAS resets the values in the PDV in the following way:

  • The values of variables created by the INPUT statement are set to missing.

  • The value created by the Sum statement is automatically retained.

  • The value of the automatic variable _N_ is incremented by 1, and the value of _ERROR_ is reset to 0.

The following figure shows the current values in the PDV.

click to expand
Figure 20.7: Current Values in the Program Data Vector

Reading the Next Record

SAS reads the next record into the input buffer. The INPUT statement reads the data values from the input buffer and writes them to the PDV. The Sum statement adds the values of Event1, Event2, and Event3 to TeamTotal. The value of 2 for variable _N_ indicates that SAS is beginning the second iteration of the DATA step. The following figure shows the input buffer, the PDV for the second record, and the SAS data set with the first two observations.

click to expand
Figure 20.8: Input Buffer, Program Data Vector, and First Two Observations

As SAS continues to read records, the value in TeamTotal grows larger as more participant scores are added to the variable. _N_ is incremented at the beginning of each iteration of the DATA step. This process continues until SAS reaches the end of the input file.

When the DATA Step Finishes Executing

The DATA step stops executing after it processes the last input record. You can use PROC PRINT to print the output in the TOTAL_POINTS data set:

Output 20.1: Output from the Walkthrough DATA Step
start example
 Total Team Scores                           1          Participant                                     Team  Obs        Name       Event1     Event2     Event3     Total   1        Sue            6          8          8         22   2        Jane           9          7          8         46   3        John           7          7          7         67   4        Lisa           8          9          9         93   5        Fran           7          6          6        112   6        Walter         9          8         10        139 
end example
 



SAS 9.1 Language Reference. Concepts
SAS 9.1 Language Reference Concepts
ISBN: 1590471989
EAN: 2147483647
Year: 2004
Pages: 255

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net