How the DATA Step Identifies BY Groups


Processing Observations in a BY Group

In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST. variable and LAST. variable . These temporary variables are available for DATA step programming but are not added to the output data set. Their values indicate whether an observation is

  • the first one in a BY group

  • the last one in a BY group

  • neither the first nor the last one in a BY group

  • both first and last, as is the case when there is only one observation in a BY group.

You can take actions conditionally, based on whether you are processing the first or the last observation in a BY group.

How SAS Determines FIRST.VARIABLE and LAST.VARIABLE

When an observation is the first in a BY group, SAS sets the value of FIRST. variable to 1 for the variable whose value changed, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of FIRST. variable is 0. Likewise, if the observation is the last in a BY group, SAS sets the value of LAST. variable to 1 for the variable whose value changes on the next observation, as well as for all of the variables that follow in the BY statement. For all other observations in the BY group, the value of LAST. variable is 0. For the last observation in a data set, the value of all LAST. variable variables are set to 1.

Grouping Observations by State, City, Zip Code, and Street

This example shows how SAS uses the FIRST. variable and LAST. variable to flag the beginning and end of four BY groups: State, City, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows , observations in the SAS data set are arranged in an order that can be used with this BY statement:

 by State City ZipCode; 

SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.

Observations in Four BY Groups

Corresponding FIRST. and LAST. Values

State

City

ZipCode

Street

FIRST. State

LAST. State

FIRST. City

LAST. City

FIRST. ZipCode

LAST. ZipCode

AZ

Tucson

85730

Glen Pl

1

1

1

1

1

1

FL

Miami

33133

Rice St

1

1

1

FL

Miami

33133

Tom Ave

FL

Miami

33133

Surrey Dr

1

FL

Miami

33146

Nervia St

1

FL

Miami

33146

Corsica St

1

1

1

OH

Miami

45056

Myrtle St

1

1

1

1

1

1

Grouping Observations by City, State, Zip Code, and Street

This example shows how SAS uses the FIRST. variable and LAST. variable to flag the beginning and end of four BY groups: City, State, ZipCode, and Street. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement:

 by City State ZipCode; 

SAS creates the following temporary variables: FIRST.City, LAST.City, FIRST.State, LAST.State, FIRST.ZipCode, and LAST.ZipCode.

Observations in Four BY Groups

Corresponding FIRST. and LAST. Values

City

State

ZipCode

Street

FIRST. City

LAST. City

FIRST. State

LAST. State

FIRST. ZipCode

LAST. ZipCode

Miami

FL

33133

Rice St

1

1

1

Miami

FL

33133

Tom Ave

Miami

FL

33133

Surrey Dr

1

Miami

FL

33146

Nervia St

1

Miami

FL

33146

Corsica St

1

1

Miami

OH

45056

Myrtle St

1

1

1

1

1

Tucson

AZ

85730

Glen Pl

1

1

1

1

1

1

Grouping Observations: Another Example

The value of FIRST. variable can be affected by a change in a previous value, even if the current value of the variable remains the same.

In this example, the value of FIRST. variable and LAST. variable are dependent on sort order, and not just by the value of the BY variable. For observation 3, the value of FIRST. Y is set to 1 because BLUEBERRY is a new value for Y . This change in Y causes FIRST. Z to be set to 1 as well, even though the value of Z did not change.

 options pageno=1 nodate linesize=80 pagesize=60;  data testfile;     input x $ y $ 9-17 z $ 19-26;     datalines;  apple   banana    coconut  apple   banana    coconut  apricot blueberry citron  ;  data _null_;     set testfile;     by x y z;     if _N_=1 then put 'Grouped by X Y Z';     put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;  run;  data _null_;     set testfile;     by y x z;     if _N_=1 then put 'Grouped by Y X Z';     put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;  run; 
Output 22.1: Partial SAS Log Showing the Results of Processing with BY Variables
start example
 Grouped by X Y Z  _N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0  _N_=2 x=Apple FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1  _N_=3 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1  _N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1  Grouped by Y X Z  _N_=1 x=Apple FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0  _N_=2 x=Apple FIRST.x=0 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1  _N_=3 x=Apple FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=1  _N_=4 x=Apricot FIRST.x=1 LAST.x=1 FIRST.y=0 LAST.y=1 FIRST.z=1 LAST.z=1 
end example
 



SAS 9.1 Language Reference. Concepts
SAS 9.1 Language Reference Concepts
ISBN: 1590471989
EAN: 2147483647
Year: 2004
Pages: 255

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net