Chapter 22: BY-Group Processing in the DATA Step


Definitions for BY-Group Processing

BY-group processing

  • is a method of processing observations from one or more SAS data sets that are grouped or ordered by values of one or more common variables . The most common use of BY-group processing in the DATA step is to combine two or more SAS data sets by using the BY statement with a SET, MERGE, MODIFY, or UPDATE statement.

BY variable

  • names a variable or variables by which the data set is sorted or indexed. All data sets must be ordered or indexed on the values of the BY variable if you use the SET, MERGE, or UPDATE statements. If you use MODIFY, data does not need to be ordered. However, your program might run more efficiently with ordered data. All data sets that are being combined must include one or more BY variables. The position of the BY variable in the observations does not matter.

BY value

  • is the value or formatted value of the BY variable.

BY group

  • includes all observations with the same BY value. If you use more than one variable in a BY statement, a BY group is a group of observations with the same combination of values for these variables. Each BY group has a unique combination of values for the variables.

FIRST.variable and LAST.variable

  • are variables that SAS creates for each BY variable. SAS sets FIRST. variable when it is processing the first observation in a BY group, and sets LAST. variable when it is processing the last observation in a BY group. These assignments enable you to take different actions, based on whether processing is starting for a new BY group or ending for a BY group. For more information, see 'How the DATA Step Identifies BY Groups' on page 380.

For more information about BY-Group processing, see Chapter 23, 'Reading, Combining, and Modifying SAS Data Sets,' on page 387. See also Combining and Modifying SAS Data Sets: Examples .

Syntax for BY-Group Processing

Use one of the following forms for BY-group processing:

BY variable(s) ;

BY <DESCENDING> variable(s) <NOTSORTED> <GROUPFORMAT>;

where

variable

  • names each variable by which the data set is sorted or indexed.

    Note: All data sets must be ordered or indexed on the values of the BY variable if you process them using the SET, MERGE, or UPDATE statements. If you use the MODIFY statement, your data does not need to be ordered. However, your program might run more efficiently with ordered data. All data sets that are being combined must include the BY variable or variables. The position of the BY variable in the observations does not matter.

GROUPFORMAT

  • uses the formatted values, instead of the internal values, of the BY variables to determine where BY-groups begin and end, and therefore how FIRST. variable and LAST. variable are assigned. Although the GROUPFORMAT option can appear anywhere in the BY statement, the option applies to all variables in the BY statement.

DESCENDING

  • indicates that the data sets are sorted in descending order (largest to smallest) by the variable that is specified. If you have more that one variable in the BY group, DESCENDING applies only to the variable that immediately follows it.

NOTSORTED

  • specifies that observations with the same BY value are grouped together but are not necessarily stored in alphabetical or numeric order.

For complete information about the BY statement, see SAS Language Reference: Dictionary .




SAS 9.1 Language Reference. Concepts
SAS 9.1 Language Reference Concepts
ISBN: 1590471989
EAN: 2147483647
Year: 2004
Pages: 255

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net