To conditionally select observations from a SAS data set, you can use either a WHERE expression or a subsetting IF statement. While they both test a condition to determine if SAS should process an observation, they differ as follows :
The subsetting IF statement can be used only in a DATA step. A subsetting IF statement tests the condition after an observation is read into the Program Data Vector (PDV). If the condition is true, SAS continues processing the current observation. Otherwise, the observation is discarded, and processing continues with the next observation.
You can use a WHERE expression in both a DATA step and SAS procedures, as well as in a windowing environment, SCL programs, and as a data set option. A WHERE expression tests the condition before an observation is read into the PDV. If the condition is true, the observation is read into the PDV and processed . If the condition is false, the observation is not read into the PDV, and processing continues with the next observation, which can yield substantial savings when observations contain many variables or very long character variables (up to 32K bytes). Additionally, a WHERE expression can be optimized with an index, and the WHERE expression allows more operators, such as LIKE and CONTAINS.
Note | Although it is generally more efficient to use a WHERE expression and avoid the move to the PDV prior to processing, if the data set contains observations with very few variables, the move to the PDV could be cheap. However, one variable containing 32K bytes of character data is not cheap, even though it is only one variable. |
In most cases, you can use either method. However, the following table provides a list of tasks that require you to use a specific method:
Task | Method |
---|---|
Make the selection in a procedure without using a preceding DATA step | WHERE expression |
Take advantage of the efficiency available with an indexed data set | WHERE expression |
Use one of a group of special operators, such as BETWEEN-AND, CONTAINS, IS MISSING or IS NULL, LIKE, SAME-AND, and Sounds-Like | WHERE expression |
Base the selection on anything other than a variable value that already exists in a SAS data set. For example, you can select a value that is read from raw data, or a value that is calculated or assigned during the course of the DATA step | subsetting IF |
Make the selection at some point during a DATA step rather than at the beginning | subsetting IF |
Execute the selection conditionally | subsetting IF |