Statements


BY

Orders the output according to the BY groups.

See also: Creating Titles That Contain BY- Group Information on page 20

BY <DESCENDING> variable-1

  • < <DESCENDING> variable-n >

  • <NOTSORTED>;

Required Arguments

variable

  • specifies the variable that the procedure uses to form BY groups. You can specify more than one variable. If you do not use the NOTSORTED option in the BY statement, then the observations in the data set must either be sorted by all the variables that you specify, or they must be indexed appropriately. Variables in a BY statement are called BY variables .

Options

DESCENDING

  • specifies that the observations are sorted in descending order by the variable that immediately follows the word DESCENDING in the BY statement.

NOTSORTED

  • specifies that observations are not necessarily sorted in alphabetic or numeric order. The observations are grouped in another way, for example, chronological order.

  • The requirement for ordering or indexing observations according to the values of BY variables is suspended for BY-group processing when you use the NOTSORTED option. In fact, the procedure does not use an index if you specify NOTSORTED. The procedure defines a BY group as a set of contiguous observations that have the same values for all BY variables. If observations with the same values for the BY variables are not contiguous, then the procedure treats each contiguous set as a separate BY group.

Note: You cannot use the NOTSORTED option in a PROC SORT step.

Note: You cannot use the GROUPFORMAT option, which is available in the BY statement in a DATA step, in a BY statement in any PROC step.

BY-Group Processing

Procedures create output for each BY group. For example, the elementary statistics procedures and the scoring procedures perform separate analyses for each BY group. The reporting procedures produce a report for each BY group.

Note: All base SAS procedures except PROC PRINT process BY groups independently. PROC PRINT can report the number of observations in each BY group as well as the number of observations in all BY groups. Similarly, PROC PRINT can sum numeric variables in each BY group and across all BY groups.

You can use only one BY statement in each PROC step. When you use a BY statement, the procedure expects an input data set that is sorted by the order of the BY variables or one that has an appropriate index. If your input data set does not meet these criteria, then an error occurs. Either sort it with the SORT procedure or create an appropriate index on the BY variables.

Depending on the order of your data, you may need to use the NOTSORTED or DESCENDING option in the BY statement in the PROC step. For more information on

  • the BY statement, see SAS Language Reference: Dictionary .

  • PROC SORT, see Chapter 44, The SORT Procedure, on page 1017.

  • creating indexes, see INDEX CREATE Statement on page 341.

Formatting BY-Variable Values

When a procedure is submitted with a BY statement, the following actions are taken with respect to processing of BY groups:

  1. The procedure determines whether the data is sorted by the internal (unformatted) values of the BY variable(s).

  2. The procedure determines whether a format has been applied to the BY variable(s). If the BY variable is numeric and has no user -applied format, then the BEST12. format is applied for the purpose of BY-group processing.

  3. The procedure continues adding observations to the current BY group until both the internal and the formatted values of the BY variable(s) change.

This process can have unexpected results if, for instance, nonconsecutive internal BY values share the same formatted value. In this case, the formatted value is represented in different BY groups. Alternatively, if different consecutive internal BY values share the same formatted value, then these observations are grouped into the same BY group.

Base SAS Procedures That Support the BY Statement

  • CALENDAR

  • CHART

  • COMPARE

  • CORR

  • FREQ

  • MEANS

  • PLOT

  • PRINT

  • RANK

  • REPORT (nonwindowing environment only)

  • SORT (required)

  • STANDARD

  • SUMMARY

  • TABULATE

  • TIMEPLOT

  • TRANSPOSE

  • UNIVARIATE

Note: In the SORT procedure, the BY statement specifies how to sort the data. With the other procedures, the BY statement specifies how the data are currently sorted.

Example

This example uses a BY statement in a PROC PRINT step. There is output for each value of the BY variable, Year. The DEBATE data set is created in Example: Temporarily Dissociating a Format from a Variable on page 29.

 options nodate pageno=1 linesize=64          pagesize=40;  proc print data=debate noobs;     by year;     title Printing of Team Members';     title2 'by Year';  run; 
 Printing of Team Members                  1                              by Year  ------------------------ Year=Freshman -------------------------                     Name      Gender     GPA                    Capiccio      m       3.598                    Tucker        m       3.901  ------------------------ Year=Sophomore ------------------------                     Name      Gender     GPA                    Bagwell       f       3.722                    Berry         m       3.198                    Metcalf       m       3.342  ------------------------- Year=Junior --------------------------                   Name      Gender       GPA                    Gold        f         3.609                    Gray        f         3.177                    Syme        f         3.883  ------------------------- Year=Senior --------------------------                   Name        Gender       GPA                    Baglione      f         4.000                    Carr          m         3.750                    Hall          m         3.574                    Lewis         m         3.421 

FREQ

Treats observations as if they appear multiple times in the input data set.

Tip: You can use a WEIGHT statement and a FREQ statement in the same step of any procedure that supports both statements.

FREQ variable ;

Required Arguments

variable

  • specifies a numeric variable whose value represents the frequency of the observation. If you use the FREQ statement, then the procedure assumes that each observation represents n observations, where n is the value of variable . If variable is not an integer, then SAS truncates it. If variable is less than 1 or is missing, then the procedure does not use that observation to calculate statistics. If a FREQ statement does not appear, then each observation has a default frequency of 1.

    The sum of the frequency variable represents the total number of observations.

Procedures That Support the FREQ Statement

  • CORR

  • MEANS/SUMMARY

  • REPORT

  • STANDARD

  • TABULATE

  • UNIVARIATE

Example

The data in this example represent a ship s course and speed (in nautical miles per hour), recorded every hour . The frequency variable, Hours, represents the number of hours that the ship maintained the same course and speed. Each of the following PROC MEANS steps calculates average course and speed. The different results demonstrate the effect of using Hours as a frequency variable.

The following PROC MEANS step does not use a frequency variable:

 options nodate pageno=1 linesize=64 pagesize=40;  data track;     input Course Speed Hours @@;     datalines;  30  4  8 50 7 20  75 10 30 30 8 10  80  9 22 20 8 25  83 11  6 20 6 20  ;  proc means data=track maxdec=2 n mean;     var course speed;     title 'Average Course and Speed';  run; 

Without a frequency variable, each observation has a frequency of 1, and the total number of observations is 8.

 Average Course and Speed                      1       The MEANS Procedure  Variable    N           Mean  ----------------------------- Course      8          48.50  Speed       8           7.88  ----------------------------- 

The second PROC MEANS step uses Hours as a frequency variable:

 proc means data=track maxdec=2 n mean;     var course speed;     freq hours;     title 'Average Course and Speed';  run; 

When you use Hours as a frequency variable, the frequency of each observation is the value of Hours, and the total number of observations is 141 (the sum of the values of the frequency variable).

 Average Course and Speed            1             The MEANS Procedure  Variable                N           Mean  ---------------------------------------- Course                141          49.28  Speed                 141           8.06  ---------------------------------------- 

QUIT

Executes any statements that have not executed and ends the procedure.

QUIT ;

Procedures That Support the QUIT Statement

  • CATALOG

  • DATASETS

  • PLOT

  • PMENU

  • SQL

WEIGHT

Specifies weights for analysis variables in the statistical calculations.

Tip: You can use a WEIGHT statement and a FREQ statement in the same step of any procedure that supports both statements.

WEIGHT variable ;

Required Arguments

variable

  • specifies a numeric variable whose values weight the values of the analysis variables. The values of the variable do not have to be integers. The behavior of the procedure when it encounters a nonpositive weight variable value is as follows:

    Weight value

    The procedure

    counts the observation in the total number of observations

    less than 0

    converts the weight value to zero and counts the observation in the total number of observations

    missing

    excludes the observation from the analysis

    Different behavior for nonpositive values is discussed in the WEIGHT statement syntax under the individual procedure.

    Prior to Version 7 of SAS, no base SAS procedure excluded the observations with missing weights from the analysis. Most SAS/STAT procedures, such as PROC GLM, have always excluded not only missing weights but also negative and zero weights from the analysis. You can achieve this same behavior in a base SAS procedure that supports the WEIGHT statement by using the EXCLNPWGT option in the PROC statement.

    The procedure substitutes the value of the WEIGHT variable for , which appears in Keywords and Formulas on page 1354.

Procedures That Support the WEIGHT Statement

  • CORR

  • FREQ

  • MEANS/SUMMARY

  • REPORT

  • STANDARD

  • TABULATE

  • UNIVARIATE

Note: In PROC FREQ, the value of the variable in the WEIGHT statement represents the frequency of occurrence for each observation. See the PROC FREQ documentation in Volume 3 of this book for more information.

Calculating Weighted Statistics

The procedures that support the WEIGHT statement also support the VARDEF= option, which lets you specify a divisor to use in the calculation of the variance and standard deviation.

By using a WEIGHT statement to compute moments, you assume that the i th observation has a variance that is equal to ƒ 2 /w i . When you specify VARDEF=DF (the default), the computed variance is a weighted least squares estimate of ƒ 2 . Similarly, the computed standard deviation is an estimate of ƒ . Note that the computed variance is not an estimate of the variance of the i th observation, because this variance involves the observation s weight which varies from observation to observation.

If the values of your variable are counts that represent the number of occurrences of each observation, then use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers. (The FREQ statement truncates any noninteger values.) The variance that is computed with a FREQ variable is an estimate of the common variance, ƒ 2 , of the observations.

Note: If your data come from a stratified sample where the weights represent the strata weights, then neither the WEIGHT statement nor the FREQ statement provides appropriate stratified estimates of the mean, variance, or variance of the mean. To perform the appropriate analysis, consider using PROC SURVEYMEANS, which is a SAS/STAT procedure that is documented in the SAS/STAT User s Guide .

Weighted Statistics Example

As an example of the WEIGHT statement, suppose 20 people are asked to estimate the size of an object 30 cm wide. Each person is placed at a different distance from the object. As the distance from the object increases , the estimates should become less precise.

The SAS data set SIZE contains the estimate (ObjectSize) in centimeters at each distance (Distance) in meters and the precision (Precision) for each estimate. Notice that the largest deviation (an overestimate by 20 cm) came at the greatest distance (7.5 meters from the object). As a measure of precision, 1/Distance, gives more weight to estimates that were made closer to the object and less weight to estimates that were made at greater distances.

The following statements create the data set SIZE:

 options nodate pageno=1 linesize=64 pagesize=60;  data size;     input Distance ObjectSize @@;     Precision=1/distance;     datalines;  1.5 30 1.5 20 1.5 30 1.5 25  3   43 3   33 3   25 3   30  4.5 25 4.5 36 4.5 48 4.5 33  6   43 6   36 6   23 6   48  7.5 30 7.5 25 7.5 50 7.5 38  ; 

The following PROC MEANS step computes the average estimate of the object size while ignoring the weights. Without a WEIGHT variable, PROC MEANS uses the default weight of 1 for every observation. Thus, the estimates of object size at all distances are given equal weight. The average estimate of the object size exceeds the actual size by 3.55 cm.

 proc means data=size maxdec=3 n mean var stddev;     var objectsize;     title1 'Unweighted Analysis of the SIZE Data Set';  run; 
 Unweighted Analysis of the SIZE Data Set          1                 The MEANS Procedure            Analysis Variable : ObjectSize   N            Mean        Variance        Std Dev  -------------------------------------------------  20         33.550          80.892          8.994  ------------------------------------------------- 

The next two PROC MEANS steps use the precision measure (Precision) in the WEIGHT statement and show the effect of using different values of the VARDEF= option. The first PROC step creates an output data set that contains the variance and standard deviation. If you reduce the weighting of the estimates that are made at greater distances, the weighted average estimate of the object size is closer to the actual size.

 proc means data=size maxdec=3 n mean var stddev;     weight precision;     var objectsize;     output out=wtstats var=Est_SigmaSq std=Est_Sigma;     title1 'Weighted Analysis Using Default VARDEF=DF';  run;  proc means data=size maxdec=3 n mean var std                       vardef=weight;     weight precision;     var objectsize;     title1 'Weighted Analysis Using VARDEF=WEIGHT';  run; 

In the first PROC MEANS step, the variance is an estimate of ƒ 2 , where the variance of the i th observation is assumed to be var ( x i ) = ƒ / w i and w i is the weight for the i th observation. In the second PROC MEANS step, the computed variance is an estimate of ( n _ 1/ n ) ƒ 2 / , where is the average weight. For large n, this is an approximate estimate of the variance of an observation with average weight.

 Weighted Analysis Using Default VARDEF=DF          1                 The MEANS Procedure            Analysis Variable : ObjectSize   N            Mean        Variance        Std Dev  -------------------------------------------------  20         31.088          20.678          4.547  -------------------------------------------------- 
 Weighted Analysis Using VARDEF=WEIGHT            2                 The MEANS Procedure            Analysis Variable : ObjectSize   N            Mean        Variance        Std Dev  -------------------------------------------------  20         31.088          64.525          8.033  ------------------------------------------------- 

The following statements create and print a data set with the weighted variance and weighted standard deviation of each observation. The DATA step combines the output data set that contains the variance and the standard deviation from the weighted analysis with the original data set. The variance of each observation is computed by dividing Est_SigmaSq, the estimate of ƒ 2 from the weighted analysis when VARDEF=DF, by each observation s weight (Precision). The standard deviation of each observation is computed by dividing Est_Sigma, the estimate of ƒ from the weighted analysis when VARDEF=DF, by the square root of each observation s weight (Precision).

 data wtsize(drop=_freq_ _type_);     set size;     if _n_=1 then set wtstats;     Est_VarObs=est_sigmasq/precision;     Est_StdObs=est_sigma/sqrt(precision);  proc print data=wtsize noobs;     title 'Weighted Statistics';     by distance;     format est_varobs est_stdobs            est_sigmasq est_sigma precision 6.3;  run; 
 Weighted Statistics                     4  ------------------------- Distance=1.5 ------------------------   Object                  Est_        Est_     Est_     Est_     Size    Precision     SigmaSq     Sigma    VarObs   StdObs      30       0.667       20.678      4.547    31.017    5.569      20       0.667       20.678      4.547    31.017    5.569      30       0.667       20.678      4.547    31.017    5.569      25       0.667       20.678      4.547    31.017    5.569  -------------------------- Distance=3 -------------------------   Object                  Est_        Est_     Est_     Est_     Size    Precision     SigmaSq     Sigma    VarObs   StdObs      43       0.333       20.678      4.547    62.035    7.876      33       0.333       20.678      4.547    62.035    7.876      25       0.333       20.678      4.547    62.035    7.876      30       0.333       20.678      4.547    62.035    7.876  ------------------------- Distance=4.5 ------------------------   Object                  Est_        Est_     Est_     Est_     Size    Precision     SigmaSq     Sigma    VarObs   StdObs      25       0.222       20.678      4.547    93.052    9.646      36       0.222       20.678      4.547    93.052    9.646      48       0.222       20.678      4.547    93.052    9.646      33       0.222       20.678      4.547    93.052    9.646  -------------------------- Distance=6 -------------------------   Object                  Est_        Est_     Est_     Est_     Size    Precision     SigmaSq     Sigma    VarObs   StdObs      43       0.167       20.678      4.547    124.07   11.139      36       0.167       20.678      4.547    124.07   11.139      23       0.167       20.678      4.547    124.07   11.139      48       0.167       20.678      4.547    124.07   11.139  ------------------------- Distance=7.5 ------------------------   Object                  Est_        Est_     Est_     Est_     Size    Precision     SigmaSq     Sigma    VarObs   StdObs      30       0.133       20.678      4.547    155.09   12.453      25       0.133       20.678      4.547    155.09   12.453      50       0.133       20.678      4.547    155.09   12.453      38       0.133       20.678      4.547    155.09   12.453 

WHERE

Subsets the input data set by specifying certain conditions that each observation must meet before it is available for processing.

WHERE where-expression ;

Required Arguments

where-expression

  • is a valid arithmetic or logical expression that generally consists of a sequence of operands and operators. See SAS Language Reference: Dictionary for more information on where processing.

Procedures That Support the WHERE Statement

You can use the WHERE statement with any of the following base SAS procedures that read a SAS data set:

  • CALENDAR

  • CHART

  • COMPARE

  • CORR

  • DATASETS (APPEND statement)

  • FREQ

  • MEANS/SUMMARY

  • PLOT

  • PRINT

  • RANK

  • REPORT

  • SORT

  • SQL

  • STANDARD

  • TABULATE

  • TIMEPLOT

  • TRANSPOSE

  • UNIVARIATE

Details

  • The CALENDAR and COMPARE procedures and the APPEND statement in PROC DATASETS accept more than one input data set. See the documentation for the specific procedure for more information.

  • To subset the output data set, use the WHERE= data set option:

     proc report data=debate nowd              out=onlyfr(where=(year='1'));  run; 

    For more information on WHERE=, see SAS Language Reference: Dictionary .

Example

In this example, PROC PRINT prints only those observations that meet the condition of the WHERE expression. The DEBATE data set is created in Example: Temporarily Dissociating a Format from a Variable on page 29.

 options nodate pageno=1 linesize=64          pagesize=40;  proc print data=debate noobs;     where gpa>3.5;     title 'Team Members with a GPA';     title2 'Greater than 3.5';  run; 
 Team Members with a GPA                        1              Greater than 3.5  Name        Gender    Year           GPA  Capiccio      m       Freshman      3.598  Tucker        m       Freshman      3.901  Bagwell       f       Sophomore     3.722  Gold          f       Junior        3.609  Syme          f       Junior        3.883  Baglione      f       Senior        4.000  Carr          m       Senior        3.750  Hall          m       Senior        3.574 



Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net