Syntax


The following statements are available in the STDIZE procedure.

  • PROC STDIZE < options > ;

    • BY variables ;

    • FREQ variable ;

    • LOCATION variables ;

    • SCALE variables ;

    • VAR variables ;

    • WEIGHT variable ;

The PROC STDIZE statement is required. The BY, LOCATION, FREQ, VAR, SCALE, and WEIGHT statements are described in alphabetical order following the PROC STDIZE statement.

PROC STDIZE Statement

  • PROC STDIZE < options > ;

The PROC STDIZE statement invokes the procedure. You can specify the following options in the PROC STDIZE statement.

Table 66.1: Summary of PROC STDIZE Statement Options

Task

Options

Description

Specify standardization methods

METHOD=

specifies the name of the standardization method

INITIAL=

specifies the method for computing initial estimates for the A estimates

Unstandardize variables

UNSTD

unstandardizes variables when you also specify the METHOD=IN option

Process missing values

NOMISS

omits observations with any missing values from computation

MISSING=

specifies the method or a numeric value for replacing missing values

REPLACE

replaces missing data by zero in the standardized data

REPONLY

replaces missing data by the location measure (does not standardize the data)

Specify data set details

DATA=

specifies the input data set

OUT=

specifies the output data set

OUTSTAT=

specifies the output statistic data set

Specify computational settings

VARDEF=

specifies the variances divisor

NMARKERS=

specifies the number of markers when you also specify PCTLMTD=ONEPASS

MULT=

specifies the constant to multiply each value by after standardizing

ADD=

specifies the constant to add to each value after standardizing and multiplying by the value specified in the MULT= option

FUZZ=

specifies the relative fuzz factor for writing the output

Specify percentiles

PCTLDEF=

specifies the definition of percentiles when you also specify the PCTLMTD=ORD_STAT option

PCTLMTD=

specifies the method used to estimate percentiles

PCTLPTS=

writes observations containing percentiles to the data set specified in the OUTSTAT= option

Normalize scale estimators

NORM

normalizes the scale estimator to be consistent for the standard deviation of a normal distribution

SNORM

normalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution

Specify output

PSTAT

displays the location and scale measures

These options and their abbreviations are described, in alphabetical order, in the remainder of this section.

ADD= c

  • specifies a constant, c , to add to each value after standardizing and multiplying by the value you specify in the MULT= option. The default value is 0.

DATA= SAS-data-set

  • specifies the input data set to be standardized. If you omit the DATA= option, the most recently created data set is used.

FUZZ= c

  • specifies the relative fuzz factor. The default value is 1E-14. For the OUT= data set, the score is computed as follows :

    click to expand
  • For the OUTSTAT= data set and the Location and Scale table, the scale and location values are computed as follows:

    click to expand
  • Otherwise,

    click to expand

INITIAL= method

  • specifies the method for computing initial estimates for the A estimates (ABW, AWAVE, and AHUBER). The following methods are not allowed: INITIAL=ABW, INITIAL=AHUBER, INITIAL=AWAVE, and INITIAL=IN. The default is INITIAL=MAD.

METHOD= name

  • specifies the name of the method for computing location and scale measures. Valid values for name are as follows: MEAN, MEDIAN, SUM, EUCLEN, USTD, STD, RANGE, MIDRANGE, MAXABS, IQR, MAD, ABW, AHUBER, AWAVE, AGK, SPACING, L, and IN.

  • For details on these methods, see the descriptions in the 'Standardization Methods' section on page 4136. The default is METHOD=STD.

MISSING= method

MISSING= value

  • specifies the method (or a numeric value) for replacing missing values. If you omit the MISSING= option, the REPLACE option replaces missing values with the location measure given by the METHOD= option. Specify the MISSING= option when you want to replace missing values with a different value. You can specify any name that is valid in the METHOD= option except the name IN. The corresponding location measure is used to replace missing values.

  • If a numeric value is given, the value replaces missing values after standardizing the data. However, you can specify the REPONLY option with the MISSING= option to suppress standardization for cases in which you want only to replace missing values.

MULT= c

  • specifies a constant, c , by which to multiply each value after standardizing. The default value is 1.

NMARKERS= n

  • specifies the number of markers used when you specify the one-pass algorithm (PCTLMTD=ONEPASS). The value n must be greater than or equal to 5. The default value is 105.

NOMISS

  • omits observations with missing values for any of the analyzed variables from calculation of the location and scale measures. If you omit the NOMISS option, all nonmissing values are used.

NORM

  • normalizes the scale estimator to be consistent for the standard deviation of a normal distribution when you specify the option METHOD=AGK, METHOD=IQR, METHOD=MAD, or METHOD=SPACING.

OUT= SAS-data-set

  • specifies the name of the SAS data set created by PROC STDIZE. The output data set is a copy of the DATA= data set except that the analyzed variables have been standardized. Note that analyzed variables are those specified in the VAR statement or, if there is no VAR statement, all numeric variables not listed in any other statement. See the section 'Output Data Sets' on page 4141 for more information.

  • If you want to create a permanent SAS data set, you must specify a two-level name. (Refer to 'SAS Files' in SAS Language Reference: Concepts for more information on permanent SAS data sets.)

  • If you omit the OUT= option, PROC STDIZE creates an output data set named according to the DATA n convention.

OUTSTAT= SAS-data-set

  • specifies the name of the SAS data set containing the location and scale measures and other computed statistics. See the section 'Output Data Sets' on page 4141 for more information.

PCTLDEF= percentiles

  • specifies which of five definitions is used to calculate percentiles when you specify the option PCTLMTD=ORD_STAT. By default, PCTLDEF=5.

  • Note that the option PCTLMTD=ONEPASS implies a specification of PCTLDEF=5. See the section 'Computational Methods for the PCTLDEF= Option' on page 4140 for details on the PCTLDEF= option.

PCTLMTD=ORD_STAT

PCTLMTD=ONEPASS P2

  • specifies the method used to estimate percentiles. Specify the PCTLMTD=ORD_STAT option to compute the percentiles by the order statistics method. The PCTLMTD=ONEPASS option modifies an algorithm invented by Jain and Chlamtac (1985). See the 'Computing Quantiles' section on page 4139 for more details on this algorithm.

  • The PCTLMTD=ONEPASS option modifies an algorithm invented by Jain and Chlamtac (1985). See the 'Computing Quantiles' section on page 4139 for more details on this algorithm.

PCTLPTS= n

  • writes percentiles to the OUTSTAT= data set. Values of n can be any decimal number between 0 and 100, inclusive.

  • A requested percentile is identified by the _TYPE_ variable in the OUTSTAT= data set with a value of P n . For example, suppose you specify the option PCTLPTS=10, 30. The corresponding observations in the OUTSTAT= data set that contain the 10th and the 30th percentiles would then have values _TYPE_ =P10 and _TYPE_ =P30, respectively.

PSTAT

  • displays the location and scale measures.

REPLACE

  • replaces missing data with the value 0 in the standardized data (this value corresponds to the location measure before standardizing). To replace missing data by other values, see the preceding description of the MISSING= option. You cannot specify both the REPLACE and REPONLY options.

REPONLY

  • replaces missing data only; PROC STDIZE does not standardize the data. Missing values are replaced with the location measure unless you also specify the MISSING= value option, in which case missing values are replaced with value . You cannot specify both the REPLACE and REPONLY options.

SNORM

  • normalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution when you specify the METHOD=SPACING option.

UNSTD

UNSTDIZE

  • unstandardizes variables when you specify the METHOD=IN( ds ) option. The location and scale measures, along with constants for addition and multiplication that the unstandardization is based upon, are identified by the _TYPE_ variable in the ds data set.

    The ds data set must have a _TYPE_ variable and contain the following two observations: a _TYPE_ = ˜LOCATION' observation and a _TYPE_ = ˜SCALE' observation. The variable _TYPE_ can also contain the optional observations, ˜ADD' and ˜MULT'; if these observations are not found in the ds data set, the constants specified in the ADD= and MULT= options (or their default values) are used for unstandardization.

    See the 'OUTSTAT= Data Set' section on page 4141 for details on the statistics that each value of _TYPE_ represents. The formula used for unstandardization is as follows: If the final output value from the previous standardization is calculated as

    click to expand

VARDEF= DF

VARDEF= N

VARDEF= WDF

VARDEF= WEIGHT WGT

  • specifies the divisor to be used in the calculation of variances. By default, VARDEF=DF. The values and associated divisors are as follows.

    Value

    Divisor

    Formula

    DF

    degrees of freedom

    n ˆ’ 1

    N

    number of observations

    n

    WDF

    sum of weights minus 1

    ( & pound ; i w i ) ˆ’ 1

    WEIGHT WGT

    sum of weights

    i w i

BY Statement

  • BY variables ;

You can specify a BY statement with PROC STDIZE to obtain separate standardization for observations in groups defined by the BY variables.

If your DATA= input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the STDIZE procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

When you specify the option METHOD=IN( ds ), the following rules are applied to BY- group processing:

  • If the ds data set does not contain any of the BY variables, the entire DATA= data set is standardized by the location and scale measures (along with the constants for addition and multiplication) in the ds data set.

  • If the ds data set contains some, but not all, of the BY variables or if some BY variables do not have the same type or length in the ds data set that they have in the DATA= data set, PROC STDIZE displays an error message and stops.

  • If all of the BY variables appear in the ds data set with the same type and length as in the DATA= data set, each BY group in the DATA= data set is standardized using the location and scale measures (along with the constants for addition and multiplication) from the corresponding BY group in the ds data set. The BY groups in the ds data set must be in the same order as they appear in the DATA= data set. All BY groups in the DATA= data set must also appear in the ds data set. If you do not specify the NOTSORTED option, some BY groups can appear in the ds data set but not in the DATA= data set; such BY groups are not used in standardizing data.

FREQ Statement

  • FREQ FREQUENCY variable ;

If one variable in the input data set represents the frequency of occurrence for other values in the observation, specify the variable name in a FREQ statement. PROC STDIZE treats the data set as if each observation appeared n times, where n is the value of the FREQ variable for the observation. Nonintegral values of the FREQ variable are truncated to the largest integer less than the FREQ value. If the FREQ variable has a value that is less than 1 or is missing, the observation is not used in the analysis.

LOCATION Statement

  • LOCATION variables ;

The LOCATION statement specifies a list of numeric variables that contain location measures in the input data set specified by the METHOD=IN option.

SCALE Statement

  • SCALE variables ;

The SCALE statement specifies the list of numeric variables containing scale measures in the input data set specified by the METHOD=IN option.

VAR Statement

  • VAR VARIABLES variables ;

The VAR statement lists numeric variables to be standardized. If you omit the VAR statement, all numeric variables not listed in the BY, FREQ, and WEIGHT statements are used.

WEIGHT Statement

  • WGT WEIGHT variable ;

The WEIGHT statement specifies a numeric variable in the input data set with values that are used to weight each observation. Only one variable can be specified.

The WEIGHT variable values can be nonintegers. An observation is used in the analysis only if the value of the WEIGHT variable is greater than zero. The WEIGHT variable applies only when you specify the option METHOD=MEAN, METHOD=SUM, METHOD=EUCLEN, METHOD=USTD, METHOD=STD, METHOD=AGK, or METHOD=L.

PROC STDIZE uses the value of the WEIGHT variable w i , as follows.

The sample mean and (uncorrected) sample variances are computed as

click to expand
click to expand

where w i is the weight value of the i th observation, x i is the value of the i th observation, and d is the divisor controlled by the VARDEF= option (see the VARDEF= option for details).

PROC STDIZE uses the value of the WEIGHT variable to calculate the following statistics:

MEAN

the weighted mean, x w

SUM

the weighted sum, i w i x i

USTD

the weighted uncorrected standard deviation,

STD

the weighted standard deviation,

EUCLEN

the weighted Euclidean length, computed as the square root of the weighted uncorrected sum of squares:

 

AGK

the AGK estimate. This estimate is documented further in the ACECLUS procedure as the METHOD=COUNT option. See the discussion of the WEIGHT statement in Chapter 16, ' The ACECLUS Procedure,' for information on how the WEIGHT variable is applied to the AGK estimate.

L

the L p estimate. This estimate is documented further in the FASTCLUS procedure as the LEAST= option. See the discussion of the WEIGHT statement in Chapter 28, 'The FASTCLUS Procedure,' for information on how the WEIGHT variable is used to compute weighted cluster means. Note that the number of clusters is always 1.




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net