Syntax


  • PROC ROBUSTREG < options >;

    • BY variables ;

    • CLASS variables ;

    • ID variables ;

    • MODEL response = < effects > < / options > ;

    • OUTPUT < OUT = SAS-data-set > < options > ;

    • PERFORMANCE < options > ;

    • TEST 'label' effects ;

    • WEIGHT variable ;

The PROC ROBUSTREG statement invokes the procedure. The METHOD= option in the PROC ROBUSTREG statement selects one of the four estimation methods, M, LTS, S, and MM. By default, Huber M estimation is used. The MODEL statement is required and specifies the variables used in the regression. Main effects and interaction terms can be specified in the MODEL statement, as in the GLM procedure. The CLASS statement specifies which explanatory variables are treated as categorical. These variables are allowed in the MODEL statement only for M estimation, and not for other estimation methods . The ID statement names variables to identify observations in the outlier diagnostics tables. The WEIGHT statement identifies a variable in the input data set whose values are used to weight the observations. The OUTPUT statement creates an output data set containing final weights, predicted values, and residuals. The TEST statement requests robust linear tests for the model parameters. The PERFORMANCE statement tunes the performance of the procedure by using single or multiple processors available on the hardware. In one invocation of PROC ROBUSTREG, multiple OUTPUT and TEST statements are allowed.

PROC ROBUSTREG Statement

  • PROC ROBUSTREG < options > ;

The PROC ROBUSTREG statement invokes the procedure. You can specify the following options in the PROC ROBUSTREG statement.

COVOUT

  • saves the estimated covariance matrix in the OUTEST= data set for M estimation and MM estimation.

DATA = SAS-data-set

  • specifies the input SAS data set used by PROC ROBUSTREG. By default, the most recently created SAS data set is used.

FWLS

  • requests that final weighted least squares estimators be computed.

INEST = SAS-data-set

  • specifies an input SAS data set that contains initial estimates for all the parameters in the model. See the section 'INEST= Data Set' on page 4011 for a detailed description of the contents of the INEST= data set.

ITPRINT

  • displays the iteration history for the iteratively reweighted least squares algorithm used by M and MM estimation. You can also use this option in the MODEL statement.

NAMELEN = n

  • specifies the length of effect names in tables and output data sets to be n characters , where n is a value between 20 and 200. The default length is 20 characters.

ORDER=DATA FORMATTED FREQ INTERNAL

  • specifies the sorting order for the levels of the classification variables (specified in the CLASS statement). This ordering determines which parameters in the model correspond to each level in the data. The following table explains how PROC ROBUSTREG interprets values of the ORDER= option.

    Table 62.1: Options for Order

    Value of ORDER=

    Levels Sorted By

    DATA

    order of appearance in the input data set

    FORMATTED

    formatted value

    FREQ

    descending frequency count; levels with the most observations come first in the order

    INTERNAL

    unformatted value

    By default, ORDER=FORMATTED. For FORMATTED and INTERNAL, the sort order is machine dependent. For more information on sorting order, refer to the chapter titled 'The SORT Procedure' in the SAS Procedures Guide .

OUTEST = SAS-data-set

  • specifies an output SAS data set containing the parameter estimates, and, if the COVOUT option is specified, the estimated covariance matrix. See the section 'OUTEST= Data Set' on page 4011 for a detailed description of the contents of the OUTEST= data set.

SEED = number

  • specifies the seed for the random number generator used to randomly select the subgroups and subsets for LTS and S estimation. By default or you specify zero, the ROBUSTREG procedure generates a seed between one and one billion.

METHOD = method type <( options )>

  • specifies the estimation method and options specify some additional options for the estimation method. PROC ROBUSTREG provides four estimation methods: M estimation, LTS estimation, S estimation, and MM estimation. The default method is M estimation.

  • Since the LTS and S methods use subsampling algorithms, it is not suitable to apply these methods to an analysis with continuous independent variables which have only a few nonzero values or a few nonzero values within one BY group .

Options with METHOD=M

With METHOD=M, you can specify the following additional options :

ASYMPCOV = H1 H2 H3

  • specifies the type of asymptotic covariance computed for the M estimate. The three types are described in the section ' Asymptotic Covariance and Confidence Intervals' on page 3997. By default, ASYMPCOV= H1.

CONVERGENCE = criterion < ( EPS = value ) >

  • specifies a convergence criterion for the M estimate.

    Table 62.2: Options to Specify Convergence Criteria

    Type

    Option

    residual

    CONVERGENCE= RESID

    weight

    CONVERGENCE= WEIGHT

    coefficient

    CONVERGENCE= COEF

  • By default, CONVERGENCE = COEF. You can specify the precision of the convergence can be specified with the EPS= option. By default, EPS=1.E ˆ’ 8.

MAXITER = n

  • sets the maximum number of iterations during the parameter estimation. By default, MAXITER=1000.

SCALE = scale type value

  • specifies the scale parameter or a method for estimating the scale parameter.

    Table 62.3: Options to Specify Scale

    Scale

    Option

    Default d

    Median estimate

    SCALE=MED

     

    Tukey estimate

    SCALE=TUKEY<(D=d)>

    2.5

    Huber estimate

    SCALE=HUBER<(D=d)>

    2.5

    Fixed constant

    SCALE= value

     
  • By default, SCALE = MED.

WF WEIGHTFUNCTION = function type

  • specifies the weight function used for the M estimate. The ROBUSTREG procedure provides ten weight functions, which are listed in the following table. You can specify the parameters in these functions with the A=, B=, and C= options. These functions are described in the section 'M Estimation' on page 3993. The default weight function is bisquare.

Table 62.4: Options to Specify Weight Functions

Weight Function

Option

Default a, b, c

andrews

WF = ANDREWS<(C=c)>

1 . 339

bisquare

WF = BISQUARE<(C=c)>

4 . 685

cauchy

WF = CAUCHY<(C=c)>

2 . 385

fair

WF = FAIR<(C=c)>

1 . 4

hampel

WF = HAMPEL<( <A=a> <B=b> <C=c>)>

2 , 4 , 8

huber

WF = HUBER<(C=c)>

1 . 345

logistic

WF = LOGISTIC<(C=c)>

1 . 205

median

WF = MEDIAN<(C=c)>

. 01

talworth

WF = TALWORTH<(C=c)>

2 . 795

welsch

WF = WELSCH<(C=c)>

2 . 985

Options with METHOD=LTS

With METHOD=LTS, you can specify the following additional options :

CSTEP = n

  • specifies the number of C-steps for the LTS estimate. See the section 'LTS Estimate' on page 4000 for how the default value is determined.

IADJUST = ALL NONE

  • requests (IADJUST=ALL) or suppresses (IADJUST=NONE) the intercept adjustment for all estimates in the LTS-algorithm. By default, the intercept adjustment is used for data sets with less than 10000 observations. See the section 'Algorithm' on page 4001 for details.

H = n

  • specifies the quantile for the LTS estimate. See the section 'LTS Estimate' on page 4000 for how the default value is determined.

NBEST= n

  • specifies the number of best solutions kept for each subgroup during the computation of the LTS estimate. The default number is 10, which is the maximum number allowed.

NREP= n

  • specifies the number of repeats of least squares fit in subgroups during the computation of the LTS estimate See the section 'LTS Estimate' on page 4000 for how the default number is determined.

SUBANALYSIS

  • requests a display of the subgrouping information and parameter estimates within subgroups. This option may generate the following ODS tables:

    Table 62.5: ODS Tables Available with SUBANALYSIS

    ODS Table Name

    Description

    BestEstimates

    Best final estimates for LTS

    BestSubEstimates

    Best estimates for each subgroup

    CStep

    C-Step information for LTS

    Groups

    Grouping information for LTS

  • Some of these tables are data dependent.

SUBGROUPSIZE= n

  • specifies the data set size of the subgroups in the computation of the LTS estimate. The default number is 300.

Options with METHOD=S

With METHOD=S, you can specify the following additional options :

ASYMPCOV= H1 H2 H3 H4

  • specifies the type of asymptotic covariance computed for the S estimate. The four types are described in the section 'Asymptotic Covariance and Confidence Intervals' on page 4005. By default, ASYMPCOV= H4.

CHIF= TUKEY YOHAI

  • specifies the function for the S estimate. PROC ROBUSTREG provides two functions, Tukey's BISQUARE function and Yohai's OPTIMAL function, which you can request with CHIF=TUKEY and CHIF=YOHAI, respectively. The default is Tukey's bisquare function.

EFF= value

  • specifies the efficiency for the S estimate. The parameter k in the function is determined by this efficiency. The default efficiency is determined such that the consistent S estimate has the breakdown value of 25%.

MAXITER= n

  • sets the maximum number of iterations for computing the scale parameter of the S estimate. By default, MAXITER=1000.

NREP= n

  • specifies the number of repeats of subsampling in the computation of the S estimate. See the section 'Algorithm' on page 4004 for how the default number of repeats is determined.

NOREFINE

  • suppresses the refinement for the S estimate. See the section 'Algorithm' on page 4004 for details.

SUBSETSIZE= n

  • specifies the size of the subset for the S estimate. See the section 'Algorithm' on page 4004 for how its default value is determined.

TOLERANCE= value

  • specifies the tolerance for the S estimate of the scale. The default value is .001.

Options with METHOD=MM

With METHOD=MM, you can specify the following additional options :

ASYMPCOV= H1 H2 H3 H4

  • specifies the type of asymptotic covariance computed for the MM estimate. The four types are described in the 'Details' section. By default, ASYMPCOV= H4.

BIASTEST<(ALPHA= number )>

  • requests the bias test for the final MM estimate. See the section 'Bias Test' on page 4008 for details about this test.

CHIF= TUKEY YOHAI

  • selects the function for the MM estimate. PROC ROBUSTREG provides two functions: Tukey's BISQUARE function and Yohai's OPTIMAL function, which you can request with CHIF=TUKEY and CHIF=YOHAI, respectively. The default is Tukey's bisquare function. This function is also used by the initial S estimate if you specify the INITEST=S option.

CONVERGENCE= criterion < (EPS= number ) >

  • specifies a convergence criterion for the MM estimate.

    Table 62.6: Options to Specify Convergence Criteria

    Type

    Option

    residual

    CONVERGENCE= RESID

    weight

    CONVERGENCE= WEIGHT

    coefficient

    CONVERGENCE= COEF

  • By default, CONVERGENCE = COEF. You can specify the precision of the convergence with the EPS= option. By default, EPS=1.E ˆ’ 8.

EFF= value

  • specifies the efficiency for the MM estimate. The parameter k 1 in the function is determined by this efficiency. The default efficiency is set to 85%, which corresponds to k 1 =3.440 for CHIF=TUKEY or k 1 =0.868 for CHIF=YOHAI.

INITH= n

  • specifies the integer h for the initial LTS estimator used by the MM estimator . See the section 'Algorithm' on page 4007 for how to specify h and how the default is determined.

INITEST= LTS S

  • specifies the initial estimator for the MM estimator. By default, the LTS estimator is used as the initial estimator for the MM estimator.

K0= number

  • specifies the parameter k in the function for the MM estimate. For CHIF=TUKEY, the default is k =2.9366. For CHIF=YOHAI, the default is k =0.7405.These default values correspond to the 25% breakdown value of the MM estimator.

MAXITER= n

  • sets the maximum number of iterations during the parameter estimation. By default, MAXITER=1000.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC ROBUSTREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the ROBUSTREG procedure. The NOTSORTED option does not mean that the data are unsorted, but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the SAS Procedures Guide .

CLASS Statement

  • CLASS variables ;

Explanatory variables that are classification variables rather than quantitative numeric variables must be listed in the CLASS statement. For each explanatory variable listed in the CLASS statement, indicator variables are generated for the levels assumed by the CLASS variable. If the CLASS statement is used, it must appear before the MODEL statement.

ID Statement

  • ID variables ;

When the diagnostics table is requested with the DIAGNOSTICS option in the MODEL statement, the variables listed in the ID statement are displayed besides the observation number. These variables can be used to identify each observation. If the ID statement is omitted, the observation number is used to identify the observations.

MODEL Statement

  • label: > MODEL response = < effects > < / options > ;

Main effects and interaction terms can be specified in the MODEL statement, as in the GLM procedure. Class variables are not allowed in the MODEL statement when you specify MM estimation or LTS estimation using the METHOD= option in the PROC statement.

The optional label is used to label output from the matching MODEL statement.

Options

You can specify the following options for the model fit.

ALPHA= value

  • specifies the significance level for the confidence intervals for regression parameters. The value must be between 0 and 1. By default, ALPHA = 0.05.

CORRB

  • produces the estimated correlation matrix of the parameter estimates.

COVB

  • produces the estimated covariance matrix of the parameter estimates.

CUTOFF= value

  • specifies the multiplier of the cutoff value for outlier detection. By default, CUTOFF =3.

DIAGNOSTICS<(ALL)>

  • requests the outlier diagnostics. By default, only observations identified as outliers or leverage points are displayed. To request that all observations be displayed, specify the ALL option.

ITPRINT

  • displays the iteration history for the iteratively reweighted least squares algorithm used by M and MM estimation. You can also use this option in the PROC statement.

LEVERAGE<(CUTOFF= value CUTOFFALPHA= value QUANTILE= n )>

  • requests an analysis of leverage points for the continuous covariates. The results are added to the diagnostics table, which you can request with the DIAGNOSTICS option in the MODEL statement. You can specify the cutoff value for leverage point detection with the CUTOFF= option. The default cutoff value is where ± can be specified with the CUTOFFALPHA= option. By default, ± = . 025. You can use the QUANTILE= option to specify the quantile to be minimized for the MCD algorithm used for the leverage point analysis. By default, QUANTILE=[(3 n + p + 1)/4], where n is the number of observations and p is the number of independent variables. The LEVERAGE option is ignored if the model includes class variables as covariates.

  • Since the MCD algorithm uses subsampling, it is not suitable to apply the leverage point analysis to continuous variables which have only a few nonzero values or a few nonzero values within one BY group.

NOGOODFIT

  • suppresses the computation of goodness-of-fit statistics.

NOINT

  • specifies no-intercept regression.

SINGULAR= value

  • specifies the tolerance for testing singularity of the information matrix and the crossproducts matrix for the initial least-squares estimates. Roughly, the test requires that a pivot be at least this value times the original diagonal value. By default, SINGULAR = 1.E ˆ’ 12.

OUTPUT Statement

  • OUTPUT < OUT= SAS-data-set > keyword = name <... keyword = name > ;

The OUTPUT statement creates an output SAS data set containing statistics calculated after fitting the model. At least one specification of the form keyword = name is required.

All variables in the original data set are included in the new data set, along with the variables created with keyword options in the OUTPUT statement. These new variables contain fitted values and estimated quantiles. If you want to create a permanent SAS data set, you must specify a two-level name (refer to SAS Language Reference: Concepts for more information on permanent SAS data sets).

The following specifications can appear in the OUTPUT statement:

OUT= SAS-data-set

specifies the new data set. By default, the procedure uses the DATA n convention to name the new data set.

keyword=name

specifies the statistics to include in the output data set and gives names to the new variables. Specify a keyword for each desired statistic (see the following list), an equal sign, and the variable to contain the statistic.

The keywords allowed and the statistics they represent are as follows :

LEVERAGE

specifies a variable to indicate leverage points. To include this variable in the OUTPUT data set, you must specify the LEVERAGE option in the PROC statement. See the section 'Leverage Point and Outlier Detection' on page 4010 for how to define LEVERAGE.

OUTLIER

specifies a variable to indicate outliers. See the section 'Leverage Point and Outlier Detection' on page 4010 for how to define OUTLIER.

PREDICTED P

specifies a variable to contain the estimated response.

RESIDUAL R

specifies a variable to contain the residuals

SRESIDUAL

SR specifies a variable to contain the standardized residuals

STDP

specifies a variable to contain the estimates of the standard errors of the estimated response.

WEIGHT

specifies a variable to contain the computed final weights.

PERFORMANCE Statement

You use the PERFORMANCE statement to specify options that tune the performance of PROC ROBUSTREG. By default these options are chosen to maximize performance. See Chen (2002) for some empirical results.

  • PERFORMANCE < options > ;

The following option is available:

CPUCOUNT= n

  • specifies the number of threads to use in the computation of LTS or S estimation (initial LTS or S estimation for MM estimation). By default this will be equal to the number of processors on the hardware.

TEST Statement

<label:> TEST effects ;

With M estimation and MM estimation, the TEST statement provides a means for obtaining a test for the canonical linear hypothesis concerning the model parameters:

click to expand

where p is the total number of parameters in the model, and q is the number of parameters for testing of significance.

PROC ROBUSTREG provides two kinds of robust tests: the -test and the -test. They are described in the 'Details' section. No test is available for LTS and S estimation.

The optional label is used to label output from the corresponding TEST statement.

WEIGHT Statement

  • WEIGHT variable ;

The WEIGHT statement specifies a weight variable in the input data set.

If you want to use fixed weights for each observation in the input data set, place the weights in a variable in the data set and specify the name in a WEIGHT statement. The values of the WEIGHT variable can be nonintegral and are not truncated. Observations with nonpositive or missing values for the weight variable do not contribute to the fit of the model.




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net