This section describes the statistical and computational aspects of the ROBUSTREG procedure. The following notation is used throughout this section.
Let X =( x ij ) denote an n — p matrix, y =( y 1 ,...,y n ) T a given n -vector of responses, and =( 1 ,..., p ) T an unknown p -vector of parameters or coefficients whose components are to be estimated. The matrix X is called the design matrix. Consider the usual linear model
where e =( e 1 ,...,e n ) T is an n -vector of unknown errors. It is assumed that (for a given X ) the components e i of e are independent and identically distributed according to a distribution L ( · / ƒ ), where ƒ is a scale parameter (usually unknown). The vector of residuals for a given value of is denoted by r =( r 1 , ...,rn ) T and the i th row of the matrix X is denoted by .
M estimation in the context of regression was first introduced by Huber (1973) as an result of making the least squares approach robust. Although M estimators are not robust with respect to leverage points, they are popular in applications where leverage points are not an issue.
Instead of minimizing a sum of squares of the residuals, a Huber-type M estimator M of minimizes a sum of less rapidly increasing functions of the residuals:
where r = y ˆ’ X . For the ordinary least squares estimation, is the quadratic function.
If ƒ is known, by taking derivatives with respect to , M is also a solution of the system of p equations:
where ˆ = ² . If is convex, M is the unique solution.
The ROBUSTREG procedure solves this system by using iteratively reweighted least squares (IRLS). The weight function w ( x ) is defined as
The ROBUSTREG procedure provides ten kinds of weight functions (corresponding to ten -functions) through the WEIGHTFUNCTION= option in the MODEL statement. See the section 'Weight Functions' on page 3995 for a complete discussion. You can specify the scale parameter ƒ with the SCALE= option in the PROC statement.
If ƒ is unknown, both and ƒ are estimated by minimizing the function
The algorithm proceeds by alternately improving in a location step and in a scale step.
For the scale step, three methods are available to estimate ƒ , which you can select with the SCALE= option.
(SCALE=HUBER<(D=d)>) Compute by the iteration
where
is the Huber function and is the Huber constant (refer to Huber 1981, p. 179). You can specify d with the D= option. By default, d =2 . 5.
(SCALE=TUKEY<(D=d)>) Compute by solving the supplementary equation
where
Here is Tukey's biweight function, and ² = ˆ« d ( s ) d ( s ) is the constant such that the solution is asymptotically consistent when L ( ·/ ƒ ) = ( ·) (refer to Hampel et. al. 1986, p. 149). You can specify d with the D= option. By default, d =2.5.
(SCALE=MED) Compute by the iteration
where ² = ˆ’ 1 (.75) is the constant such that the solution is asymptotically consistent when L ( ·/ ƒ )= ( ·) (refer to Hampel et. al. 1986, p. 312).
Note that SCALE = MED is the default.
The basic algorithm for computing M estimates for regression is iteratively reweighted least squares (IRLS). As the name suggests, a weighted least squares fit is carried out inside an iteration loop. For each iteration, a set of weights for the observations is used in the least squares fit. The weights are constructed by applying a weight function to the current residuals. Initial weights are based on residuals from an initial fit. The ROBUSTREG procedure uses the unweighted least squares fit as a default initial fit. The iteration terminates when a convergence criterion is satisfied. The maximum number of iterations is set to 1000. You can specify the weight function and the convergence criteria.
You can specify the weight function for M estimation with the WEIGHTFUNCTION= option. The ROBUSTREG procedure provides ten weight functions. By default, the procedure uses the bisquare weight function. In most cases, M estimates are more sensitive to the parameters of these weight functions than to the type of the weight function. The median weight function is not stable and is seldom recommended in data analysis; it is included in the procedure for completeness. You can specify the parameters for these weight functions. Except for the hampel and median weight functions, default values for these parameters are defined such that the corresponding M estimates have 95% asymptotic efficiency in the location model with the Gaussian distribution (see Holland and Welsch (1977)).
The following list shows the weight functions available.
andrews | | |
bisquare | | |
cauchy | | |
fair | | |
hampel | | |
huber | | |
logistic | | |
median | | |
talworth | | |
welsch | | |
See Table 62.4 on page 3985 for the default values of the constants in these weight functions.
The following convergence criteria are available in PROC ROBUSTREG:
Relative change in the scaled residuals (CONVERGENCE= RESID)
Relative change in the coefficients (CONVERGENCE= COEF)
Relative change in weights (CONVERGENCE= W)
You can specify the criteria with the CONVERGENCE= option in the PROC statement. The default is CONVERGENCE= COEF.
You can specify the precision of the convergence criterion with the EPS= sub-option.
In addition to these convergence criteria, a convergence criterion based on scale-independent measure of the gradient is always checked. See Coleman, et. al. (1980). A warning is issued if this criterion is not satisfied.
The following three estimators of the asymptotic covariance of the robust estimator are available in PROC ROBUSTREG:
where is a correction factor and W jk = ˆ‘ˆ ² ( r i ) x ij x ik . Refer to Huber (1981, p. 173) for more details.
You can specify the asymptotic covariance estimate with the option ASYMPCOV= option. The ROBUSTREG procedure uses H1 as the default because of its simplicity and stability. Confidence intervals are computed from the diagonal elements of the estimated asymptotic covariance matrix.
The robust version of R 2 is defined as
and the robust deviance is defined as the optimal value of the objective function on the ƒ 2 -scale:
where ² = ˆ , is the M estimator of , is the M estimator of location, and is the M estimator of the scale parameter in the full model.
Two tests are available in PROC ROBUSTREG for the canonical linear hypothesis
The first test is a robust version of the F test, which is refered to as the -test. Denote the M estimators in the full and reduced model as ˆˆ and 1 ˆˆ 1 , respectively. Let
with
The robust F test is based on the test statistic
Asymptotically under , where the standardization factor is » = ˆ« ˆ 2 ( s ) d ( s )/ ˆ« ˆ ² ( s ) d ( s ) and is the cumulative distribution function of the standard normal distribution. Large values of are significant. This test is a special case of the general -test of Hampel et. al. (1986, Section 7.2).
The second test is a robust version of the Wald test, which is refered to as -test. The test uses a test statistic
where is the ( p ˆ’ q ) —( p ˆ’ q ) lower right block of the asymptotic covariance matrix of the M estimate M of in a p -parameter linear model.
Under , the statistic has an asymptotic 2 distribution with p ˆ’ q degrees of freedom. Large absolute values of are significant. Refer to Hampel et. al. (1986, Chapter 7).
When M estimation is used, two criteria are available in PROC ROBUSTREG for model selection. The first criterion is a counterpart of the Akaike (1974) AIC criterion for robust regression, and it is defined as
where is a robust estimate of ƒ and is the M estimator with p -dimensional design matrix.
As with AIC, ± is the weight of the penalty for dimensions. The ROBUSTREG procedure uses ± =2 E ˆ 2 /E ˆ ² (Ronchetti (1985)) and estimates it using the final robust residuals.
The second criterion is a robust version of the Schwarz information criteria(BIC), and it is defined as
The breakdown value of an estimator is defined as the smallest fraction of contamination that can cause the estimator to take on values arbitrarily far from its value on the uncontamined data. The breakdown value of an estimator can be used as a measure of the robustness of the estimator. Rousseeuw and Leroy (1987) and others introduced the following high breakdown value estimators for linear regression.
The least trimmed squares (LTS) estimate proposed by Rousseeuw (1984) is defined as the p -vector
where
are the ordered squared residuals and h is defined in the range .
You can specify the parameter h with the H= option in the PROC statement. By default, . The breakdown value is for the LTS estimate.
The least median of squares (LMS) estimate is defined as the p -vector
where
are the ordered squared residuals i = 1,..., n , and h is defined in the range .
The breakdown value for the LMS estimate is also . However the LTS estimate has several advantages over the LMS estimate. Its objective function is smoother, making the LTS estimate less 'jumpy' (i.e. sensitive to local effects) than the LMS estimate. Its statistical efficiency is better, because the LTS estimate is asymptotically normal whereas the LMS estimate has a lower convergence rate (Rousseeuw and Leroy (1987)). Another important advantage is that, using the FAST-LTS algorithm of Rousseeuw and Van Driessen (2000), the LTS estimate takes less computing time and is more accurate.
The ROBUSTREG procedure computes LTS estimates using the FAST-LTS algorithm. The estimates are often used to detect outliers in the data, which are then downweighted in the resulting weighted LS regression.
Algorithm
Least trimmed squares (LTS) regression is based on the subset of h observations (out of a total of n observations) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between and n . The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator with breakdown value . The ROBUSTREG procedure uses the FAST-LTS algorithm given by Rousseeuw and Van Driessen (1998). The intercept adjustment technique is also used in this implementation. However, because this adjustment is expensive to compute, it is optional. You can use the IADJUST option in the PROC statement to request or suppress the intercept adjustment. By default, PROC ROBUSTREG does intercept adjustment for data sets with less than 10000 observations. The algorithm is described briefly as follows . Refer to Rousseeuw and Van Driessen (2000) for details.
The default h is where p is the number of independent variables . You can specify any integer h with with the H= option in the MODEL statement. The breakdown value for LTS, is reported . The default h is a good compromise between breakdown value and statistical efficiency.
If p =1(single regressor) the procedure uses the exact algorithm of Rousseeuw and Leroy (1987, p. 172-172).
If p ‰ 2, the procedure uses the following algorithm. If n < 2 ssubs , where ssubs is the size of the subgroups (you can specify ssubs using the SUBGROUPSIZE= option in the PROC statement, by default, ssubs = 300), draw a random p -subset and compute the regression coefficients using these p points (if the regression is degenerate , draw another p -subset). Compute the absolute residuals for all observations in the data set and select the first h points with smallest absolute residuals. From this selected h -subset, carry out nsteps C-steps (Concentration step, see Rousseeuw and Van Driessen (2000) for details. You can specify nsteps with the CSTEP= option in the PROC statement, by default, nsteps =2). Redraw p - subsets and repeat the preceding computing procedure nrep times and find the nbsol (at most) solutions with the lowest sums of h squared residuals. nrep can be specified with the NREP= option in the PROC statement. By default, NREP=min{500, }. For small n and p , all subsets are used and the NREP= option is ignored (Rousseeuw and Hubert (1996)). nbsol can be specified with the NBEST= option in the PROC statement. By default, NBEST=10. For each of these nbsol best solutions, take C-steps until convergence and find the best final solution.
If n ‰ 5 ssubs , construct 5 disjoint random subgroups with size ssubs . If 2 ssubs < n < 5 ssubs , the data are split into at most four subgroups with ssubs or more observations in each subgroup, so that each observation belongs to a subgroup and such that the subgroups have roughly the same size. Let nsubs denote the number of subgroups. Inside each subgroup, repeat the procedure in Step 3 times and keep the nbsol best solutions. Pool the subgroups, yielding the merged set of size n merged . In the merged set, for each of the nsubs — nbsol best solutions, carry out nsteps C-steps using n merged and and keep the nbsol best solutions. In the full data set, for each of these nbsol best solutions, take C-steps using n and h until convergence and find the best final solution.
R 2
The robust version of R 2 for the LTS estimate is defined as
for models with the intercept term and as
for models without the intercept term, where
s LTS is a preliminary estimate of the parameter ƒ in the distribution function L ( · / ƒ ).
Here d h,n is chosen to make s LTS consistent assuming a Gaussian model. Specifically,
with and being the distribution function and the density function of the standard normal distribution, respectively.
Final Weighted Scale Estimator
The ROBUSTREG procedure displays two scale estimators, s LTS and Wscale. The estimate Wscale is a more efficient scale estimate based on the preliminary estimate s LTS , and it is defined as
where
You can specify k with the CUTOFF= option in the MODEL statement. By default, k =3.
The S estimate proposed by Rousseeuw and Yohai (1984) is defined as the p -vector
where the dispersion S ( ) is the solution of
Here ² is set to ˆ« ( s ) d ( s ) such that s and S ( s ) are asymptotically consistent estimates of and ƒ for the Gaussian regression model. The breakdown value of the S estimate is
The ROBUSTREG procedure provides two choices for : the Tukey function and the Yohai function.
The Tukey function, which you can specify with the option CHIF=TUKEY, is
The constant k controls the breakdown value and efficiency of the S estimate. By specifying the efficiency using the EFF= option, you can determine the corresponding k . The default k is 2.9366 such that the breakdown value of the S estimate is 0.25 with a corresponding asymptotic efficiency for the Gaussian model of 75 . 9%.
The Yohai function, which you can specify with the option CHIF=YOHAI, is
where b =1 . 792, b 1 = ˆ’ . 972, b 2 =0 . 432, b 3 = ˆ’ . 052, and b 4 =0 . 002. By specifying the efficiency using the EFF= option, you can determine the corresponding k . By default, k is set to 0.7405 such that the breakdown value of the S estimate is 0.25 with a corresponding asymptotic efficiency for the Gaussian model of 72 . 7%.
Algorithm
The ROBUSTREG procedure implements the algorithm by Marazzi (1993) for the S estimate, which is a refined version of the algorithm proposed by Ruppert (1992). The refined algorithm is briefly described as follows.
Initialize iter = 1.
Draw a random q -subset of the total n observations and compute the regression coefficients using these q observations (if the regression is degenerate, draw another q -subset), where q ‰ p can be specified with the SUBSIZE= option. By default, q = p .
Compute the residuals: for i =1 , , n . If iter = 1, set s * = 2 median { r i ,i = 1 , ,n } ; if s * = 0, set s * = min { r i ,i =1 , ,n }; while ; go to Step3. If iter > 1 and , go to the Step 3; else go to Step 5.
Solve for s the equation
using an iterative algorithm.
If iter > 1 and s>s , go to Step5. Otherwise , sets = s an d * = . I f s * <TOLS , return s * and *; else go to Step5.
if iter <NREP , setiter = iter + 1 and return to Step 1; else return s * and *.
The ROBUSTREG procedure does the following refinement step by default. You can request this refinement not be done using the NOREFINE option in the PROC statement.
Let ˆ = ² . Using the values s * and * from the previous steps, compute M estimates M and ƒ M of and ƒ with the setup for M estimation in the section 'M Estimation' on page 3993. If ƒ M >s *, give a warning and return s * and *; otherwise, return ƒ M and M .
You can specify TOLS with the TOLERANCE= option; by default, TOLS = . 001. Alternately You can specify NREP with the NREP= option. You can also use the options NREP= NREP0 or NREP= NREP1 to determine NREP according to the following table. NREP= NREP0 is set as the default.
P | NREP0 | NREP1 |
---|---|---|
1 | 150 | 500 |
2 | 300 | 1000 |
3 | 400 | 1500 |
4 | 500 | 2000 |
5 | 600 | 2500 |
6 | 700 | 3000 |
7 | 850 | 3000 |
8 | 1250 | 3000 |
9 | 1500 | 3000 |
>9 | 1500 | 3000 |
R 2 and Deviance
The robust version of R 2 for the S estimate is defined as
for the model with the intercept term and
for the model without the intercept term, where S p is the S estimate of the scale in the full model, S µ is the S estimate of the scale in the regression model with only the intercept term, and S is the S estimate of the scale without any regressor. The deviance D is defined as the optimal value of the objective function on the ƒ 2 -scale:
Asymptotic Covariance and Confidence Intervals
Since the S estimate satisfies the first-order necessary conditions as the M estimate, it has the same asymptotic covariance as that of the M estimate. All three estimators of the asymptotic covariance for the M estimate in the section 'Asymptotic Covariance and Confidence Intervals' on page 3997 can be used for the S estimate. Besides, the weighted covariance estimator H4 described in the section 'Asymptotic Covariance and Confidence Intervals' on page 4008 is also available and is set as the default. Confidence intervals for estimated parameters are computed from the diagonal elements of the estimated asymptotic covariance matrix.
MM estimation is a combination of high breakdown value estimation and efficient estimation, which was introduced by Yohai (1987). It has three steps:
Compute an initial (consistent) high breakdown value estimate ² . The ROBUSTREG procedure provides two kinds of estimates as the initial estimate, the LTS estimate and the S estimate. By default, the LTS estimate because of its speed, efficiency, and high breakdown value. The breakdown value of the final MM estimate is decided by the breakdown value of the initial LTS estimate and the constant k in the function. To use the S estimate as the initial estimate, you specify the INITEST=S option in the PROC statement. In this case, the breakdown value of the final MM estimate is decided only by the constant k . Instead of computing the LTS estimate or the S estimate as initial estimates, you can also specify the initial estimate explicitly using the INEST= option in the PROC statement. See the section 'INEST= Data Set' on page 4011 for details.
Find ² such that
where ² = ˆ« ( s ) d ( s ).
The ROBUSTREG procedure provides two choices for : the Tukey function and the Yohai function.
The Tukey function, which you can specify with the option CHIF=TUKEY, is
where k can be specified with the K0= option. The default k = 2.9366 such that the asymptotically consistent scale estimate ² has the breakdown value of 25%.
The Yohai function, which you can specify with the option CHIF=YOHAI, is
where b = 1 . 792, b 1 = ˆ’ . 972, b 2 = 0 . 432, b 3 = ˆ’ . 052, and b 4 = 0 . 002. You can specify k with the K0= option. The default k is .7405 such that the asymptotically consistent scale estimate ² has the breakdown value of 25%.
Find a local minimum MM of
such that Q MM ( MM ) ‰ Q MM ( ² ). The algorithm for M estimation is used here.
The ROBUSTREG procedure provides two choices for : the Tukey function and the Yohai function.
The Tukey function, which you can specify with the option CHIF=TUKEY, is
where k 1 can be specified with the K1= option. The default k 1 is 3.440 such that the MM estimate has 85% asymptotic efficiency with the Gaussian distribution.
The Yohai function, which you can specify with the option CHIF=Yohai, is
where k 1 can be specified with the K1= option. The default k 1 is 0.868 such that the MM estimate has 85% asymptotic efficiency with the Gaussian distribution.
The initial LTS estimate is computed using the algorithm described in the section 'LTS Estimate' on page 4000. You can control the quantile of the LTS estimate with the option INITH= h , where h is an integer between and . By default, , which corresponds to a breakdown value of around 25%.
The initial S estimate is computed using the algorithm described in the section 'S Estimate' on page 4003. You can control the breakdown value and efficiency of this initial S estimate by the constant k which can be specified with the K0 option.
The scale parameter ƒ is solved by an iterative algorithm
where ² = ˆ« k 0( s ) d ( s ).
Once the scale parameter is computed, the iteratively reweighted least squares (IRLS) algorithm with fixed scale parameter is used to compute the final MM estimator.
In the iterative algorithm for the scale parameter, the relative change of the scale parameter controls the convergence.
In the iteratively reweighted least squares algorithm, the same convergence criteria for the M estimate used before are used here.
Although the final MM estimate inherits the high-breakdown-value properity, its bias due to the distortion of the outliers can be high. Yohai, Stahel, and Zamar (1991) introduced a bias test. The ROBUSTREG procedure implements this test when you specify the BIASTEST option in the PROC statement. This test bases on the initial scale estimate ² and the final scale estimate ² , which is the solution of
Let ˆ k ( ·) = ( ·) and ˆ k1 ( ·) = ( ·), where ² denotes the derivative with respect to the argument. Compute
Standard asymptotic theory shows that T approximately follows a 2 -distribution with p degrees of freedom. If T exceeds the ± quantile of the 2 -distribution with p degrees of freedom, then the ROBUSTREG procedure gives a warning and recommends to use other methods. Otherwise the final MM estimate and the initial scale estimate are reported. You can specify ± with the ALPHA= option following the BIASTEST option. By default, ALPHA= 0.99.
Since the MM estimate is computed as a M estimate with a fixed scale in the last step, the asymptotic covariance for the M estimate can be used here for the asymptotic covariance of the MM estimate. Besides the three estimators H1, H2, and H3 as described in the section 'Asymptotic Covariance and Confidence Intervals' on page 3997, a weighted covariance estimator H4 is available:
where is the correction factor and = .
You can specify these estimators with the option ASYMPCOV= [H1 H2 H3 H4]. The ROBUSTREG procedure uses H4 as default. Confidence intervals for estimated parameters are computed from the diagonal elements of the estimated asymptotic covariance matrix.
The robust version of R 2 for the MM estimate is defined as
and the robust deviance is defined as the optimal value of the objective function on the ƒ 2 -scale:
where ² = ˆ , is the MM estimator of , is the MM estimator of location, and is the MM estimator of the scale parameter in the full model.
For MM estimation, the same -test and -test used for M estimation can be used. See the section 'Linear Tests' on page 3998 for details.
For MM estimation, the same two model selection methods used for M estimation can be used. See the section 'Model Selection' on page 3999 for details.
The ROBUSTREG procedure uses the robust multivariate location and scale estimates for leverage points detection. The procedure provides the minimum covariance determinant (MCD) method, which was introduced by Rousseeuw (1984).
PROC ROBUSTREG implements the algorithm given by Rousseeuw and Van Driessen (1999) for MCD, which is similar to the algorithm for LTS.
The Mahalanobis Distance is defined as
where and . Here x i = ( x i 1 , , x i ( p- 1) ) T do not include the constant variable. The relation between the Mahalanobis distance MD ( x i ) and the hat matrix H = ( h ij ) = X ( X T X ) ˆ’ 1 X T is
The Robust Distance is defined as
where T ( X ) and C ( X ) are the robust multivariate location and scale obtained by MCD.
These distances are used to detect leverage points.
Let be the cutoff value. The variable LEVERAGE is defined as
You can specify a cutoff value with the LEVERAGE option in the MODEL statement.
Residuals r i , i = 1 , , n based on robust regression estimates are used to detect vertical outliers. The variable OUTLIER is defined as
You can specify the multiplier k of the cutoff value by using the CUTOFF= option in the MODEL statement.
An ODS table called DIAGNOSTICS contains these two variables.
When you use the M or MM estimation, you cna use the INEST= data set to specify initial estimates for all the parameters in the model. The INEST= option is ignored if you specify LTS or S estimation using the METHOD=LTS or METHOD=S option or if you specify the INITEST= option after the METHOD=MM option in the PROC statement. The INEST= data set must contain the intercept variable (named Intercept ) and all independent variables in the MODEL statement.
If BY processing is used, the INEST= data set should also include the BY variables, and there must be at least one observation for each BY group. If there is more than one observation in one BY group, the first one read is used for that BY group .
If the INEST= data set also contains the _TYPE_ variable, only observations with _TYPE_ value 'PARMS' are used as starting values.
You can specify starting values for the iteratively reweighted least squares algorithm in the INEST= data set. The INEST= data set has the same structure as the OUTEST= data set but is not required to have all the variables or observations that appear in the OUTEST= data set. One simple use of the INEST= option is passing the previous OUTEST= data set directly to the next model as an INEST= data set, assuming that the two models have the same parameterization.
The OUTEST= data set contains parameter estimates for the model. You can specify a label in the MODEL statement to distinguish between the estimates for different modeling using the ROBUSTREG procedure. If the COVOUT option is specified, the OUTEST= data set also contains the estimated covariance matrix of the parameter estimates. Note that, if the ROBUSTREG procedure does not converge, the parameter estimates are set to missing in the OUTEST data set.
The OUTES= data set contains all variables specified in the MODEL statement and the BY statement. One observation consists of parameter values for the model with the dependent variable having the value ˆ’ 1. If the COVOUT option is specified, there are additional observations containing the rows of the estimated covariance matrix. For these observations, the dependent variable contains the parameter estimate for the corresponding row variable. The following variables are also added to the data set:
_MODEL_ | a character variable containing the label of the MODEL statement, if present. Otherwise, the variable's value is blank |
_NAME_ | a character variable containing the name of the dependent variable for the parameter estimates observations or the name of the row for the covariance matrix estimates |
_TYPE_ | a character variable containing the type of the observation, either PARMS for parameter estimates or COV for covariance estimates |
_METHOD_ | a character variable of containing the type of estimation method, either M estimation, or LTS estimation, or S estimation, or MM estimation |
_STATUS_ | a character variable containing the status of model fitting, either Converged, or Warning, or Failed |
INTERCEPT | a numeric variable containing the intercept parameter estimates and covariances |
_SCALE_ | a numeric variable containing the scale parameter estimates |
Any BY variables specified are also added to the OUTEST= data set.
The algorithms for the various different estimation methods need different amount of memory for working space. Let p be the number of parameters estimated and n be the number of observations used in the model estimation.
For M estimation, the minimum working space (in bytes) needed is
If sufficient space is available, the input data set is also kept in memory; otherwise, the input data set is reread for computing the iteratively reweighted least squares estimates and the execution time of the procedure increases substantially. For each reweighted least squares, O ( np 2 + p 3 ) multiplications and additions are required for computing the cross product matrix and its inverse. The O ( v ) notation means that, for large values of the argument, v , O ( v ) is approximately a constant times v .
Since the iteratively reweighted least squares algorithm converges very quickly (normally within less than 20 iterations), the computation of M estimates is fast.
LTS estimation is more expensive in computation. The minimum working space (in bytes) needed is
The memory is mainly used to store the current data used by LTS for modeling. The LTS algorithm uses subsampling and spends much of its computing time on resampling and computing estimates for subsamples. Since it resamples if singularity is detected , it may take more time if the data set has serious singularities.
The MCD algorithm for high leverage point diagnostics is similar to the LTS algorithm.
The ROBUSTREG procedures assigns a name to each table it creates. You can specify these names when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table.
ODS Table Name | Description | Statement | Option |
---|---|---|---|
BestEstimates | Best final estimates for LTS | PROC | SUBANALYSIS |
BestSubEstimates | Best estimates for each subgroup | PROC | SUBANALYSIS [*] |
BiasTest | Bias test for MM estimation | PROC | BIASTEST |
ClassLevels | Class variable levels | CLASS | default* |
CorrB | Parameter estimate correlation matrix | MODEL | CORRB |
CovB | Parameter estimate covariance matrix | MODEL | COVB |
CStep | C-Step for LTS fitting | PROC | SUBANALYSIS |
Diagnostics | Outlier diagnostics | MODEL | DIAGNOSTICS |
DiagSummary | Summary of the outlier diagnostics | MODEL | default |
GoodFit | R2, deviance, AIC, and BIC | MODEL | default |
InitLTSProfile | Profile for initial LTS estimate | PROC | METHOD |
InitSProfile | Profile for initial S estimate | PROC | METHOD |
IterHistory | Iteration history | PROC | ITPRINT |
LTSEstimates | LTS parameter estimates | PROC | METHOD |
LTSLocationScale | Location and scale for LTS | PROC | METHOD |
LTSProfile | Profile for LTS estimate | PROC | METHOD |
LTSRsquare | R2 for LTS estimate | PROC | METHOD |
MMProfile | Profile for MM estimate | PROC | METHOD |
ModelInfo | Model information | MODEL | default |
NObs | Observations Summary | PROC | default |
ParameterEstimates | Parameter estimates | MODEL | default |
ParameterEstimatesF | Final weighted LS estimates | PROC | FWLS |
ParameterEstimatesR | Reduced parameter estimates | TEST | default |
ParmInfo | Parameter indices | MODEL | default |
SProfile | Profile for S estimate | PROC | METHOD |
Groups | Groups for LTS fitting | PROC | SUBANALYSIS [*] |
SummaryStatistics | Summary statistics for model variables | MODEL | default |
TestsProfile | Results for tests | TEST | default |
[*] Depends on data. |
Graphical displays are important in robust regression and outlier detection. Two plots are particularly useful for revealing outliers and leverage points. The first is a scatter plot of the standardized robust residuals against the robust distances (RDPLOT). The second is a scatter plot of the robust distances against the classical Mahalanobis distances (DDPLOT). See Figure 62.4 on page 3975 and Figure 62.5 on page 3975 for examples. In addition to these two plots, a histogram and a quantile-quantile plot of the standardized robust residuals are also helpful. See Figure 62.6 on page 3976 and Figure 62.7 on page 3976 for examples.
This section describes the use of ODS for creating these four plots with the ROBUSTREG procedure. These graphics are experimental in this release, meaning that both the graphical results and the syntax for specifying them are subject to change in a future release.
To request these plots you must specify the ODS GRAPHICS statement in addition to the PLOT= (or PLOTS=) option, which is described as follows. For more information on the ODS GRAPHICS statement, see Chapter 15, 'Statistical Graphics Using ODS.'
You can specify the PLOT= or PLOTS= option in the PROC statement to request one or more plots:
PLOT = keyword
PLOTS =( keyword-list )
requests plots for robust regression. You can specify one or more of the following keywords :
With the RDPLOT and DDPLOT options, you can label the points on the plots by specifying the LABEL= suboption immediately after the keyword:
PLOT=DDPLOT <( LABEL = label method )>
PLOT=RDPLOT <( LABEL = label method )>
You can specify one of the following label methods :
Value of LABEL= | Label Method |
---|---|
ALL | label all points |
OUTLIER | label outliers |
LEVERAGE | label leverage points |
NONE | no labels |
By default, the ROBUSTREG procedure labels both outliers and leverage points.
If you specify ID variables in the ID statement, the values of the first ID variable are used as labels; otherwise, observation numbers are used as labels.
The histogram is superimposed with a normal density curve and a kernel density curve.
PROC ROBUSTREG assigns a name to each graph it creates using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 62.11 on page 4015.
ODS Graph Name | Plot Description | Statement | PLOTS= Option |
---|---|---|---|
DDPlot | Robust distance - Mahalanobis distance | PROC | DDPLOT |
RDPlot | Standardized robust residual -Robust distance | PROC | RDPLOT |
ResidualHistogram | Histogram of standardized robust residuals | PROC | RESHISTOGRAM |
ResidualQQPlot | Q-Q plot of standardized robust residuals | PROC | RESQQPLOT |
To request these graphs you must specify the ODS GRAPHICS statement in addition to the PLOT= (or PLOTS=) option described in Table 62.9 on page 4014. For more information on the ODS GRAPHICS statement, see Chapter 15, 'Statistical Graphics Using ODS.'
Keyword | Plot |
---|---|
DDPLOT | Robust distance - Mahalanobis distance |
RDPLOT | Standardized robust residual - Robust distance |
RESHISTOGRAM | Histogram of standardized robust residuals |
RESQQPLOT | Q-Q plot of standardized robust residuals |
ALL | All plots |