Syntax | SAS.STAT 9.1 Users Guide (Vol. 6)

The following statements are available in PROC RSREG.

PROC RSREG < options > ;
- MODEL responses= independents < / options > ;
- RIDGE < options > ;
- WEIGHT variable ;
- ID variables ;
- BY variables ;

The PROC RSREG and MODEL statements are required. The BY, ID, MODEL, RIDGE, and WEIGHT statements are described after the PROC RSREG statement, and they can appear in any order.

PROC RSREG Statement

PROC RSREG < options > ;

The PROC RSREG statement invokes the procedure. You can specify the following options in the PROC RSREG statement.

DATA= SAS-data-set

specifies the input SAS data set that contains the data to be analyzed . By default, PROC RSREG uses the most recently created SAS data set.

NOPRINT

suppresses the normal display of results when only the output data set is required. For more information, see the description of the NOPRINT option in the 'MODEL Statement' and 'RIDGE Statement' sections. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, 'Using the Output Delivery System,' for more information.

OUT= SAS-data-set

creates an output SAS data set that contains statistics for each observation in the input data set. In particular, this data set contains the BY variables, the ID variables, the WEIGHT variable, the variables in the MODEL statement, and the output options requested in the MODEL statement. You must specify output options in the MODEL statement; otherwise , the output data set is created but contains no observations. To create a permanent SAS data set, you must specify a two-level name (refer to the discussion in SAS Language Reference: Concepts for more information on permanent SAS data sets). For details on the data set created by PROC RSREG, see the 'Output Data Sets' section on page 4051.

BY Statement

BY variables ;

You can specify a BY statement with PROC RSREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the RSREG procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure (in base SAS software).

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

ID Statement

ID variables ;

The ID statement names variables that are to be transferred to the data set created by the OUT= option in the PROC RSREG statement.

MODEL Statement

MODEL responses=independents < / options > ;

The MODEL statement lists response (dependent) variables followed by an equal sign and then lists independent variables, some of which may be covariates. The output options to the MODEL statement specify which statistics are output to the data set created using the OUT= option in the PROC RSREG statement. If none of the options are selected, the data set is created but contains no observations. The option keywords become values of the special variable _TYPE_ in the output data set. Any of the following options can be specified.

Task	Options
Analyze Original Data	NOCODE
Fit Model to First BY Group Only	BYOUT
Declare Covariates	COVAR=
Request Additional Statistics	PRESS
Request Additional Tests	LACKFIT
Suppress Displayed Output	NOANOVA NOOPTIMAL NOPRINT
Output Statistics	ACTUAL PREDICT RESIDUAL L95 U95 L95M U95M D

ACTUAL

specifies that the observed response values from the input data set be written to the output data set.

BYOUT

uses only the first BY group to estimate the model. Subsequent BY groups have scoring statistics computed in the output data set only. The BYOUT option is used only when a BY statement is specified.

COVAR= n

declares that the first n variables on the right-hand side of the model are simple linear regressors (covariates) and not factors in the quadratic response surface. By default, PROC RSREG forms quadratic and crossproduct effects for all regressor variables in the MODEL statement. See the 'Handling Covariates' section on page 4050 for more details and Example 63.2 on page 4059 for an example using covariates.

specifies that Cook's D influence statistic be written to the output data set. See Chapter 2, 'Introduction to Regression Procedures,' for details and formulas.

LACKFIT

performs a lack-of-fit test. Refer to Draper and Smith (1981) for a discussion of lack-of-fittests.

L95

specifies that the lower bound of a 95% confidence interval for an individual predicted value be written to the output data set. The variance used in calculating this bound is a function of both the mean square error and the variance of the parameter estimates. See Chapter 2 for details and formulas.

L95M

specifies that the lower bound of a 95% confidence interval for the expected value of the dependent variable be written to the output data set. The variance used in calculating this bound is a function of the variance of the parameter estimates. See Chapter 2 for details and formulas.

NOANOVA

NOAOV

suppresses the display of the analysis of variance and parameter estimates from the model fit.

NOCODE

performs the canonical and ridge analyses with the parameter estimates derived from fitting the response to the original values of the factors variables, rather than their coded values (see the 'Coding the Factor Variables' section on page 4047 for more details.) Use this option if the data are already stored in a coded form.

NOOPTIMAL

NOOPT

suppresses the display of the canonical analysis for the quadratic response surface.

NOPRINT

suppresses the display of both the analysis of variance and the canonical analysis.

PREDICT

specifies that the values predicted by the model be written to the output data set.

PRESS

computes and displays the predicted residual sum of squares (PRESS) statistic for each dependent variable in the model. The PRESS statistic is added to the summary information at the beginning of the analysis of variance, so if the NOANOVA or NOPRINT option is specified, PRESS has no effect. See Chapter 2 for details and formulas.

RESIDUAL

specifies that the residuals, calculated as ACTUAL - PREDICTED, be written to the output data set.

U95

specifies that the upper bound of a 95% confidence interval for an individual predicted value be written to the output data set. The variance used in calculating this bound is a function of both the mean square error and the variance of the parameter estimates. See Chapter 2 for details and formulas.

U95M

specifies that the upper bound of a 95% confidence interval for the expected value of the dependent variable be written to the output data set. The variance used in calculating this bound is a function of the variance of the parameter estimates. See Chapter 2 for details and formulas.

RIDGE Statement

RIDGE < options > ;

A RIDGE statement computes the ridge of optimum response. The ridge starts at a given point x , and the point on the ridge at radius r from x is the collection of factor settings that optimizes the predicted response at this radius. You can think of the ridge as climbing or falling as fast as possible on the surface of predicted response. Thus, the ridge analysis can be used as a tool to help interpret an existing response surface or to indicate the direction in which further experimentation should be performed.

The default starting point, x , has each coordinate equal to the point midway between the highest and lowest values of the factor in the design. The default radii at which the ridge is computed are 0, 0.1, , 0.9, 1. If, as usual, the ridge analysis is based on the response surface fit to coded values for the factor variables (see the 'Coding the Factor Variables' section on page 4047 for details), then this results in a ridge that starts at the point with a coded zero value for each coordinate and extends toward, but not beyond, the edge of the range of experimentation. Alternatively, both the center point for the ridge and the radii at which it is to be computed can be specified.

You can specify the following options in the RIDGE statement:

CENTER= uncoded-factor-values

gives the coordinates of the point x from which to begin the ridge. The coordinates should be given in the original (uncoded) factor variable values and should be separated by commas. There must be as many coordinates specified as there are factors in the model, and the order of the coordinates must be the same as that used in the MODEL statement. This starting point should be well inside the range of experimentation. The default sets each coordinate equal to the value midway between the highest and lowest values for the associated factor.

MAXIMUM

MAX

computes the ridge of maximum response. Both the MIN and MAX options can be specified; at least one must be specified.

MINIMUM

MIN

computes the ridge of minimum response. Both the MIN and MAX options can be specified; at least one must be specified.

NOPRINT

suppresses the display of the ridge analysis when only an output data set is required.

OUTR= SAS-data-set

creates an output SAS data set containing the computed optimum ridge. For details, see the 'Output Data Sets' section on page 4051.

RADIUS= coded-radii

gives the distances from the ridge starting point at which to compute the optimum. The values in the list represent distances between coded points. The list can take any of the following forms or can be composed of mixtures of them:

m ₁ , m ₂ , , m _n	several values
m TO n	a sequence where m equals the starting value, n equals the ending value, and the increment equals 1
m TO n BY i	a sequence where m equals the starting value, n equals the ending value, and i equals the increment

Mixtures of the preceding forms should be separated by commas. The default list runs from 0 to 1 by increments of 0.1. The following are examples of valid lists.
```
  radius=0 to 5 by .5;   radius=0, .2, .25, .3, .5 to 1.0 by .1;  
```

WEIGHT Statement

WEIGHT variable ;

When a WEIGHT statement is used, a weighted residual sum of squares

is minimized, where w _i is the value of the variable specified in the WEIGHT statement, y _i is the observed value of the response variable, and _i is the predicted value of the response variable.

The observation is used in the analysis only if the value of the WEIGHT statement variable is greater than zero. The WEIGHT statement has no effect on degrees of freedom or number of observations. If the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least-squares estimates are best linear unbiased estimators (BLUE).