Syntax | SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)

The following statements are available in PROC LOESS:

PROC LOESS < DATA = SAS-data-set > ;
- MODEL dependents=regressors < / options > ;
- ID variables ;
- BY variables ;
- WEIGHT variable ;
- SCORE DATA = SAS-data-set < ID=( variable list ) > < / options > ;

The PROC LOESS and MODEL statements are required. The BY, WEIGHT, and ID statements are optional. The SCORE statement is optional, and more than one SCORE statement can be used.

The statements used with the LOESS procedure, in addition to the PROC LOESS statement, are as follows .

BY	specifies variables to define subgroups for the analysis.
ID	names variables to identify observations in the displayed output.
MODEL	specifies the dependent and independent variables in the loess model, details and parameters for the computational algorithm, and the required output.
SCORE	specifies a data set containing observations to be scored.
WEIGHT	declares a variable to weight observations.

PROC LOESS Statement

PROC LOESS < DATA = SAS-data-set > ;

The PROC LOESS statement is required. The only option in this statement is the DATA= option, which names a data set to use for the loess model.

BY Statement

BY variables ;

You can specify a BY statement with PROC LOESS to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in the order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure (in Base SAS software).

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

ID Statement

ID variables ;

The ID statement is optional, and more than one ID statement can be used. The variables listed in any of the ID statements are displayed in the Output Statistics table beside each observation. Any variables specified as a regressor or dependent variable in the MODEL statement already appear in the Output Statistics table and are not treated as ID variables, even if they appear in the variable list of an ID statement.

MODEL Statement

MODEL dependents=independent variables < / options > ;

The MODEL statement names the dependent variables and the independent variables. Variables specified in the MODEL statement must be numeric variables in the data set being analyzed .

Table 41.1 lists the options available in the MODEL statement.

Table 41.1: Model Statement Options
Option	Description
Fitting Parameters
DIRECT	specifies direct fitting at every data point
SMOOTH=	specifies the list of smoothing values
DEGREE=	specifies the degree of local polynomials (1 or 2)
DROPSQUARE=	specifies the variables whose squares are to be dropped from local quadratic polynomials
BUCKET=	specifies the number of points in kd tree buckets
ITERATIONS=	specifies the number of reweighting iterations
DFMETHOD=	specifies the method of computing lookup degrees of freedom
SELECT=	specifies that automatic smoothing parameter selection be done
TRACEL	displays the trace of the smoothing matrix
Residuals and Confidence limits
ALL	requests the following options: CLM, RESIDUAL , STD, SCALEDINDEP
CLM	displays 100(1 ˆ’ ± ) % confidence interval for the mean predicted value
RESIDUAL	displays residual statistics
STD	displays estimated prediction standard deviation
T	displays t statistics
INTERP=	specifies the degree of polynomials used in blending
Display Options
DETAILS=	specifies which tables are to be displayed
Other options
ALPHA=	sets significance value for confidence intervals
SCALE=	specifies the method used to scale the regressor variables
SCALEDINDEP	displays scaled independent variable coordinates

The following options are available in the MODEL statement after a slash (/).

ALL

requests all these options: CLM, RESIDUAL, SCALEDINDEP, STD, and T.

ALPHA= number

sets the significance level used for the construction of confidence intervals for the current MODEL statement. The value must be between 0 and 1; the default value of 0.05 results in 95% intervals.

BUCKET= number

specifies the maximum number of points in the leaf nodes of the kd tree. The default value used is s * n/ 5, where s is a smoothing parameter value specified using the SMOOTH= option and n is the number of observations being used in the current BY group . The BUCKET= option is ignored if the DIRECT option is specified.

CLM

requests that 100(1 ˆ’ ± ) confidence limits on the mean predicted value be added to the Output Statistics table. By default, 95% limits are computed; the ALPHA= option in the MODEL statement can be used to change the ± -level. The use of this option implicitly selects the model option DFMETHOD=EXACT if the DFMETHOD= option has not been explicitly used.

DEGREE=12

sets the degree of the local polynomials to use for each local regression. The valid values are 1 for local linear fitting or 2 for local quadratic fitting, with 1 being the default.

DETAILS < ( tables ) >

selects which tables to display, where tables is one or more of KDTREE, MODELSUMMARY, OUTPUTSTATISTICS, and PREDATVERTICES:
- KDTREE displays the kd tree structure.
- MODELSUMMARY displays the fit criteria for all smoothing parameter values that are specified in the SMOOTH= option in the MODEL statement, or which are fit with automatic smoothing parameter selection.
- OUTPUTSTATISTICS displays the predicted values and other requested statistics at the points in the input data set.
- PREDATVERTICES displays fitted values and coordinates of the kd tree vertices where the local least squares fitting is done.

The KDTREE and PREDATVERTICES specifications are ignored if the DIRECT option is specified in the MODEL statement. Specifying the option DETAILS with no qualifying list outputs all tables.

DFMETHOD= NONE EXACT APPROX < ( approx-options ) >

specifies the method used to calculate the lookup degrees of freedom used in performing statistical inference. The default is DFMETHOD=NONE, unless you specify any of the MODEL statement options ALL, CLM, or T, or any SCORE statement CLM option, in which case the default is DFMETHOD=EXACT.

You can specify the following approx-options in parentheses after the DFMETHOD=APPROX option:

QUANTILE= number specifies that the smallest 100( number )% of the nonzero coefficients in the smoothing matrix is set to zero in computing the approximate lookup degrees of freedom. The default value is QUANTILE=0.9.

CUTOFF= number specifies that coefficients in the smoothing matrix whose magnitude is less than the specified value are set to zero in computing the approximate lookup degrees of freedom. Using the CUTOFF= option overrides the QUANTILE= option.

See the Sparse and Approximate Degrees of Freedom Computation section on page 2246 for a description of the method used when the DFMETHOD=APPROX option is specified.

DIRECT

specifies that local least squares fits are to be done at every point in the input data set. When the direct option is not specified, a computationally faster method is used. This faster method performs local fitting at vertices of a kd tree decomposition of the predictor space followed by blending of the local polynomials to obtain a regression surface.

DROPSQUARE=( variables )

specifies the quadratic monomials to exclude from the local quadratic fits. This option is ignored unless the DEGREE=2 option has been specified. For example,
```
  model z=x y / degree=2 dropsquare=(y)  
```
uses the monomials 1, x , y , x ² , and xy in performing the local fitting.

INTERP= LINEAR CUBIC

specifies the degree of the interpolating polynomials used for blending local polynomial fits at the kd tree vertices. This option is ignored if the DIRECT option is specified in the model statement. INTERP=CUBIC is not supported for models with more than two regressors. The default is INTERP=LINEAR.

ITERATIONS= number

specifies the number of iterative reweighting steps to be done. Such iterations are appropriate when there are outliers in the data or when the error distribution is a symmetric long-tailed distribution. The default number of iterations is 1.

RESIDUAL R

specifies that residuals are to be included in the Output Statistics table.

SCALE= NONE SD < ( number ) >

specifies the scaling method to be applied to scale the regressors. The default is NONE, in which case no scaling is applied. A specification of SD( number ) indicates that a trimmed standard deviation is to be used as a measure of scale, where number is the trimming fraction. A specification of SD with no qualification defaults to 10% trimmed standard deviation.

SCALEDINDEP

specifies that scaled regressor coordinates be included in the output tables. This option is ignored if the SCALE= model option is not used or if SCALE=NONE is specified.

SELECT= criterion < ( <GLOBAL> <STEPS> <RANGE( lower , upper )> )>

SELECT= DFCriterion ( target <GLOBAL> <STEPS> <RANGE( lower , upper )> )

specifies that automatic smoothing parameter selection be done using the named criterion or DFCriterion . Valid values for the criterion are

AICC	specifies the AIC _C criterion (Hurvich, Simonoff, and Tsai 1998).
AICC1	specifies the AIC _C ₁ criterion (Hurvich, Simonoff, and Tsai 1998).
GCV	specifies the generalized cross-validation criterion (Craven and Wahba 1979).

The DFCriterion specifies the measure used to estimate the model degrees of freedom. The measures implemented in PROC LOESS all depend on prediction matrix L relating the observed and predicted values of the dependent variable. Valid values are

DF1	specifies Trace( L ).
DF2	specifies Trace( L ^T L ).
DF3	specifies 2Trace( L ) ˆ’ Trace( L ^T L ).

For both types of selection, the smoothing parameter value is selected to yield a minimum of an optimization criterion. If you specify criterion as one of AICC, AICC1, or GCV, the optimization criterion is the specified criterion . If you specify DFCriterion as one of DF1, DF2, or DF3, the optimization criterion is DFCriterion ˆ’ target , where target is a specified target degree of freedom value. Note that if you specify a DFCriterion , then you must also specify a target value. See the section Automatic Smoothing Parameter Selection on page 2243 for definitions and properties of the selection criteria.

The selection is done as follows:

If you specify the SMOOTH= value-list option, then PROC LOESS selects the largest value in this list that yields the global minimum of the specified optimization criterion.
If you do not specify the SMOOTH= option, then PROC LOESS finds a local minimum of the specified optimization criterion using a golden section search of values less than or equal to one.

You can specify the following modifiers in parentheses after the specified criterion to alter the behavior of the SELECT= option:

GLOBAL	specifies that a global minimum be found within the range of smoothing parameter values examined. This modifier has no effect if you also specify the SMOOTH= option in the MODEL statement.
STEPS	specifies that all models evaluated in the selection process be displayed.
RANGE( lower , upper )	specifies that only smoothing parameter values greater than or equal to lower and less than or equal to upper be examined.

For models with one dependent variable, if you specify neither the SELECT= nor the SMOOTH= options in the MODEL statement, then PROC LOESS uses SELECT=AICC.

The following table summarizes how the smoothing parameter values are chosen for various combinations of the SMOOTH= option, the SELECT= option, and the SELECT= option modifiers.

Table 41.2: Smoothing Parameter Value(s) Used for Combinations of SMOOTH= and SELECT= OPTIONS for Models with One Dependent Variable
Syntax	Search Method	Search Domain
default	golden section using AICC	(0 , 1]
SMOOTH= list	no selection	values in list
SMOOTH= list SELECT= criterion	global	values in list
SMOOTH= list SELECT= criterion ( RANGE( l, u ) )	global	values in list within [ l, u ]
SELECT= criterion	golden	section (0 , 1]
SELECT= criterion ( RANGE( l,u ) )	golden	section [ l, u ]
SELECT= criterion ( GLOBAL )	global	(0 , 1]
SELECT= criterion ( GLOBAL RANGE( l, u ) )	global	[ l, u ]

Some examples of using the SELECT= option are

SELECT=GCV	specifies selection using the GCV criterion .
SELECT=DF1(6.3)	specifies selection using the DF1 DFCriterion with target value 6 . 3.
SELECT=AICC(STEPS)	specifies selection using the AICC criterion showing all step details.
SELECT=DF2(7 GLOBAL)	specifies selection using the DF2 DFCriterion with target value 7 using a global search algorithm.

Note: The SELECT= option cannot be used for models with more than one dependent variable.

SMOOTH= value-list

specifies a list of positive smoothing parameter values. If you do not SELECT= option in the MODEL statement, then a separate fit is obtained for each SMOOTH= value specified. If you do specify the SELECT= option, then models with all values specified in the SMOOTH= list are examined, and PROC LOESS selects the value that minimizes the criterion specified in the SELECT= option.

For models with two or more dependent variables, if the SMOOTH= option is not specified in the MODEL statement, then SMOOTH=0.5 is used as a default.

STD

specifies that standardized errors are to be included in the Output Statistics table.

specifies that t statistics are to be included in the Output Statistics table.

TRACEL

specifies that the trace of the prediction matrix as well as the GCV and AICC statistics are to be included in the FIT Summary table. The use of any of the MODEL statement options ALL, CLM, DFMETHOD=EXACT, DIRECT, SELECT=, or T implicitly selects the TRACEL option.

SCORE Statement

SCORE < DATA= SAS-data-set > < ID=( variable list ) > < / options > ;

The fitted loess model is used to score the data in the specified SAS data set. This data set must contain all the regressor variables specified in the MODEL statement. Furthermore, when a BY statement is used, the score data set must also contain all the BY variables sorted in the order of the BY variables. A SCORE statement is optional, and more than one SCORE statement can be used. SCORE statements cannot be used if the DIRECT option is specified in the MODEL statement. The optional ID= ( variable list ) specifies ID variables to be included in the Score Results table.

You find the results of the SCORE statement in the Score Results table. This table contains all the data in the data set named in the SCORE statement, including observations with missing values. However, only those observations with nonmissing regressor variables are scored. If no data set is named in the SCORE statement, the data set named in the PROC LOESS statement is scored. You use the PRINT option in the SCORE statement to request that the Score Results table be displayed. You can place the Score Results table in an output data set using an ODS OUTPUT statement even if this table is not displayed.

The following options are available in the SCORE statement after a slash (/).

CLM

requests that 100(1 ˆ’ ± ) confidence limits on the mean predicted value be added to the Score Results table. By default the 95% limits are computed; the ALPHA= option in the MODEL statement can be used to change the ± -level. The use of this option implicitly selects the model option DFMETHOD=EXACT if the DFMETHOD= option has not been explicitly used.

PRINT < ( variables ) >

specifies that the Score Results table is to be displayed. By default only the variables named in the MODEL statement, the variables listed in the ID list in the SCORE statement, and the scored dependent variables are displayed. The optional list in the PRINT option specifies additional variables in the score data set that are to be included in the displayed output. Note however that all columns in the SCORE data set are placed in the SCORE results table, even if you do not request that they be included in the displayed output.

SCALEDINDEP

specifies that scaled regressor coordinates be included in the SCORE results table. This option is ignored if the SCALE= model option is not used or if SCALE=NONE is specified.

STEPS

requests that all models evaluated during smoothing parameter value selection be scored, provided that the SELECT= option together with the STEPS modifier is specified in the MODEL statement. By default only the selected model is scored.

WEIGHT Statement

WEIGHT variable ;

The WEIGHT statement specifies a variable in the input data set that contains values to be used as a priori weights for a loess fit.

The values of the weight variable must be nonnegative. If an observation s weight is zero, negative, or missing, the observation is deleted from the analysis.