Details | SAS/STAT 9.1 Users Guide, Volumes 1-7

Model Statement Usage

MODEL < transform(dependents < / t-options > )
- transform(dependents < / t-options > ) > = >
- transform(independents < / t-options > )
- transform(independents < / t-options > ) >< / a-options > ;

Here are some examples of model statements:

linear regression
```
  model identity(y) = identity(x);  
```
a linear model with a nonlinear regression function
```
  model identity(y) = spline(x / nknots=5);  
```

multiple regression

  model identity(y) = identity(x1-x5);

multiple regression with nonlinear transformations

  model spline(y / nknots=3) = spline(x1-x5 / nknots=3);

multiple regression with nonlinear but monotone transformations

  model mspline(y / nknots=3) = mspline(x1-x5 / nknots=3);

multivariate multiple regression

  model identity(y1-y4) = identity(x1-x5);

canonical correlation

  model identity(y1-y4) = identity(x1-x5) / method=canals;

redundancy analysis

  model identity(y1-y4) = identity(x1-x5) / method=redundancy;

preference mapping, vector model (Carroll 1972)

  model identity(Attrib1-Attrib3) = identity(Dim1-Dim2);

preference mapping, ideal point model (Carroll 1972)

  model identity(Attrib1-Attrib3) = point(Dim1-Dim2);

preference mapping, ideal point model, elliptical (Carroll 1972)
```
  model identity(Attrib1-Attrib3) = epoint(Dim1-Dim2);  
```
preference mapping, ideal point model, quadratic (Carroll 1972)
```
  model identity(Attrib1-Attrib3) = qpoint(Dim1-Dim2);  
```

metric conjoint analysis

  model identity(Subj1-Subj50) = class(abcdef/zero=sum);

nonmetric conjoint analysis

  model monotone(Subj1-Subj50) = class(abcdef/zero=sum);

main effects, two-way interaction
```
  model identity(y) = class(ab);  
```
less-than -full-rank model ”main effects and two-way interaction are constrained to sum to zero
```
  model identity(y) = class(ab / zero=sum);  
```
main effects and all two-way interactions
```
  model identity(y) = class(abc@2);  
```
main effects and all two- and three-way interactions
```
  model identity(y) = class(abc);  
```

main effects and just B*C two-way interaction

  model identity(y) = class(a b c b*c);

seven main effects, three two-way interactions

  model identity(y) = class(abcdefga*ba*ca*d);

deviations-from-means (effects or (1, 0, ˆ’ 1)) coding, with an A reference level of 1 and a B reference level of 2
```
  model identity(y) = class(ab / deviations zero='1' '2');  
```

cell -means coding (implicit intercept)

  model identity(y) = class(a*b / zero=none);

reference cell model

  model identity(y) = class(ab / zero='1' '1');

reference line with change in line parameters

  model identity(y) = class(a)  identity(x);

reference curve with change in curve parameters

  model identity(y) = class(a)  spline(x);

separate curves and intercepts

  model identity(y) = class(a / zero=none)  spline(x);

quantitative effects with interaction

  model identity(y) = identity(x1  x2);

separate quantitative effects with interaction within each cell

  model identity(y) = class(a*b/zero=none)  identity(x1  x2);

Box-Cox Transformations

The Box-Cox (1964) transformation has the form

This family of transformations of the positive dependent variable y is controlled by the parameter » . Transformations linearly related to square root, inverse, quadratic, cubic, and so on are all special cases. The limit as » approaches 0 is the log transformation. More generally , Box-Cox transformations of the following form can be fit:

By default, c = 0. The parameter c can be used to rescale y so that it is strictly positive. By default, g = 1. Alternatively, g can be ^» ^{ˆ’ 1} where is the geometric mean of y .

The BOXCOX transformation in PROC TRANSREG can be used to perform a BoxCox transformation of the dependent variable. You can specify a list of power parameters using the LAMBDA= transformation option. By default, LAMBDA=-3 TO 3 BY 0.25. The procedure chooses the optimal power parameter using a maximum likelihood criterion (Draper and Smith 1981, pp. 225-226). You can specify the PARAMETER= c transformation option when you want to shift the values of y , usually to avoid negatives . To divide by ^{» ˆ’ 1} , specify the GEOMETRICMEAN transformation option.

Here are some examples of usage of the LAMBDA= option:

  model BoxCox(y / lambda=0) = identity(x1-x5);   model BoxCox(y / lambda=-2 to 2 by 0.1) = identity(x1-x5);   model BoxCox(y) = identity(x1-x5);

In the first example

  model BoxCox(y / lambda=0) = identity(x1-x5);

LAMBDA=0 specifies a Box-Cox transformation with a power parameter of 0. Since a single value of 0 was specified for LAMBDA=, there is no difference between the following models:

  model BoxCox(y / lambda=0) = identity(x1-x5);   model log(y) = identity(x1-x5);

In the second example

  model BoxCox(y / lambda=-2 to 2 by 0.1) = identity(x1-x5);

there is a list of power parameters specified. This tells PROC TRANSREG to find a Box-Cox transformation before the usual iterations begin. PROC TRANSREG tries each power parameter in the list and picks the best transformation. A maximum likelihood approach (Draper and Smith 1981, pp. 225-226) is used. Note that this is quite different from TRANSREG s usual approach of iteratively finding optimal transformations. It is analogous to SMOOTH, RANK, and the other nonoptimal transformations that are performed before the iterations begin.

In the third example

  model BoxCox(y) = identity(x1-x5);

the default list of -3 TO 3 BY 0.25 is used.

The procedure prints the optimal power parameter, a confidence interval on the power parameter (using the ALPHA= transformation option), a convenient power parameter (selected from the CLL= option list), and the log likelihood for each power parameter tried (see Example 75.6).

Smoothing Splines

You can use PROC TRANSREG to output to a SAS data set the same smoothing splines that the GPLOT procedure creates. The SMOOTH transformation is a noniterative transformation for smoothing splines. The smoothing parameter can be specified with either the SM= or the PARAMETER= o-option . The independent variable transformation ( Tx in this case) contains the results. The GPLOT request y * x =2 with I=SM50 creates the same curve as Tx * x .

  title 'Smoothing Splines';   data x;   do x = 1 to 100 by 2;   do rep = 1 to 3;   y = log(x) + sin(x / 10) + normal(7);   output;   end;   end;   run;   proc transreg;   model identity(y) = smooth(x / sm=50);   output;   run;   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;   proc gplot;   axis1 minor=none label=(angle=90 rotate=0);   axis2 minor=none;   plot y*x=1 y*x=2 tx*x=3 / &opts overlay;   symbol1 color=blue   v=star i=none;   symbol2 color=yellow v=none i=sm50;   symbol3 color=cyan   v=dot  i=none;   run; quit;

Figure 75.5: Smoothing Spline Example 1

When you cross a SMOOTH variable with a CLASS variable, specify ZERO=NONE with the CLASS expansion and the AFTER t-option with the SMOOTH transformation so that separate functions are found within each group .

  title2 'Two Groups';   data x;   do x = 1 to 100;   group = 1;   do rep = 1 to 3;   y = log(x) + sin(x / 10) + normal(7);   output;   end;   group = 2;   do rep = 1 to 3;   y = -log(x) + cos(x / 10) + normal(7);   output;   end;   end;   run;   proc transreg;   model identity(y) = class(group / zero=none)   smooth(x / after sm=50);   output out=curves;   run;   data curves2;   set curves;   if group1 = 0 then tgroup1x = .;   if group2 = 0 then tgroup2x = .;   run;   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;   proc gplot;   axis1 minor=none label=(angle=90 rotate=0);   axis2 minor=none;   plot y*x=1 tgroup1x*x=2 tgroup2x*x=2 / &opts overlay;   symbol1 color=blue   v=star i=none;   symbol2 color=yellow v=none i=join;   run; quit;

Figure 75.6: Smoothing Spline Example 2

The SMOOTH transformation is valid only with independent variables; typically, it is used in models with a single independent and a single dependent variable. When there are multiple independent variables designated as SMOOTH, the TRANSREG procedure tries to smooth the i th independent variable using the i th dependent variable as a target. When there are more independent variables than dependent variables, the last dependent variable is reused as often as is necessary. For example, for the model

  model identity(y1-y3) = smooth(x1-x5);

smoothing is based on the pairs ( y1 , x1 ), ( y2 , x2 ), ( y3 , x3 ), ( y3 , x4 ), and ( y3 , x5 ).

The SMOOTH transformation is a noniterative transformation; smoothing occurs once per variable before the iterations begin. In contrast, SSPLINE provides an iterative smoothing spline transformation. It does not generally minimize squared error; hence, divergence is possible with SSPLINE.

Missing Values

PROC TRANSREG can estimate missing values, with or without category or monotonicity constraints, so that the regression model fit is optimized. Several approaches to missing data handling are provided. All observations with missing values in IDENTITY, CLASS, POINT, EPOINT, QPOINT, SMOOTH, PSPLINE, and BSPLINE variables are excluded from the analysis. When METHOD=UNIVARIATE (specified in the PROC TRANSREG or MODEL statement), observations with missing values in any of the independent variables are excluded from the analysis. When you specify the NOMISS a-option , observations with missing values in the other analysis variables are excluded. Otherwise , missing data are estimated, using variable means as initial estimates.

You can specify the LINEAR, OPSCORE, MONOTONE, UNTIE, SPLINE, MSPLINE, SSPLINE, LOG, LOGIT, POWER, ARSIN, BOXCOX, RANK, and EXP transformations in any combination with nonmissing values, ordinary missing values, and special missing values, as long as the nonmissing values in each variable have positive variance. No category or order restrictions are placed on the estimates of ordinary missing values. You can force missing value estimates within a variable to be identical by using special missing values (refer to DATA Step Processing in SAS Language Reference: Concepts ). You can specify up to 27 categories of missing values, in which within-category estimates must be the same, by coding the missing values using ._ and .A through .Z.

You can also specify an ordering of some missing value estimates. You can use the MONOTONE= a-option in the PROC TRANSREG or MODEL statement to indicate a range of special missing values (a subset of the list from .A to .Z) with estimates that must be weakly ordered within each variable in which they appear. For example, if MONOTONE=AI, the nine classes, .A, .B, , .I, are monotonically scored and optimally scaled just as MONOTONE transformation values are scored. In this case, category but not order restrictions are placed on the missing values ._ and .J through .Z. You can also use the UNTIE= a-option (in the PROC TRANSREG or MODEL statement) to indicate a range of special missing values with estimates that must be weakly ordered within each variable in which they appear but can be untied.

The missing value estimation facilities allow for partitioned or mixed-type variables. For example, a variable can be considered part nominal and part ordinal. Nominal classes of otherwise ordinal variables are coded with special missing values. This feature can be useful with survey research. The class unfamiliar with the product in the variable Rate your preference for Brand X on a 1 to 9 scale, or if you are unfamiliar with the product, check unfamiliar with the product is an example. You can code unfamiliar with the product as a special missing value, such as .A. The 1s to 9s can be monotonically transformed, while no monotonic restrictions are placed on the quantification of the unfamiliar with the product class.

A variable specified for a LINEAR transformation, with special missing values and ordered categorical missing values, can be part interval, part ordinal, and part nominal. A variable specified for a MONOTONE transformation can have two independent ordinal parts . A variable specified for an UNTIE transformation can have an

ordered categorical part and an ordered part without category restrictions. Many other mixes are possible.

Missing Values, UNTIE, and Hypothesis Tests

The TRANSREG procedure has the ability to estimate missing data and monotonically transform variables while untying tied values. Estimates of ordinary missing values (.) may all be different. Analyses with UNTIE transformations, the UNTIE= a-option , and ordinary missing data estimation are all prone to degeneracy problems. Consider the following example. A perfect fit is found by collapsing all observations except the one with two missing values into a single value in Y and X1 .

  data x;   input y x1 x2 @@;   datalines;   1 3 7    8 3 9    1 8 6    . . 9    3 3 9   8 5 1    6 7 3    2 7 2    1 8 2    . 9 1   ;   proc transreg dummy;   model linear(y) = linear(x1 x2);   output;   run;   proc print;   run;

  Obs  _TYPE_  _NAME_  y     Ty        Intercept  x1  x2   TIntercept     Tx1    Tx2   1  SCORE   ROW1    1   2.7680          1       3   7        1        5.1233   7   2  SCORE   ROW2    8   2.7680          1       3   9        1        5.1233   9   3  SCORE   ROW3    1   2.7680          1       8   6        1        5.1233   6   4  SCORE   ROW4    .  12.5878          1       .   9        1       12.7791   9   5  SCORE   ROW5    3   2.7680          1       3   9        1        5.1233   9   6  SCORE   ROW6    8   2.7680          1       5   1        1        5.1233   1   7  SCORE   ROW7    6   2.7680          1       7   3        1        5.1233   3   8  SCORE   ROW8    2   2.7680          1       7   2        1        5.1233   2   9  SCORE   ROW9    1   2.7680          1       8   2        1        5.1233   2   10  SCORE   ROW10   .   2.7680          1       9   1        1        5.1233   1

Figure 75.7: Missing Values Example

Generally, the use of ordinary missing data estimation, the UNTIE transformation, and the UNTIE= a-option should be avoided, particularly with hypothesis tests. With these options, parameters are estimated based on only a single observation, and they can exert tremendous influence over the results. Each of these parameters has one model degree of freedom associated with it, so small or zero error degrees of freedom can also be a problem.

Controlling the Number of Iterations

Several a-options in the PROC TRANSREG or MODEL statement control the number of iterations performed. Iteration terminates when any one of the following conditions is satisfied:

The number of iterations equals the value of the MAXITER= a-option .
The average absolute change in variable scores from one iteration to the next is less than the value of the CONVERGE= a-option .
The criterion change is less than the value of the CCONVERGE= a-option .

You can specify negative values for either convergence option if you wish to define convergence only in terms of the other option. The criterion change can become negative when the data have converged so that it is numerically impossible , within machine precision, to increase the criterion. Usually, a negative criterion change is the result of very small amounts of rounding error since the algorithms are (usually) convergent. However, there are other cases where a negative criterion change is a sign of divergence, which is not necessarily an error. When you specify an SSPLINE transformation or the REITERATE or DUMMY a-option , divergence may be perfectly normal.

When there are no monotonicity constraints and there is only one canonical variable in each set, PROC TRANSREG (with the DUMMY a-option ) can usually find the optimal solution in only one iteration. (There are no monotonicity constraints when the MONOTONE, MSPLINE, or UNTIE transformations and the UNTIE= and MONOTONE= a-options are not specified. There is only one canonical variable in each set when METHOD=MORALS or METHOD=UNIVARIATE, or when METHOD=REDUNDANCY with only one dependent variable, or when METHOD=CANALS and NCAN=1.)

The initialization iteration is number 0. When there are no monotonicity constraints and there is only one canonical variable in each set, the next iteration shows no change and iteration stops. At least two iterations (0 and 1) are performed with the DUMMY a-option even if nothing changes in iteration 0. The MONOTONE, MSPLINE, and UNTIE variables are not transformed by the dummy variable initialization. Note that divergence with the DUMMY a-option , particularly in the second iteration, is not an error. The initialization iteration is slower and uses more memory than other iterations. However, for many models, specifying the DUMMY a-option can greatly decrease the amount of time required to find the optimal transformations. Furthermore, by solving for the transformations directly instead of iteratively, PROC TRANSREG avoids certain nonoptimal solutions.

You can increase the number of iterations to ensure convergence by increasing the value of the MAXITER= a-option and decreasing the value of the CONVERGE= a-option . Since the average absolute change in standardized variable scores seldom decreases below 1E ˆ’ 11, you should not specify a value for the CONVERGE= a-option less than 1E ˆ’ 8 or 1E ˆ’ 10. Most of the data changes occur during the first few iterations, but the data can still change after 50 or even 100 iterations. You can try different combinations of values for the CONVERGE= and MAXITER= a-options to ensure convergence without extreme overiteration. If the data do not converge with the default specifications, try CONVERGE=1E ˆ’ 8 and MAXITER=50, or CONVERGE=1E ˆ’ 10 and MAXITER=200. Note that you can specify the REITERATE a-option to start iterating where the previous analysis stopped .

Using the REITERATE Algorithm Option

You can use the REITERATE a-option to perform additional iterations when PROC TRANSREG stops before the data have adequately converged. For example, suppose that you execute the following code:

  proc transreg data=a;   model mspline(y) = mspline(x1-x5);   output out=b coefficients;   run;

If the transformations do not converge in the default 30 iterations, you can perform more iterations without repeating the first 30 iterations.

  proc transreg data=b reiterate;   model mspline(y) = mspline(x1-x5);   output out=b coefficients;   run;

Note that a WHERE statement is not necessary to exclude the coefficient observations. They are automatically excluded because their _TYPE_ value is not SCORE.

You can also use the REITERATE a-option to specify starting values other than the original values for the transformations. Providing alternate starting points may avoid local optima. Here are two examples.

  proc transreg data=a;   model rank(y) = rank(x1-x5);   output out=b;   run;   proc transreg data=b reiterate;   /* Use ranks as the starting point. */   model mspline(y) = mspline(x1-x5);   output out=c coefficients;   run;   data b;   set a;   array tx[6] ty tx1-tx5;   do j = 1 to 6;   tx[j] = normal(7);   end;   run;   proc transreg data=b reiterate;   /* Use a random starting point. */   model mspline(y) = mspline(x1-x5);   output out=c coefficients;   run;

Note that divergence with the REITERATE a-option , particularly in the second iteration, is not an error since the initial transformation is not required to be a valid member of the transformation family. When you specify the REITERATE a-option , the iteration does not terminate when the criterion change is negative during the first 10 iterations.

Avoiding Constant Transformations

There are times when the optimal scaling produces a constant transformed variable. This can happen with the MONOTONE, UNTIE, and MSPLINE transformations when the target is negatively correlated with the original input variable. It can happen with all transformations when the target is uncorrelated with the original input variable. When this happens, the procedure modifies the target to avoid a constant transformation. This strategy avoids certain nonoptimal solutions.

If the transformation is monotonic and a constant transformed variable results, the procedure multiplies the target by ˆ’ 1 and tries the optimal scaling again. If the transformation is not monotonic or if the multiplication by ˆ’ 1 did not help, the procedure tries using a random target. If the transformation is still constant, the previous non-constant transformation is retained. When a constant transformation is avoided by any strategy, a message is displayed: A constant transformation was avoided for name .

With extreme collinearity, small amounts of rounding error might interact with the instability of the coefficients to produce target vectors that are not positively correlated with the original scaling. If a regression coefficient for a variable is zero, the formula for the target for that variable contains a zero divide. In a multiple regression model, after many iterations, one independent variable can be scaled the same way as the current scaling of the dependent variable, so the other independent variables have coefficients of zero. When the constant transformation warning appears, you should interpret your results with extreme caution, and recheck your model.

Constant Variables

Constant and almost constant variables are zeroed and ignored. As long as the dependent variable is not constant, PROC TRANSREG produces an iteration history table for all models, not just models in which the variables can change. When constant variables are expected and should not be zeroed, specify the NOZEROCONSTANT option.

Character OPSCORE Variables

Character OPSCORE variables are replaced by a numeric variable containing category numbers before the iterations, and the character values are discarded. Only the first eight characters are considered when determining category membership. If you want the original character variable in the output data set, give it a different name in the OPSCORE specification (OPSCORE( x / name=( x2 )) and name the original variable on the ID statement (ID x ;).

Convergence and Degeneracies

When you specify the SSPLINE transformation, divergence is normal. The rest of this section assumes that you did not specify SSPLINE. For all the methods available in PROC TRANSREG, the algorithms are convergent, both in terms of the criterion being optimized and the parameters being estimated. The value of the criterion being maximized (squared multiple correlation, average squared multiple correlation, or average squared canonical correlation) can, theoretically, never decrease from one iteration to the next. The values of the parameters being solved for (the scores and weights of the transformed variables) become stable after sufficient iteration.

In practice, the criterion being maximized can decrease with overiteration. When the statistic has very nearly reached its maximum, further iterations might report a decrease in the criterion in the last few decimal places. This is a normal result of very small amounts of rounding error. By default, iteration terminates when this occurs because, by default, CCONVERGE=0.0. Specifying CCONVERGE= ˆ’ 1, an impossible change, turns off this check for convergence.

Even though the algorithms are convergent, they might not converge to a global optimum. Also, under extreme circumstances, the solution might degenerate. Because two points always form a straight line, the algorithms sometimes try to reach this degenerate optimum. This sometimes occurs when one observation is an ordinal outlier (when one observation has the extreme rank on all variables). The algorithm can reach an optimal solution that ties all other categories producing two points. Similar results can occur when there are many missing values. More generally, whenever there are very few constraints on the scoring of one or more points, degeneracies can be a problem. In a well-behaved analysis, the maximum data change, average data change, and criterion change all decrease at a rapid rate with each iteration. When the rate of change increases for several iterations, the solution might be degenerating.

Implicit and Explicit Intercepts

Depending on several options, the model intercept is nonzero, zero, or implicit, or there is no intercept. Ordinarily, the model contains an explicit nonzero intercept, and the Intercept variable in the OUT= data set contains ones. When TSTANDARD=CENTER or TSTANDARD=Z is specified, the model contains an explicit, zero intercept and the Intercept variable contains zeros. When METHOD=CANALS, the model is fit with centered variables and the Intercept variable is set to missing.

If you specify CLASS with ZERO=NONE or BSPLINE for one or more independent variables, and TSTANDARD=NOMISS or TSTANDARD=ORIGINAL (the default), an implicit intercept model is fit. The intercept is implicit in a set of the independent variables since there exists a set of independent variables the sum of which is a column of ones. All statistics are mean corrected. The implicit intercept is not an option; it is implied by the model.

With METHOD=CANALS, the Intercept variable contains the canonical intercept for canonical coefficients observations:

Passive Observations

Observations may be excluded from the analysis for several reasons; these include zero weight; zero frequency; missing values in variables designated IDENTITY, CLASS, POINT, EPOINT, QPOINT, SMOOTH, PSPLINE, or BSPLINE; and missing values with the NOMISS a-option specified. These observations are passive in that they do not contribute to determining transformations, R ² , sums of squares, degrees of freedom, and so on. However, some information can be computed for them. For example, if no independent variable values are missing, predicted values and redundancy variable values can both be computed. Residuals can be computed for observations with a nonmissing dependent and nonmissing predicted value. Canonical variables for dependent variables can be computed when no dependent variables are missing; canonical variables for independent variables can be computed when no independent variables are missing, and so on. Passive observations in the OUT= data set have a blank value for _TYPE_ .

Point Models

The expanded set of independent variables generated from the POINT, EPOINT, and QPOINT expansions can be used to perform ideal point regressions (Carroll 1972) and compute ideal point coordinates for plotting in a biplot (Gabriel 1981). The three types of ideal point coordinates can all be described as transformed coefficients. Assume that m independent variables are specified in one of the three point expansions. Let b ² be a 1 — m row vector of coefficients for these variables and one of the dependent variables. Let R be a matrix created from the coefficients of the extra variables. When coordinates are requested with the MPC, MEC, or MQC o-options , b ² and R are created from multiple regression coefficients. When coordinates are requested with the CPC, CEC, or CQC o-options , b ² and R are created from canonical coefficients.

If you specify the POINT expansion in the MODEL statement, R is an m — m identity matrix times the coefficient for the sums of squares (_ISSQ_) variable. If you specify the EPOINT expansion, R is an m — m diagonal matrix of coefficients from the squared variables. If you specify the QPOINT expansion, R is an m — m symmetric matrix of coefficients from the squared variables on the diagonal and crossproduct variables off the diagonal. The MPC, MEC, MQC, CPC, CEC, and CQC ideal point coordinates are defined as ˆ’ . 5 b ² R ^{ˆ’ 1} . When R is singular, the ideal point coordinates are infinitely far away and are set to missing, so you should try a simpler version of the model. The version that is simpler than the POINT model is the vector model where no extra variables are created. In the vector model, designate all independent variables as IDENTITY. Then draw vectors from the origin to the COEFFICIENTS points.

Typically, when you request ideal point coordinates, the MODEL statement should consist of a single transformation for the dependent variables (usually IDENTITY, MONOTONE, or MSPLINE) and a single expansion for the independent variables (one of POINT, EPOINT, or QPOINT).

Redundancy Analysis

Redundancy analysis (Stewart and Love 1968) is a principal component analysis of multivariate regression predicted values. These first steps show the redundancy analysis results produced by PROC TRANSREG. The specification TSTANDARD=Z standardizes all variables to mean zero and variance one. METHOD=REDUNDANCY specifies redundancy analysis and outputs the redundancy variables to the OUT= data set. The MREDUNDANCY o-option outputs two sets of redundancy analysis coefficients to the OUT= data set.

  title 'Redundancy Analysis';   data x;   input y1-y3 x1-x4;   datalines;   6  8  8 15 18 26 27   1 12 16 18  9 20  8   5  6 15 20 17 29 31   6  9 15 14 10 16 22   7  5 12 14  6 13  9   3  6  7  2 14 26 22   3  5  9 13 18 10 22   6  3 11  3 15 22 29   6  3  7 10 20 21 27   7  5  9  8 10 12 18   ;   proc transreg data=x tstandard=z method=redundancy;   model identity(y1-y3) = identity(x1-x4);   output out=red mredundancy replace;   run;   proc print data=red(drop=Intercept);   format _numeric_ 4.1;   run;

  Redundancy Analysis   Obs _TYPE_   _NAME_   y1   y2   y3   x1   x2   x3   x4 Red1 Red2 Red3   1 SCORE    ROW1    0.5  0.6   0.8  0.6  0.9  1.0  0.7  0.2   0.5   0.9   2 SCORE    ROW2   2.0  2.1  1.5  1.1   1.0  0.1   1.7  1.6   1.5  0.4   3 SCORE    ROW3    0.0 -0.1  1.2  1.4  0.7  1.5  1.2  1.0  0.8   1.3   4 SCORE    ROW4    0.5  1.0  1.2  0.4   0.8   0.5  0.1  0.5  1.7  0.1   5 SCORE    ROW5    1.0 -0.4  0.3  0.4   1.6   1.0   1.6  1.0  0.1  0.9   6 SCORE    ROW6   1.0 -0.1   1.1   1.6  0.1  1.0  0.1 -0.8   0.9  1.4   7 SCORE    ROW7   1.0 -0.4   0.6  0.2  0.9   1.5  0.1 -1.0   0.4   1.3   8 SCORE    ROW8    0.5 -1.2  0.0   1.5  0.3  0.4  1.0 -1.2  0.8  0.7   9 SCORE    ROW9    0.5 -1.2   1.1   0.3  1.3  0.2  0.7 -1.0   0.9   0.8   10 SCORE    ROW10   1.0 -0.4 -0.6   0.6   0.8   1.1   0.4   0.4  0.8  0.7   11 M REDUND Red1     .    .    .   0.7   0.6  0.4   0.1   .    .    .   12 M REDUND Red2     .    .    .   0.3   1.5   0.6  1.9   .    .    .   13 M REDUND Red3     .    .    .   0.7   0.7  0.3   0.3   .    .    .   14 R REDUND x1       .    .    .    .    .    .    .   0.8   0.0   0.6   15 R REDUND x2       .    .    .    .    .    .    .   0.6   0.2   0.7   16 R REDUND x3       .    .    .    .    .    .    .   0.1   0.2   0.1   17 R REDUND x4       .    .    .    .    .    .    .   0.5  0.3   0.5

Figure 75.8: Redundancy Analysis Example

The _TYPE_ = SCORE observations of the Red1 _ Red3 variables contain the redundancy variables. The nonmissing M REDUND values are coefficients for predicting the redundancy variables from the independent variables. The nonmissing R REDUND values are coefficients for predicting the independent variables from the redundancy variables.

These following steps show how to generate the same results manually. The data set is standardized, predicted values are computed, and principal components of the predicted values are computed. The following statements produce the redundancy variables, shown in Figure 75.9:

  proc standard data=x out=std m=0 s=1;   title2 'Manually Generate Redundancy Variables';   run;   proc reg noprint data=std;   model y1-y3 = x1-x4;   output out=p p=ay1-ay3;   run; quit;   proc princomp data=p cov noprint std out=p;   var ay1-ay3;   run;   proc print data=p(keep=Prin:);   format _numeric_ 4.1;   run;

  Redundancy Analysis   Manually Generate Redundancy Variables   Obs    Prin1    Prin2    Prin3   1     0.2   0.5   0.9   2     1.6   1.5      0.4   3     1.0      0.8   1.3   4     0.5      1.7      0.1   5     1.0      0.1      0.9   6   0.8   0.9      1.4   7   1.0   0.4   1.3   8   1.2      0.8      0.7   9   1.0   0.9   0.8   10   0.4      0.8      0.7

Figure 75.9: Redundancy Analysis Example

The following statements produce the coefficients for predicting the redundancy variables from the independent variables, shown in Figure 75.10:

  proc reg data=p outest=redcoef noprint;   title2 'Manually Create Redundancy Coefficients';   model Prin1-Prin3 = x1-x4;   run; quit;   proc print data=redcoef(keep=x1-x4);   format _numeric_ 4.1;   run;

  Redundancy Analysis   Manually Create Redundancy Coefficients   Obs      x1      x2      x3      x4   1      0.7   0.6     0.4   0.1   2      0.3   1.5   0.6     1.9   3   0.7   0.7     0.3   0.3

Figure 75.10: Redundancy Analysis Example

The following statements produce the coefficients for predicting the independent variables from the redundancy variables, shown in Figure 75.11:

  proc reg data=p outest=redcoef2 noprint;   title2 'Manually Create Other Coefficients';   model x1-x4 = prin1-prin3;   run; quit;   proc print data=redcoef2(keep=Prin1-Prin3);   format _numeric_ 4.1;   run;

  Redundancy Analysis   Manually Create Other Coefficients   Obs    Prin1    Prin2    Prin3   1      0.8   0.0   0.6   2   0.6   0.2   0.7   3      0.1   0.2   0.1   4   0.5      0.3   0.5

Figure 75.11: Redundancy Analysis Example

Optimal Scaling

An alternating least-squares optimal scaling algorithm can be divided into two major stages. The first stage estimates the parameters of the linear model. These parameters are used to create the predicted values or target for each variable that can be transformed. Each target minimizes squared error (as explained in the discussion of the algorithms in SAS Technical Report R-108 . The definition of the target depends on many factors, such as whether a variable is independent or dependent, which algorithm is used (for example, regression, redundancy, CANALS, principal components), and so on. The definition of the target is independent of the transformation family you specify for the variable. However, the target values for a variable typically do not fit the prescribed transformation family for the variable. They might not have the right category structure; they might not have the right order; they might not be a linear combination of the columns of a B-spline basis; and so on.

The second major stage is optimal scaling. Optimal scaling can be defined as a possibly constrained, least-squares regression problem. When you specify an optimal transformation, or when missing data are estimated for any variable, the full representation of the variable is not simply a vector; it is a matrix with more than one column. The optimal scaling phase finds the vector that is a linear combination of the columns of this matrix, that is closest to the target (in terms of minimum squared error), among those that do not violate any of the constraints imposed by the transformation family. Optimal scaling methods are independent of the data analysis method that generated the target. In all cases, optimal scaling can be accomplished by creating a design matrix based on the original scaling of the variable and the transformation family specified for that variable. The optimally scaled variable is a linear combination of the columns of the design matrix. The coefficients of the linear combination are found using (possibly constrained) least squares. Many optimal scaling problems are solved without actually constructing design and projection matrices. The following two sections describe the algorithms used by PROC TRANSREG for optimal scaling. The first section discusses optimal scaling for OPSCORE, MONOTONE, UNTIE, and LINEAR transformations, including how missing values are handled. The second section addresses SPLINE and MSPLINE transformations.

OPSCORE, MONOTONE, UNTIE, and LINEAR Transformations

Two vectors of information are needed to produce the optimally scaled variable: the initial variable scaling vector x and the target vector y . For convenience, both vectors are first sorted on the values of the initial scaling vector. If you request an UNTIE transformation, the target vector is sorted within ties in the initial scaling vector. The normal SAS System collating sequence for missing and nonmissing values is used. Sorting simply allows constraints to be specified in terms of relations among adjoining coefficients. The sorting process partitions x and y into missing and nonmissing parts ( x ² _m x ² _n ) ² , and ( y ² _m y ² _n ) ² .

Next, PROC TRANSREG determines category membership. Every ordinary missing value (.) forms a separate category. (Three ordinary missing values form three categories.) Every special missing value within the range specified in the UNTIE= a-option forms a separate category. (If UNTIE= BC and there are three .B and two .C missing values, five categories are formed from them.) For all other special missing values, a separate category is formed for each different value. (If there are four .A missing values, one category is formed from them.)

x:	(. . .A .A .B 1 1 1 2 2 3 3 3 4)'
y:	(5 6 2 4 2 1 2 3 4 6 4 5 6 7)'
OPSCORE and
MONOTONE means:	(5 6 3 2 2 5 5 7)'
other means:	(5 6 3 2 1 2 3 4 6 4 5 6 7)'

Each distinct nonmissing value forms a separate category for OPSCORE and MONOTONE transformations (1 1 1 2 2 3 form three categories). Each nonmissing datum forms a separate category for all other transformations (1 1 1 2 2 3 form six categories). Once category membership is determined, category means are computed. Here is an example:

The category means are the coefficients of a category indicator design matrix. The category means are the Fisher (1938) optimal scores. For MONOTONE and UNTIE transformations, order constraints are imposed on the category means for the nonmissing partition by merging categories that are out of order. The algorithm checks upward until an order violation is found, then averages downward until the order violation is averaged away. (The average of x ₁ computed from n ₁ observations and x ₂ computed from n ₂ observations is ( n ₁ x ₁ + n ₂ x ₂ ) / ( n ₁ + n ₂ ).) The MONOTONE algorithm (Kruskal 1964, secondary approach to ties) for this example with means for the nonmissing values (2 5 5 7) ² would do the following checks: 2 < 5:OK, 5=5:OK, 5 < 7:OK. The means are in the proper order, so no work is needed.

The UNTIE transformation (Kruskal 1964, primary approach to ties) uses the same algorithm on the means of the nonmissing values (1 2 3 4 6 4 5 6 7) ² but with different results for this example: 1 < 2:OK, 2 < 3:OK, 3 < 4:OK, 4 < 6:OK, 6 > 4:average 6 and 4 and replace 6 and 4 by the average. The new means of the nonmissing values are (1 2 3 4 5 5 5 6 7) ² . The check resumes: 4 < 5:OK, 5 = 5:OK, 5 = 5:OK, 5 < 6:OK, 6 < 7:OK. If some of the special missing values are ordered, the upward checking, downward averaging method is applied to them also, independently of the other missing and nonmissing partitions. Once the means conform to any required category or order constraints, an optimally scaled vector is produced from the means. The following example results from a MONOTONE transformation.

x :	(. . .A .A .B 1 1 1 2 2 3 3 3 4)
y :	(5 6 2 4 2 1 2 3 4 6 4 5 6 7)
result:	(5 6 3 3 2 2 2 2 5 5 5 5 5 7)

The upward checking, downward averaging algorithm is equivalent to creating a category indicator design matrix, solving for least-squares coefficients with order constraints, then computing the linear combination of design matrix columns.

For the optimal transformation LINEAR and for nonoptimal transformations, missing values are handled as just described. The nonmissing target values are regressed onto the matrix defined by the nonmissing initial scaling values and an intercept. In this example, the target vector y _n = (1 2 3 4 6 4 5 6 7) ² is regressed onto the design matrix

Although only a linear transformation is performed, the effect of a linear regression optimal scaling is not eliminated by the later standardization step (unless the variable has no missing values). In the presence of missing values, the linear regression is necessary to minimize squared error.

SPLINE and MSPLINE Transformations

The missing portions of variables subjected to SPLINE or MSPLINE transformations are handled the same way as for OPSCORE, MONOTONE, UNTIE, and LINEAR transformations (see the previous section). The nonmissing partition is handled by first creating a B-spline basis of the specified degree with the specified knots for the nonmissing partition of the initial scaling vector and then regressing the target onto the basis. The optimally scaled vector is a linear combination of the B-spline basis vectors using least-squares regression coefficients. An algorithm for generating the B-spline basis is given in de Boor (1978, pp. 134 “135). B-splines are both a computationally accurate and efficient way of constructing a basis for piecewise polynomials ; however, they are not the most natural method of describing splines.

Consider an initial scaling vector x = (1 2 3 4 5 6 7 8 9) ² and a degree three spline with interior knots at 3.5 and 6.5. The B-spline basis for the transformation is the left matrix in Table 75.5, and the natural piecewise polynomial spline basis is the right matrix. The two matrices span the same column space. The natural basis has an intercept, a linear term, a quadratic term, a cubic term , and two more terms since there are two interior knots. These terms are generated (for knot k and x element x ) by the formula ( x ˆ’ k ) ³ — I ₍ _x>k ₎ . The indicator variable I ₍ _x>k ₎ evaluates to 1.0 if x is greater than k and to 0.0 otherwise. If knot k had been repeated, there would be a ( x ˆ’ k ) ² — I ₍ _x>k ₎ term also. Notice that the fifth column makes no contribution to the curve before 3.5, makes zero contribution at 3.5 (the transformation is continuous), and makes an increasing contribution beyond 3.5. The same pattern of results holds for the last term with knot 6.5. The coefficient of the fifth column represents the change in the cubic portion of the curve after 3.5. The coefficient of the sixth column represents the change in the cubic portion of the curve after 6.5.

Table 75.5: Spline Bases

The numbers in the B-spline basis do not have a simple interpretation like the numbers in the natural piecewise polynomial basis. The B-spline basis has a diagonally banded structure. The band shifts one column to the right after every knot. The number of entries in each row that may potentially be nonzero is one greater than the degree. The elements within a row always sum to one. The B-spline basis is accurate because of the smallness of the numbers and the lack of extreme collinearity inherent in the natural polynomials. B-splines are efficient because PROC TRANSREG can take advantage of the sparseness of the B-spline basis when it accumulates crossproducts. The number of required multiplications and additions to accumulate the crossproduct matrix does not increase with the number of knots but does increase with the degree of the spline, so it is much more computationally efficient to increase the number of knots than to increase the degree of the polynomial.

MSPLINE transformations are handled like SPLINE transformations except that constraints are placed on the coefficients to ensure monotonicity. When the coefficients of the B-spline basis are monotonically increasing, the transformation is monotonically increasing. When the polynomial degree is two or less, monotone coefficient splines, integrated splines (Winsberg and Ramsay 1980), and the general class of all monotone splines are equivalent.

Specifying the Number of Knots

Keep the number of knots small (usually less than ten, although you can specify more). A degree three spline with nine knots, one at each decile, can closely follow a large variety of curves. Each spline transformation of degree p with q knots fits a model with p + q parameters. The total number of parameters should be much less than the number of observations. Usually in regression analyses, it is recommended that there be at least five or ten observations for each parameter in order to get stable results. For example, when spline transformations of degree three with nine knots are requested for six variables, the number of observations in the data set should be at least five or ten times 72 (since 6 — (3 + 9) is the total number of parameters). The overall model can also have a parameter for the intercept and one or more parameters for each nonspline variable in the model.

Increasing the number of knots gives the spline more freedom to bend and follow the data. Increasing the degree also gives the spline more freedom, but to a lesser extent. Specifying a large number of knots is much better than increasing the degree beyond three.

When you specify NKNOTS= q for a variable with n observations, then each of the q +1segments of the spline contains n/ ( q +1)observations on the average. When you specify KNOTS=number-list, make sure that there is a reasonable number of observations in each interval.

The following statements find a cubic polynomial transformation of X and no transformation of Y :

  proc transreg;   model identity(Y)=spline(X);   output;   run;

The following statements find a cubic spline transformation curve for X that consists of the weighted sum of a single constant, a single straight line, a quadratic curve for the portion of the variable less than 3.0, a different quadratic curve for the portion greater than 3.0 (since the 3.0 knot is repeated), and a different cubic curve for each of the intervals: (minimum to 1.5), (1.5 to 2.4), (2.4 to 3.0), (3.0 to 4.0), and (4.0 to maximum). The transformation is continuous everywhere, its first derivative is continuous everywhere, its second derivative is continuous everywhere except at 3.0, and its third derivative is continuous everywhere except at 1.5, 2.4, 3.0, and 4.0.

  proc transreg;   model identity(Y)=spline(X / knots=1.5 2.4 3.0 3.0 4.0);   output;   run;

The following statements find a quadratic spline transformation that consists of a polynomial X_t = b + b ₁ X + b ₂ X ² for the range ( X < 3.0) and a completely different polynomial X_t = b ₃ + b ₄ X + b ₅ X ² for the range ( X > 3.0). The two curves are not required to be continuous at 3.0.

  proc transreg;   model identity(y)=spline(x / knots=3 3 3 degree=2);   output;   run;

The following statements categorize Y into 10 intervals and find a step-function transformation. One aspect of this transformation family is unlike all other optimal transformation families. The initial scaling of the data does not fit the restrictions imposed by the transformation family. This is because the initial variable can be continuous, but a discrete step function transformation is sought. Zero degree spline variables are categorized before the first iteration.

  proc transreg;   model identity(Y)=spline(X / degree=0 nknots=9);   output;   run;

The following statements find a continuous, piecewise linear transformation of X :

  proc transreg;   model identity(Y)=spline(X / degree=1 nknots=8);   output;   run;

SPLINE, BSPLINE, and PSPLINE Comparisons

SPLINE is a transformation. It takes a variable as input and produces a transformed variable as output. Internally, with SPLINE, a B-spline basis is used to find the transformation, which is a linear combination of the columns of the B-spline basis. However, with SPLINE, the basis is not made available in any output.

BSPLINE is an expansion. It takes a variable as input and produces more than one variable as output. The output variables comprise the B-spline basis that is used internally by SPLINE.

PSPLINE is an expansion. It takes a variable as input and produces more than one variable as output. The difference between PSPLINE and BSPLINE is that PSPLINE produces a piecewise polynomial, whereas BSPLINE produces a B-spline. A matrix consisting of a piecewise polynomial basis and an intercept spans the same space as the B-spline matrix, but the basis vectors are quite different. The numbers in the piecewise polynomials can get quite large; the numbers in the B-spline basis range between 0 and 1. There are many more zeros in the B-spline basis.

Interchanging SPLINE, BSPLINE, and PSPLINE should have no effect on the fit of the overall model except for the fact that PSPLINE is much more prone to numerical problems. Similarly, interchanging a CLASS expansion and an OPSCORE transformation should have no effect on the fit of the overall model.

Hypothesis Tests

The TRANSREG procedure has a set of options for testing hypotheses in models with a single dependent variable. The TEST a-option produces an ANOVA table. It tests the null hypothesis that the vector of coefficients for all of the transformations is zero. The SS2 a-option produces a regression table with Type II tests of the contribution of each transformation to the overall model. In some cases, exact tests are provided; in other cases, the tests are approximate, liberal , or conservative.

For two reasons it is typically not appropriate to test hypotheses by using the output from PROC TRANSREG as input to other procedures such as the REG procedure. First, PROC REG has no way of determining how many degrees of freedom were used for each transformation. Second, the Type II sums of squares for the tests of the individual regression coefficients are not correct for the transformation regression model since PROC REG, as it evaluates the effect of each variable, cannot change the transformations of the other variables. PROC TRANSREG uses the correct degrees of freedom and sums of squares.

In an ordinary univariate linear model, there is one parameter for each independent variable, including the intercept. In the transformation regression model, many of the variables are used internally in the bases for the transformations. Each basis column has one parameter or scoring coefficient, and each linearly independent column has one model degree of freedom associated with it. Coefficients applied to transformed variables, model coefficients , do not enter into the degrees of freedom calculations. They are by-products of the standardizations and can be absorbed into the transformations by specifying the ADDITIVE a-option . The word parameter is reserved for model and scoring coefficients that have a degree of freedom associated with them.

For expansions, there is one model parameter for each variable created by the expansion (except for all missing CLASS columns and expansions that have an implicit intercept). Each IDENTITY variable has one model parameter. If there are m POINT variables, they expand to m + 1 variables and, hence, have m + 1 model parameters. For m EPOINT variables, there are 2 m model parameters. For m QPOINT variables, there are m ( m + 3) / 2 model parameters. If a variable with m categories is designated CLASS, there are m ˆ’ 1 parameters. For BSPLINE and PSPLINE variables of DEGREE= n with NKNOTS= k , there are n + k parameters. Note that one of the n + k + 1 BSPLINE columns and one of the m CLASS(variable / ZERO=NONE) columns are not counted due to the implicit intercept.

There are scoring parameters for missing values in nonexcluded observations. Each ordinary missing value (.) has one scoring parameter. Each different special missing value (._ and .A through .Z) within each variable has one scoring parameter. Missing values specified in the UNTIE= and MONOTONE= options follow the rules for UNTIE and MONOTONE transformations, which are described later in this chapter.

For all nonoptimal transformations (LOG, LOGIT, ARSIN, POWER, EXP, RANK, BOXCOX), there is one parameter per variable in addition to any missing value scoring parameters.

For SPLINE, OPSCORE, and LINEAR transformations, the number of scoring parameters is the number of basis columns that are used internally to find the transformations minus 1 for the intercept. The number of scoring parameters for SPLINE variables is the same as the number of model parameters for BSPLINE and PSPLINE variables. If DEGREE= n and NKNOTS= k , there are n + k scoring parameters. The number of scoring parameters for OPSCORE, SMOOTH, and SSPLINE variables is the same as the number of model parameters for CLASS variables. If there are m categories, there are m ˆ’ 1 scoring parameters. There is one parameter for each LINEAR variable. For SPLINE, OPSCORE, LINEAR, MONOTONE, UNTIE, and MSPLINE transformations, missing value scoring parameters are computed as described previously with the nonoptimal transformations.

The number of scoring parameters for MONOTONE, UNTIE, and MSPLINE transformations is less precise than for SPLINE, OPSCORE, and LINEAR transformations. One way of handling a MONOTONE transformation is to treat it as if it were the same as an OPSCORE transformation. If there are m categories, there are m ˆ’ 1 potential scoring parameters. However, there are typically fewer than m ˆ’ 1 unique parameter estimates since some of those m ˆ’ 1 scoring parameter estimates may be tied during the optimal scaling to impose the order constraints. Imposing ties on the scoring parameter estimates is equivalent to fitting a model with fewer parameters. So there are two available scoring parameter counts: m ˆ’ 1 and a smaller number that is determined during the analysis. Using m ˆ’ 1 as the model degrees of freedom for MONOTONE variables ( treating OPSCORE and MONOTONE transformations the same way) is conservative , since the MONOTONE scoring parameter estimates are more restricted than the OPSCORE scoring parameter estimates. Using the smaller count (the number of scoring parameter estimates that are different minus 1 for the intercept) in the model degrees of freedom is liberal , since the data and the model together are being used to determine the number of parameters. PROC TRANSREG reports tests using both liberal and conservative degrees of freedom to provide lower and upper bounds on the true p -values.

For the UNTIE transformation, the conservative scoring parameter count is the number of distinct observations, whereas the liberal scoring parameter count is the number of scoring parameter estimates that are different minus 1 for the intercept. Hence, when you specify UNTIE, conservative tests have zero error degrees of freedom unless there are replicated observations.

For MSPLINE variables of DEGREE= n and NKNOTS= k , the conservative scoring parameter count is n + k , whereas the liberal parameter count is the number of scoring parameter estimates that are different, minus 1 for the intercept. A liberal degrees of freedom of 1 does not necessarily imply a linear transformation. It just implies that n plus k minus the number of ties imposed equals 1. An example of a one degree-of-freedom nonlinear transformation is a two-piece linear transformation in which the slope of one piece is 0.

The number of scoring parameters is determined during each iteration. After the last iteration, enough information is available for the TEST a-option to produce an ANOVA table that reports the overall fit of the model. If you specify the SS2 a-option , further iterations are necessary to test the contribution of each transformation to the overall model.

The liberal tests do not compensate for over-parameterization. For example, requesting a spline transformation with k knots when a linear transformation will suffice results in liberal tests that are actually conservative because too many degrees of freedom are being used for the transformations. Use as few knots as possible to avoid this problem.

In ordinary multiple regression, an F test of the null hypothesis that the coefficient for variable x _j is zero can be constructed by comparing two linear models. One model is the full model with all parameters, and the other is a reduced model that has all parameters except the parameter for variable x _j . The difference between the model sum of squares for the full model and the model sum of squares for the reduced model is the Type II sum of squares for the test of the null hypothesis that the coefficient for variable x _j is 0. The numerator of the F test has one degree of freedom. The mean square error for the full model is the denominator of the F test of variable x _j . Note that the estimates of the coefficients for the two models are not usually the same. When variable x _j is removed, the coefficients for the other variables change to compensate for the removal of x _j . In a transformation regression model, the transformations of the other variables must be allowed to change and the numerator degrees of freedom are not always ones. It is not correct to simply let the model coefficients for the transformed variables change and apply the new model coefficients to the old transformations computed with the old scoring parameter estimates. In a transformation regression model, further iteration is needed to test each transformation because all the scoring parameter estimates for other variables must be allowed to change to test the effect of variable x _j . This can be quite time consuming for a large model if the DUMMY a-option cannot be used to solve directly for the transformations.

Output Data Set

The OUT= output data set can contain a great deal of information; however, in most cases, the output data set contains a small portion of the entire range of available information and is organized for direct input into the %PLOTIT macro or graphical or analysis procedures. For information on the %PLOTIT macro, see Appendix B, Using the %PLOTIT Macro.

Output Data Set Examples

The next section provides a complete list of the contents of the OUT= data set. However, before presenting complete details, this section provides three brief examples, illustrating some typical output data sets.

The first example shows the output data set from a two-way ANOVA model. The following statements produce Figure 75.12:

  title 'ANOVA Output Data Set Example';   data ReferenceCell;   input Y X1 $ X2 $;   datalines;   11 a  a   12 a  a   10 a  a   4 a  b   5 a  b   3 a  b   5 b  a   6 b  a   4 b  a   2 b  b   3 b  b   1 b  b   ;   *---Fit Reference Cell Two-Way ANOVA Model---;   proc transreg data=ReferenceCell;   model identity(Y) = class(X1  X2);   output coefficients replace predicted residuals;   run;   *---Print the Results---;   proc print;   run;   proc contents position;   ods select position;   run;

  ANOVA Output Data Set Example   X1a   Obs  _TYPE_     _NAME_    Y   PY   RY   Intercept   X1a   X2a   X2a   X1   X2   1  SCORE      ROW1     11   11    0       1       1.0    1      1   a    a   2  SCORE      ROW2     12   11    1       1       1.0    1      1   a    a   3  SCORE      ROW3     10   11   -1       1       1.0    1      1   a    a   4  SCORE      ROW4      4    4    0       1       1.0    0      0   a    b   5  SCORE      ROW5      5    4    1       1       1.0    0      0   a    b   6  SCORE      ROW6      3    4   -1       1       1.0    0      0   a    b   7  SCORE      ROW7      5    5    0       1       0.0    1      0   b    a   8  SCORE      ROW8      6    5    1       1       0.0    1      0   b    a   9  SCORE      ROW9      4    5   -1       1       0.0    1      0   b    a   10  SCORE      ROW10     2    2    0       1       0.0    0      0   b    b   11  SCORE      ROW11     3    2    1       1       0.0    0      0   b    b   12  SCORE      ROW12     1    2   -1       1       0.0    0      0   b    b   13  M COEFFI   Y         .    .    .       2       2.0    3      4   14  MEAN       Y         .    .    .       .       7.5    8     11   ANOVA Output Data Set Example   The CONTENTS Procedure   Variables in Creation Order   #    Variable     Type    Len    Label   1    _TYPE_       Char      8   2    _NAME_       Char     32   3    Y            Num       8   4    PY           Num       8    Y Predicted Values   5    RY           Num       8    Y Residuals   6    Intercept    Num       8    Intercept   7    X1a          Num       8    X1 a   8    X2a          Num       8    X2 a   9    X1aX2a       Num       8    X1 a * X2 a   10    X1           Char      8   11    X2           Char      8

Figure 75.12: ANOVA Example Output Data Set Contents

The _TYPE_ variable indicates observation type: score, multiple regression coefficient (parameter estimates), and marginal means. The _NAME_ variable contains the default observation labels, ROW1 , ROW2 , and so on, and contains the dependent variable name ( Y ) for the remaining observations. If you specify an ID statement, _NAME_ contains the values of the first ID variable for score observations. The Y variable is the dependent variable, PY contains the predicted values, RY contains the residuals, and the variables Intercept through X1aX2a contain the design matrix. The X1 and X2 variables are the original CLASS variables.

The next example shows the contents of the output data set from fitting a curve through a scatter plot.

  title 'Output Data Set for Curve Fitting Example';   data A;   do X=1to100;   Y = log(x) + sin(x / 10) + normal(7);   output;   end;   run;   proc transreg;   model identity(Y) = spline(X / nknots=9);   output predicted out=B;   run;   proc contents position;   ods select position;   run;

These statements produce Figure 75.13.

  Output Data Set for Curve Fitting Example   The CONTENTS Procedure   Variables in Creation Order   #    Variable      Type    Len    Label   1    _TYPE_        Char      8   2    _NAME_        Char     32   3    Y             Num       8   4    TY            Num       8    Y Transformation   5    PY            Num       8    Y Predicted Values   6    Intercept     Num       8    Intercept   7    X             Num       8   8    TIntercept    Num       8    Intercept Transformation   9    TX            Num       8    X Transformation

Figure 75.13: Predicted Values Example Output Data Set Contents

The OUT= data set contains _TYPE_ and _NAME_ variables. Since no coefficients or coordinates are requested, all observations are _TYPE_ = SCORE . The Y variable is the original dependent variable, TY is the transformed dependent variable, PY contains the predicted values, X is the original independent variable, and TX is the transformed independent variable. The data set also contains an Intercept and transformed intercept TIntercept variable. (In this case, the transformed intercept is the same as the intercept. However, if you specify the TSTANDARD= and ADDITIVE options, these are not always the same.)

The next example shows the results from specifying METHOD=MORALS when there is more than one dependent variable.

  title 'METHOD=MORALS Output Data Set Example';   data x;   input Y1 Y2 X1 $ X2 $;   datalines;   11 1 a a   10 4 b a   5 2 a b   5 9 b b   4 3 c c   3 6 b a   1 8 a b   ;   *---Fit Reference Cell Two-Way ANOVA Model---;   proc transreg data=x noprint dummy;   model spline(Y1 Y2) = opscore(X1 X2 / name=(N1 N2));   output coefficients predicted residuals;   id x1 x2;   run;   *---Print the Results---;   proc print;   run;   proc contents position;   ods select position;   run;

These statements produce Figure 75.14.

  METHOD=MORALS Output Data Set Example   Obs   _DEPVAR_    _TYPE_    _NAME_  _DEPEND_  T_DEPEND_  P_DEPEND_  R_DEPEND_   1  Spline(Y1)   SCORE       a        11       13.1600    11.1554    2.00464   2  Spline(Y1)   SCORE       b        10        6.1931     6.8835   0.69041   3  Spline(Y1)   SCORE       a         5        2.4467     4.7140   2.26724   4  Spline(Y1)   SCORE       b         5        2.4467     0.4421    2.00464   5  Spline(Y1)   SCORE       c         4        4.2076     4.2076    0.00000   6  Spline(Y1)   SCORE       b         3        5.5693     6.8835   1.31422   7  Spline(Y1)   SCORE       a         1        4.9766     4.7140    0.26261   8  Spline(Y1)   M COEFFI    Y1        .         .          .         .   9  Spline(Y2)   SCORE       a         1       -0.5303   0.5199   0.01043   10  Spline(Y2)   SCORE       b         4        5.5487     4.5689    0.97988   11  Spline(Y2)   SCORE       a         2        3.8940     4.5575   0.66347   12  Spline(Y2)   SCORE       b         9        9.6358     9.6462   0.01043   13  Spline(Y2)   SCORE       c         3        5.6210     5.6210    0.00000   14  Spline(Y2)   SCORE       b         6        3.5994     4.5689   0.96945   15  Spline(Y2)   SCORE       a         8        5.2314     4.5575    0.67390   16  Spline(Y2)   M COEFFI    Y2        .         .          .         .   Obs  Intercept    N1    N2    TIntercept       TN1         TN2      X1    X2   1      1         0     0       1.0000      0.06711   0.09384    a     a   2      1         1     0       1.0000      1.51978   0.09384    b     a   3      1         0     1       1.0000      0.06711     1.32038    a     b   4      1         1     1       1.0000      1.51978     1.32038    b     b   5      1         2     2       1.0000      0.23932     1.32038    c     c   6      1         1     0       1.0000      1.51978   0.09384    b     a   7      1         0     1       1.0000      0.06711     1.32038    a     b   8      .         .     .      10.9253   2.94071   4.55475    Y1    Y1   9      1         0     0       1.0000      0.03739   0.09384    a     a   10      1         1     0       1.0000      1.51395   0.09384    b     a   11      1         0     1       1.0000      0.03739     1.32038    a     b   12      1         1     1       1.0000      1.51395     1.32038    b     b   13      1         2     2       1.0000      0.34598     1.32038    c     c   14      1         1     0       1.0000      1.51395   0.09384    b     a   15      1         0     1       1.0000      0.03739     1.32038    a     b   16      .         .     .   0.3119      3.44636     3.59024    Y2    Y2   The CONTENTS Procedure   Variables in Creation Order   #    Variable      Type    Len    Label   1    _DEPVAR_      Char     42    Dependent Variable Transformation(Name)   2    _TYPE_        Char      8   3    _NAME_        Char     32   4    _DEPEND_      Num       8    Dependent Variable   5    T_DEPEND_     Num       8    Dependent Variable Transformation   6    P_DEPEND_     Num       8    Dependent Variable Predicted Values   7    R_DEPEND_     Num       8    Dependent Variable Residuals   8    Intercept     Num       8    Intercept   9    N1            Num       8   10    N2            Num       8   11    TIntercept    Num       8    Intercept Transformation   12    TN1           Num       8    N1 Transformation   13    TN2           Num       8    N2 Transformation   14    X1            Char      8   15    X2            Char      8

Figure 75.14: METHOD=MORALS Rolled Output Data Set

If you specify METHOD=MORALS with multiple dependent variables, PROC TRANSREG performs separate univariate analyses and stacks the results in the OUT= data set. For this example, the results of the first analysis are in the partition designated by _DEPVAR_ = Spline( Y1 ) and the results of the first analysis are in the partition designated by _DEPVAR_ = Spline( Y2 ) , which are the transformation and dependent variable names . Each partition has _TYPE_ = SCORE observations for the variables and a _TYPE_ = M COEFFI observation for the coefficients. In this example, an ID variable is specified, so the _NAME_ variable contains the formatted values of the first ID variable. Since both dependent variables have to go into the same column, the dependent variable is given a new name, _DEPEND_ . The dependent variable transformation is named T_DEPEND_ , the predicted values variable is named P_DEPEND_ , and the residuals variable is named R_DEPEND_ .

The independent variables are character OPSCORE variables. By default, PROC TRANSREG replaces character OPSCORE variables with category numbers and dis-cards the original character variables. To avoid this, the input variables are renamed from X1 and X2 to N1 and N2 and the original X1 and X2 are added to the data set as ID variables. The N1 and N2 variables contain the initial values for the OPSCORE transformations, and the TN1 and TN2 variables contain optimal scores. The data set also contains an Intercept and transformed intercept TIntercept variable. The regression coefficients are in the transformation columns, which also contain the variables to which they apply.

Output Data Set Contents

This section presents the various matrices that can result from PROC TRANSREG processing and that appear in the OUT= data set. The exact contents of an OUT= data set depends on many options.

Table 75.6: PROC TRANSREG OUT= Data Set Contents
_TYPE_	Contents	Options, Default Prefix
SCORE	dependent variables	DREPLACE not specified
SCORE	independent variables	IREPLACE not specified
SCORE	transformed dependent variables	default, TDPREFIX=T
SCORE	transformed independent variables	default, TIPREFIX=T
SCORE	predicted values	PREDICTED, PPREFIX=P
SCORE	residuals	RESIDUALS, RDPREFIX=R
SCORE	leverage	LEVERAGE, LEVERAGE=Leverage
SCORE	lower individual confidence limits	CLI, LILPREFIX=LIL, CILPREFIX=CIL
SCORE	upper individual confidence limits	CLI, LIUPREFIX=LIU, CIUPREFIX=CIU
SCORE	lower mean confidence limits	CLM, LMLPREFIX=LML, CMLPREFIX=CML
SCORE	upper mean confidence limits	CLM, LMUPREFIX=LMU, CMUPREFIX=CMU
SCORE	dependent canonical variables	CANONICAL, CDPREFIX=Cand
SCORE	independent canonical variables	CANONICAL, CIPREFIX=Cani
SCORE	redundancy variables	REDUNDANCY, RPREFIX=Red
SCORE	ID, CLASS, BSPLINE variables	ID, CLASS, BSPLINE,
SCORE	independent variables approximations	IAPPROXIMATIONS, IAPREFIX=A
M COEFFI	multiple regression coefficients	COEFFICIENTS, MRC
C COEFFI	canonical coefficients	COEFFICIENTS, CCC
MEAN	marginal means	COEFFICIENTS, MEANS
M REDUND	multiple redundancy coefficients	MREDUNDANCY
R REDUND	multiple redundancy coefficients	MREDUNDANCY
M POINT	point coordinates	COORDINATES or MPC, POINT
M EPOINT	elliptical point coordinates	COORDINATES or MEC, EPOINT
M QPOINT	quadratic point coordinates	COORDINATES or MQC, QPOINT
C POINT	canonical point coordinates	COORDINATES or CPC, POINT
C EPOINT	canonical elliptical point coordinates	COORDINATES or CEC, EPOINT
C QPOINT	canonical quadratic point coordinates	COORDINATES or CQC, QPOINT

The independent and dependent variables are created from the original input data. Several potential differences exist between these variables and the actual input data. An intercept variable can be added, new variables can be added for POINT, EPOINT, QPOINT, CLASS, IDENTITY, PSPLINE, and BSPLINE variables, and category numbers are substituted for character OPSCORE variables. These matrices are not always what is input to the first iteration. After the expanded data set is stored for inclusion in the output data set, several things happen to the data before they are input to the first iteration: column means are substituted for missing values; zero degree SPLINE and MSPLINE variables are transformed so that the iterative algorithms get step function data as input, which conform to the zero degree transformation family restrictions; and the nonoptimal transformations are performed.

Details for the UNIVARIATE Method

When you specify METHOD=UNIVARIATE (in the MODEL or PROC TRANSREG statement), PROC TRANSREG can perform several analyses, one for each dependent variable. While each dependent variable can be transformed, their independent variables are not transformed. The OUT= data set optionally contains all of the _TYPE_ = SCORE observations, optionally followed by coefficients or coordinates.

Details for the MORALS Method

When you specify METHOD=MORALS (in the MODEL or PROC TRANSREG statement), successive analyses are performed, one for each dependent variable. Each analysis transforms one dependent variable and the entire set of the independent variables. All information for the first dependent variable (scores then, optionally, coefficients) appear first. Then all information for the second dependent variable (scores then, optionally, coefficients) appear next. This arrangement is repeated for all dependent variables.

Details for the CANALS and REDUNDANCY Methods

For METHOD=CANALS and METHOD=REDUNDANCY (specified in either the MODEL or PROC TRANSREG statement), one analysis is performed that simultaneously transforms all dependent and independent variables. The OUT= data set optionally contains all of the _TYPE_ = SCORE observations, optionally followed by coefficients or coordinates.

Variable Names

As shown in the preceding examples, some variables in the output data set directly correspond to input variables and some are created. All original optimal and nonoptimal transformation variable names are unchanged.

The names of the POINT, QPOINT, and EPOINT expansion variables are also left unchanged, but new variables are created. When independent POINT variables are present, the sum-of-squares variable _ISSQ_ is added to the output data set. For each EPOINT and QPOINT variable, a new squared variable is created by appending _2 . For example, Dim1 and Dim2 are expanded into Dim1 , Dim2 , Dim1_2 , and Dim2_2 . In addition, for each pair of QPOINT variables, a new crossproduct variable is created by combining the two names, for example, Dim1Dim2 .

The names of the CLASS variables are constructed from original variable names and levels. Lengths are controlled by the CPREFIX= a-option . For example, when X1 and X2 both have values of a and b , CLASS( X1 X2 / ZERO=NONE) creates X1 main effect variable names X1a X1b , X2 main effect variable names X2a X2b ,and interaction variable names X1aX2a X1aX2b X1bX2a X1bX2b .

PROC TRANSREG then uses these variable names when creating the transformed, predicted, and residual variable names by affixing the relevant prefix and possibly dropping extra characters.

METHOD=MORALS Variable Names

When you specify METHOD=MORALS and only one dependent variable is present, the output data set is structured exactly as if METHOD=REDUNDANCY (see the section Details for the CANALS and REDUNDANCY Methods on page 4625). When more than one dependent variable is present, the dependent variables are output in the variable _DEPEND_ , transformed dependent variables are output in the variable T_DEPEND_ , predicted values are output in the variable P_DEPEND_ , and residuals are output in the variable R_DEPEND_ . You can partition the data set into BY groups, one per dependent variable, by referring to the character variable _DEPVAR_ , which contains the original dependent variable names and transformations.

Duplicate Variable Names

When the same name is generated from multiple variables in the OUT= data set, new names are created by appending 2 , 3 , or 4 , and so on, until a unique name is created. For 32-character names, the last character is replaced with a numeric suffix until a unique name is created. For example, if there are two output variables that otherwise would be named X , then X and X2 are created instead. If there are two output variables that otherwise would be named ThisIsAThirtyTwoCharacterVarName , then ThisIsAThirtyTwoCharacterVarName and ThisIsAThirtyTwoCharacterVarNam2 are created instead.

OUTTEST= Output Data Set

The OUTTEST= data set contains hypothesis test results. The OUTTEST= data set always contains ANOVA results. When you specify the SS2 a-option , regression tables are also output. When you specify the UTILITIES a-option , conjoint analysis part-worth utilities are also output. The OUTTEST= data set has the following variables:

_DEPVAR_	is a 42-character variable that contains the dependent variable transformation and name.
_TYPE_	is an 8-character variable that contains the table type. The first character is U for univariate or M for multivariate. The second character is blank. The third character is A for ANOVA, 2 for Type II sum of squares, or U for UTILITIES. The fourth character is blank. The fifth character is L for liberal tests, C for conservative tests, or U for the usual tests.
Title	is an 80-character variable that contains the table title.
Variable	is a 42-character variable that contains the independent variable transformations and names for regression tables and blanks for ANOVA tables.
Coefficient	contains the multiple regression coefficients for regression tables and underscore special missing values for ANOVA tables.
Statistic	is a 24-character variable that contains the names for statistics in other variables, such as Value .
Value	contains multivariate test statistics and all other information that does not fit in one of the other columns including R-Square, Dependent Mean, Adj R-Sq, and Coeff Var. Whenever Value is not an underscore special missing value, Statistic describes the contents of Value .
NumDF	contains numerator degrees of freedom for F tests.
DenDF	contains denominator degrees of freedom for F tests.
SSq	contains sums of squares. MeanSquare contains mean squares.
F	contains F statistics.
NumericP	contains the p -value for the F statistic, stored in a numeric variable.
P	is a 9-character variable that contains the formatted p -value for the F statistic, including the appropriate ~, <=, >=, or blank symbols.
LowerLimit	contains lower confidence limits on the parameter estimates.
UpperLimit	contains upper confidence limits on the parameter estimates.
StdError	contains standard errors. For SS2 and UTILITIES tables, standard errors are output for each coefficient with one degree of freedom.
Importance	contains the relative importance of each factor for UTILITIES tables.
Label	is a 256-character variable that contains variable labels.

There are several possible tables in the OUTTEST= data set corresponding to combinations of univariate and multivariate tests; ANOVA and regression results; and liberal, conservative, and the usual tests. Each table is composed of only a subset of the variables. Numeric variables contain underscore special missing values when they are not a column in a table. Ordinary missing values (.) appear in variables that are part of a table when a nonmissing value cannot be produced. For example, the F is missing for a test with zero degrees of freedom.

Computational Resources

This section provides information on the computational resources required to use PROC TRANSREG.

Let

n = number of observations
q = number of expanded independent variables
r = number of expanded dependent variables
k = maximum spline degree
p = maximum number of knots

More than 56( q + r ) plus the maximum of the data matrix size, the optimal scaling work space, and the covariance matrix size bytes of array space are required. The data matrix size is 8 n ( q + r ) bytes. The optimal scaling work space requires less than 8(6 n + ( p + k + 2)( p + k + 11)) bytes. The covariance matrix size is 4( q + r )( q + r + 1) bytes.
PROC TRANSREG tries to store the original and transformed data in memory. If there is not enough memory, a utility data set is used, potentially resulting in a large increase in execution time. The amount of memory for the preceding data formulas is an underestimate of the amount of memory needed to handle most problems. These formulas give the absolute minimum amount of memory required. If a utility data set is used, and if memory can be used with perfect efficiency, then roughly the amount of memory stated previously is needed. In reality, most problems require at least two or three times the minimum.
PROC TRANSREG sorts the data once. The sort time is roughly proportional to ( q + r ) n ³ / ² .
One regression analysis per iteration is required to compute model parameters (or two canonical correlation analyses per iteration for METHOD=CANALS). The time required for accumulating the crossproducts matrix is roughly proportional to n ( q + r ) ² . The time required to compute the regression coefficients is roughly proportional to q ³ .
Each optimal scaling is a multiple regression problem, although some transformations are handled with faster special-case algorithms. The number of regressors for the optimal scaling problems depends on the original values of the variable and the type of transformation. For each monotone spline transformation, an unknown number of multiple regressions is required to find a set of coefficients that satisfies the constraints. The B-spline basis is generated twice for each SPLINE and MSPLINE transformation for each iteration. The time required to generate the B-spline basis is roughly proportional to nk ² .

Solving Standard Least-Squares Problems

This section illustrates how to solve some ordinary least-squares problems and generalizations of those problems by formulating them as transformation regression problems. One problem involves finding linear and nonlinear regression functions in a scatter plot. The next problem involves simultaneously fitting two lines or curves through a scatter plot. The last problem involves finding the overall fit of a multi-way main-effects and interactions analysis-of-variance model.

Nonlinear Regression Functions

This example uses PROC TRANSREG in simple regression to find the optimal regression line, a nonlinear but monotone regression function, and a nonlinear nonmonotone regression function. A regression line can be found by specifying

  proc transreg;   model identity(y) = identity(x);   output predicted;   run;

A monotone regression function (in this case, a monotonically decreasing regression function, since the correlation coefficient is negative) can be found by requesting an MSPLINE transformation of the independent variable, as follows .

  proc transreg;   model identity(y) = mspline(x / nknots=9);   output predicted;   run;

The monotonicity restriction can be relaxed by requesting a SPLINE transformation of the independent variable, as shown next.

  proc transreg;   model identity(y) = spline(x / nknots=9);   output predicted;   run;

In this example, it is not useful to plot the transformation TX , since TX is just an intermediate result used in finding a regression function through the original X and Y scatter plot.

The following statements provide a specific example of using the TRANSREG procedure for fitting nonlinear regression functions. These statements produce Figure 75.15 through Figure 75.18.

  title 'Linear and Nonlinear Regression Functions';   *---Generate an Artificial Nonlinear Scatter Plot--- ;   *---SAS/IML Software is Required for this Example--- ;   proc iml;   N   = 500;   X   = (1:N)