PROC TRANSREG Statement


  • PROC TRANSREG < DATA= SAS-data-set >

  • < OUTTEST= SAS-data-set >< a-options

  • >< o-options > ;

The PROC TRANSREG statement starts the TRANSREG procedure. Optionally , this statement identifies an input and an OUTTEST= data set, specifies the algorithm and other computational details, requests displayed output, and controls the contents of the OUT= data set (which is created with the OUTPUT statement). The DATA= and OUTTEST= options can appear only in the PROC TRANSREG statement.

The following table summarizes options available in the PROC TRANSREG statement. All a-options and o-options are described in the sections on either the MODEL or OUTPUT statement, in which these options can also be specified.

DATA= SAS-data-set

  • specifies the SAS data set to be analyzed . If you do not specify the DATA= option, PROC TRANSREG uses the most recently created SAS data set. The data set must be an ordinary SAS data set; it cannot be a special TYPE= data set.

OUTTEST= SAS-data-set

  • specifies an output data set to contain hypothesis tests results. When you specify the OUTTEST= option, the data set contains ANOVA results. When you specify the SS2 a-option , regression tables are also output. When you specify the UTILITIES o-option , conjoint analysis part-worth utilities are also output. For more information on the OUTTEST= data set, see the 'OUTTEST= Output Data Set' section on page 4626.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC TRANSREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the TRANSREG procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

FREQ Statement

  • FREQ variable ;

If one variable in the input data set represents the frequency of occurrence for other values in the observation, specify the variable's name in a FREQ statement. PROC TRANSREG then treats the data set as if each observation appeared n times, where n is the value of the FREQ variable for the observation. Noninteger values of the FREQ variable are truncated to the largest integer less than the FREQ value. The observation is used in the analysis only if the value of the FREQ statement variable is greater than or equal to 1.

ID Statement

  • ID variables ;

The ID statement includes additional character or numeric variables in the OUT= data set. The variables must be contained in the input data set.

MODEL Statement

  • MODEL < transform(dependents < / t-options > )

    • < transform(dependents < / t-options > ) > = >

    • transform(independents < / t-options > )

    • < transform(independents < / t-options > ) >< / a-options > ;

The MODEL statement specifies the dependent and independent variables ( dependents and independents , respectively) and specifies the transformation ( transform ) to apply to each variable. Only one MODEL statement can appear in the TRANSREG procedure. The t-options are transformation options, and the a-options are the algorithm options. The t-options provide details for the transformation; these depend on the transform chosen . The t-options are listed after a slash in the parentheses that enclose the variable list (either dependents or independents ). The a-options control the algorithm used, details of iteration, details of how the intercept and dummy variables are generated, and displayed output details. The a-options are listed after the entire model specification (the dependents , independents , transformations, and t-options ) and after a slash. You can also specify the algorithm options in the PROC TRANSREG statement. When you specify the DESIGN o-option , dependents and an equal sign are not required. The operators '*', '', and '@' from the GLM procedure are available for interactions with the CLASS expansion and the IDENTITY transformation.

  Class(a * b ...   c  d ...   e  f ... @ n)   Identity(a * b ...   c  d ...   e  f ... @ n)  

In addition, transformations and spline expansions can be crossed with classification variables:

   transform    (var) * class(group)    transform    (var)  class(group)  

See the 'Types of Effects' section on page 1784 in Chapter 32, 'The GLM Procedure,' for a description of the @, *, and operators and see the 'Model Statement Usage' section on page 4592 for information on how to use these operators in PROC TRANSREG. Note that nesting is not allowed in PROC TRANSREG.

The next three sections discuss the transformations available ( transforms ) (see the 'Families of Transformations' section on page 4558), the transformation options ( toptions ) (see the 'Transformation Options (t-options)' section on page 4564), and the algorithm options ( a-options ) (see the 'Algorithm Options (a-options)' section on page 4573).

Families of Transformations

In the MODEL statement, transform specifies a transformation in one of four families.

Variable expansions

preprocess the specified variables, replacing them with more variables.

Nonoptimal transformations

preprocess the specified variables, replacing each one with a single new nonoptimal, nonlinear transformation.

Optimal transformations

replace the specified variables with new, iteratively derived optimal transformation variables that fit the specified model better than the original variable (except for contrived cases where the transformation fits the model exactly as well as the original variable).

Other transformations

are the IDENTITY and SSPLINE transformations. These do not fit into the preceding categories.

The following table summarizes the transformations in each family.

Family

Members of Family

Variable expansions

B-spline basis

BSPLINE

set of dummy variables

CLASS

elliptical response surface

EPOINT

circular response surface

POINT

piecewise polynomial basis

PSPLINE

quadratic response surface

QPOINT

Nonoptimal transformations

inverse trigonometric sine

ARSIN

Box-Cox

BOXCOX

exponential

EXP

logarithm

LOG

logit

LOGIT

raises variables to specified power

POWER

transforms to ranks

RANK

noniterative smoothing spline

SMOOTH

Optimal transformations

linear

LINEAR

monotonic, ties preserved

MONOTONE

monotonic B-spline

MSPLINE

optimal scoring

OPSCORE

B-spline

SPLINE

monotonic, ties not preserved

UNTIE

Other transformations

identity, no transformation

IDENTITY

iterative smoothing spline

SSPLINE

You can use any transformation with either dependent or independent variables (except the SMOOTH transformation, which can be used only with independent variables, and BOXCOX, which can be used only with dependent variables). However, the variable expansions are usually more appropriate for independent variables.

The transform is followed by a variable (or list of variables) enclosed in parentheses. Optionally, depending on the transform , the parentheses can also contain t-options , which follow the variables and a slash. For example,

  model log(y)=class(x);  

finds a LOG transformation of Y and performs a CLASS expansion of X .

  model identity(y) = spline(x1 x2 / nknots=3);  

The preceding statement finds SPLINE transformations of X1 and X2 . The NKNOTS= t-option used with the SPLINE transformation specifies three knots. The IDENTITY( Y ) transformation specifies that Y is not to be transformed.

The rest of this section provides syntax details for members of the four families of transformations. The t-options are discussed in the 'Transformation Options (t-options)' section on page 4564.

Variable Expansions

The TRANSREG procedure performs variable expansions before iteration begins. Variable expansions expand the original variables into a typically larger set of new variables. The original variables are those that are listed in parentheses after transform , and they are sometimes referred to by the name of the transform . For example, in CLASS( X1 X2 ), X1 and X2 are sometimes referred to as CLASS expansion variables or simply CLASS variables, and the expanded variables are referred to as dummy variables. Similarly, in POINT( Dim1 Dim2 ), Dim1 and Dim2 are sometimes referred to as POINT variables.

The resulting variables are not transformed by the iterative algorithms after the initial preprocessing. Observations with missing values for these types of variables are excluded from the analysis.

The POINT, EPOINT, and QPOINT variable expansions are used in preference mapping analyses (also called PREFMAP, external unfolding, ideal point regression) (Carroll 1972) and for response surface regressions. These three expansions create circular, elliptical, and quadratic response or preference surfaces (see the 'Point Models' section on page 4605 and Example 75.5). The CLASS variable expansion is used for main effects ANOVA.

The following list provides syntax and details for the variable expansion transforms .

BSPLINE

BSP

  • expands each variable to a B-spline basis. You can specify the DEGREE=, KNOTS=, NKNOTS=, and EVENLY t-options with the BSPLINE expansion. When DEGREE= n (3 by default) with k knots (0 by default), n + k + 1 variables are created. In addition, the original variable appears in the OUT= data set before the ID variables. For example, BSPLINE( X ) expands X into X_0X_1X_2X_3 and outputs X as well. The X_: variables contain the B-spline (which are the same basis vectors that the SPLINE and MSPLINE transformations use internally). The columns of the BSPLINE expansion sum to a column of ones, so an implicit intercept model is fit when the BSPLINE expansion is specified. If you specify the BSPLINE expansion for more than one variable, the model is less than full rank. See the section 'SPLINE, BSPLINE, and PSPLINE Comparisons' on page 4614. Variables following BSPLINE must be numeric, and they are typically continuous.

CLASS

CLA

  • expands the variables to a set of dummy variables. For example, CLASS( X1 X2 ) is used for a simple main-effects model, CLASS( X1 X2 ) fits a main-effects and interactions model, and CLASS( X1 X2 X3 X4@2 X1 * X2 * X3 ) creates all main effects, all two-way interactions, and one three-way interaction. See the 'Model Statement Usage' section on page 4592 for information on how to use the operators @, *, and in PROC TRANSREG. To determine class membership, PROC TRANSREG uses the values of the formatted variables. Variables following CLASS can be either character or numeric; numeric variables should be discrete.

EPOINT

EPO

  • expands the variables for an elliptical response surface regression or for an elliptical ideal point regression. Specify the COORDINATES o-option to output PREFMAP ideal elliptical point model coordinates to the OUT= data set. Each axis of the ellipse (or ellipsoid) is oriented in the same direction as one of the variables. The EPOINT expansion creates a new variable for each original variable. The value of each new variable is the square of each observed value for the corresponding parenthesized variable. The regression analysis then uses both sets of variables (original and squared). Variables following EPOINT must be numeric, and they are typically continuous.

POINT

POI

  • expands the variables for a circular response surface regression or for a circular ideal point regression. Specify the COORDINATES o-option to output PREFMAP ideal point model coordinates to the OUT= data set. The POINT expansion creates a new variable having a value for each observation that is the sums of squares of all the POINT variables. This new variable is added to the set of variables and is used in the regression analysis. For more on ideal point regression, refer to Carroll (1972). Variables following POINT must be numeric, and they are typically continuous.

PSPLINE

PSP

  • expands each variable to a piecewise polynomial basis. You can specify the DEGREE=, KNOTS=, NKNOTS=, and EVENLY t-options with PSPLINE. When DEGREE= n (3 by default) with k knots (0 by default), n + k variables are created. In addition, the original variable appears in the OUT= data set before the ID variables. For example, PSPLINE( X / NKNOTS=1) expands X into X_1X_2X_3X_4 and outputs X as well. Unlike BSPLINE, an intercept is not implicit in the columns of PSPLINE. Refer to Smith (1979) for a good introduction to piecewise polynomial splines. Also see the section 'SPLINE, BSPLINE, and PSPLINE Comparisons' on page 4614. Variables following PSPLINE must be numeric, and they are typically continuous.

QPOINT

QPO

  • expands the variables for a quadratic response surface regression or for a quadratic ideal point regression. Specify the COORDINATES o-option to output PREFMAP quadratic ideal point model coordinates to the OUT= data set. For m QPOINT variables, m ( m +1)/2 new variables are created containing the squares and crossproducts of the original variables. The regression analysis uses both sets (original and crossed). Variables following QPOINT must be numeric, and they are typically continuous.

Nonoptimal Transformations

Like variable expansions, nonoptimal transformations are computed before the iterative algorithm begins. Nonoptimal transformations create a single new transformed variable that replaces the original variable. The new variable is not transformed by the subsequent iterative algorithms (except for a possible linear transformation with missing value estimation).

The following list provides syntax and details for nonoptimal variable transformations.

ARSIN

ARS

  • finds an inverse trigonometric sine transformation. Variables following ARSIN must be numeric, in the interval ( ˆ’ 1.0 X 1 . 0), and they are typically continuous.

BOXCOX

BOX

  • finds a Box-Cox transformation of the specified variables (see the 'Box-Cox Transformations' section on page 4595 and Example 75.6). The BOXCOX transformation can be used only with dependent variables. The ALPHA=, CLL=, CONVENIENT , GEOMETRICMEAN, LAMBDA=, and PARAMETER= t-options can be used with the BOXCOX transformation. Variables following BOXCOX must be numeric, and they are typically continuous.

EXP

  • exponentiates variables (the variable X is transformed to a X ). To specify the value of a , use the PARAMETER= t-option . By default, a is the mathematical constant e = 2 . 718 . Variables following EXP must be numeric, and they are typically continuous.

LOG

  • transforms variables to logarithms (the variable X is transformed to log a ( X )). To specify the base of the logarithm, use the PARAMETER= t-option . The default is a natural logarithm with base e = 2.718 . Variables following LOG must be numeric and positive, and they are typically continuous.

LOGIT

  • finds a logit transformation on the variables. The logit of X is log( X/ (1 ˆ’ X )). Unlike other transformations, LOGIT does not have a three-letter abbreviation. Variables following LOGIT must be numeric, in the interval (0.0 < X < 1 . 0), and they are typically continuous.

POWER

POW

  • raises variables to a specified power (the variable X is transformed to X a ). You must specify the power parameter a by specifying the PARAMETER= t-option following the variables:

      power(variable / parameter=number)  

    You can use POWER for squaring variables (PARAMETER=2), reciprocal transformations (PARAMETER= ˆ’ 1), square roots (PARAMETER=0.5), and so on. Variables following POWER must be numeric, and they are typically continuous.

RANK

RAN

  • transforms variables to ranks. Ranks are averaged within ties. The smallest input value is assigned the smallest rank. Variables following RANK must be numeric.

SMOOTH

SMO

  • is a noniterative smoothing spline transformation. You can specify the smoothing parameter with either the SM= or the PARAMETER= t-option . The default smoothing parameter is SM=0. Variables following SMOOTH must be numeric, and they are typically continuous. The SMOOTH transformation can be used only with independent variables. For more information, see the 'Smoothing Splines' section on page 4596.

Optimal Transformations

Optimal transformations are iteratively derived. Missing values for these types of variables can be optimally estimated (see the 'Missing Values' section on page 4599).

The following list provides syntax and details for optimal transformations.

LINEAR

LIN

  • finds an optimal linear transformation of each variable. For variables with no missing values, the transformed variable is the same as the original variable. For variables with missing values, the transformed nonmissing values have a different scale and origin than the original values. Variables following LINEAR must be numeric.

MONOTONE

MON

  • finds a monotonic transformation of each variable, with the restriction that ties are preserved. The Kruskal (1964) secondary least-squares monotonic transformation is used. This transformation weakly preserves order and category membership (ties). Variables following MONOTONE must be numeric, and they are typically discrete.

MSPLINE

MSP

  • finds a monotonically increasing B-spline transformation with monotonic coefficients (de Boor 1978; de Leeuw 1986) of each variable. You can specify the DEGREE=, KNOTS=, NKNOTS=, and EVENLY t-options with MSPLINE. By default, PROC TRANSREG uses a quadratic spline. Variables following MSPLINE must be numeric, and they are typically continuous.

OPSCORE

OPS

  • finds an optimal scoring of each variable. The OPSCORE transformation assigns scores to each class (level) of the variable. Fisher's (1938) optimal scoring method is used. Variables following OPSCORE can be either character or numeric; numeric variables should be discrete.

SPLINE

SPL

  • finds a B-spline transformation (de Boor 1978) of each variable. By default, PROC TRANSREG uses a cubic polynomial transformation. You can specify the DEGREE=, KNOTS=, NKNOTS=, and EVENLY t-options with SPLINE. Variables following SPLINE must be numeric, and they are typically continuous.

UNTIE

UNT

  • finds a monotonic transformation of each variable without the restriction that ties are preserved. The TRANSREG procedure uses the Kruskal (1964) primary least-squares monotonic transformation method. This transformation weakly preserves order but not category membership (it may untie some previously tied values). Variables following UNTIE must be numeric, and they are typically discrete.

Other Transformations

IDENTITY

IDE

  • specifies variables that are not changed by the iterations. Typically, the IDENTITY transformation is used with a simple variable list, such as IDENTITY( X1-X5 ). However, you can also specify interaction terms. For example, IDENTITY( X1 X2 ) creates X1 , X2 , and the product X1 * X2 ; and IDENTITY( X1 X2 X3 ) creates X1 , X2 , X1 * X2 , X3 , X1 * X3 , X2 * X3 , and X1 * X2 * X3 . See the 'Model Statement Usage' section on page 4592 for information on how to use the operators @, *, and in PROC TRANSREG.

    The IDENTITY transformation is used for variables when no transformation and no missing data estimation are desired. However, the REFLECT t-option , the ADDITIVE a-option , and the TSTANDARD=Z, and TSTANDARD=CENTER options can linearly transform all variables, including IDENTITY variables, after the iterations. Observations with missing values in IDENTITY variables are excluded from the analysis, and no optimal scores are computed for missing values in IDENTITY variables. Variables following IDENTITY must be numeric.

SSPLINE

SSP

  • finds an iterative smoothing spline transformation of each variable. The SSPLINE transformation does not generally minimize squared error. You can specify the smoothing parameter with either the SM= t-option or the PARAMETER= t-option . The default smoothing parameter is SM=0. Variables following SSPLINE must be numeric, and they are typically continuous.

Transformation Options (t-options)

If you use a nonoptimal, optimal, or other transformation, you can use t-options , which specify additional details of the transformation. The t-options are specified within the parentheses that enclose variables and are listed after a slash. You can use t-options with both dependent and independent variables. For example,

  proc transreg;   model identity(y)=spline(x / nknots=3);   output;   run;  

The preceding statements find an optimal variable transformation (SPLINE) of the independent variable, and they use a t-option to specify the number of knots (NKNOTS=). The following is a more complex example:

  proc transreg;   model mspline(y / nknots=3)=class(x1 x2 / effects);   output;   run;  

These statements find a monotone spline transformation (MSPLINE with three knots) of the dependent variable and perform a CLASS expansion with effects coding of the independents.

The following sections discuss the t-options available for nonoptimal, optimal, and other transformations.

The following table summarizes the t-options .

Table 75.2: t-options Available in the MODEL Statement

Task

Option

Nonoptimal transformation t-options

uses original mean and variance

ORIGINAL

Parameter t-options

specifies miscellaneous parameters

PARAMETER=

specifies smoothing parameter

SM=

Spline t-options

specifies the degree of the spline

DEGREE=

spaces the knots evenly

EVENLY

exterior knots

EXKNOTS=

specifies the interior knots or break points

KNOTS=

creates n knots

NKNOTS=

CLASS Variable t-options

CLASS dummy variable name prefix

CPREFIX=

requests a deviations-from-means coding

DEVIATIONS

requests a deviations-from-means coding

EFFECTS

CLASS dummy variable label prefix

LPREFIX=

order of class variable levels

ORDER=

CLASS dummy variable label separators

SEPARATORS=

controls reference levels

ZERO=

BOXCOX t-options

confidence interval alpha

ALPHA=

convenient lambda list

CLL=

use a convenient lambda

CONVENIENT

scale the transformation using geometric mean

GEOMETRICMEAN

power parameter list

LAMBDA=

Other t-options

operations occur after the expansion

AFTER

centers before the analysis begins

CENTER

renames variables

NAME=

reflects the variable around the mean

REFLECT

specifies transformation standardization

TSTANDARD=

standardizes before the analysis begins

Z

Nonoptimal Transformation t-options

ORIGINAL

ORI

  • matches the variable's final mean and variance to the mean and variance of the original variable. By default, the mean and variance are based on the transformed values. The ORIGINAL t-option is available for all of the nonoptimal transformations.

Parameter t-options

PARAMETER= number

PAR= number

  • specifies the transformation parameter. The PARAMETER= t-option is available for the BOXCOX, EXP, LOG, POWER, SMOOTH, and SSPLINE transformations. For BOXCOX, the parameter is the value to add to each value of the BOXCOX variable before a Box-Cox transformation. For EXP, the parameter is the value to be exponentiated; for LOG, the parameter is the base value; and for POWER, the parameter is the power. For SMOOTH and SSPLINE, the parameter is the raw smoothing parameter. (You can specify a SAS/GRAPH-style smoothing parameter with the SM= t-option .) The default for the PARAMETER= t-option for the BOXCOX transformation is 0 and for the LOG and EXP transformations is e =2 . 718 . The default parameter for SMOOTH and SSPLINE is computed from SM=0. For the POWER transformation, you must specify the PARAMETER= t-option ; there is no default.

SM= n

  • specifies a SAS/GRAPH-style smoothing parameter in the range 0 to 100. You can specify the SM= t-option only with the SMOOTH and SSPLINE transformations. The smoothness of the function increases as the value of the smoothing parameter increases . By default, SM=0.

Spline t-options

The following t-options are available with the SPLINE and MSPLINE optimal transformations and the PSPLINE and BSPLINE expansions.

DEGREE= n

DEG= n

  • specifies the degree of the spline transformation. The degree must be a nonnegative integer. The defaults are DEGREE=3 for SPLINE, PSPLINE, and BSPLINE variables and DEGREE=2 for MSPLINE variables.

  • The polynomial degree should be a small integer, usually 0, 1, 2, or 3. Larger values are rarely useful. If you have any doubt as to what degree to specify, use the default.

EVENLY

EVE

  • is used with the NKNOTS= t-option to space the knots evenly. The differences between adjacent knots are constant.

  • If you specify NKNOTS= k , k knots are created at

    click to expand
  • for i = 1 ,..., k . For example, if you specify

      spline(X / knots=2 evenly)  
  • and the variable X has a minimum of 4 and a maximum of 10, then the two interior knots are 6 and 8. Without the EVENLY t-option , the NKNOTS= t-option places knots at percentiles, so the knots are not evenly spaced .

EXKNOTS= number-list n TO m BY p

EXK= number-list n TO m BY p

  • specifies exterior knots for SPLINE and MSPLINE transformations and BSPLINE expansions. Usually, this option is not needed; PROC TRANSREG automatically picks suitable exterior knots. The only time you need to use this option is when you want to ensure that the exact same basis is used for different splines, for example when applying coefficients from one spline transformation to a variable in a different data set (see 'Scoring Spline Variables' at the end of Example 75.1).

    Specify one or two values. If the minimum EXKNOTS= value is less than the minimum data value, it is used as the exterior knot. If the maximum EXKNOTS= value is greater than the maximum data value, it is used as the exterior knot. Otherwise these values are ignored. When EXKNOTS= is specified with the CENTER or Z t-options , the knots apply to the original variable, not to the centered or standardized variable.

    The B-spline transformations and expansions use a knot list consisting of exterior knots (values just smaller than the minimum), the specified (interior) knots, and exterior knots (values just larger than the minimum). You can use the DETAILS option to see all of these knots. Using different external knots gives different but equivalent B-spline bases. You can specify exterior knots on either the KNOTS= or EXKNOTS= t-options , however for the BSPLINE expansion, the KNOTS= t-option creates extra all-zero basis columns, whereas the EXKNOTS= t-option will give you the correct basis.

KNOTS= number-list n TO m BY p

KNO= number-list n TO m BY p

  • specifies the interior knots or break points. By default, there are no knots. The first time you specify a value in the knot list, it indicates a discontinuity in the n th (from DEGREE= n ) derivative of the transformation function at the value of the knot. The second mention of a value indicates a discontinuity in the ( n ˆ’ 1)th derivative of the transformation function at the value of the knot. Knots can be repeated any number of times for decreasing smoothness at the break points, but the values in the knot list can never decrease.

    You cannot use the KNOTS= t-option with the NKNOTS= t-option . You should keep the number of knots small (see the section 'Specifying the Number of Knots' on page 4613).

NKNOTS= n

NKN= n

  • creates n knots, the first at the 100 / ( n + 1) percentile, the second at the 200 / ( n + 1) percentile, and so on. Knots are always placed at data values; there is no interpolation. For example, if NKNOTS=3, knots are placed at the twenty-fifth percentile, the median, and the seventy-fifth percentile. By default, NKNOTS=0. The NKNOTS= t-option must be 0.

    You cannot use the NKNOTS= t-option with the KNOTS= t-option .

    You should keep the number of knots small (see the section 'Specifying the Number of Knots' on page 4613).

CLASS Variable t-options

CPREFIX= n number-list

CPR= n number-list

  • specifies the number of first characters of a CLASS expansion variable's name to use in constructing names for dummy variables. When CPREFIX= is specified as an a-option (see the description of the CPREFIX= a-option on page 4575) or an o-option , it specifies the default for all CLASS variables. When you specify CPREFIX= as a t-option , it overrides the default only for selected variables. A different CPREFIX= value can be specified for each CLASS variable by specifying the CPREFIX=number-list t-option , like the ZERO=formatted-value-list t-option .

DEVIATIONS

DEV

EFFECTS

EFF

  • requests a deviations-from-means coding of CLASS variables. The coded design matrix has values of 0, 1, and ˆ’ 1 for reference levels. This coding is referred to as 'deviations-from-means,' 'effects,' 'center-point,' or 'full-rank' coding.

LPREFIX= n number-list

LPR= n number-list

  • specifies the number of first characters of a CLASS expansion variable's label (or name if no label is specified) to use in constructing labels for dummy variables. When LPREFIX= is specified as an a-option (see the description of the LPREFIX= a-option on page 4576) or an o-option , it specifies the default for all CLASS variables. When you specify LPREFIX= as a t-option , it overrides the default only for selected variables. A different LPREFIX= value can be specified for each CLASS variable by specifying the LPREFIX=number-list t-option , like the ZERO=formatted-value-list t-option .

ORDER=DATA FREQ FORMATTED INTERNAL

ORD=DAT FRE FOR INT

  • specifies the order in which the CLASS variable levels are to be reported . The default is ORDER=INTERNAL. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. When ORDER= is specified as an a-option (see the description of the ORDER= a-option on page 4578) or as an o-option , it specifies the default ordering for all CLASS variables. When you specify ORDER= as a t-option , it overrides the default ordering only for selected variables. You can specify a different ORDER= value for each CLASS specification.

SEPARATORS=' string-1 ' < ' string-2 ' >

SEP=' string-1 ' < ' string-2 ' >

  • specifies separators for creating CLASS expansion variable labels. By default, SEPARATORS=' ' ' * ' ('blank' and 'blank asterisk blank'). When SEPARATORS= is specified as an a-option (see the description of the SEPARATORS= a-option on page 4579) or an o-option , it specifies the default separators for all CLASS variables. When you specify SEPARATORS= as a t-option , it overrides the default only for selected variables. You can specify a different SEPARATORS= value for each CLASS specification.

ZERO=FIRST LAST NONE SUM

ZER=FIR LAS NON SUM

ZERO=' formatted-value ' < ' formatted-value ' ... >

  • is used with CLASS variables. The default is ZERO=LAST.

    The specification CLASS(variable / ZERO=FIRST) sets to missing the dummy variable for the first of the sorted categories, implying a zero coefficient for that category.

    The specification CLASS(variable / ZERO=LAST) sets to missing the dummy variable for the last of the sorted categories, implying a zero coefficient for that category.

    The specification CLASS(variable / ZERO=' formatted-value ') sets to missing the dummy variable for the category with a formatted value that matches ' formattedvalue ', implying a zero coefficient for that category. With ZERO=formatted-value-list, the first formatted value applies to the first variable in the specification, the second formatted value applies to the next variable that was not previously mentioned and so on. For example, CLASS( AA * BBB * CC / ZERO='x' 'y' 'z') specifies that the reference level for A is 'x', for B is 'y', and for C is 'z'. With ZERO=' formattedvalue ', the procedure first looks for exact matches between the formatted values and the specified value. If none are found, leading blanks are stripped from both and the values are compared again. If zero or two or more matches are found, warnings are issued.

    The specifications ZERO=FIRST, ZERO=LAST, and ZERO=' formatted-value ' are used for reference cell models. The Intercept parameter estimate is the marginal mean for the reference cell, and the other marginal means are obtained by adding the intercept to the dummy variable coefficients.

    The specification CLASS(variable / ZERO=NONE) sets to missing none of the dummy variables. The columns of the expansion sum to a column of ones, so an implicit intercept model is fit. If you specify ZERO=NONE for more than one variable, the model is less than full rank. In the model MODEL IDENTITY( Y ) = CLASS( X / ZERO=NONE), the coefficients are cell means.

    The specification CLASS(variable / ZERO=SUM) sets to missing none of the dummy variables, and the coefficients for the dummy variables created from the variable sum to 0. This creates a less-than -full-rank model, but the coefficients are uniquely determined due to the sum-to-zero constraint.

    In the presence of iterative transformations, hypothesis tests for ZERO=NONE and ZERO=SUM levels are not exact; they are liberal because a model with an explicit intercept is fit inside the iterations. There is no provision for adjusting the transformations while setting to 0 a parameter that is redundant given the explicit intercept and the other parameters.

Box-Cox t-options

The following t-options are available only with the BOXCOX transformation of the dependent variable (see the 'Box-Cox Transformations' section on page 4595 and Example 75.6).

ALPHA= p

ALP= p

  • specifies the Box-Cox alpha for the confidence interval for the power parameter. By default, ALPHA=0.05.

CLL= number-list

  • specifies the Box-Cox convenient lambda list. When the confidence interval for the power parameter includes one of the values in this list, PROC TRANSREG reports it and can optionally use the convenient power parameter instead of the more optimal power parameter. The default is CLL=1.0 0.0 0.5 ˆ’ 1.0 ˆ’ 0.5 2.0 ˆ’ 2.0 3.0 ˆ’ 3.0. By default, a linear transformation is preferred over log, square root, inverse, inverse square root, quadratic, inverse quadratic, cubic, and inverse cubic. If you specify the CONVENIENT t-option , then PROC TRANSREG uses the first convenient power parameter in the list that is in the confidence interval. For example, if the optimal power parameter is 0.25 and 0.0 is in the confidence interval but not 1.0, then the convenient power parameter is 0.0.

CONVENIENT

CON

  • specifies that a power parameter from the CLL= t-option list is to be used for the final transformation instead of the LAMBDA= t-option value if a CLL= value is in the confidence interval. See the CLL= t-option for more information on its usage.

GEOMETRICMEAN

GEO

  • divides the Box-Cox transformation by » ˆ’ 1 where is the geometric mean of the variable to be transformed. This form of the Box-Cox transformation essentially converts the transformation back to original units and hence allows direct comparison of the residual sums of squares for models with different power parameters.

LAMBDA= number-list

LAM= number-list

  • specifies a list of Box-Cox power parameters. The default is LAMBDA=-3 TO 3 BY 0.25. PROC TRANSREG tries each power parameter in the list and picks the best one. However, when the CONVENIENT t-option value is in the confidence interval. See the CLL= t-option for more information on its usage.

Other t-options

AFTER

AFT

  • requests that certain operations occur after the expansion. This t-option affects the NKNOTS= t-option when the SPLINE or MSPLINE transformation is crossed with a CLASS specification. For example, if the original spline variable (1 2 3 4 5 6 7 8 9) is expanded into the three variables (1 2 3 0 0 0 0 0 0), (0 0 0 4 5 6 0 0 0), and (0 0 0 0 0 0 7 8 9), then, by default, NKNOTS=1 would use the overall median of 5 as the knot for all three variables. When you specify the AFTER t-option , the knots for the three variables are 2, 5, and 8. Note that the structural zeros are ignored when the internal knot list is created, but they are not ignored for the external knots.

    You can also specify the AFTER t-option with the RANK and SMOOTH transformations. The following specifications compute ranks and smooth within groups, after crossing , ignoring the structural zeros.

      class(x / zero=none)  rank(z / after)   class(x / zero=none)  smooth(z / after)  

CENTER

CEN

  • centers the variables before the analysis begins (in contrast to the TSTANDARD=CENTER option which centers after the analysis ends). The CENTER t-option can be used instead of running PROC STANDARD before PROC TRANSREG (see the 'Centering' section on page 4675). When the KNOTS= t-option is specified with CENTER, the knots apply to the original variable, not to the centered variable. PROC TRANSREG will center the knots.

NAME=( variable-list )

NAM=( variable-list )

  • renames variables as they are used in the MODEL statement. This t-option allows a variable to be used more than once.

    For example, if X is a character variable, then the following step stores both the original character variable X and a numeric variable XC that contains category numbers in the OUT= data set.

      proc transreg data=a;   model identity(y) = opscore(x / name=(xc));   output;   id x;   run;  

    With the CLASS and IDENTITY transformations, which allow interaction effects, the first name applies to the first variable in the specification, the second name applies to the next variable that was not previously mentioned, and so on. For example, IDENTITY( AA * BBB * CC / NAME=( GHI )) specifies that the new name for A is G , for B is H , and for C is I. The same assignment is used for the (not useful) specification IDENTITY( AABBCC / NAME=( GHI )). For all transforms other than CLASS and IDENTITY (all those in which interactions are not supported), repeated variables are not handled specially. For example, SPLINE( AABBCC / NAME=( A GBHCI )) creates six variables, a copy of A named A , another copy of A named G , a copy of B named B , another copy of B named H , a copy of C named C , and another copy of C named I .

REFLECT

REF

  • reflects the transformation

    after the iterations are completed and before the final standardization and results calculations. This t-option is particularly useful with the dependent variable in a conjoint analysis. When the dependent variable consists of ranks with the most preferred combination assigned 1.0, the REFLECT t-option reflects the transformation so that positive utilities mean high preference. (See Example 75.2.)

TSTANDARD=CENTER NOMISS ORIGINAL Z

TST=CEN NOM ORI Z

  • specifies the standardization of the transformed variables for the hypothesis tests and in the OUT= data set (see the 'Centering' section on page 4675). By default, TSTANDARD=ORIGINAL. When TSTANDARD= is specified as an a-option (see the description of the TSTANDARD= a-option on page 4580) or an o-option , it determines the default standardization for all variables. When you specify TSTANDARD= as a t-option , it overrides the default standardization only for selected variables. You can specify a different TSTANDARD= value for each transformation. For example, to perform a redundancy analysis with standardized dependent variables, specify

      model identity(y1-y4 / tstandard=z) = identity(x1-x10);  

Z

  • centers and standardizes the variables to variance one before the analysis begins (in contrast to the TSTANDARD=Z option, which standardizes after the analysis ends). The Z t-option can be used instead of running PROC STANDARD before PROC TRANSREG (see the 'Centering' section on page 4675). When the KNOTS= toption is specified with Z, the knots apply to the original variable, not to the centered variable. PROC TRANSREG will standardize the knots.

Algorithm Options (a-options)

This section discusses the options that can appear in the PROC TRANSREG or MODEL statements as a-options . They are listed after the entire model specification and after a slash.

For example,

  proc transreg;   model spline(y / nknots=3)=log(x1 x2 / parameter=2)   / nomiss maxiter=50;   output;   run;  

In the preceding statements, NOMISS and MAXITER= are a-options . (SPLINE and LOG are transforms , and NKNOTS= and PARAMETER= are t-options .) The statements find a spline transformation with 3 knots on Y and a base 2 logarithmic transformation on X1 and X2 . The NOMISS a-option excludes all observations with missing values, and the MAXITER= a-option specifies the maximum number of iterations.

Table 75.3: Options Available in the PROC TRANSREG or MODEL Statements

Task

Option

Input data set

specifies input observation type

TYPE=

restarts iterations

REITERATE

Specify method and control iterations

specifies minimum criterion change

CCONVERGE=

specifies minimum data change

CONVERGE=

specifies canonical dummy-variable initialization

DUMMY

specifies maximum number of iterations

MAXITER=

specifies iterative algorithm

METHOD=

specifies number of canonical variables

NCAN=

specifies singularity criterion

SINGULAR=

Control missing data handling

METHOD=MORALS fists each model individually

INDIVIDUAL

includes monotone special missing values

MONOTONE=

excludes observations with missing values

NOMISS

unties special missing values

UNTIE=

Control intercept and CLASS variables

CLASS dummy variable name prefix

CPREFIX=

CLASS dummy variable label prefix

LPREFIX=

no intercept or centering

NOINT

order of class variable levels

ORDER=

controls output of reference levels

REFERENCE=

CLASS dummy variable label separators

SEPARATORS=

Control displayed output

confidence limits alpha

ALPHA=

displays parameter estimate confidence limits

CL

displays model specification details

DETAIL

displays iteration histories

HISTORY

suppresses displayed output

NOPRINT

suppresses the iteration histories

SHORT

displays regression results

SS2

displays ANOVA table

TEST

displays conjoint part-worth utilities

UTILITIES

Control standardization

fits additive model

ADDITIVE

do not zero constant variables

NOZEROCONSTANT

specifies transformation standardization

TSTANDARD=

The following list provides details on these a-options .

ADDITIVE

ADD

  • creates an additive model by multiplying the values of each independent variable (after the TSTANDARD= standardization) by that variable's corresponding multiple regression coefficient. This process scales the independent variables so that the predicted -values variable for the final dependent variable is simply the sum of the final independent variables. An additive model is a univariate multiple regression model. As a result, the ADDITIVE a-option is not valid if METHOD=CANALS, or if METHOD=REDUNDANCY or METHOD=UNIVARIATE with more than one dependent variable.

ALPHA= number

ALP= number

  • specifies the level of significance for all of the confidence limits. By default, ALPHA=0.05.

CCONVERGE= n

CCO= n

  • specifies the minimum change in the criterion being optimized (squared multiple correlation for METHOD=MORALS and METHOD=UNIVARIATE, average squared multiple correlation for METHOD=REDUNDANCY, average squared canonical correlation for METHOD=CANALS) that is required to continue iterating. By default, CCONVERGE=0.0.

CL

  • requests confidence limits on the parameter estimates in the displayed output.

CONVERGE= n

CON= n

  • specifies the minimum average absolute change in standardized variable scores that is required to continue iterating. By default, CONVERGE=0.00001. Average change is computed over only those variables that can be transformed by the iterations; that is, all LINEAR, OPSCORE, MONOTONE, UNTIE, SPLINE, MSPLINE, and SSPLINE variables and nonoptimal transformation variables with missing values.

CPREFIX= n

CPR= n

  • specifies the number of first characters of a CLASS expansion variable's name to use in constructing names for dummy variables. Dummy variable names are constructed from the first n characters of the CLASS expansion variable's name and the first 32 ˆ’ n characters of the formatted CLASS expansion variable's value. For example, if the variable ClassVariable has values 1, 2, and 3, then, by default, the dummy variables are named ClassVariable1 , ClassVariable2 , and ClassVariable3 . However, with CPREFIX=5, the dummy variables are named Class1 , Class2 , and Class3 . When CPREFIX=0, dummy variable names are created entirely from the CLASS expansion variable's formatted values. Valid values range from -1 to 31, where -1 indicates the default calculation and 0 to 31 are the number of prefix characters to use. The default, -1, sets n to 32 - min(32, max(2, fl )), where fl is the format length. When CPREFIX= is specified as an a-option or an o-option , it specifies the default for all CLASS variables. When you specify CPREFIX= as a t-option , it overrides the default only for selected variables.

DETAIL

DET

  • reports on details of the model specification. For example, it reports the knots and coefficients for splines, reference levels for CLASS variables, Box-Cox results, and so on.

DUMMY

DUM

  • provides a canonical dummy variable initialization. When there are no monotonicity constraints and there is only one canonical variable in each set, PROC TRANSREG (with the DUMMY a-option ) can usually find the optimal solution in only one iteration. The initialization iteration is number 0, which is slower and uses more memory than other iterations. However, when there are no monotonicity constraints, when there is only one canonical variable in each set, and when there is enough available memory, specifying the DUMMY a-option can greatly decrease the amount of time required to find the optimal transformations. Furthermore, by solving for the transformations directly instead of iteratively, PROC TRANSREG avoids certain nonoptimal solutions.

HISTORY

HIS

  • displays the iteration histories even when the NOPRINT a-option is specified.

INDIVIDUAL

IND

  • fits each model for each dependent variable individually. This means, for example, that when INDIVIDUAL is specified, missing values in one dependent variable will not cause that observation to be deleted for the other models with the other dependent variables. In contrast, by default, missing values in any variable in any model can cause the observation to be deleted for all models. The INDIVIDUAL option can only be specified with METHOD=MORALS.

    This option also affects the order of the output. By default, the number of observations table is printed once at the beginning of the output. With INDIVIDUAL, a number of observations table appears for each model.

LPREFIX= n

LPR= n

  • specifies the number of first characters of a CLASS expansion variable's label (or name if no label is specified) to use in constructing labels for dummy variables. Dummy variable labels are constructed from the first n characters of the CLASS expansion variable's name and the first 127 ˆ’ n characters of the formatted CLASS expansion variable's value. Valid values range from -1 to 127. Values of 0 to 127 specify the number of name or label characters to use. The default is -1, which specifies that PROC TRANSREG should pick a value depending on the length of the prefix and the formatted class value. When LPREFIX= is specified as an a-option or an o-option , it determines the default for all CLASS variables. When you specify LPREFIX= as a t-option , it overrides the default only for selected variables.

MAXITER= n

MAX= n

  • specifies the maximum number of iterations (see the 'Controlling the Number of Iterations' section on page 4601). By default, MAXITER=30. A specification of MAXITER=0 is allowed to save time when no transformations are requested .

METHOD=CANALS MORALS REDUNDANCY UNIVARIATE

MET=CAN MOR RED UNI

  • specifies the iterative algorithm. By default, METHOD=UNIVARIATE, unless you specify options that cannot be handled by the UNIVARIATE algorithm. Specifically, the default is METHOD=MORALS for the following situations:

    • if you specify LINEAR, OPSCORE, MONOTONE, UNTIE, SPLINE, MSPLINE, or SSPLINE transformations for the independent variables

    • if you specify the ADDITIVE a-option with more than one dependent variable

    • if you specify the IAPPROXIMATIONS o-option

      CANALS

      specifies canonical correlation with alternating least squares. This jointly transforms all dependent and independent variables to maximize the average of the first n squared canonical correlations , where n is the value of the NCAN= a-option .

      MORALS

      specifies multiple optimal regression with alternating least squares. This transforms each dependent variable, along with the set of independent variables, to maximize the squared multiple correlation.

      REDUNDANCY

      jointly transforms all dependent and independent variables to maximize the average of the squared multiple correlations (see the 'Redundancy Analysis' section on page 4606).

      UNIVARIATE

      transforms each dependent variable to maximize the squared multiple correlation, while the independent variables are not transformed.

MONOTONE= two- letters

MON= two-letters

  • specifies the first and last special missing value in the list of those special missing values to be estimated using within-variable order and category constraints. By default, there are no order constraints on missing value estimates. The two-letters value must consist of two letters in alphabetical order. For example, MONOTONE=DF means that the estimate of .D must be less than or equal to the estimate of .E, which must be less than or equal to the estimate of .F; no order constraints are placed on estimates of ._, .A through .C, and .G through .Z. For details, see the 'Missing Values' section on page 4599.

NCAN= n

NCA= n

  • specifies the number of canonical variables to use in the METHOD=CANALS algorithm. By default, NCAN=1. The value of the NCAN= a-option must be 1.

    When canonical coefficients and coordinates are included in the OUT= data set, the NCAN= a-option also controls the number of rows of the canonical coefficient matrices in the data set. If you specify an NCAN= value larger than the minimum of the number of dependent variables and the number of independent variables, PROC TRANSREG displays a warning and sets the NCAN= a-option to the maximum allowable value.

NOINT

NOI

  • omits the intercept from the OUT= data set and suppresses centering of data. The NOINT a-option is not allowed with iterative transformations since there is no provision for optimal scaling without an intercept. The NOINT a-option is allowed only when there is no implicit intercept and when all of the data in a BY group absolutely will not change during the iterations.

NOMISS

NOM

  • excludes all observations with missing values from the analysis, but does not exclude them from the OUT= data set. If you omit the NOMISS a-option ,PROC TRANSREG simultaneously computes the optimal transformations of the nonmissing values and estimates the missing values that minimize squared error. For details, see the 'Missing Values' section on page 4599.

    Casewise deletion of observations with missing values occurs when the NOMISS a-option is specified, when there are missing values in expansions, when there are missing values in METHOD=UNIVARIATE independent variables, when there are weights less than or equal to 0, or when there are frequencies less than 1. Excluded observations are output with a blank value for the _TYPE_ variable, and they have a weight of 0. They do not contribute to the analysis but are scored and transformed as supplementary or passive observations.

    See the 'Passive Observations' section on page 4605 for more information on excluded observations.

NOPRINT

NOP

  • suppresses the display of all output unless you specify the HISTORY a-option .The NOPRINT a-option without the HISTORY a-option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, 'Using the Output Delivery System.'

NOZEROCONSTANT

NOZERO

NOZ

  • specifies that constant variables are expected and should not be zeroed. By default, constant variables are zeroed. This option is useful when PROC TRANSREG is used to code experimental designs for discrete choice models (see the 'Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO' section on page 4660). When these designs are very large, it may be more efficient to use the DESIGN= n option. It may be that attributes are constant within a block of n observations, so you need to specify the NOZEROCONSTANT option to get the correct results. You can specify this option in the PROC TRANSREG, MODEL, and OUTPUT statements.

ORDER=DATA FREQ FORMATTED INTERNAL

ORD=DAT FRE FOR INT

  • specifies the order in which the CLASS variable levels are to be reported. The default is ORDER=INTERNAL. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. When ORDER= is specified as an a-option or an o-option , it determines the default ordering for all CLASS variables. When you specify ORDER= as a t-option , it overrides the default ordering only for selected variables.

    DATA

    sorts by order of appearance in the input data set.

    FORMATTED

    sorts by formatted value.

    FREQ

    sorts by descending frequency count; levels with the most observations appear first.

    INTERNAL

    sorts by unformatted value.

REFERENCE=NONE MISSING ZERO

REF=NON MIS ZER

  • specifies how reference levels of CLASS variables are to be treated. The options are REFERENCE=NONE, the default, in which reference levels are suppressed; REFERENCE=MISSING, in which reference levels are displayed and output with missing values; and REFERENCE=ZERO, in which reference levels are displayed and output with zeros. The REFERENCE= option can be specified in the PROC TRANSREG, MODEL, or OUTPUT statement, and it can be independently specified for the OUT= data set and the displayed output. When you specify it in only one statement, it sets the option for both the displayed output and the OUT= data set.

REITERATE

REI

  • enables the TRANSREG procedure to use previous transformations as starting points. The REITERATE a-option affects only variables that are iteratively transformed (specified as LINEAR, OPSCORE, MONOTONE, UNTIE, SPLINE, MSPLINE, and SSPLINE). For iterative transformations, the REITERATE a-option requests a search in the input data set for a variable that consists of the value of the TDPREFIX= or TIPREFIX= o-option followed by the original variable name. If such a variable is found, it is used to provide the initial values for the first iteration. The final transformation is a member of the transformation family defined by the original variable, not the transformation family defined by the initialization variable. See the section 'Using the REITERATE Algorithm Option' on page 4602.

SEPARATORS=' string-1 ' < ' string-2 ' >

SEP=' string-1 ' < ' string-2 ' >

  • specifies separators for creating CLASS expansion variable labels. By default, SEPARATORS=' ' ' * ' ('blank' and 'blank asterisk blank'). The first value is used to separate variable names and values in interactions. The second value is used to separate interaction components . For example, the label for the dummy variable for the A =1 and B =2 cell is, by default, 'A 1 * B 2'. If SEPARATORS='=' 'x' is specified, then the label is 'A=1xB=2'. When SEPARATORS= is specified as an a-option or an o-option , it determines the default separators for all CLASS variables. When you specify SEPARATORS= as a t-option , it overrides the default only for selected variables.

SHORT

SHO

  • suppresses the iteration histories.

SINGULAR= n

SIN= n

  • specifies the largest value within rounding error of zero. By default, SINGULAR=1E ˆ’ 12. The TRANSREG procedure uses the value of the SINGULAR= a-option for checking 1 ˆ’ R 2 when constructing full-rank matrices of predictor variables, checking denominators before dividing, and so on. PROC TRANSREG computes the regression coefficients by sweeping with rational pivoting.

SS2

  • produces a regression table based on Type II sums of squares. Tests of the contribution of each transformation to the overall model are displayed and output to the OUTTEST= data set when you specify the OUTTEST= option. When you specify the SS2 a-option , the TEST a-option is implied . See the section 'Hypothesis Tests' on page 4615. You can suppress the variable labels in the regression tables by specifying the NOLABEL option in the OPTIONS statement.

TEST

TES

  • generates an ANOVA table. PROC TRANSREG tests the null hypothesis that the vector of scoring coefficients for all of the transformations is zero. See the section 'Hypothesis Tests' on page 4615.

TSTANDARD=CENTER NOMISS ORIGINAL Z

TST=CEN NOM ORI Z

  • specifies the standardization of the transformed variables for the hypothesis tests and in the OUT= data set. By default, TSTANDARD=ORIGINAL. When TSTANDARD= is specified as an a-option or an o-option , it determines the default standardization for all variables. When you specify TSTANDARD= as a t-option , it overrides the default standardization only for selected variables.

    CENTER

    centers the output variables to mean zero, but the variances are the same as the variances of the input variables.

    NOMISS

    sets the means and variances of the transformed variables in the OUT= data set, computed over all output values that correspond to nonmissing values in the input data set, to the means and variances computed from the nonmissing observations of the original variables. The

    TSTANDARD=NOMISS

    specification is useful with missing data. When a variable is linearly transformed, the final variable contains the original nonmissing values and the missing value estimates. In other words, the nonmissing values are unchanged. If your data have no

    missing values, TSTANDARD=NOMISS and TSTANDARD=ORIGINAL produce the same results.

    ORIGINAL

    sets the means and variances of the transformed variables to the means and variances of the original variables. This is the default.

    Z

    standardizes the variables to mean zero, variance one.

    The final standardization is affected by other options. If you also specify the ADDITIVE a-option , the TSTANDARD= option specifies an intermediate step in computing the final means and variances. The final independent variables, along with their means and standard deviations, are scaled by the regression coefficients, creating an additive model with all coefficients equal to one.

    For nonoptimal variable transformations, the means and variances of the original variables are actually the means and variances of the nonlinearly transformed variables, unless you specify the ORIGINAL nonoptimal t-option in the MODEL statement. For example, if a variable X with no missing values is specified as LOG, then, by default, the final transformation of X is simply LOG( X ), not LOG( X ) standardized to the mean of X and variance of X .

TYPE=' text ' name

TYP=' text ' name

  • specifies the valid value for the _TYPE_ variable in the input data set. If PROC TRANSREG finds an input _TYPE_ variable, it uses only observations with a _TYPE_ value that matches the TYPE= value. This enables a PROC TRANSREG OUT= data set containing coefficients to be used as input to PROC TRANSREG without requiring a WHERE statement to exclude the coefficients. If a _TYPE_ variable is not in the data set, all observations are used. The default is TYPE='SCORE', so if you do not specify the TYPE= a-option , only observations with _TYPE_ ='SCORE' are used. Do not confuse this option with the data set TYPE= option. The DATA= data set must be an ordinary SAS data set.

    PROC TRANSREG displays a note when it reads observations with blank values of _TYPE_ , but it does not automatically exclude those observations. Data sets created by the TRANSREG and PRINQUAL procedures have blank _TYPE_ values for those observations that were excluded from the analysis due to nonpositive weights, nonpositive frequencies, or missing data. When these observations are read again, they are excluded for the same reason that they were excluded from their original analysis, not because their _TYPE_ value is blank.

UNTIE= two-letters

UNT= two-letters

  • specifies the first and last special missing value in the list of those special missing values that are to be estimated with within-variable order constraints but no category constraints. The two-letters value must consist of two letters in alphabetical order. By default, there are category constraints but no order constraints on special missing value estimates. For details, see the 'Missing Values' section on page 4599 and the 'Optimal Scaling' section on page 4609.

UTILITIES

UTI

  • produces a table of the part-worth utilities from a conjoint analysis. Utilities, their standard errors, and the relative importance of each factor are displayed and output to the OUTTEST= data set when you specify the OUTTEST= option. When you specify the UTILITIES a-option , the TEST a-option is implied. Refer to SAS Technical Report R-109, Conjoint Analysis Examples , for more information on conjoint analysis.

OUTPUT Statement

  • OUTPUT OUT= SAS-data-set < o-options > ;

The OUTPUT statement creates a new SAS data set that contains coefficients, marginal means, and information on original and transformed variables. The information on original and transformed variables composes the score partition of the data set; observations have _TYPE_ ='SCORE'. The coefficients and marginal means compose the coefficient partition of the data set; observations have _TYPE_ ='M COEFFI' or _TYPE_ ='MEAN'. Other values of _TYPE_ are possible; for details, see ' _TYPE_ and _NAME_ Variables' later in this chapter. For details on data set structure, see the 'Output Data Set' section on page 4617.

To specify the data set, use the OUT= specification.

OUT= SAS-data-set

  • specifies the output data set for the data, transformed data, predicted values, residuals, scores, coefficients, and so on. When you use an OUTPUT statement but do not use the OUT= specification, PROC TRANSREG creates a data set and uses the DATAn convention. If you want to create a permanent SAS data set, you must specify a two-level name (refer to 'SAS Files' in SAS Language Reference: Concepts and 'Introduction to DATA Step Processing' in the SAS Procedures Guide for details).

    To control the contents of the data set and variable names, use one or more of the o-options . You can also specify these options in the PROC TRANSREG statement.

Output Options (o-options)

The following table provides a summary of options in the OUTPUT statement. These options include the OUT= option and all of the o-options .

Table 75.4: Options Available in the OUTPUT Statement

Task

Option

Identify output data set

 

output data set

OUT=

Predicted values, residuals, scores

 

outputs canonical scores

CANONICAL

outputs individual confidence limits

CLI

outputs mean confidence limits

CLM

specifies design matrix coding

DESIGN=

outputs leverage

LEVERAGE

does not restore missings

NORESTOREMISSING

suppresses output of scores

NOSCORES

outputs predicted values

PREDICTED

outputs redundancy variables

REDUNDANCY=

outputs residuals

RESIDUALS

Output data set replacement

 

replaces dependent variables

DREPLACE

replaces independent variables

IREPLACE

replaces all variables

REPLACE

Output data set coefficients

 

outputs coefficients

COEFFICIENTS

outputs ideal point coordinates

COORDINATES

outputs marginal means

MEANS

outputs redundancy analysis coefficients

MREDUNDANCY

Output data set variable name prefixes

 

dependent variable approximations

ADPREFIX=

independent variable approximations

AIPREFIX=

canonical dependent variables

CDPREFIX=

conservative individual lower

CL CILPREFIX=

canonical independent variables

CIPREFIX=

conservative-individual-upper

CL CIUPREFIX=

conservative-mean-lower

CL CMLPREFIX=

conservative-mean-upper

CL CMUPREFIX=

METHOD=MORALS untransformed dependent

DEPENDENT=

liberal-individual-lower

CL LILPREFIX=

liberal-individual-upper

CL LIUPREFIX=

liberal-mean-lower

CL LMLPREFIX=

liberal-mean-upper

CL LMUPREFIX=

residuals

RDPREFIX=

predicted values

PPREFIX=

redundancy variables

RPREFIX=

transformed dependents

TDPREFIX=

transformed independents

TIPREFIX=

Output data set macros

 

creates macro variables

MACRO

Control CLASS variables

 

controls output of reference levels

REFERENCE=

Output data set details

 

dependent and independent approximations

APPROXIMATIONS

canonical correlation coefficients

CCC

canonical elliptical point coordinate

CEC

canonical point coordinates

CPC

canonical quadratic point coordinates

CQC

approximations to transformed dependents

DAPPROXIMATIONS

approximations to transformed independents

IAPPROXIMATIONS

elliptical point coordinates

MEC

point coordinates

MPC

quadratic point coordinates

MQC

multiple regression coefficients

MRC

For the coefficients partition, the COEFFICIENTS, COORDINATES, and MEANS o-options provide the coefficients that are appropriate for your model. For more explicit control of the coefficient partition, use the options that control details and prefixes.

The following list provides details on these options.

ADPREFIX= name

ADP= name

  • specifies a prefix for naming the dependent variable predicted values. The default is ADPREFIX= P when you specify the PREDICTED o-option ; otherwise, it is ADPREFIX= A . Specifying the ADPREFIX= o-option also implies the PREDICTED o-option , and the ADPREFIX= o-option is the same as the PPREFIX= o-option .

AIPREFIX= name

AIP= name

  • specifies a prefix for naming the independent variable approximations. The default is AIPREFIX= A . Specifying the AIPREFIX= o-option also implies the IAPPROXIMATIONS o-option .

APPROXIMATIONS

APPROX

APP

  • is equivalent to specifying both the DAPPROXIMATIONS and the IAPPROXIMATIONS o-options . If METHOD=UNIVARIATE, then the APPROXIMATIONS o-option implies only the DAPPROXIMATIONS o-option .

CANONICAL

CAN

  • outputs canonical variables to the OUT= data set. When METHOD=CANALS, the CANONICAL o-option is implied. The CDPREFIX= o-option specifies a prefix for naming the dependent canonical variables (default Cand ), and the CIPREFIX= o-option specifies a prefix for naming the independent canonical variables (default Cani ).

CCC

  • outputs canonical correlation coefficients to the OUT= data set.

CDPREFIX= name

CDP= name

  • provides a prefix for naming the canonical dependent variables. The default is CDPREFIX= Cand . Specifying the CDPREFIX= o-option also implies the CANONICAL o-option .

CEC

  • outputs canonical elliptical point model coordinates to the OUT= data set.

CILPREFIX= name

CIL= name

  • specifies a prefix for naming the conservative-individual-lower confidence limits. The default prefix is CIL . Specifying the CILPREFIX= o-option also implies the CLI ooption .

CIPREFIX= name

CIP= name

  • provides a prefix for naming the canonical independent variables. The default is CIPREFIX= Cani . Specifying the CIPREFIX= o-option also implies the CANONICAL o-option .

CIUPREFIX= name

CIU= name

  • specifies a prefix for naming the conservative-individual-upper confidence limits. The default prefixis CIU . Specifying the CIUPREFIX= o-option also implies the CLI ooption .

CLI

  • outputs individual confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options : LILPREFIX= (default LIL for liberal individual lower), CILPREFIX= (default CIL for conservative individual lower), LIUPREFIX= (default LIU for liberal individual upper), and CIUPREFIX= (default CIU for conservative individual upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CLM

  • outputs mean confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options : LMLPREFIX= (default LML for liberal mean lower), CMLPREFIX= (default CML for conservative mean lower), LMUPREFIX= (default LMU for liberal mean upper), and CMUPREFIX= (default CMU for conservative mean upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CMLPREFIX= name

CML= name

  • specifies a prefix for naming the conservative-mean-lower confidence limits. The default prefixis CML . Specifying the CMLPREFIX= o-option also implies the CLM o-option .

CMUPREFIX= name

CMU= name

  • specifies a prefix for naming the conservative-mean-upper confidence limits. The default prefixis CMU . Specifying the CMUPREFIX= o-option also implies the CLM o-option .

COEFFICIENTS

COE

  • outputs either multiple regression coefficients or raw canonical coefficients to the OUT= data set. If you specify METHOD=CANALS (in the MODEL or PROC TRANSREG statement), then the COEFFICIENTS o-option outputs the first n canonical variables, where n is the value of the NCAN= a-option (specified in the MODEL or PROC TRANSREG statement). Otherwise, the COEFFICIENTS o-option includes multiple regression coefficients in the OUT= data set. In addition, when you specify the CLASS expansion for any independent variable, the COEFFICIENTS o-option also outputs marginal means.

COORDINATES

COO

  • outputs either ideal point or vector model coordinates for preference mapping to the OUT= data set. When METHOD=CANALS, these coordinates are computed from canonical coefficients; otherwise, the coordinates are computed from multiple regression coefficients. For details, see the 'Point Models' section on page 4605.

CPC

  • outputs canonical point model coordinates to the OUT= data set.

CQC

  • outputs canonical quadratic point model coordinates to the OUT= data set.

DAPPROXIMATIONS

DAP

  • outputs the approximations of the transformed dependent variables to the OUT= data set. These are the target values for the optimal transformations. With METHOD=UNIVARIATE and METHOD=MORALS, the dependent variable approximations are the ordinary predicted values from the linear model. The names of the approximation variables are constructed from the ADPREFIX= o-option (default A ) and the original dependent variable names. For ordinary predicted values, use the PREDICTED o-option instead of the DAPPROXIMATIONS o-option , since the PREDICTED o-option uses a more relevant prefix (' P ' instead of ' A ') and a more relevant variable label suffix ('Predicted Values' instead of 'Approximations').

DESIGN < = n >

DES < = n >

  • specifies that your primary goal is design matrix coding, not analysis. Specifying the DESIGN o-option makes the procedure run faster. The DESIGN o-option sets the default method to UNIVARIATE and the default MAXITER= value to zero. It suppresses computing the regression coefficients, unless they are needed for some other option. Furthermore, when the DESIGN o-option is specified, the MODEL statement is not required to have an equal sign. When no MODEL statement equal sign is specified, all variables are considered independent variables, all options that require dependent variables are ignored, and the IREPLACE o-option is implied.

    You can use DESIGN= n for coding very large data sets, where n is the number of observations to code at one time. For example, to code a data set with a large number of observations, you can specify DESIGN=100 or DESIGN=1000 to process the data set in blocks of 100 or 1000 observations. If you specify the DESIGN o-option rather than DESIGN= n , PROC TRANSREG tries to process all observations at once, which will not work with very large data sets. Specify the NOZEROCONSTANT aoption with DESIGN=n to ensure that constant variables within blocks are not zeroed. See the section 'Using the DESIGN Output Option' on page 4654 and the section 'Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO' on page 4660.

DEPENDENT= name

DEP= name

  • specifies the untransformed dependent variable for OUT= data sets with METHOD=MORALS when there is more than one dependent variable. The default is DEPENDENT= _DEPEND_ .

DREPLACE

DRE

  • replaces the original dependent variables with the transformed dependent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original dependent variables in the input data set. By default, both the original dependent variables and transformed dependent variables (with names constructed from the TDPREFIX= (default T ) o-option and the original dependent variable names) are included in the OUT= data set.

IAPPROXIMATIONS

IAP

  • outputs the approximations of the transformed independent variables to the OUT= data set. These are the target values for the optimal transformations. The names of the approximation variables are constructed from the AIPREFIX= o-option (default A ) and the original independent variable names. Specifying the AIPREFIX= o-option also implies the IAPPROXIMATIONS o-option . The IAPPROXIMATIONS o-option is not valid when METHOD=UNIVARIATE.

IREPLACE

IRE

  • replaces the original independent variables with the transformed independent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original independent variables in the input data set. By default, both the original independent variables and transformed independent variables (with names constructed from the TIPREFIX= o-option (default T ) and the original independent variable names) are included in the OUT= data set.

LEVERAGE < = name >

LEV < = name >

  • creates a variable with the specified name in the OUT= data set that contains leverages . Specifying the LEVERAGE o-option is equivalent to specifying LEVERAGE= Leverage .

LILPREFIX= name

LIL= name

  • specifies a prefix for naming the liberal-individual-lower confidence limits. The default prefix is LIL . Specifying the LILPREFIX= o-option also implies the CLI ooption .

LIUPREFIX= name

LIU= name

  • specifies a prefix for naming the liberal-individual-upper confidence limits. The default prefix is LIU . Specifying the LIUPREFIX= o-option also implies the CLI ooption .

LMLPREFIX= name

LML= name

  • specifies a prefix for naming the liberal-mean-lower confidence limits. The default prefix is LML . Specifying the LMLPREFIX= o-option also implies the CLM o-option .

LMUPREFIX= name

LMU= name

  • specifies a prefix for naming the liberal-mean-upper confidence limits. The default prefix is LMU . Specifying the LMUPREFIX= o-option also implies the CLM ooption .

MACRO( keyword=name )

MAC( keyword=name )

  • creates macro variables. Most of the options available within the MACRO o-option are rarely needed. By default, the TRANSREG procedure creates a macro variable named _TRGIND with a complete list of independent variables created by the procedure. When the TRANSREG procedure is being used for design matrix creation prior to running a procedure without a CLASS statement, this macro provides a convenient way to use the results from PROC TRANSREG. For example, a PROC LOGISTIC step that uses a design matrix coded by PROC TRANSREG could use the following MODEL statement:

      model y=&_trgind;  

    The TRANSREG procedure, also by default, creates a macro variable named _TRGINDN , which contains the number of variables in the _TRGIND list. This macro variable could be used in an ARRAY statement as follows :

      array indvars[&_trgindn] &_trgind;  

    See the section 'Using the DESIGN Output Option' on page 4654 and the section 'Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO' on page 4660 for examples of using the default macro variables.

    The available keywords are as follows.

    DN= name

    specifies the name of a macro variable that contains the number of dependent variables. By default, a macro variable named _TRGDEPN is created. This is the number of variables in the DL= list and the number of macro variables created by the DV= and DE= specifications.

    IN= name

    specifies the name of a macro variable that contains the number of independent variables. By default, a macro variable named _TRGINDN is created. This is the number of variables in the IL= list and the number of macro variables created by the IV= and IE= specifications.

    DL= name

    specifies the name of a macro variable that contains the list of the dependent variables. By default, a macro variable named _TRGDEP is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three dependent variables, Y1_Y3 , then _TRGDEP contains, by default, TY1 TY2 TY3 (or Y1 Y2 Y3 if you specify the REPLACE o-option ).

    IL= name

    specifies the name of a macro variable that contains the list of the independent variables. By default, a macro variable named _TRGIND is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three independent variables, X1_X3 , then _TRGIND contains, by default, TX1 TX2 TX3 (or X1 X2 X3 if you specify the REPLACE o-option ).

    DV= prefix

    specifies a prefix for creating a list of macro variables, each of which contains one dependent variable name. For example, if there are three dependent variables, Y1_Y3 , and you specify MACRO(DV= DEP ), then three macro variables, DEP1 , DEP2 , and DEP3 , are created, containing TY1 , TY2 ,and TY3 , respectively (or Y1 , Y2 ,and Y3 if you specify the REPLACE o-option ). By default, no list is created.

    IV= prefix

    specifies a prefix for creating a list of macro variables, each of which contains one independent variable name. For example, if there are three independent variables, X1_X3 , and you specify MACRO(IV= IND ), then three macro variables, IND1 , IND2 ,and IND3 , are created, containing TX1 , TX2 ,and TX3 , respectively (or X1 , X2 ,and X3 if you specify the REPLACE o-option ). By default, no list is created.

    DE= prefix

    specifies a prefix for creating a list of macro variables, each of which contains one dependent variable effect. This list shows the origin of each model term . Each effect consists of two or more parts , and each part consists of a value in 32 columns followed by a blank. For example, if you specify MACRO(DE= D ), then a macro variable D1 is created for IDENTITY( Y ). The D1 macro variable is shown next, wrapped onto two lines.

      4                                  TY   IDENTITY                           Y  

    The first part is the number of parts (4), the second part is the transformed variable name, the third part is the transformation, and the last part is the input variable name. By default, no list is created.

    IE= prefix

    specifies a prefix for creating a list of macro variables, each of which contains one independent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify MACRO(IE= I ), then three macro variables, I1 , I2 ,and I3 , are created for CLASS( X1 X2 ) when both X1 and X2 have values of 1 and 2. These macro variables are shown next, but with extra white space removed.

      5      Tx11      CLASS      x1     1   5      Tx21      CLASS      x2     1   8      Tx11x21   CLASS      x1     1      CLASS      x2      1  

    For CLASS variables, the formatted level appears after the variable name. The first two effects are the main effects, and the last is the interaction term. By default, no list is created.

MEANS

MEA

  • outputs marginal means for CLASS variable expansions to the OUT= data set.

MEC

  • outputs multiple regression elliptical point model coordinates to the OUT= data set.

MPC

  • outputs multiple regression point model coordinates to the OUT= data set.

MQC

  • outputs multiple regression quadratic point model coordinates to the OUT= data set.

MRC

  • outputs multiple regression coefficients to the OUT= data set.

MREDUNDANCY

MRE

  • outputs multiple redundancy analysis coefficients to the OUT= data set.

NORESTOREMISSING

NORESTORE

NOR

  • specifies that missing values should not be restored when the OUT= data set is created. By default, the coded CLASS variable contains a row of missing values for observations in which the CLASS variable is missing. When you specify the NORESTOREMISSING o-option , these observations contain a row of zeros instead. This is useful when the TRANSREG procedure is used to code experimental designs for discrete choice models and there is a constant alternative indicated by a missing value.

NOSCORES

NOS

  • excludes original variables, transformed variables, predicted values, residuals, and scores from the OUT= data set. You can use the NOSCORES o-option with various other options to create an OUT= data set that contains only a coefficient partition (for example, a data set consisting entirely of coefficients and coordinates).

PREDICTED

PRE

P

  • outputs predicted values, which for METHOD=UNIVARIATE and METHOD=MORALS are the ordinary predicted values from the linear model, to the OUT= data set. The names of the predicted values' variables are constructed from the PPREFIX= o-option (default P ) and the original dependent variable names. Specifying the PPREFIX= o-option also implies the PREDICTED o-option .

PPREFIX= name

PDPREFIX= name

PDP= name

  • specifies a prefix for naming the dependent variable predicted values. The default is PPREFIX= P when you specify the PREDICTED o-option ; otherwise, it is PPREFIX= A . Specifying the PPREFIX= o-option also implies the PREDICTED ooption , and the PPREFIX= o-option is the same as the ADPREFIX= o-option .

RDPREFIX= name

RDP= name

  • specifies a prefix for naming the residual (dependent) variables to the OUT= data set. The default is RDPREFIX= R . Specifying the RDPREFIX= o-option also implies the RESIDUALS o-option .

REDUND ANCY < =STANDARDIZE UNSTAND ARDIZE >

RED < =STA UNS >

  • outputs redundancy variables to the OUT= data set, either standardized or unstandardized. Specifying the REDUNDANCY o-option is the same as specifying REDUNDANCY=STANDARDIZE. The results of the REDUNDANCY o-option depends on the TSTANDARD= option. You must specify TSTANDARD=Z to get results based on standardized data. The TSTANDARD= option controls how the data that go into the redundancy analysis are scaled, and REDUNDANCY=STANDARDIZE UNSTANDARDIZE controls how the redundancy variables are scaled. The REDUNDANCY o-option is implied by METHOD=REDUNDANCY. The RPREFIX= o-option specifies a prefix (default Red ) for naming the redundancy variables.

REFERENCE=NONE MISSING ZERO

REF=NON MIS ZER

  • specifies how reference levels of CLASS variables are to be treated. The options are REFERENCE=NONE, the default, in which reference levels are suppressed; REFERENCE=MISSING, in which reference levels are displayed and output with missing values; and REFERENCE=ZERO, in which reference levels are displayed and output with zeros. The REFERENCE= option can be specified in the PROC TRANSREG, MODEL, or OUTPUT statement, and it can be independently specified for the OUT= data set and the displayed output. When you specify it in only one statement, it sets the option for both the displayed output and the OUT= data set.

REPLACE

REP

  • is equivalent to specifying both the DREPLACE and the IREPLACE o-options .

RESIDUALS

RES

R

  • outputs the differences between the transformed dependent variables and their predicted values. The names of the residual variables are constructed from the RDPREFIX= o-option (default R ) and the original dependent variable names.

RPREFIX= name

RPR= name

  • provides a prefix for naming the redundancy variables. The default is RPREFIX= Red . Specifying the RPREFIX= o-option also implies the REDUNDANCY o-option .

TDPREFIX= name

TDP= name

  • specifies a prefix for naming the transformed dependent variables. By default, TDPREFIX= T . The TDPREFIX= o-option is ignored when you specify the DREPLACE o-option .

TIPREFIX= name

TIP= name

  • specifies a prefix for naming the transformed independent variables. By default, TIPREFIX= T . The TIPREFIX= o-option is ignored when you specify the IREPLACE o-option .

WEIGHT Statement

  • WEIGHT variable ;

When you use a WEIGHT statement, a weighted residual sum of squares is minimized. The WEIGHT statement has no effect on degrees of freedom or number of observations, but the weights affect most other calculations. The observation is used in the analysis only if the value of the WEIGHT statement variable is greater than 0.




SAS.STAT 9.1 Users Guide (Vol. 7)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 132

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net