Syntax


The following statements are available in PROC MIXED.

  • PROC MIXED < options > ;

    • BY variables ;

    • CLASS variables ;

    • ID variables ;

    • MODEL dependent = < fixed-effects >< / options > ;

    • RANDOM random-effects < / options > ;

    • REPEATED < repeated-effect >< / options > ;

    • PARMS ( value-list ) ...< / options > ;

    • PRIOR < distribution >< / options > ;

    • CONTRAST label < fixed-effect values >

      • < random-effect values > , ...< / options > ;

    • ESTIMATE label < fixed-effect values >

      • < random-effect values >< / options > ;

    • LSMEANS fixed-effects < / options > ;

    • WEIGHT variable ;

Items within angle brackets (< >) are optional. The CONTRAST, ESTIMATE, LSMEANS, and RANDOM statements can appear multiple times; all other statements can appear only once.

The PROC MIXED and MODEL statements are required, and the MODEL statement must appear after the CLASS statement if a CLASS statement is included. The CONTRAST, ESTIMATE, LSMEANS, RANDOM, and REPEATED statements must follow the MODEL statement. The CONTRAST and ESTIMATE statements must also follow any RANDOM statements.

Table 46.1 summarizes the basic functions and important options of each PROC MIXED statement. The syntax of each statement in Table 46.1 is described in the following sections in alphabetical order after the description of the PROC MIXED statement.

Table 46.1: Summary of PROC MIXED Statements

Statement

Description

Important Options

PROC MIXED

invokes the procedure

DATA= specifies input data set, METHOD= specifies estimation method

BY

performs multiple PROC MIXED analyses in one invocation

none

CLASS

declares qualitative variables that create indicator variables in design matrices

none

ID

lists additional variables to be included in predicted values tables

none

MODEL

specifies dependent variable and fixed effects, setting up X

S requests solution for fixed-effects parameters, DDFM= specifies denominator degrees of freedom method, OUTP= outputs predicted values to a data set, INFLUENCE computes influence diagnostics

RANDOM

specifies random effects, setting up Z and G

SUBJECT= creates block-diagonality, TYPE= specifies covariance structure, S requests solution for random-effects parameters, G displays estimated G

REPEATED

sets up R

SUBJECT= creates block-diagonality, TYPE= specifies covariance structure, R displays estimated blocks of R , GROUP = enables between-subject heterogeneity, LOCAL adds a diagonal matrix to R

PARMS

specifies a grid of initial values for the covariance parameters

HOLD= and NOITER hold the covariance parameters or their ratios constant, PDATA= reads the initial values from a SAS data set

PRIOR

performs a sampling-based Bayesian analysis for variance component models

NSAMPLE= specifies the sample size , SEED= specifies the starting seed

CONTRAST

constructs custom hypothesis tests

E displays the L matrix coefficients

ESTIMATE

constructs custom scalar estimates

CL produces confidence limits

LSMEANS

computes least squares means for classification fixed effects

DIFF computes differences of the least squares means, ADJUST= performs multiple comparisons adjustments, AT changes covariates, OM changes weighting , CL produces confidence limits, SLICE= tests simple effects

WEIGHT

specifies a variable by which to weight R

none

PROC MIXED Statement

  • PROC MIXED < options > ;

The PROC MIXED statement invokes the procedure. You can specify the following options.

ABSOLUTE

  • makes the convergence criterion absolute. By default, it is relative (divided by the current objective function value). See the CONVF, CONVG, and CONVH options in this section for a description of various convergence criteria.

ALPHA= number

  • requests that confidence limits be constructed for the covariance parameter estimates with confidence level 1 ˆ’ number . The value of number must be between 0 and 1; the default is 0.05.

ASYCORR

  • produces the asymptotic correlation matrix of the covariance parameter estimates. It is computed from the corresponding asymptotic covariance matrix (see the description of the ASYCOV option, which follows ). For ODS purposes, the label of the Asymptotic Correlation table is AsyCorr.

ASYCOV

  • requests that the asymptotic covariance matrix of the covariance parameters be displayed. By default, this matrix is the observed inverse Fisher information matrix, which equals 2 H ˆ’ 1 , where H is the Hessian (second derivative) matrix of the objective function. See the Covariance Parameter Estimates section on page 2750 for more information about this matrix. When you use the SCORING= option and PROC MIXED converges without stopping the scoring algorithm, PROC MIXED uses the expected Hessian matrix to compute the covariance matrix instead of the observed Hessian. For ODS purposes, the label of the Asymptotic Covariance table is AsyCov.

CL < =WALD >

  • requests confidence limits for the covariance parameter estimates. A Satterthwaite approximation is used to construct limits for all parameters that have a lower boundary constraint of zero. These limits take the form

    click to expand
  • where ½ =2 Z 2 , Z is the Wald statistic 2 / se( 2 ), and the denominators are quantiles of the 2 -distribution with ½ degrees of freedom. Refer to Milliken and Johnson (1992) and Burdick and Graybill (1992) for similar techniques.

  • For all other parameters, Wald Z -scores and normal quantiles are used to construct the limits. Wald limits are also provided for variance components if you specify the NOBOUND option. The optional =WALD specification requests Wald limits for all parameters.

  • The confidence limits are displayed as extra columns in the Covariance Parameter Estimates table. The confidence level is 1 ˆ’ ± =0 . 95 by default; this can be changed with the ALPHA= option.

CONVF < = number >

  • requests the relative function convergence criterion with tolerance number . The relative function convergence criterion is

    click to expand
  • where f k is the value of the objective function at iteration k . To prevent the division by f k ,use the ABSOLUTE option. The default convergence criterion is CONVH, and the default tolerance is 1E ˆ’ 8.

CONVG < = number >

  • requests the relative gradient convergence criterion with tolerance number . The relative gradient convergence criterion is

    click to expand
  • where f k is the value of the objective function, and g jk is the j th element of the gradient (first derivative) of the objective function, both at iteration k . To prevent division by f k , use the ABSOLUTE option. The default convergence criterion is CONVH, and the default tolerance is 1E ˆ’ 8.

CONVH < = number >

  • requests the relative Hessian convergence criterion with tolerance number . The relative Hessian convergence criterion is

    click to expand
  • where f k is the value of the objective function, g k is the gradient (first derivative) of the objective function, and H k is the Hessian (second derivative) of the objective function, all at iteration k .

  • If H k is singular, then PROC MIXED uses the following relative criterion:

  • To prevent the division by f k , use the ABSOLUTE option. The default convergence criterion is CONVH, and the default tolerance is 1E ˆ’ 8.

COVTEST

  • produces asymptotic standard errors and Wald Z -tests for the covariance parameter estimates.

DATA= SAS-data-set

  • names the SAS data set to be used by PROC MIXED. The default is the most recently created data set.

DFBW

  • has the same effect as the DDFM=BW option in the MODEL statement.

EMPIRICAL

  • computes the estimated variance-covariance matrix of the fixed-effects parameters by using the asymptotically consistent estimator described in Huber (1967), White (1980), Liang and Zeger (1986), and Diggle, Liang, and Zeger (1994). This estimator is commonly referred to as the sandwich estimator , and it is computed as follows:

    click to expand
  • Here, i = y i ˆ’ X i , S is the number of subjects, and matrices with an i subscript are those for the i th subject. You must include the SUBJECT= option in either a RANDOM or REPEATED statement for this option to take effect.

  • When you specify the EMPIRICAL option, PROC MIXED adjusts all standard errors and test statistics involving the fixed-effects parameters. This changes output in the following tables (listed in Table 46.8 on page 2752): Contrast, CorrB, CovB, Diffs, Estimates, InvCovB, LSMeans, MMEq, MMEqSol, Slices, SolutionF, Tests1“Tests3. The OUTP= and OUTPM= data sets are also affected. Finally, the Satterthwaite and Kenward-Roger degrees of freedom methods are not available if you specify EMPIRICAL.

IC

  • displays a table of various information criteria. The criteria are all in smaller-is-better form, and are described in Table 46.2.

    Table 46.2: Information Criteria

    Criteria

    Formula

    Reference

    AIC

    ˆ’ 2 + 2 d

    Akaike (1974)

    AICC

    ˆ’ 2 + 2 dn * / ( n * ˆ’ d ˆ’ 1)

    Hurvich and Tsai (1989)

    Burnham and Anderson (1998)

    HQIC

    ˆ’ 2 + 2 d log log n

    Hannan and Quinn (1979)

    BIC

    ˆ’ 2 + d log n

    Schwarz (1978)

    CAIC

    ˆ’ 2 + d (log n +1)

    Bozdogan (1987)

  • Here denotes the maximum value of the (possibly restricted) log likelihood , d the dimension of the model, and n the number of observations. In Version 6 of SAS/STAT software, n equals the number of valid observations for maximum likelihood estimation and n ˆ’ p for restricted maximum likelihood estimation, where p equals the rank of X . In later versions, n equals the number of effective subjects as displayed in the Dimensions table, unless this value equals 1, in which case n equals the number of levels of the first RANDOM effect you specify. If the number of effective subjects equals 1 and you have no RANDOM statements, then n reverts to the Version 6 values. For AICC (a finite-sample corrected version of AIC), n * equals the Version 6 values of n , unless this number is less than d + 2, in which case it equals d + 2.

  • For restricted likelihood estimation, d equals q the effective number of estimated covariance parameters. In Version 6, when a parameter estimate lies on a boundary constraint, then it is still included in the calculation of d , but in later versions it is not. The most common example of this behavior is when a variance component is estimated to equal zero. For maximum likelihood estimation, d equals q + p .

  • For ODS purposes, the name of the Information Criteria table is InfoCrit.

INFO

  • is a default option. The creation of the Model Information, Dimensions, and Number of Observations tables can be suppressed using the NOINFO option.

  • Note that, in Version 6, this option displays the Model Information and Dimensions tables.

ITDETAILS

  • displays the parameter values at each iteration and enables the writing of notes to the SAS log pertaining to infinite likelihood and singularities during Newton-Raphson iterations.

LOGNOTE

  • writes periodic notes to the log describing the current status of computations . It is designed for use with analyses requiring extensive CPU resources.

MAXFUNC= number

  • specifies the maximum number of likelihood evaluations in the optimization process. The default is 150.

MAXITER= number

  • specifies the maximum number of iterations. The default is 50.

METHOD=REML

METHOD=ML

METHOD=MIVQUE0

METHOD=TYPE1

METHOD=TYPE2

METHOD=TYPE3

MMEQ

  • requests that coefficients of the mixed model equations be displayed. These are

    click to expand
  • assuming that is nonsingular. If is singular, PROC MIXED produces the following coefficients

    click to expand
  • See the Estimating ² and ³ in the Mixed Model section on page 2739 for further information on these equations.

MMEQSOL

  • requests that a solution to the mixed model equations be produced, as well as the inverted coefficients matrix. Formulas for these equations are provided in the preceding description of the MMEQ option.

  • When is singular, and a generalized inverse of the left-hand-side coefficient matrix are transformed using to produce and , respectively, where is a generalized inverse of the left-hand-side coefficient matrix of the original equations.

NAMELEN <= number >

  • specifies the length to which long effect names are shortened . The default and minimum value is 20.

NOBOUND

  • has the same effect as the NOBOUND option in the PARMS statement (see page 2707).

NOCLPRINT <= number >

  • suppresses the display of the Class Level Information table if you do not specify number . If you do specify number , only levels with totals that are less than number are listed in the table.

NOINFO

  • suppresses the display of the Model Information, Dimensions, and Number of Observations tables.

NOITPRINT

  • suppresses the display of the Iteration History table.

NOPROFILE

  • includes the residual variance as part of the Newton-Raphson iterations. This option applies only to models that have a residual variance parameter. By default, this parameter is profiled out of the likelihood calculations, except when you have specified the HOLD= or NOITER option in the PARMS statement.

ORD

  • displays ordinates of the relevant distribution in addition to p -values. The ordinate can be viewed as an approximate odds ratio of hypothesis probabilities.

ORDER=DATA

ORDER=FORMATTED

ORDER=FREQ

ORDER=INTERNAL

  • specifies the sorting order for the levels of all CLASS variables. This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option may be useful when you use CONTRAST or ESTIMATE statements.

  • The default is ORDER=FORMATTED, and its behavior has been modified for Version 8. When the default ORDER=FORMATTED is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values. In releases previous to Version 8, numeric class levels with no explicit format were ordered by their BEST12. formatted values. In order to revert to the previous method you can specify this format explicitly for the CLASS variables. The change was implemented because the former default behavior for ORDER=FORMATTED often resulted in levels not being ordered numerically and required you to use an explicit format or ORDER=INTERNAL to get the more natural ordering.

  • The following table shows how PROC MIXED interprets values of the ORDER= option.

Value of ORDER=

Levels Sorted By

DATA

order of appearance in the input data set

FORMATTED

external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

FREQ

descending frequency count; levels with the most observations come first in the order

INTERNAL

unformatted value

For FORMATTED and INTERNAL, the sort order is machine dependent. For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.

RATIO

  • produces the ratio of the covariance parameter estimates to the estimate of the residual variance when the latter exists in the model.

RIDGE= number

  • specifies the starting value for the minimum ridge value used in the Newton-Raphson algorithm. The default is 0.3125.

SCORING < = number >

  • requests that Fisher scoring be used in association with the estimation method up to iteration number , which is 0 by default. When you use the SCORING= option and PROC MIXED converges without stopping the scoring algorithm, PROC MIXED uses the expected Hessian matrix to compute approximate standard errors for the covariance parameters instead of the observed Hessian. The output from the ASYCOV and ASYCORR options is similarly adjusted.

SIGITER

  • is an alias for the NOPROFILE option.

UPDATE

  • is an alias for the LOGNOTE option.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC MIXED to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement options NOTSORTED or DESCENDING in the BY statement for the MIXED procedure. The NOTSORTED option does not mean that the data are unsorted but rather means that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure (in base SAS software).

Since sorting the data changes the order in which PROC MIXED reads observations, the sorting order for the levels of the CLASS variable may be affected if you have specified ORDER=DATA in the PROC MIXED statement. This, in turn , affects specifications in the CONTRAST statements.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

  • CLASS variables ;

The CLASS statement names the classification variables to be used in the analysis. If the CLASS statement is used, it must appear before the MODEL statement.

Classification variables can be either character or numeric. By default, class levels are determined from the entire formatted values of the CLASS variables. Note that this represents a slight change from previous releases in the way in which class levels are determined. In releases prior to Version 9, class levels were determined using no more than the first 16 characters of the formatted values. If you wish to revert to this previous behavior you can use the TRUNCATE option in the CLASS statement. In any case, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .You can adjust the order of CLASS variable levels with the ORDER= option in the PROC MIXED statement.

You can specify the following option in the CLASS statement after a slash(/):

TRUNCATE MIXED specifies that class levels should be determined using only no more than the first 16 characters of the formatted values of CLASS variables. When formatted values are longer than 16 characters, you can use this option in order to revert to the levels as determined in releases previous to Version 9.

CONTRAST Statement

  • CONTRAST label < fixed-effect values ... >

    • < random-effect values ... > ,... < / options > ;

The CONTRAST statement provides a mechanism for obtaining custom hypothesis tests. It is patterned after the CONTRAST statement in PROC GLM, although it has been extended to include random effects. This enables you to select an appropriate inference space (McLean, Sanders, and Stroup 1991).

You can test the hypothesis L' = 0, where L ² = ( K ² M ² ) and ' = ( ² ² ³ ² ), in several inference spaces. The inference space corresponds to the choice of M . When M = , your inferences apply to the entire population from which the random effects are sampled; this is known as the broad inference space. When all elements of M are nonzero, your inferences apply only to the observed levels of the random effects. This is known as the narrow inference space, and you can also choose it by specifying all of the random effects as fixed. The GLM procedure uses the narrow inference space. Finally, by zeroing portions of M corresponding to selected main effects and interactions, you can choose intermediate inference spaces. The broad inference space is usually the most appropriate, and it is used when you do not specify any random effects in the CONTRAST statement.

In the CONTRAST statement,

label

identifies the contrast in the table. A label is required for every contrast specified. Labels can be up to 20 characters and must be enclosed in single quotes.

fixed-effect

identifies an effect that appears in the MODEL statement. The keyword INTERCEPT can be used as an effect when an intercept is fitted in the model. You do not need to include all effects that are in the MODEL statement.

random-effect

identifies an effect that appears in the RANDOM statement. The first random effect must follow a vertical bar (); however, random effects do not have to be specified.

values

are constants that are elements of the L matrix associated with the fixed and random effects.

The rows of L ² are specified in order and are separated by commas. The rows of the K ² component of L ² are specified on the left side of the vertical bars (). These rows test the fixed effects and are, therefore, checked for estimability. The rows of the M ² component of L ² are specified on the right side of the vertical bars. They test the random effects, and no estimability checking is necessary.

If PROC MIXED finds the fixed-effects portion of the specified contrast to be nonestimable (see the SINGULAR= option on page 2684), then it displays Non-est for the contrast entries.

The following CONTRAST statement reproduces the F -test for the effect A in the split-plot example (see Example 46.1 on page 2777):

  contrast 'A broad'   A  1 -1 0    A*B .5 .5 -.5 -.5 0 0 ,   A  1 0 -1    A*B .5 .5 0 0 -.5 -.5 / df=6;  

Note that no random effects are specified in the preceding contrast; thus, the inference space is broad. The resulting F -test has two numerator degrees of freedom because L ² has two rows. The denominator degrees of freedom is, by default, the residual degrees of freedom (9), but the DF= option changes the denominator degrees of freedom to 6.

The following CONTRAST statement reproduces the F -test for A when Block and A * Block are considered fixed effects (the narrow inference space):

  contrast 'A narrow'   A        1 -1 0   A*B     .5 .5 -.5 -.5 0 0   A*Block .25 .25 .25 .25   -.25 -.25 -.25 -.25   0    0    0    0 ,   A        1 0 -1   A*B     .5 .5 0 0 -.5 -.5   A*Block .25 .25 .25 .25   0    0    0    0   -.25 -.25 -.25 -.25 ;  

The preceding contrast does not contain coefficients for B and Block because they cancel out in estimated differences between levels of A . Coefficients for B and Block are necessary when estimating the mean of one of the levels of A in the narrow inference space (see Example 46.1 on page 2777).

If the elements of L are not specified for an effect that contains a specified effect, then the elements of the specified effect are automatically filled in over the levels of the higher-order effect. This feature is designed to preserve estimability for cases when there are complex higher-order effects. The coefficients for the higher-order effect are determined by equitably distributing the coefficients of the lower-level effect as in the construction of least squares means. In addition, if the intercept is specified, it is distributed over all classification effects that are not contained by any other specified effect. If an effect is not specified and does not contain any specified effects, then all of its coefficients in L are set to 0. You can override this behavior by specifying coefficients for the higher-order effect.

If too many values are specified for an effect, the extra ones are ignored; if too few are specified, the remaining ones are set to 0. If no random effects are specified, the vertical bar can be omitted; otherwise , it must be present. If a SUBJECT effect is used in the RANDOM statement, then the coefficients specified for the effects in the RANDOM statement are equitably distributed across the levels of the SUBJECT effect. You can use the E option to see exactly what L matrix is used.

The SUBJECT and GROUP options in the CONTRAST statement are useful for the case when a SUBJECT= or GROUP= variable appears in the RANDOM statement, and you want to contrast different subjects or groups. By default, CONTRAST statement coefficients on random effects are distributed equally across subjects and groups.

PROC MIXED handles missing level combinations of classification variables similarly to the way PROC GLM does. Both procedures delete fixed-effects parameters corresponding to missing levels in order to preserve estimability. However, PROC MIXED does not delete missing level combinations for random-effects parameters because linear combinations of the random-effects parameters are always estimable . These conventions can affect the way you specify your CONTRAST coefficients.

The CONTRAST statement computes the statistic

click to expand

and approximates its distribution with an F- distribution. In this expression, is an estimate of the generalized inverse of the coefficient matrix in the mixed model equations. See the Inference and Test Statistics section on page 2741 for more information on this F -statistic.

The numerator degrees of freedom in the F -approximation is rank ( L ), and the denominator degrees of freedom is taken from the Tests of Fixed Effects table and corresponds to the final effect you list in the CONTRAST statement. You can change the denominator degrees of freedom by using the DF= option.

You can specify the following options in the CONTRAST statement after a slash (/).

CHISQ

  • requests that 2 -tests be performed in addition to any F -tests. A 2 -statistic equals its corresponding F -statistic times the associate numerator degrees of freedom, and this same degrees of freedom is used to compute the p -value for the 2 -test. This p -value will always be less than that for the F -test, as it effectively corresponds to an F -test with infinite denominator degrees of freedom.

DF= number

  • specifies the denominator degrees of freedom for the F- test. The default is the denominator degrees of freedom taken from the Tests of Fixed Effects table and corresponds to the final effect you list in the CONTRAST statement.

E

  • requests that the L matrix coefficients for the contrast be displayed. For ODS purposes, the label of this L Matrix Coefficients table is Coef.

GROUP coeffs

GRP coeffs

  • sets up random-effect contrasts between different groups when a GROUP= variable appears in the RANDOM statement. By default, CONTRAST statement coefficients on random effects are distributed equally across groups.

SINGULAR= number

  • tunes the estimability checking. If v is a vector, define ABS( v ) to be the absolute value of the element of v with the largest absolute value. If ABS( K ²ˆ’ K ² T ) is greater than C* number for any row of K ² in the contrast, then K is declared nonestimable. Here T is the Hermite form matrix ( X ² X ) ˆ’ X ² X , and C is ABS( K ² ) except when it equals 0, and then C is 1. The value for number must be between 0 and 1; the default is 1E ˆ’ 4.

SUBJECT coeffs

SUB coeffs

  • sets up random-effect contrasts between different subjects when a SUBJECT= variable appears on the RANDOM statement. By default, CONTRAST statement coefficients on random effects are distributed equally across subjects.

ESTIMATE Statement

  • ESTIMATE label < fixed-effect values ... >

    • < random-effect values ... > ,... < / options > ;

The ESTIMATE statement is exactly like a CONTRAST statement, except only one-row L matrices are permitted. The actual estimate, L ² , is displayed along with its approximate standard error. An approximate t- test that L ² = 0 is also produced.

PROC MIXED selects the degrees of freedom to match those displayed in the Tests of Fixed Effects table for the final effect you list in the ESTIMATE statement. You can modify the degrees of freedom using the DF= option.

If PROC MIXED finds the fixed-effects portion of the specified estimate to be nonestimable, then it displays Non-est for the estimate entries.

The following examples of ESTIMATE statements compute the mean of the first level of A in the split-plot example (see Example 46.1 on page 2777) for various inference spaces.

  estimate 'A1 mean narrow'   intercept 1   A 1 B .5 .5 A*B .5 .5   block .25 .25 .25 .25   A*Block .25 .25 .25 .25   0 0 0 0   0 0 0 0;   estimate 'A1 mean intermed' intercept 1   A 1 B .5 .5 A*B .5 .5   Block .25 .25 .25 .25;   estimate 'A1 mean broad'    intercept 1   A 1 B .5 .5 A*B .5 .5;  

The construction of the L vector for an ESTIMATE statement follows the same rules as listed under the CONTRAST statement.

You can specify the following options in the ESTIMATE statement after a slash (/).

ALPHA= number

  • requests that a t -type confidence interval be constructed with confidence level 1 ˆ’ number . The value of number must be between 0 and 1; the default is 0.05.

CL

  • requests that t -type confidence limits be constructed. The confidence level is 0.95 by default; this can be changed with the ALPHA= option.

DF= number

  • specifies the degrees of freedom for the t- test and confidence limits. The default is the denominator degrees of freedom taken from the Tests of Fixed Effects table and corresponds to the final effect you list in the ESTIMATE statement.

DIVISOR= number

  • specifies a value by which to divide all coefficients so that fractional coefficients can be entered as integer numerators.

E

  • requests that the L matrix coefficients be displayed. For ODS purposes, the label of this L Matrix Coefficients table is Coef.

GROUP coeffs

GRP coeffs

  • sets up random-effect contrasts between different groups when a GROUP= variable appears in the RANDOM statement. By default, ESTIMATE statement coefficients on random effects are distributed equally across groups.

LOWER

LOWERTAILED

  • requests that the p -value for the t -test be based only on values less than the t -statistic. A two-tailed test is the default. A lower-tailed confidence limit is also produced if you specify the CL option.

SINGULAR= number

  • tunes the estimability checking as documented for the CONTRAST statement.

SUBJECT coeffs

SUB coeffs

  • sets up random-effect contrasts between different subjects when a SUBJECT= variable appears in the RANDOM statement. By default, ESTIMATE statement coefficients on random effects are distributed equally across subjects.

  • For example, the ESTIMATE statement in the following code from Example 46.5 constructs the difference between the random slopes of the first two batches.

      proc mixed data=rc;   class batch;   model y = month / s;   random int month / type=un sub=batch s;   estimate 'slope b1 - slope b2'  month 1 / subject 1 -1;   run;  

UPPER

UPPERTAILED

  • requests that the p -value for the t -test be based only on values greater than the t -statistic. A two-tailed test is the default. An upper-tailed confidence limit is also produced if you specify the CL option.

ID Statement

  • ID variables ;

The ID statement specifies which variables from the input data set are to be included in the OUTP= and OUTPM= data sets from the MODEL statement. If you do not specify an ID statement, then all variables are included in these data sets. Otherwise, only the variables you list in the ID statement are included. Specifying an ID statement with no variables prevents any variables from being included in these data sets.

LSMEANS Statement

  • LSMEANS fixed-effects < / options > ;

The LSMEANS statement computes least-squares means (LS-means) of fixed effects. As in the GLM procedure, LS-means are predicted population margins ”that is, they estimate the marginal means over a balanced population. In a sense, LS-means are to unbalanced designs as class and subclass arithmetic means are to balanced designs. The L matrix constructed to compute them is the same as the L matrix formed in PROC GLM; however, the standard errors are adjusted for the covariance parameters in the model.

Each LS-mean is computed as L where L is the coefficient matrix associated with the least-squares mean and is the estimate of the fixed-effects parameter vector (see the Estimating ² and ³ in the Mixed Model section on page 2739). The approximate standard errors for the LS-mean is computed as the square root of L ( X ² ˆ’ 1 X ) ˆ’ L ² .

LS-means can be computed for any effect in the MODEL statement that involves CLASS variables. You can specify multiple effects in one LSMEANS statement or in multiple LSMEANS statements, and all LSMEANS statements must appear after the MODEL statement. As in the ESTIMATE statement, the L matrix is tested for estimability, and if this test fails, PROC MIXED displays Non-est for the LS-means entries.

Assuming the LS-mean is estimable, PROC MIXED constructs an approximate t- test to test the null hypothesis that the associated population quantity equals zero. By default, the denominator degrees of freedom for this test are the same as those displayed for the effect in the Tests of Fixed Effects table (see the Default Output section on page 2748).

You can specify the following options in the LSMEANS statement after a slash (/).

ADJUST=BON

ADJUST=DUNNETT

ADJUST=SCHEFFE

ADJUST=SIDAK

ADJUST=SIMULATE < ( simoptions ) >

ADJUST=SMM GT2

ADJUST=TUKEY

ALPHA= number

  • requests that a t -type confidence interval be constructed for each of the LS-means with confidence level 1 ˆ’ number . The value of number must be between 0 and 1; the default is 0.05.

AT variable = value

AT ( variable-list ) = ( value-list )

AT MEANS

  • enables you to modify the values of the covariates used in computing LS-means. By default, all covariate effects are set equal to their mean values for computation of standard LS-means. The AT option enables you to assign arbitrary values to the covariates. Additional columns in the output table indicate the values of the covariates. If there is an effect containing two or more covariates, the AT option sets the effect equal to the product of the individual means rather than the mean of the product (as with standard LS-means calculations). The AT MEANS option sets covariates equal to their mean values (as with standard LS-means) and incorporates this adjustment to cross products of covariates.

  • As an example, consider the following invocation of PROC MIXED:

      proc mixed;   class A;   modelY=AX1X2X1*X2;   lsmeans A;   lsmeansA/atmeans;   lsmeansA/atX1=1.2;   lsmeansA/at(X1X2)=(1.2 0.3);   run;  
  • For the first two LSMEANS statements, the LS-means coefficient for X1 is (the mean of X1 ) and for X2 is (the mean of X2 ). However, for the first LSMEANS statement, the coefficient for X1 * X2 is , but for the second LSMEANS statement, the coefficient is · . The third LSMEANS statement sets the coefficient for X1 equal to 1 . 2 and leaves it at for X2 , and the final LSMEANS statement sets these values to 1 . 2 and 0 . 3, respectively.

  • If a WEIGHT variable is present, it is used in processing AT variables. Also, observations with missing dependent variables are included in computing the covariate means, unless these observations form a missing cell and the FULLX optioninthe MODEL statement is not in effect. You can use the E option in conjunction with the AT option to check that the modified LS-means coefficients are the ones you desire .

  • The AT option is disabled if you specify the BYLEVEL option.

BYLEVEL

  • requests PROC MIXED to process the OM data set by each level of the LS-mean effect (LSMEANS effect) in question. For more details, see the OM option later in this section.

CL

  • requests that t -type confidence limits be constructed for each of the LS-means. The confidence level is 0.95 by default; this can be changed with the ALPHA= option.

CORR

  • displays the estimated correlation matrix of the least-squares means as part of the Least Squares Means table.

COV

  • displays the estimated covariance matrix of the least-squares means as part of the Least Squares Means table.

DF= number

  • specifies the degrees of freedom for the t -test and confidence limits. The default is the denominator degrees of freedom taken from the Tests of Fixed Effects table corresponding to the LS-means effect unless the DDFM=SATTERTH or DDFM=KENWARDROGER option is in effect on the MODEL statement. For these DDFM= methods degrees of freedom are determined separately for each test; see the DDFM= option on page 2693 for more information.

DIFF < =difftype >

PDIFF < =difftype >

  • requests that differences of the LS-means be displayed. The optional difftype specifies which differences to produce, with possible values being ALL, CONTROL, CONTROLL, and CONTROLU. The difftype ALL requests all pairwise differences, and it is the default. The difftype CONTROL requests the differences with a control, which, by default, is the first level of each of the specified LSMEANS effects.

  • To specify which levels of the effects are the controls, list the quoted formatted values in parentheses after the keyword CONTROL. For example, if the effects A , B ,and C are class variables, each having two levels, 1 and 2, the following LSMEANS statement specifies the (1,2) level of A * B and the (2,1) level of B * C as controls:

      lsmeans A*B B*C / diff=control('1' '2' '2' '1');  
  • For multiple effects, the results depend upon the order of the list, and so you should check the output to make sure that the controls are correct.

  • Two-tailed tests and confidence limits are associated with the CONTROL difftype . For one-tailed results, use either the CONTROLL or CONTROLU difftype .The CONTROLL difftype tests whether the noncontrol levels are significantly smaller than the control; the upper confidence limits for the control minus the noncontrol levels are considered to be infinity and are displayed as missing. Conversely, the CONTROLU difftype tests whether the noncontrol levels are significantly larger than the control; the upper confidence limits for the noncontrol levels minus the control are considered to be infinity and are displayed as missing.

  • If you want to perform multiple comparison adjustments on the differences of LSMeans, you must specify the ADJUST= option.

  • The differences of the LS-means are displayed in a table titled Differences of Least Squares Means. For ODS purposes, the table name is Diffs.

E

  • requests that the L matrix coefficients for all LSMEANS effects be displayed. For ODS purposes, the label of this L Matrix Coefficients table is Coef.

OM < = OM-data-set >

OBSMARGINS < = OM-data-set >

  • specifies a potentially different weighting scheme for the computation of LS-means coefficients. The standard LS-means have equal coefficients across classification effects; however, the OM option changes these coefficients to be proportional to those found in OM-data-set . This adjustment is reasonable when you want your inferences to apply to a population that is not necessarily balanced but has the margins observed in OM-data-set .

  • By default, OM-data-set is the same as the analysis data set. You can optionally specify another data set that describes the population for which you want to make inferences. This data set must contain all model variables except for the dependent variable (which is ignored if it is present). In addition, the levels of all CLASS variables must be the same as those occurring in the analysis data set. Specifying an OM-data-set enables you to construct arbitrarily weighted LS-means.

  • In computing the observed margins, PROC MIXED uses all observations for which there are no missing or invalid independent variables, including those for which there are missing dependent variables. Also, if OM-data-set has a WEIGHT variable, PROC MIXED uses weighted margins to construct the LS-means coefficients. If OM-data-set is balanced, the LS-means are unchanged by the OM option.

  • The BYLEVEL option modifies the observed-margins LS-means. Instead of computing the margins across all of the OM-data-set , PROC MIXED computes separate margins for each level of the LSMEANS effect in question. In this case the resulting LS-means are actually equal to raw means for fixed effects models and certain balanced random effects models, but their estimated standard errors account for the covariance structure that you have specified. If the AT option is specified, the BYLEVEL option disables it.

  • You can use the E option in conjunction with either the OM or BYLEVEL option to check that the modified LS-means coefficients are the ones you desire. It is possible that the modified LS-means are not estimable when the standard ones are, or vice versa. Nonestimable LS-means are noted as Non-est in the output.

PDIFF

  • is the same as the DIFF option. See the description of the DIFF option on page 2690.

SINGULAR= number

  • tunes the estimability checking as documented on the CONTRAST Statement section on page 2681.

SLICE= fixed-effect

SLICE= ( fixed-effects )

  • specifies effects by which to partition interaction LSMEANS effects. This can produce what are known as tests of simple effects (Winer 1971). For example, suppose that A * B is significant, and you want to test the effect of A for each level of B .The appropriate LSMEANS statement is

      lsmeans A*B / slice=B;  

    This code tests for the simple main effects of A for B , which are calculated by extracting the appropriate rows from the coefficient matrix for the A * B LS-means and using them to form an F -test. See the Inference and Test Statistics section on page 2741 for more information on this F -test.

    The SLICE option produces a table titled Tests of Effect Slices. For ODS purposes, the table name is Slices.

MODEL Statement

  • MODEL dependent = < fixed-effects >< / options > ;

The MODEL statement names a single dependent variable and the fixed effects, which determine the X matrix of the mixed model (see the Parameterization of Mixed Models section on page 2743 for details). The specification of effects is the same as in the GLM procedure; however, unlike PROC GLM, you do not specify random effects in the MODEL statement. The MODEL statement is required.

An intercept is included in the fixed-effects model by default. If no fixed effects are specified, only this intercept term is fit. The intercept can be removed by using the NOINT option.

You can specify the following options in the MODEL statement after a slash (/).

ALPHA= number

  • requests that a t -type confidence interval be constructed for each of the fixed-effects parameters with confidence level 1 ˆ’ number . The value of number must be between 0 and 1; the default is 0.05.

ALPHAP= number

  • requests that a t -type confidence interval be constructed for the predicted values with confidence level 1 ˆ’ number . The value of number must be between 0 and 1; the default is 0.05.

CHISQ

  • requests that 2 -tests be performed for all specified effects in addition to the F -tests. Type III tests are the default; you can produce the Type I and Type II tests using the HTYPE= option.

CL

  • requests that t -type confidence limits be constructed for each of the fixed-effects parameter estimates. The confidence level is 0.95 by default; this can be changed with the ALPHA= option.

CONTAIN

  • has the same effect as the DDFM=CONTAIN option.

CORRB

  • produces the approximate correlation matrix of the fixed-effects parameter estimates. For ODS purposes, the label for this table is CorrB.

COVB

  • produces the approximate variance-covariance matrix of the fixed-effects parameter estimates . By default, this matrix equals ( X ² ˆ’ 1 X ) ˆ’ and results from sweeping ( Xy ) ² ˆ’ 1 ( Xy ) on all but its last pivot and removing the y border. The EMPIRICAL option in the PROC MIXED statement changes this matrix into empirical sandwich form, as described on page 2676. For ODS purposes, the label for this table is CovB.

    COVBI

  • produces the inverse of the approximate variance-covariance matrix of the fixedeffects parameter estimates. For ODS purposes, the label for this table is InvCovB.

DDF= value-list

  • enables you to specify your own denominator degrees of freedom for the fixed effects. The value-list specification is a list of numbers or missing values (.) separated by commas. The degrees of freedom should be listed in the order in which the effects appear in the Tests of Fixed Effects table. If you want to retain the default degrees of freedom for a particular effect, use a missing value for its location in the list. For example,

      model Y = A B A*B / ddf=3, . ,4. 7;  
  • assigns 3 denominator degrees of freedom to A and 4.7 to A * B , while those for B remain the same. If you specify DDFM=SATTERTH or DDFM=KENWARDROGER the DDF= option has no effect.

DDFM=CONTAIN

DDFM=BETWITHIN

DDFM=RESIDUAL

DDFM=SATTERTH

DDFM=KENWARDROGER

  • specifies the method for computing the denominator degrees of freedom for the tests of fixed effects resulting from the MODEL, CONTRAST, ESTIMATE, and LSMEANS statements.

  • The DDFM=CONTAIN option invokes the containment method to compute denominator degrees of freedom, and it is the default when you specify a RANDOM statement. The containment method is carried out as follows: Denote the fixed effect in question A , and search the RANDOM effect list for the effects that syntactically contain A . For example, the RANDOM effect B ( A ) contains A , but the RANDOM effect C does not, even if it has the same levels as B ( A ).

  • Among the RANDOM effects that contain A, compute their rank contribution to the ( XZ ) matrix. The DDF assigned to A is the smallest of these rank contributions. If no effects are found, the DDF for A is set equal to the residual degrees of freedom, N ˆ’ rank( XZ ). This choice of DDF matches the tests performed for balanced split-plot designs and should be adequate for moderately unbalanced designs.

  • Caution: If you have a Z matrix with a large number of columns, the overall memory requirements and the computing time after convergence can be substantial for the containment method. If it is too large, you may want to use the DDFM=BETWITHIN option.

  • The DDFM=BETWITHIN option is the default for REPEATED statement specifications (with no RANDOM statements). It is computed by dividing the residual degrees of freedom into between-subject and within-subject portions. PROC MIXED then checks whether a fixed effect changes within any subject. If so, it assigns within-subject degrees of freedom to the effect; otherwise, it assigns the between-subject degrees of freedom to the effect (refer to Schluchter and Elashoff 1990). If there are multiple within-subject effects containing classification variables, the within-subject degrees of freedom is partitioned into components corresponding to the subject-by-effect interactions.

  • One exception to the preceding method is the case when you have specified no RANDOM statements and a REPEATED statement with the TYPE=UN option. In this case, all effects are assigned the between-subject degrees of freedom to provide for better small-sample approximations to the relevant sampling distributions. DDFM=KENWARDROGER may be a better option to try for this case.

  • The DDFM=RESIDUAL option performs all tests using the residual degrees of freedom, n ˆ’ rank( X ), where n is the number of observations.

  • The DDFM=SATTERTH option performs a general Satterthwaite approximation for the denominator degrees of freedom, computed as follows. Suppose is the vector of unknown parameters in V and suppose C =( X ² V ˆ’ 1 X ) ˆ’ , where ˆ’ denotes a generalized inverse. Let and be the corresponding estimates.

  • Consider the one-dimensional case, and consider to be a vector defining an estimable linear combination of ² . The Satterthwaite degrees of freedom for the t -statistic

  • is computed as

  • where g is the gradient of C “² with respect to , evaluated at , and A is the asymptotic variance-covariance matrix of obtained from the second derivative matrix of the likelihood equations.

  • For the multi-dimensional case, let L be an estimable contrast matrix of rank q > 1. The Satterthwaite denominator degrees of freedom for the F -statistic

    click to expand
  • is computed by first performing the spectral decomposition L L ² = P ² DP where P is an orthogonal matrix of eigenvectors and D is a diagonal matrix of eigenvalues, both of dimension q q . Define m to be the m th row of PL , and let

  • where D m is the m th diagonal element of D and g m is the gradient of with respect to , evaluated at . Then let

    click to expand
  • where the indicator function eliminates terms for which ½ m 2. The degrees of freedom for F are then computed as

  • provided E > q ; otherwise ½ is set to zero.

  • This method is a generalization of the techniques described in Giesbrecht and Burns (1985), McLean and Sanders (1988), and Fai and Cornelius (1996). The method can also include estimated random effects. In this case, append to and change to be the inverse of the coefficient matrix in the mixed model equations. The calculations require extra memory to hold c matrices that are the size of the mixed model equations, where c is the number of covariance parameters. In the notation of Table 46.12 on page 2773, this is approximately 8 q ( p + g )( p + g ) / 2 bytes. Extra computing time is also required to process these matrices. The Satterthwaite method implemented here is intended to produce an accurate F -approximation; however, the results may differ from those produced by PROC GLM. Also, the small sample properties of this approximation have not been extensively investigated for the various models available with PROC MIXED.

  • The DDFM=KENWARDROGER option performs the degrees-of-freedom calculations detailed by Kenward and Roger (1997). This approximation involves inflating the estimated variance-covariance matrix of the fixed and random effects by the method proposed by Prasad and Rao (1990) and Harville and Jeske (1992); refer also to Kackar and Harville (1984). Satterthwaite-type degrees of freedom are then computed based on this adjustment. By default, the observed information matrix of the covariance parameter estimates is used in the calculations.

  • When the asymptotic variance matrix of the covariance parameters is found to be singular, a generalized inverse is used. Covariance parameters with zero variance then do not contribute to the degrees-of-freedom adjustment for DDFM=SATTERTH and DDFM=KENWARDROGER, and a message is written to the LOG.

  • This method changes output in the following tables (listed in Table 46.8 on page 2752): Contrast, CorrB, CovB, Diffs, Estimates, InvCovB, LSMeans, MMEq, MMEqSol, Slices, SolutionF, SolutionR, Tests1“Tests3. The OUTP= and OUTPM= data sets are also affected.

E

  • requests that Type I, Type II, and Type III L matrix coefficients be displayed for all specified effects. For ODS purposes, the labels of the tables are Coef.

E1

  • requests that Type I L matrix coefficients be displayed for all specified effects. For ODS purposes, the label of this table is Coef.

E2

  • requests that Type II L matrix coefficients be displayed for all specified effects. For ODS purposes, the label of this table is Coef.

    E3

  • requests that Type III L matrix coefficients be displayed for all specified effects. For ODS purposes, the label of this table is Coef.

FULLX

  • requests that columns of the X matrix that consist entirely of zeros not be eliminated from X ; otherwise, they are eliminated by default. For a column corresponding to a missing cell to be added to X , its particular levels must be present in at least one observation in the analysis data set along with a missing dependent variable. The use of the FULLX option can impact coefficient specifications in the CONTRAST and ESTIMATE statements, as well as covariate coefficients from LSMEANS statements specified with the AT MEANS option.

HTYPE= value-list

  • indicates the type of hypothesis test to perform on the fixed effects. Valid entries for value are 1, 2, and 3; the default value is 3. You can specify several types by separating the values with a comma or a space. The ODS table names are Tests1 for the Type 1 tests, Tests2 for the Type 2 tests, and Tests3 for Type 3 tests.

Experimental

INFLUENCE<(<EFFECT= effect >

  • ESTIMATESEST>

  • ITER= number >

  • KEEP= number >

  • SELECT= value-list >

  • SIZE= number >)>

  • specifies that influence and case deletion diagnostics are to be computed.

  • The INFLUENCE option of the MODEL statement in the MIXED procedure computes influence diagnostics by noniterative or iterative methods. The noniterative diagnostics rely on recomputation formulas under the assumption that covariance parameters or their ratios remain fixed. With the possible exception of a profiled residual variance, no covariance parameters are updated. This is the default behavior because of its computational efficiency. However, the impact of an observation on the overall analysis can be underestimated if its effect on covariance parameters is not assessed. Toward this end, iterative methods can be applied to gauge the overall impact of observations and to obtain influence diagnostics for the covariance parameter estimates.

  • If you specify the INFLUENCE option without further suboptions, PROC MIXED computes single-case deletion diagnostics and influence statistics for each observation in the data set by updating estimates for the fixed effects parameter estimates, and also the residual variance, if it is profiled. The EFFECT=, SELECT=, ITER=, SIZE=, and KEEP= suboptions provide additional flexibility in the computation and reporting of influence statistics.

    Description

    Suboption

    Compute influence diagnostics for individual observations

    default

    Measure influence of sets of observations chosen according to a classification variable or effect

    EFFECT=

    Remove pairs of observations and report the results sorted by degree of influence

    SIZE=2

    Remove triples, quadruples of observations,...

    SIZE=

    Allow selection of individual observations, observations sharing specific levels of effects, and construction of tuples from specified subsets of observations

    SELECT=

    Update fixed effects and covariance parameters by refitting the mixed model, adding up to n iterations

    ITER= n>

    Compute influence diagnostics for the covariance parameters

    ITER= n>

    Update only fixed effects and the residual variance, if it is profiled

    ITER=0

    Add the reduced-data estimates to the data set created with ODS OUTPUT

    ESTIMATES

  • The modifiers and their default values are discussed in the following paragraphs. The set of computed influence diagnostics varies with the suboptions. The most extensive set of influence diagnostics is obtained when ITER= n with n> 0.

  • You can produce statistical graphics of influence diagnostics when the experimental ODS GRAPHICS statement is specified. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the MIXED procedure, see the ODS Graphics section on page 2757.

    EFFECT= effect

    specifies an effect according to which observations are grouped. Observations sharing the same level of the effect are removed from the analysis as a group. The effect must contain only class variables, but need not be contained in the model.

    Removing observations can change the rank of the ( X ² V ˆ’ 1 X ) ˆ’ matrix. This is particularly likely to happen when multiple observations are eliminated from the analysis. If the rank of the estimated variance-covariance matrix of changes or its singularity pattern is altered , no influence diagnostics are computed.

    ESTIMATESEST

    specifies that the updated parameter estimates should be written to the ODS output data set. The values are not displayed in the Influence table, but if you use ODS OUTPUT to create a data set from the listing, the estimates are added to the data set. If ITER=0, only the fixed effects estimates are saved. In iterative influence analyses, fixed effects and covariance parameters are stored. The p fixed effects parameter estimates are named Parm1 Parm p ,and the q covariance parameter estimates are named CovP1 CovP q . The order corresponds to that in the Solution for Fixed Effects and Covariance Parameter Estimates tables. If parameter updates fail, for example, because of a loss of rank or a nonpositive definite Hessian, missing values are reported .

    ITER= n

    controls the maximum number of additional iterations PROC MIXED performs to update the fixed effects and covariance parameter estimates following data point removal. If you specify n> 0, then statistics such as DFFITS, MDFFITS, and the likelihood distances measure the impact of observation(s) on all aspects of the analysis. Typically, the influence will grow compared to values at ITER=0. In models without RANDOM or REPEATED effects, the ITER= option has no effect.

    This documentation refers to analyses when n > 0 simply as iterative influence analysis, even if final covariance parameter estimates can be updated in a single step (for example, when METHOD=MIVQUE0 or METHOD=TYPE3). This nomenclature reflects the fact that only if n > 0 will all model parameters be updated, which may require additional iterations. If n > 0 and METHOD=REML (default) or METHOD=ML, the procedure updates fixed effects and variance-covariance parameters after removing the selected observations with additional Newton-Raphson iterations, starting from the converged estimates for the entire data. The process stops for each observation or set of observations if the convergence criterion is satisfied or the number of further iterations exceeds n . If n > 0 and METHOD=TYPE1, TYPE2, or TYPE3, ANOVA estimates of the covariance parameters are recomputed in a single step.

    Compared to noniterative updates the computations are more involved. In particular for large data sets and/or a large number of random effects, iterative updates require considerably more resources. A one-step (ITER=1) or two-step update may be a good compromise. The output includes the number of iterations performed, which is less than n if the iteration converges. If the process does not converge in n iterations, you should be careful in interpreting the results, especially if n is fairly large.

    Bounds and other restrictions on the covariance parameters carry over from the full-data model. Covariance parameters that are not iterated in the model fit to the full data (the NOITER or HOLD option of the PARMS statement) are likewise not updated in the refit. In certain models, for example, random effects models, the ratios between the covariance parameters and the residual variance are maintained rather than the actual value of the covariance parameter estimate (see the section Influence Diagnostics on page 2765 in the Details section).

    KEEP= n

    determines how many observations are retained for display and in the output data set or how many tuples if you specify SIZE=. The output is sorted by an influence statistic as discussed for the SIZE= suboption.

    SELECT = value-list

    specifies which observations or effect levels are chosen for influence calculations. If SELECT= is not specified, diagnostics are computed for all possible subsets, that is

    ¢ all observations, if EFFECT= or SIZE= are not given

    ¢ all levels of the specified effect, if EFFECT= is specified

    ¢ all tuples of size k formed from the observations in value-list , if SIZE= k is specified

    When you specify an effect with the EFFECT= option, the values in value-list represent indices of the levels in the order in which PROC MIXED builds classification effects. Which observations in the data set correspond to this index depends on the order of the variables in the CLASS statement, not the order in which the variables appear in the interaction effect. See the section Parameterization of Mixed Models on page 2743 on precisely how the procedure indexes nested and crossed effects and how levels of classification variables are ordered. The actual values of the classification variables involved in the effect are shown on the output so you can determine which observations were removed.

    If the EFFECT= suboption is not specified, the SELECT= value list refers to the sequence in which observations are read from the input data set or from the current BY group if there is a BY statement. This indexing is not necessarily the same as the observation numbers in the input data set, for example, if a WHERE clause is specified or during BY processing.

    SIZE= n

    instructs PROC MIXED to remove groups of observations formed as tuples of size n . For example, SIZE=2 specifies all n ( n ˆ’ 1) / 2 unique pairs of observations. The number of tuples for SIZE= k is n ! / ( k !( n ˆ’ k )!) and grows quickly with n and k . Using the SIZE= option can result in considerable computing time. The MIXED procedure displays by default only the 50 tuples with the greatest influence. Use the KEEP= option to override this default and to retain a different number of tuples in the listing or ODS output data set. Regardless of the KEEP= specification, all tuples are evaluated and the results are ordered according to an influence statistic. This statistic is the (restricted) likelihood distance as a measure of overall influence if ITER = n > 0 or when a residual variance is profiled. When likelihood distances are unavailable, the results are ordered by the PRESS statistic.

    To reduce computational burden , the SIZE= option can be combined with the SELECT= value-list modifier. For example,

      proc mixed data=aerosol;   class aerosol manufacturer filter;   model penetration = aerosol manufacturer /   influence(size=2 keep=5   select=13,14,18,30,31,33);   random filter(manuf);   run;  

    evaluates all 15 = 6 5 / 2 pairs formed from observations 13, 14, 18, 30, 31, and 33 and displays the five pairs with the greatest influence. If any observation in a tuple contains missing values or has otherwise not contributed to the analysis, the tuple is not evaluated. This guarantees that the displayed results refer to the same number of observations, so that meaningful statistics are available by which to order the results. If computations fail for a particular tuple, for example, because the ( X ² V ˆ’ 1 X ) ˆ’ matrix changes rank or the G matrix is not positive definite, no results are produced. Results are retained when the maximum number of iterative updates is exceeded in iterative influence analyses.

    The SIZE= suboption cannot be combined with the EFFECT= suboption.

    As in the case of the EFFECT= suboption, the statistics being computed are those appropriate for removal of multiple data points, even if SIZE=1.

  • For ODS purposes the label of the Influence Diagnostics table is Influence. The variables in this table depend on whether you specify the EFFECT=, SIZE=, or KEEP= suboption and whether covariance parameters are iteratively updated. When ITER=0 (the default) certain influence diagnostics are only meaningful if the residual variance is profiled. Table 46.3 and Table 46.4 summarize the statistics obtained depending on the model and modifiers. The last column in these tables gives the variable name in the ODS OUTPUT INFLUENCE= data set. Restricted likelihood distances are reported instead of the likelihood distance unless METHOD=ML. See the Influence Diagnostics section beginning on page 2765 for details on the individual statistics.

Table 46.3: Statistics Computed with INFLUENCE Option, Noniterative Analysis (ITER=0)

Suboption

ƒ 2 profiled

Statistic

Variable Name

Default

Yes

Observed value

Observed

   

Predicted value

Predicted

   

Residual

Residual

   

Leverage

Leverage

   

PRESS residual

PRESSRes

   

Internally studentized residual

Student

   

Externally studentized residual

RStudent

   

RMSE without deleted obs

RMSE

   

Cooks D

CookD

   

DFFITS

DFFITS

   

COVRATIO

COVRATIO

   

(Restricted) likelihood distance

RLD, LD

Default

No

Observed value

Observed

   

Predicted value

Predicted

   

Residual

Residual

   

Leverage

Leverage

   

PRESS residual

PRESSRes

   

Internally studentized residual

Student

   

Cooks D

CookD

EFFECT=, SIZE=, PRESS or KEEP=

Yes

Observations in level (tuple)

Nobs

   

PRESS statistic

PRESS

   

Cooks D

CookD

   

MDFFITS

MDFFITS

   

COVRATIO

COVRATIO

   

COVTRACE

COVTRACE

   

RMSE without deleted level (tuple)

RMSE

   

(Restricted) likelihood distance

RLD, LD

EFFECT=, SIZE=, or KEEP=

No

Observations in level (tuple)

Nobs

   

PRESS statistic

PRESS

   

Cooks D

CookD

Table 46.4: Statistics Computed with INFLUENCE Option, Iterative Analysis (ITER=n>0)

Suboption

Statistic

Variable Name

Default

Number of iterations

Iter

 

Observed value

Observed

 

Predicted value

Predicted

 

Residual

Residual

 

Leverage

Leverage

 

PRESS residual

PRESSres

 

Internally studentized residual

Student

 

Externally studentized residual

RStudent

 

RMSE without deleted obs (if possible)

RMSE

 

Cooks D

CookD

 

DFFITS

DFFITS

 

COVRATIO

COVRATIO

 

Cooks D CovParms

CookDCP

 

COVRATIO CovParms

COVRATIOCP

 

(Restricted) likelihood distance

RLD, LD

EFFECT=, SIZE=, or KEEP=

Number of iterations

Iter

 

PRESS statistic

PRESS

 

RMSE without deleted level (tuple)

RMSE

 

Cooks D

CookD

 

MDFFITS

MDFFITS

 

COVRATIO

COVRATIO

 

COVTRACE

COVTRACE

 

Cooks D CovParms

CookDCP

 

COVRATIO CovParms

COVRATIOCP

 

(Restricted) likelihood distance

RLD, LD

INTERCEPT

  • adds a row to the tables for Type 1, 2, and 3 tests corresponding to the overall intercept.

LCOMPONENTS

  • requests an estimate for each row of the L matrix used to form tests of fixed effects. Components corresponding to Type 3 tests are the default; you can produce the Type 1 and Type 2 component estimates with the HTYPE= option.

  • Tests of fixed effects involve testing of linear hypotheses of the form L ² = .The matrix L is constructed from Type 1, 2, or 3 estimable functions. By default the MIXED procedure constructs Type 3 tests. In many situations, the individual rows of the matrix L represent contrasts of interest. For example, in a one-way classification model, the Type 3 estimable functions define differences of factor level means. In a balanced two-way layout, the rows of L correspond to differences of cell means.

  • For example, if factors A and B have a and b levels, respectively, the statements

      class A B;   model y = A B x / htype=1,3 lcomponents;  
  • produce ( a ˆ’ 1) one degree of freedom tests for the rows of L associated with the Type 1 and Type 3 estimable functions for factor A , ( b ˆ’ 1) tests for the rows of L associated with factor B , and a single test for the Type 1 and Type 3 coefficients associated with regressor X .

  • The denominator degrees of freedom associated with a row of L are the same as those in the corresponding Tests of Fixed Effects table, except for DDFM=KENWARDROGER and DDFM=SATTERTH. For these degree of freedom methods, the denominator degrees of freedom are computed separately for each row of L .

  • For ODS purposes, the name of the table containing all requested component tests is LComponents. See Example 46.9 on page 2839 for examples incorporating the LCOMPONENTS option.

NOCONTAIN

  • has the same effect as the DDFM=RESIDUAL option.

NOINT

  • requests that no intercept be included in the model. An intercept is included by default.

NOTEST

  • specifies that no hypothesis tests be performed for the fixed effects.

OUTP= SAS-data-set

OUTPRED= SAS-data-set

  • specifies an output data set containing predicted values and related quantities . This option replaces the P option from Version 6.

  • Predicted values are formed by using the rows from ( XZ ) as L matrices. Thus, predicted values from the original data are X + Z . Their approximate standard errors of prediction are formed from the quadratic form of L with defined in the Statistical Properties section on page 2740. The L95 and U95 variables provide a t -type confidence interval for the predicted values, and they correspond to the L95M and U95M variables from the GLM and REG procedures for fixed-effect models. The residuals are the observed minus the predicted values. Predicted values for data points other than those observed can be obtained by using missing dependent variables in your input data set.

  • Specifications that have a REPEATED statement with the SUBJECT= option and missing dependent variables compute predicted values using empirical best linear unbiased prediction (EBLUP). Using hats ( ) to denote estimates, the EBLUP formula is

    click to expand
  • where m represents a hypothetical realization of a missing data vector with associated design matrix X m . The matrix C m is the model-based covariance matrix between m and the observed data y , and other notation is as presented in the Mixed Models Theory section beginning on page 2731.

    The estimated prediction variance is as follows:

    click to expand
  • where V m is the model-based variance matrix of m . For further details, refer to Henderson (1984) and Harville (1990). This feature can be useful for forecasting time series or for computing spatial predictions .

  • By default, all variables from the input data set are included in the OUTP= data set. You can select a subset of these variables using the ID statement.

OUTPM= SAS-data-set

OUTPREDM= SAS-data-set

  • specifies an output data set containing predicted means and related quantities. This option replaces the PM option from Version 6.

  • The output data set is of the same form as that resulting from the OUTP= option, except that the predicted values do not incorporate the EBLUP values Z . They also do not use the EBLUPs for specifications that have a REPEATED statement with the SUBJECT= option and missing dependent variables. The predicted values are formed as X in the OUTPM= data set, and standard errors are quadratic forms in the approximate variance-covariance matrix of as displayed by the COVB option.

    By default, all variables from the input data set are included in the OUTPM= data set. You can select a subset of these variables using the ID statement.

Experimental

RESIDUAL

SINGULAR= number

  • tunes the sensitivity in sweeping. If a diagonal pivot element is less than D* number as PROC MIXED sweeps a matrix, the associated column is declared to be linearly dependent upon previous columns, and the associated parameter is set to 0. The value D is the original diagonal element of the matrix. The default is 1E4 times the machine epsilon ; this product is approximately 1E ˆ’ 12 on most computers.

SINGCHOL= number

  • tunes the sensitivity in computing Cholesky roots. If a diagonal pivot element is less than D* number as PROC MIXED performs the Cholesky decomposition on a matrix, the associated column is declared to be linearly dependent upon previous columns and is set to . The value D is the original diagonal element of the matrix. The default for number is 1E4 times the machine epsilon; this product is approximately 1E ˆ’ 12 on most computers.

SINGRES= number

  • sets the tolerance for which the residual variance is considered to be zero. The default is 1E4 times the machine epsilon; this product is approximately 1E ˆ’ 12 on most computers.

SOLUTION

S

  • requests that a solution for the fixed-effects parameters be produced. Using notation from the Mixed Models Theory section beginning on page 2731, the fixed-effects parameter estimates are and their approximate standard errors are the square roots of the diagonal elements of ( X ² ˆ’ 1 X ) ˆ’ . You can output this approximate variance matrix with the COVB option or modify it with the EMPIRICAL optioninthePROC MIXED statement.

  • Along with the estimates and their approximate standard errors, a t -statistic is computed as the estimate divided by its standard error. The degrees of freedom for this t -statistic matches the one appearing in the Tests of Fixed Effects table under the effect containing the parameter. The Pr > t column contains the two-tailed p -value corresponding to the t -statistic and associated degrees of freedom. You can use the CL option to request confidence intervals for all of the parameters; they are constructed around the estimate by using a radius of the standard error times a percentage point from the t -distribution.

VCIRY Experimental

  • requests that responses and marginal residuals be scaled by the inverse Cholesky root of the marginal variance-covariance matrix. The variables ScaledDep and ScaledResid are added to the OUTPM= data set. These quantities can be important in bootstrapping of data or residuals. Examination of the scaled residuals is also helpful in diagnosing departures from normality. Notice that the results of this scaling operation can depend on the order in which the MIXED procedure processes the data.

  • The VCIRY option has no effect unless you also use the OUTPM= option or you request statistical graphics with the experimental ODS GRAPHICS statement. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the MIXED procedure, see the ODS Graphics section on page 2757.

XPVIX

  • is an alias for the COVBI option.

XPVIXI

  • is an alias for the COVB option.

ZETA= number

  • tunes the sensitivity in forming Type III functions. Any element in the estimable function basis with an absolute value less than number is set to 0. The default is 1E ˆ’ 8.

PARMS Statement

  • PARMS ( value-list ) ... < / options > ;

The PARMS statement specifies initial values for the covariance parameters, or it requests a grid search over several values of these parameters. You must specify the values in the order in which they appear in the Covariance Parameter Estimates table.

The value-list specification can take any of several forms:

m

a single value

m 1 ,m 2 ,...,m n

several values

m to n

a sequence where m equals the starting value, n equals the ending value, and the increment equals 1

m to n by i

a sequence where m equals the starting value, n equals the ending value, and the increment equals i

m 1 , m 2 to m 3

mixed values and sequences

You can use the PARMS statement to input known parameters. Referring to the split-plot example (Example 46.1 on page 2777), suppose the three variance components are known to be 60, 20, and 6. The SAS statements to fix the variance components at these values is as follows:

  proc mixed data=sp noprofile;   class Block A B;   modelY=ABA*B;   random Block A*Block;   parms (60) (20) (6) / noiter;   run;  

The NOPROFILE option requests PROC MIXED to refrain from profiling the residual variance parameter during its calculations, thereby enabling its value to be held at 6 as specified in the PARMS statement. The NOITER option prevents any Newton-Raphson iterations so that the subsequent results are based on the given variance components. You can also specify known parameters of G using the GDATA= option in the RANDOM statement.

If you specify more than one set of initial values, PROC MIXED performs a grid search of the likelihood surface and uses the best point on the grid for subsequent analysis. Specifying a large number of grid points can result in long computing times. The grid search feature is also useful for exploring the likelihood surface. See Example 46.3 on page 2795.

The results from the PARMS statement are the values of the parameters on the specified grid (denoted by CovP1“CovP n ), the residual variance (possibly estimated) for models with a residual variance parameter, and various functions of the likelihood.

For ODS purposes, the label of the Parameter Search table is ParmSearch.

You can specify the following options in the PARMS statement after a slash (/).

HOLD= value-list

EQCONS= value-list

  • specifies which parameter values PROC MIXED should hold to equal the specified values. For example, the statement

      parms (5) (3) (2) (3) / hold=1,3;  
  • constrains the first and third covariance parameters to equal 5 and 2, respectively.

LOGDETH

  • evaluates the log determinant of the Hessian matrix for each point specified in the PARMS statement. A Log Det H column is added to the Parameter Search table.

LOWERB= value-list

  • enables you to specify lower boundary constraints on the covariance parameters. The value-list specification is a list of numbers or missing values (.) separated by commas. You must list the numbers in the order that PROC MIXED uses for the covariance parameters, and each number corresponds to the lower boundary constraint. A missing value instructs PROC MIXED to use its default constraint, and if you do not specify numbers for all of the covariance parameters, PROC MIXED assumes the remaining ones are missing.

  • An example for which this option is useful is when you want to constrain the G matrix to be positive definite in order to avoid the more computationally intensive algorithms required when G becomes singular. The corresponding code for a random coefficients model is as follows:

      proc mixed;   class person;   model y = time;   random int time / type=fa0(2) sub=person;   parms / lowerb=1e-4,.,1e-4;   run;  
  • Here the FA0(2) structure is used in order to specify a Cholesky root parameterization for the 2 2 unstructured blocks in G . This parameterization ensures that the G matrix is nonnegative definite, and the PARMS statement then ensures that it is positive definite by constraining the two diagonal terms to be greater than or equal to 1E ˆ’ 4.

NOBOUND

  • requests the removal of boundary constraints on covariance parameters. For example, variance components have a default lower boundary constraint of 0, and the NOBOUND option allows their estimates to be negative.

NOITER

  • requests that no Newton-Raphson iterations be performed and that PROC MIXED use the best value from the grid search to perform inferences. By default, iterations begin at the best value from the PARMS grid search.

NOPROFILE

  • specifies a different computational method for the residual variance during the grid search. By default, PROC MIXED estimates this parameter using the profile likelihood when appropriate. This estimate is displayed in the Variance column of the Parameter Search table. The NOPROFILE option suppresses the profiling and uses the actual value of the specified variance in the likelihood calculations.

OLS

  • requests starting values corresponding to the usual general linear model. Specifically, all variances and covariances are set to zero except for the residual variance, which is set equal to its ordinary least-squares (OLS) estimate. This option is useful when the default MIVQUE0 procedure produces poor starting values for the optimization process.

PARMSDATA= SAS-data-set

PDATA= SAS-data-set

  • reads in covariance parameter values from a SAS data set. The data set should contain the EST or COVP1“COVP n variables.

RATIOS

  • indicates that ratios with the residual variance are specified instead of the covariance parameters themselves . The default is to use the individual covariance parameters.

UPPERB= value-list

  • enables you to specify upper boundary constraints on the covariance parameters. The value-list specification is a list of numbers or missing values (.) separated by commas. You must list the numbers in the order that PROC MIXED uses for the covariance parameters, and each number corresponds to the upper boundary constraint. A missing value instructs PROC MIXED to use its default constraint, and if you do not specify numbers for all of the covariance parameters, PROC MIXED assumes that the remaining ones are missing.

PRIOR Statement

  • PRIOR < distribution >< / options > ;

The PRIOR statement enables you to carry out a sampling-based Bayesian analysis in PROC MIXED. It currently operates only with variance component models. The analysis produces a SAS data set containing a pseudo-random sample from the joint posterior density of the variance components and other parameters in the mixed model.

The posterior analysis is performed after all other PROC MIXED computations. It begins with the Posterior Sampling Information table, which provides basic information about the posterior sampling analysis, including the prior densities , sampling algorithm, sample size, and random number seed. For ODS purposes, the name of this table is Posterior.

By default, PROC MIXED uses an independence chain algorithm in order to generate the posterior sample (Tierney 1994). This algorithm works by generating a pseudo-random proposal from a convenient base distribution, chosen to be as close as possible to the posterior. The proposal is then retained in the sample with probability proportional to the ratio of weights constructed by taking the ratio of the true posterior to the base density. If a proposal is not accepted, then a duplicate of the previous observation is added to the chain.

In selecting the base distribution, PROC MIXED makes use of the fact that the fixed-effects parameters can be analytically integrated out of the joint posterior, leaving the marginal posterior density of the variance components. In order to better approximate the marginal posterior density of the variance components, PROC MIXED transforms them using the MIVQUE(0) equations. You can display the selected transformation with the PTRANS option or specify your own with the TDATA= option. The density of the transformed parameters is then approximated by a product of inverted gamma densities (refer to Gelfand et al. 1990).

To determine the parameters for the inverted gamma densities, PROC MIXED evaluates the logarithm of the posterior density over a grid of points in each of the transformed parameters, and you can display the results of this search with the PSEARCH option. PROC MIXED then performs a linear regression of these values on the logarithm of the inverted gamma density. The resulting base densities are displayed in the Base Densities table; for ODS purposes, the name of this table is BaseDen. You can input different base densities with the BDATA= option.

At the end of the sampling, the Acceptance Rates table displays the acceptance rate computed as the number of accepted samples divided by the total number of samples generated. For ODS purposes, the label of the Acceptance Rates table is AcceptanceRates.

The OUT= option specifies the output data set containing the posterior sample. PROC MIXED automatically includes all variance component parameters in this data set (labeled COVP1“COVP n ), the Type III F -statistics constructed as in Ghosh (1992) discussing Schervish (1992) (labeled T3F n ), the log values of the posterior (labeled LOGF), the log of the base sampling density (labeled LOGG), and the log of their ratio (labeled LOGRATIO). If you specify the SOLUTION option in the MODEL statement, the data set also contains a random sample from the posterior density of the fixed-effects parameters (labeled BETA n ), and if you specify the SOLUTION option in the RANDOM statement, the table contains a random sample from the posterior density of the random-effects parameters (labeled GAM n ). PROC MIXED also generates additional variables corresponding to any CONTRAST, ESTIMATE, or LSMEANS statement that you specify.

Subsequently, you can use SAS/INSIGHT, or the UNIVARIATE, CAPABILITY, or KDE procedures to analyze the posterior sample.

The prior density of the variance components is, by default, a noninformative version of Jeffreys prior (Box and Tiao 1973). You can also specify informative priors with the DATA= option or a flat (equal to 1) prior for the variance components. The prior density of the fixed-effects parameters is assumed to be flat (equal to 1), and the resulting posterior is conditionally multivariate normal (conditioning on the variance component parameters) with mean ( X ² V ˆ’ 1 X ) ˆ’ X ² V ˆ’ 1 y and variance ( X ² V ˆ’ 1 X ) ˆ’ .

The distribution argument in the PRIOR statement determines the prior density for the variance component parameters of your mixed model. Valid values are as follows.

DATA=

  • enables you to input the prior densities of the variance components used by the sampling algorithm. This data set must contain the TYPE and PARM1“PARM n variables, where n is the largest number of parameters among each of the base densities. The format of the DATA= data set matches that created by PROC MIXED in the Base Densities table, so you can output the densities from one run and use them as input for a subsequent run.

JEFFREYS

  • specifies a noninformative reference version of Jeffreys prior constructed using the square root of the determinant of the expected information matrix as in (1.3.92) of Box and Tiao (1973). This is the default prior.

FLAT

  • specifies a prior density equal to 1 everywhere, making the likelihood function the posterior.

  • You can specify the following options in the PRIOR statement after a slash (/).

ALG=IC INDCHAIN

ALG=IS IMPSAMP

ALG=RS REJSAMP

ALG=RWC RWCHAIN

  • specifies the algorithm used for generating the posterior sample. The ALG=IC option requests an independence chain algorithm, and it is the default. The option ALG=IS requests importance sampling, ALG=RS requests rejection sampling, and ALG=RWC requests a random walk chain. For more information on these techniques, refer to Ripley (1987), Smith and Gelfand (1992), and Tierney (1994).

BDATA=

  • enables you to input the base densities used by the sampling algorithm. This data set must contain the TYPE and PARM1“PARM n variables, where n is the largest number of parameters among each of the base densities. The format of the BDATA= data set matches that created by PROC MIXED in the Base Densities table, so you can output the densities from one run and use them as input for a subsequent run.

GRID= (value-list)

  • specifies a grid of values over which to evaluate the posterior density. The value-list syntax is the same as in the PARMS statement (see page 2706), and you must specify an output data set name with the OUTG= option.

GRIDT= (value-list)

  • specifies a transformed grid of values over which to evaluate the posterior density. The value-list syntax is the same as in the PARMS statement (see page 2706), and you must specify an output data set name with the OUTGT= option.

IFACTOR= number

  • is an alias for the SFACTOR= option.

LOGNOTE= number

  • instructs PROC MIXED to write a note to the SAS log after it generates the sample corresponding to each multiple of number . This is useful for monitoring the progress of CPU-intensive runs.

LOGRBOUND= number

  • specifies the bounding constant for rejection sampling. The value of number equals the maximum of log( f/g ) over the variance component parameter space, where f is the posterior density and g is the product inverted gamma densities used to perform rejection sampling.

  • When performing the rejection sampling, you may encounter the message

      WARNING: The log ratio bound of LL was violated at sample XX.  

    When this occurs, PROC MIXED reruns an optimization algorithm to determine a new log upper bound and then restarts the rejection sampling. The resulting OUT= data set contains all observations that have been generated; therefore, assuming that you have requested N samples, you should retain only the final N observations in this data set for analysis purposes.

NSAMPLE= number

  • specifies the number of posterior samples to generate. The default is 1000, but more accurate results are obtained with larger samples such as 10000.

NSEARCH= number

  • specifies the number of posterior evaluations PROC MIXED makes for each transformed parameter in determining the parameters for the inverted gamma densities. The default is 20.

OUT= SAS-data-set

  • creates an output data set containing the sample from the posterior density.

OUTG= SAS-data-set

  • creates an output data set from the grid evaluations specified in the GRID= option.

OUTGT= SAS-data-set

  • creates an output data set from the transformed grid evaluations specified in the GRIDT= option.

PSEARCH

  • displays the search used to determine the parameters for the inverted gamma densities. For ODS purposes, the name of the table is Search.

PTRANS

  • displays the transformation of the variance components. For ODS purposes, the name of the table is Trans.

SEED= number

  • specifies an integer used to start the pseudo-random number generator for the simulation. If you do not specify a seed, or specify a value less than or equal to zero, the seed is by default generated from reading the time of day from the computer clock. You should use a positive seed (less than 2 31 ˆ’ 1) whenever you want to duplicate the sample in another run of PROC MIXED.

SFACTOR= number

  • enables you to adjust the range over which PROC MIXED searches the transformed parameters in order to determine the parameters for the inverted gamma densities. PROC MIXED determines the range by first transforming the estimates from the standard PROC MIXED analysis (REML, ML, or MIVQUE0, depending upon which estimation method you select). It then multiplies and divides the transformed estimates by 2* number to obtain upper and lower bounds, respectively. Transformed values that produce negative variance components in the original scale are not included in the search. The default value is 1; number must be greater than 0.5.

TDATA=

  • enables you to input the transformation of the covariance parameters used by the sampling algorithm. This data set should contain the CovP1“CovP n variables. The format of the TDATA= data set matches that created by PROC MIXED in the Trans table, so you can output the transformation from one run and use is as input for a subsequent run.

TRANS=EXPECTED

TRANS=MIVQUE0

TRANS=OBSERVED

  • specifies the particular algorithm used to determine the transformation of the covariance parameters. The default is MIVQUE0, indicating a transformation based on the MIVQUE(0) equations. The other two options indicate the type of Hessian matrix used in constructing the transformation via a Cholesky root.

UPDATE= number

  • is an alias for the LOGNOTE= option.

RANDOM Statement

  • RANDOM random-effects < / options > ;

The RANDOM statement defines the random effects constituting the ³ vector in the mixed model. It can be used to specify traditional variance component models (as in the VARCOMP procedure) and to specify random coefficients. The random effects can be classification or continuous, and multiple RANDOM statements are possible.

Using notation from the Mixed Models Theory section beginning on page 2731, the purpose of the RANDOM statement is to define the Z matrix of the mixed model, the random effects in the ³ vector, and the structure of G . The Z matrix is constructed exactly as the X matrix for the fixed effects, and the G matrix is constructed to correspond with the effects constituting Z . The structure of G is defined by using the TYPE= option described on page 2715.

You can specify INTERCEPT (or INT) as a random effect to indicate the intercept. PROC MIXED does not include the intercept in the RANDOM statement by default as it does in the MODEL statement.

You can specify the following options in the RANDOM statement after a slash (/).

ALPHA= number

  • requests that a t -type confidence interval be constructed for each of the random effect estimates with confidence level 1 ˆ’ number . The value of number must be between 0 and 1; the default is 0.05.

CL

  • requests that t -type confidence limits be constructed for each of the random effect estimates. The confidence level is 0.95 by default; this can be changed with the ALPHA= option.

G

  • requests that the estimated G matrix be displayed. PROC MIXED displays blanks for values that are 0. If you specify the SUBJECT= option, then the block of the G matrix corresponding to the first subject is displayed. For ODS purposes, the name of the table is G.

GC

  • displays the lower- triangular Cholesky root of the estimated G matrix according to the rules listed under the G option. For ODS purposes, the name of the table is CholG.

GCI

  • displays the inverse Cholesky root of the estimated G matrix according to the rules listed under the G option. For ODS purposes, the name of the table is InvCholG.

GCORR

  • displays the correlation matrix corresponding to the estimated G matrix according to the rules listed under the G option. For ODS purposes, the name of the table is GCorr.

GDATA= SAS-data-set

  • requests that the G matrix be read in from a SAS data set. This G matrix is assumed to be known; therefore, only R -side parameters from effects in the REPEATED statement are included in the Newton-Raphson iterations. If no REPEATED statement is specified, then only a residual variance is estimated.

  • The information in the GDATA= data set can appear in one of two ways. The first is a sparse representation for which you include ROW, COL, and VALUE variables to indicate the row, column, and value of G . All unspecified locations are assumed to be 0. The second representation is for dense matrices. In it you include ROW and COL1“COL n variables to indicate the row and columns of G , which is a symmetric matrix of order n . For both representations, you must specify effects in the RANDOM statement that generate a Z matrix that contains n columns. See Example 46.4 on page 2802.

  • If you have more than one RANDOM statement, only one GDATA= option is required on any one of them, and the data set you specify must contain the entire G matrix defined by all of the RANDOM statements.

  • If the GDATA= data set contains variance ratios instead of the variances themselves, then use the RATIOS option.

  • Known parameters of G can also be input using the PARMS statement with the HOLD= option.

GI

  • displays the inverse of the estimated G matrix according to the rules listed under the G option. For ODS purposes, the name of the table is InvG.

GROUP= effect

GRP= effect

  • defines an effect specifying heterogeneity in the covariance structure of G . All observations having the same level of the group effect have the same covariance parameters. Each new level of the group effect produces a new set of covariance parameters with the same structure as the original group. You should exercise caution in defining the group effect, as strange covariance patterns can result with its misuse. Also, the group effect can greatly increase the number of estimated covariance parameters, which may adversely affect the optimization process.

  • Continuous variables are permitted as arguments to the GROUP= option. PROC MIXED does not sort by the values of the continuous variable; rather, it considers the data to be from a new subject or group whenever the value of the continuous variable changes from the previous observation. Using a continuous variable decreases execution time for models with a large number of subjects or groups and also prevents the production of a large Class Levels Information table.

LDATA= SAS-data-set

  • reads the coefficient matrices associated with the TYPE=LIN( number ) option. The data set must contain the variables PARM, ROW, COL1“COLn, or PARM, ROW, COL, VALUE. The PARM variable denotes which of the number coefficient matrices is currently being constructed, and the ROW, COL1“COLn, or ROW, COL, VALUE variables specify the matrix values, as they do with the GDATA= option. Unspecified values of these matrices are set equal to 0.

NOFULLZ

  • eliminates the columns in Z corresponding to missing levels of random effects involving CLASS variables. By default, these columns are included in Z .

RATIOS

  • indicates that ratios with the residual variance are specified in the GDATA= data set instead of the covariance parameters themselves. The default GDATA= data set contains the individual covariance parameters.

SOLUTION

S

  • requests that the solution for the random-effects parameters be produced. Using notation from the Mixed Models Theory section beginning on page 2731, these estimates are the empirical best linear unbiased predictors (EBLUPs) = Z' ˆ’ 1 ( y ˆ’ X ) . They can be useful for comparing the random effects from different experimental units and can also be treated as residuals in performing diagnostics for your mixed model.

  • The numbers displayed in the SE Pred column of the Solution for Random Effects table are not the standard errors of the displayed in the Estimate column; rather, they are the standard errors of predictions i ˆ’ ³ i , where i is the i th EBLUP and ³ i is the i th random-effect parameter.

SUBJECT= effect

SUB= effect

  • identifies the subjects in your mixed model. Complete independence is assumed across subjects; thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal structure in G with identical blocks. The Z matrix is modified to accommodate this block-diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the RANDOM statement within the subject effect.

  • Continuous variables are permitted as arguments to the SUBJECT= option. PROC MIXED does not sort by the values of the continuous variable; rather, it considers the data to be from a new subject or group whenever the value of the continuous variable changes from the previous observation. Using a continuous variable decreases execution time for models with a large number of subjects or groups and also prevents the production of a large Class Levels Information table.

  • When you specify the SUBJECT= option and a classification random effect, computations are usually much quicker if the levels of the random effect are duplicated within each level of the SUBJECT= effect.

TYPE= covariance-structure

  • specifies the covariance structure of G . Valid values for covariance-structure and their descriptions are listed in Table 46.5 on page 2721 and Table 46.6 on page 2722. Although a variety of structures are available, most applications call for either TYPE=VC or TYPE=UN. The TYPE=VC (variance components) option is the default structure, and it models a different variance component for each random effect.

  • The TYPE=UN (unstructured) option is useful for correlated random coefficient models. For example,

      random intercept age / type=un subject=person;  
  • specifies a random intercept-slope model that has different variances for the intercept and slope and a covariance between them. You can also use TYPE=FA0(2) here to request a G estimate that is constrained to be nonnegative definite.

  • If you are constructing your own columns of Z with continuous variables, you can use the TYPE=TOEP(1) structure to group them together to have a common variance component. If you desire to have different covariance structures in different parts of G , you must use multiple RANDOM statements with different TYPE= options.

V < = value-list >

  • requests that blocks of the estimated V matrix be displayed. The first block determined by the SUBJECT= effect is the default displayed block. PROC MIXED displays entries that are 0 as blanks in the table.

  • You can optionally use the value-list specification, which indicates the subjects for which blocks of V are to be displayed. For example, the statement

      random int time / type=un subject=person v=1,3,7;  
  • displays block matrices for the first, third, and seventh persons. The table name for ODS purposes is V.

VC < = value-list >

  • displays the Cholesky root of the blocks of the estimated V matrix. The value-list specification is the same as in the V= option. The table name for ODS purposes is CholV.

VCI < = value-list >

  • displays the inverse of the Cholesky root of the blocks of the estimated V matrix. The value-list specification is the same as in the V= option. The table name for ODS purposes is InvCholV.

VCORR < = value-list >

  • displays the correlation matrix corresponding to the blocks of the estimated V matrix. The value-list specification is the same as in the V= option. The table name for ODS purposes is VCorr.

VI < = value-list >

  • displays the inverse of the blocks of the estimated V matrix. The value-list specification is the same as in the V= option. The table name for ODS purposes is InvV.

REPEATED Statement

  • REPEATED < repeated-effect >< / options > ;

The REPEATED statement is used to specify the R matrix in the mixed model. Its syntax is different from that of the REPEATED statement in PROC GLM. If no REPEATED statement is specified, R is assumed to be equal to ƒ 2 I .

For many repeated measures models, no repeated effect is required in the REPEATED statement. Simply use the SUBJECT= option to define the blocks of R and the TYPE= option to define their covariance structure. In this case, the repeated measures data must be similarly ordered for each subject, and you must indicate all missing response variables with periods in the input data set unless they all fall at the end of a subjects repeated response profile. These requirements are necessary in order to inform PROC MIXED of the proper location of the observed repeated responses.

Specifying a repeated effect is useful when you do not want to indicate missing values with periods in the input data set. The repeated effect must contain only classification variables. Make sure that the levels of the repeated effect are different for each observation within a subject; otherwise, PROC MIXED constructs identical rows in R corresponding to the observations with the same level. This results in a singular R andaninfinite likelihood.

Whether you specify a REPEATED effect or not, the rows of R for each subject are constructed in the order that they appear in the input data set.

You can specify the following options in the REPEATED statement after a slash (/).

GROUP= effect

GRP= effect

  • defines an effect specifying heterogeneity in the covariance structure of R . All observations having the same level of the GROUP effect have the same covariance parameters. Each new level of the GROUP effect produces a new set of covariance parameters with the same structure as the original group. You should exercise caution in properly defining the GROUP effect, as strange covariance patterns can result with its misuse. Also, the GROUP effect can greatly increase the number of estimated covariance parameters, which may adversely affect the optimization process.

  • Continuous variables are permitted as arguments to the GROUP= option. PROC MIXED does not sort by the values of the continuous variable; rather, it considers the data to be from a new subject or group whenever the value of the continuous variable changes from the previous observation. Using a continuous variable decreases execution time for models with a large number of subjects or groups and also prevents the production of a large Class Levels Information table.

HLM

  • produces a table of Hotelling-Lawley-McKeon statistics (McKeon 1974) for all fixed effects whose levels change across data having the same level of the SUBJECT= effect (the within-subject fixed effects). This option applies only when you specify a REPEATED statement with the TYPE=UN option and no RANDOM statements. For balanced data, this model is equivalent to the multivariate model for repeated measuresinPROCGLM.

  • The Hotelling-Lawley-McKeon statistic has a slightly better F approximation than the Hotelling-Lawley-Pillai-Samson statistic (see the description of the HLPS option, which follows). Both of the Hotelling-Lawley statistics can perform much better in small samples than the default F statistic (Wright 1994).

  • Separate tables are produced for Type I, II, and III tests, according to the ones you select. For ODS purposes, the labels for these tables are HLM1, HLM2, and HLM3, respectively.

HLPS

  • produces a table of Hotelling-Lawley-Pillai-Samson statistics (Pillai and Samson 1959) for all fixed effects whose levels change across data having the same level of the SUBJECT= effect (the within-subject fixed effects). This option applies only when you specify a REPEATED statement with the TYPE=UN option and no RANDOM statements. For balanced data, this model is equivalent to the multivariate model for repeated measures in PROC GLM, and this statistic is the same as the Hotelling-Lawley Trace statistic produced by PROC GLM.

  • Separate tables are produced for Type I, II, and III tests, according to the ones you select. For ODS purposes, the labels for these tables are HLPS1, HLPS2, and HLPS3, respectively.

LDATA= SAS-data-set

  • reads the coefficient matrices associated with the TYPE=LIN( number ) option. The data set must contain the variables PARM, ROW, COL1“COLn, or PARM, ROW, COL, VALUE. The PARM variable denotes which of the number coefficient matrices is currently being constructed, and the ROW, COL1“COLn, or ROW, COL, VALUE variables specify the matrix values, as they do with the RANDOM statement option GDATA=. Unspecified values of these matrices are set equal to 0.

LOCAL

LOCAL=EXP( < effects > )

LOCAL=POM( POM-data-set )

  • requests that a diagonal matrix be added to R . With just the LOCAL option, this diagonal matrix equals ƒ 2 I , and ƒ 2 becomes an additional variance parameter that PROC MIXED profiles out of the likelihood provided that you do not specify the NOPROFILE option in the PROC MIXED statement. The LOCAL option is useful if you want to add an observational error to a time series structure (Jones and Boadi-Boateng 1991) or a nugget effect to a spatial structure (Cressie 1991).

  • The LOCAL=EXP( <effects> ) option produces exponential local effects, also known as dispersion effects, in a log-linear variance model. These local effects have the form

  • where U is the full-rank design matrix corresponding to the effects that you specify and are the parameters that PROC MIXED estimates. An intercept is not included in U because it is accounted for by ƒ 2 . PROC MIXED constructs the full-rank U in terms of 1s and ˆ’ 1s for classification effects. Be sure to scale continuous effects in U sensibly.

  • The LOCAL=POM( POM-data-set ) option specifies the power-of-the-mean structure. This structure possesses a variance of the form for the i th observation, where x i is the i th row of X (the design matrix of the fixed effects), and ² * is an estimate of the fixed-effects parameters that you specify in POM-data-set .

  • The SAS data set specified by POM-data-set contains the numeric variable Estimate (in previous releases, the variable name was required to be EST), and it has at least as many observations as there are fixed-effects parameters. The first p observations of the Estimate variable in POM-data-set are taken to be the elements of ² *, where p is the number of columns of X . You must order these observations according to the non-full-rank parameterization of the MIXED procedure. One easy way to set up POM-data-set for a ² * corresponding to ordinary least squares is illustrated by the following code:

      ods output SolutionF=sf;   proc mixed;   class a;   modely=ax/s;   run;   proc mixed;   class a;   modely=ax;   repeated / local=pom(sf);   run;  
  • Note that the generalized least-squares estimate of the fixed-effects parameters from the second PROC MIXED step usually is not the same as your specified ² *.However, you can iterate the POM fitting until the two estimates agree. Continuing from the previous example, the code for performing one step of this iteration is as follows.

      ods output SolutionF=sf1;   proc mixed;   class a;   modely=ax/s;   repeated / local=pom(sf);   run;   proc compare brief data=sf compare=sf1;   var estimate;   run;   data sf;   set sf1;   run;  
  • Unfortunately, this iterative process does not always converge. For further details, refer to the description of pseudo-likelihood in Chapter 3 of Carroll and Ruppert (1988).

LOCALW

  • specifies that only the local effects and no others be weighted. By default, all effects are weighted. The LOCALW option is used in connection with the WEIGHT statement and the LOCAL option in the REPEATED statement

NONLOCALW

  • specifies that only the nonlocal effects and no others be weighted. By default, all effects are weighted. The NONLOCALW option is used in connection with the WEIGHT statement and the LOCAL option in the REPEATED statement

R < = value-list >

  • requests that blocks of the estimated R matrix be displayed. The first block determined by the SUBJECT= effect is the default displayed block. PROC MIXED displays blanks for value-lists that are 0.

  • The value-list indicates the subjects for which blocks of R are to be displayed. For example,

      repeated / type=cs subject=person r=1,3,5;  
  • displays block matrices for the first, third, and fifth persons. See the PARMS Statement section on page 2706 for the possible forms of value-list . The table name for ODS purposes is R.

RC < = value-list >

  • produces the Cholesky root of blocks of the estimated R matrix. The value-list specification is the same as with the R option. The table name for ODS purposes is CholR.

RCI < = value-list >

  • produces the inverse Cholesky root of blocks of the estimated R matrix. The value-list specification is the same as with the R option. The table name for ODS purposes is InvCholR.

RCORR < = value-list >

  • produces the correlation matrix corresponding to blocks of the estimated R matrix. The value-list specification is the same as with the R option. The table name for ODS purposes is RCorr.

RI < = value-list >

  • produces the inverse of blocks of the estimated R matrix. The value-list specification is the same as with the R option. The table name for ODS purposes is InvR.

SSCP

  • requests that an unstructured R matrix be estimated from the sum-of-squares-and-crossproducts matrix of the residuals. It applies only when you specify TYPE=UN and have no RANDOM statements. Also, you must have a sufficient number of subjects for the estimate to be positive definite.

  • This option is useful when the size of the blocks of R are large (for example, greater than 10) and you want to use or inspect an unstructured estimate that is much quicker to compute than the default REML estimate. The two estimates will agree for certain balanced data sets when you have a classification fixed effect defined across all time points within a subject.

SUBJECT= effect

SUB= effect

  • identifies the subjects in your mixed model. Complete independence is assumed across subjects; therefore, the SUBJECT= option produces a block-diagonal structure in R with identical blocks. When the SUBJECT= effect consists entirely of classification variables, the blocks of R correspond to observations sharing the same level of that effect. These blocks are sorted according to this effect as well.

  • Continuous variables are permitted as arguments to the SUBJECT= option. PROC MIXED does not sort by the values of the continuous variable; rather, it considers the data to be from a new subject or group whenever the value of the continuous variable changes from the previous observation. Using a continuous variable decreases execution time for models with a large number of subjects or groups and also prevents the production of a large Class Levels Information table.

  • If you want to model nonzero covariance among all of the observations in your SAS data set, specify SUBJECT=INTERCEPT to treat the data as if they are all from one subject. Be aware though that, in this case, PROC MIXED manipulates an R matrix with dimensions equal to the number of observations. If no SUBJECT= effect is specified, then every observation is assumed to be from a different subject and R is assumed to be diagonal. For this reason, you usually want to use the SUBJECT= option in the REPEATED statement.

TYPE= covariance-structure

  • specifies the covariance structure of the R matrix. The SUBJECT= option defines the blocks of R , and the TYPE= option specifies the structure of these blocks. Valid values for covariance-structure and their descriptions are provided in Table 46.5 and Table 46.6. The default structure is VC.

    Table 46.5: Covariance Structures

    Structure

    Description

    Parms

    ( i,j ) th element

    ANTE(1)

    Ante-Dependence

    2 t ˆ’ 1

    AR(1)

    Autoregressive(1)

    2

    ARH(1)

    Heterogeneous AR(1)

    t +1

    ARMA(1,1)

    ARMA(1,1)

    3

    click to expand

    CS

    Compound Symmetry

    2

    CSH

    Heterogeneous CS

    t +1

    click to expand

    FA( q )

    Factor Analytic

    click to expand

    FA0( q )

    No Diagonal FA

    FA1( q )

    Equal Diagonal FA

    click to expand

    HF

    Huynh-Feldt

    t +1

    click to expand

    LIN( q )

    General Linear

    q

    TOEP

    Toeplitz

    t

    TOEP( q )

    Banded Toeplitz

    q

    click to expand

    TOEPH

    Heterogeneous TOEP

    2 t ˆ’ 1

    TOEPH( q )

    Banded Hetero TOEP

    t + q ˆ’ 1

    click to expand

    UN

    Unstructured

    t ( t +1) / 2

    UN( q )

    Banded

    UNR

    Unstructured Corrs

    t ( t +1) / 2

    click to expand

    UNR( q )

    Banded Correlations

    click to expand

    UN@AR(1)

    Direct Product AR(1)

    t 1 ( t 1 +1) / 2+1

    UN@CS

    Direct Product CS

    t 1 ( t 1 +1) / 2+1

    click to expand

    UN@UN

    Direct Product UN

    t 1 ( t 1 +1) / 2+ t 2 ( t 2 +1) / 2 ˆ’ 1

    VC

    Variance Components

    q

    and i corresponds to k th effect

  • In Table 46.5, Parms is the number of covariance parameters in the structure, t is the overall dimension of the covariance matrix, and 1( A ) equals 1 when A is true and 0 otherwise. For example, 1( i = j ) equals 1 when i = j and 0 otherwise, and 1( i ˆ’ j < q ) equals 1 when i ˆ’ j < q and 0 otherwise. For the TOEPH structures, =1, and for the UNR structures, ii =1 for all i . For the direct product structures, the subscripts 1 and 2 refer to the first and second structure in the direct product, respectively, and i 1 = int(( i + t 2 ˆ’ 1) /t 2 ), j 1 = int(( j + t 2 ˆ’ 1) /t 2 ), i 2 = mod( i ˆ’ 1 , t 2 ) + 1, and j 2 = mod( j ˆ’ 1 , t 2 ) + 1.

    Table 46.6: Spatial Covariance Structures

    Structure

    Description

    Parms

    ( i, j ) th element

    SP(EXP)( c-list )

    Exponential

    2

    SP(EXPA)( c-list )

    Anisotropic Exponential

    2 c +1

    click to expand

    SP(EXPGA)( c 1 c 2 )

    2D Exponential,

    Geometrically Anisotropic

    4

    click to expand

    SP(GAU)( c-list )

    Gaussian

    2

    SP(GAUGA)( c 1 c 2 )

    2D Gaussian,

    Geometrically Anisotropic

    4

    click to expand

    SP(LIN)( c-list )

    Linear

    2

    click to expand

    SP(LINL)( c-list )

    Linear log

    2

    click to expand

    SP(MATERN)( c-list )

    Mat rn

    3

    click to expand

    SP(MATHSW)( c-list )

    Mat rn

    (Handcock-Stein-Wallis)

    3

    click to expand

    SP(POW)( c-list )

    Power

    2

    SP(POWA)( c-list )

    Anisotropic Power

    c + 1

    click to expand

    SP(SPH)( c-list )

    Spherical

    2

    click to expand

    SP(SPHGA)( c 1 c 2 )

    2D Spherical,

    Geometrically Anisotropic

    4

    click to expand

  • In Table 46.6, c-list contains the names of the numeric variables used as coordinates of the location of the observation in space, and d ij is the Euclidean distance between the i th and j th vectors of these coordinates, which correspond to the i th and j th observations in the input data set. For SP(POWA) and SP(EXPA), c is the number of coordinates, and d ( i, j, k ) is the absolute distance between the k th coordinate, k = 1 ,..., c , of the i th and j th observations in the input data set. For the geometrically anisotropic structures SP(EXPGA), SP(GAUGA), and SP(SPHGA), exactly two spatial coordinate variables must be specified as c 1 and c 2 . Geometric anisotropy is corrected by applying a rotation and scaling » to the coordinate system, and d ij ( , » ) represents the Euclidean distance between two points in the transformed space. SP(MATERN) and SP(MATHSW) represent covariance structures in a class defined by Mat rn (refer to Mat rn 1986, Handcock and Stein 1993, Handcock and Wallis 1994). The function K ½ is the modified Bessel function of the second kind of (real) order ½ > 0; the parameter ½ governs the smoothness of the process (see below for more details).

  • Table 46.7 lists some examples of the structures in Table 46.5 and Table 46.6.

    Table 46.7: Covariance Structure Examples

    Description

    Structure

    Example

    Variance Components

    VC (default)

    click to expand

    Compound Symmetry

    CS

    click to expand

    Unstructured

    UN

    click to expand

    Banded Main Diagonal

    UN(1)

    click to expand

    First-Order Autoregressive

    AR(1)

    click to expand

    Toeplitz

    TOEP

    click to expand

    Toeplitz with Two Bands

    TOEP(2)

    click to expand

    Spatial Power

    SP(POW)(c)

    click to expand

    Heterogeneous AR(1)

    ARH(1)

    click to expand

    First-Order Autoregressive Moving-Average

    ARMA(1,1)

    click to expand

    Heterogeneous CS

    CSH

    click to expand

    First-Order Factor Analytic

    FA(1)

    click to expand

    Huynh-Feldt

    HF

    click to expand

    First-Order Ante-dependence

    ANTE(1)

    click to expand

    Heterogeneous Toeplitz

    TOEPH

    click to expand

    Unstructured Correlations

    UNR

    click to expand

    Direct Product AR(1)

    UN@AR(1)

    click to expand

       

    click to expand

  • The following provides some further information about these covariance structures:

    TYPE=ANTE(1)

    specifies the first-order antedependence structure (refer to Kenward 1987, Patel 1991, and Macchiavelli and Arnold 1994). In Table 46.5, is the i th variance parameter, and k is the k th autocorrelation parameter satisfying k < 1.

    TYPE=AR(1)

    specifies a first-order autoregressive structure. PROC MIXED imposes the constraint < 1 for stationarity. TYPE=ARH(1) specifies a heterogeneous first-order autoregressive structure. As with TYPE=AR(1), PROC MIXED imposes the constraint < 1 for stationarity.

    TYPE=ARMA(1,1)

    specifies the first-order autoregressive moving average structure. In Table 46.5, is the autoregressive parameter, ³ models a moving average component, and ƒ 2 is the residual variance. In the notation of Fuller (1976, p. 68), = 1 and

    click to expand

    The example in Table 46.7 and b 1 < 1 imply that

    click to expand

    where ± = ³ ˆ’ and ² = 1+ 2 ˆ’ 2 ³ . PROC MIXED imposes the constraints < 1 and ³ < 1 for stationarity, although for some values of and ³ in this region the resulting covariance matrix is not positive definite. When the estimated value of becomes negative, the computed covariance is multiplied by cos( d ij ) to account for the negativity.

    TYPE=CS

    specifies the compound-symmetry structure, which has constant variance and constant covariance.

    TYPE=CSH

    specifies the heterogeneous compound-symmetry structure. This structure has a different variance parameter for each diagonal element, and it uses the square roots of these parameters in the off-diagonal entries. In Table 46.5, is the i th variance parameter, and is the correlation parameter satisfying < 1.

    TYPE=FA( q )

    specifies the factor-analytic structure with q factors (Jennrich and Schluchter 1986). This structure is of the form ² + D , where is a t q rectangular matrix and D is a t t diagonal matrix with t different parameters. When q> 1, the elements of in its upper right-hand corner (that is, the elements in the i th row and j th column for j>i )are set to zero to fix the rotation of the structure.

    TYPE=FA0( q )

    is similar to the FA( q ) structure except that no diagonal matrix D is included. When q < t , that is, when the number of factors is less than the dimension of the matrix, this structure is nonnegative definite but not of full rank. In this situation, you can use it for approximating an unstructured G matrix in the RANDOM statement or for combining with the LOCAL option in the REPEATED statement. When q = t , you can use this structure to constrain G to be nonnegative definite in the RANDOM statement.

    TYPE=FA1( q )

    is similar to the FA( q ) structure except that all of the elements in D are constrained to be equal. This offers a useful and more parsimonious alternative to the full factor-analytic structure.

    TYPE=HF

    specifies the Huynh-Feldt covariance structure (Huynh and Feldt 1970). This structure is similar to the CSH structure in that it has the same number of parameters and heterogeneity along the main diagonal. However, it constructs the off-diagonal elements by taking arithmetic rather than geometric means.

    You can perform a likelihood ratio test of the Huynh-Feldt conditions by running PROC MIXED twice, once with TYPE=HF and once with TYPE=UN, and then subtracting their respective values of ˆ’ 2 times the maximized likelihood.

    If PROC MIXED does not converge under your Huynh-Feldt model, you can specify your own starting values with the PARMS statement. The default MIVQUE(0) starting values can sometimes be poor for this structure. A good choice for starting values is often the parameter estimates corresponding to an initial fitusing TYPE=CS.

    TYPE=LIN( q )

    specifies the general linear covariance structure with q parameters (Helms and Edwards 1991). This structure consists of a linear combination of known matrices that are input with the LDATA= option. This structure is very general, and you need to make sure that the variance matrix is positive definite. By default, PROC MIXED sets the initial values of the parameters to 1. You can use the PARMS statement to specify other initial values.

    TYPE=SIMPLE

    is an alias for TYPE=VC.

    TYPE=SP(EXPA)( c-list )

    specifies the spatial anisotropic exponential structure, where c-list is a list of variables indicating the coordinates. This structure has ( i,j )th element equal to

    click to expand

     

    where c is the number of coordinates and d ( i, j, k ) is the absolute distance between the k th coordinate ( k = 1 ,...,c )of the i th and j th observations in the input data set. There are 2 c +1 parameters to be estimated: k , p k ( k =1 ,...,c ), and ƒ 2 .

    You may want to constrain some of the EXPA parameters to known values. For example, suppose you have three coordinate variables C1, C2, and C3 and you want to constrain the powers p k to equal 2, as in Sacks et al. (1989). Suppose further that you want to model covariance across the entire input data set and you suspect the k and ƒ 2 estimates are close to 3, 4, 5, and 1, respectively. Then specify

      repeated / type=sp(expa)(c1 c2 c3)   subject=intercept;   parms (3) (4) (5) (2) (2) (2) (1) /   hold=4,5,6;  

    TYPE=SP(EXPGA)( c 1 c 2 )

     

    TYPE=SP(GAUGA)( c 1 c 2 )

     

    TYPE=SP(SPHGA)( c 1 c 2 )

    specify modifications of the isotropic SP(EXP), SP(SPH), and SP(GAU) covariance structures that allow for geometric anisotropy in two dimensions. The coordinates are specified by the variables c1 and c2 .

    If the spatial process is geometrically anisotropic in c =[ c i 1 ,c i 2 ], then it is isotropic in the coordinate system

    click to expand

    for a properly chosen angle and scaling factor » . Elliptical iso-correlation contours are thereby transformed to spherical contours , adding two parameters to the respective isotropic covariance structures. Euclidean distances (see Table 46.6 on page 2722) are expressed in terms of c *.

    The angle of the clockwise rotation is reported in radians, 0 2 . The scaling parameter » represents the ratio of the range parameters in the direction of the major and minor axis of the correlation contours. In other words, following a rotation of the coordinate system by angle , isotropy is achieved by compressing or magnifying distances in one coordinate by the factor » .

    Fixing » = 1 . reduces the models to isotropic ones for any angle of rotation. If the scaling parameter is held constant at 1.0, you should also hold constant the angle of rotation, e.g.,

      repeated / type=sp(expga)(gxc gyc)   subject=intercept;   parms (6) (1.0) (0.0) (1) / hold=2,3;  

    If » is fixed at any other value than 1.0, the angle of rotation can be estimated. Specifying a starting grid of angles and scaling factors can considerably improve the convergence properties of the optimization algorithm for these models. Only a single random effect with geometrically anisotropic structure is permitted.

    TYPE=SP(MATERN)( c-list )

     

    TYPE=SP(MATHSW)( c-list )

    specifies covariance structures in the Mat rn class of covariance functions (Mat rn 1986). Two observations for the same subject (block of R ) that are Euclidean distance d ij apart have covariance

    click to expand

    where K ½ is the modified Bessel function of the second kind of (real) order ½ > 0. The smoothness (continuity) of a stochastic process with covariance function in this class increases with ½ .The Mat rn class thus enables data-driven estimation of the smoothness properties. The covariance is identical to the exponential model for ½ = 0 . 5 (TYPE=SP(EXP)( c-list )), while for ½ = 1 the model advocated by Whittle (1954) results. As ½ ˆ the model approaches the gaussian covariance structure (TYPE=SP(GAU)( c-list )).

    The MATHSW structure represents the Mat rn class in the parameterization of Handcock and Stein (1993) and Handcock and Wallis (1994),

    click to expand

    Since computation of the function K ½ and its derivatives is numerically very intensive, fitting models with Mat rn covariance structures can be more time consuming than for other spatial covariance structures. Good starting values are essential.

    TYPE=SP(POW)( c-list )

     

    TYPE=SP(POWA)( c-list )

    specifies the spatial power structures. When the estimated value of becomes negative, the computed covariance is multiplied by cos( d ij ) to account for the negativity.

    TYPE=TOEP<( q )>

    specifies a banded Toeplitz structure. This can be viewed as a moving-average structure with order equal to q ˆ’ 1. The TYPE=TOEP option is a full Toeplitz matrix, which can be viewed as an autoregressive structure with order equal to the dimension of the matrix. The specification TYPE=TOEP(1) is the same as ƒ 2 I , where I is an identity matrix, and it can be useful for specifying the same variance component for several effects.

    TYPE=TOEPH<( q )>

    specifies a heterogeneous banded Toeplitz structure. In Table 46.5, ƒ 2 i is the i th variance parameter and j is the j th correlation parameter satisfying j < 1. If you specify the order parameter q , then PROC MIXED estimates only the first q bands of the matrix, setting all higher bands equal to 0. The option TOEPH(1) is equivalent to both the UN(1) and UNR(1) options.

    TYPE=UN<( q )>

    specifies a completely general (unstructured) covariance matrix parameterized directly in terms of variances and covariances. The variances are constrained to be nonnegative, and the covariances are unconstrained. This structure is not constrained to be nonnegative definite in order to avoid nonlinear constraints; however, you can use the FA0 structure if you want this constraint to be imposed by a Cholesky factorization. If you specify the order parameter q , then PROC MIXED estimates only the first q bands of the matrix, setting all higher bands equal to 0.

    TYPE=UNR<( q )>

    specifies a completely general (unstructured) covariance matrix parameterized in terms of variances and correlations. This structure fits the same model as the TYPE=UN( q ) option but with a different parameterization. The i th variance parameter is ƒ 2 i .The parameter jk is the correlation between the j th and k th measurements; it satisfies jk < 1. If you specify the order parameter r , then PROC MIXED estimates only the first q bands of the matrix, setting all higher bands equal to zero.

    TYPE=UN@AR(1)

     

    TYPE=UN@CS

     

    TYPE=UN@UN

    specify direct (Kronecker) product structures designed for multivariate repeated measures (refer to Galecki 1994). These structures are constructed by taking the Kronecker product of an unstructured matrix (modeling covariance across the multivariate observations) with an additional covariance matrix (modeling covariance across time or another factor). The upper left value in the second matrix is constrained to equal 1 to identify the model. Refer to SAS/IML Users Guide, First Edition, for more details on direct products.

    To use these structures in the REPEATED statement, you must specify two distinct REPEATED effects, both of which must be included in the CLASS statement. The first effect indicates the multivariate observations, and the second identifies the levels of time or some additional factor. Note that the input data set must still be constructed in univariate format; that is, all dependent observations are still listed observation-wise in one single variable. Although this construction provides for general modeling possibilities, it forces you to construct variables indicating both dimensions of the Kronecker product.

    For example, suppose your observed data consist of heights and weights of several children measured over several successive years .

    Your input data set should then contain variables similar to the following:

    ¢ Y , all of the heights and weights, with a separate observation for each

    ¢ Var , indicating whether the measurement is a height or a weight

    ¢ Year , indicating the year of measurement

    ¢ Child , indicating the child on which the measurement was taken

    Your PROC MIXED code for a Kronecker AR(1) structure across years would then be

      proc mixed;   class Var Year Child;   model Y = Var Year Var*Year;   repeated Var Year / type=un@ar(1)   subject=Child;   run;  

    You should nearly always want to model different means for the multivariate observations, hence the inclusion of Var in the MODEL statement. The preceding mean model consists of cell means for all combinations of VAR and YEAR .

    TYPE=VC

    specifies standard variance components and is the default structure for both the RANDOM and REPEATED statements. In the RANDOM statement, a distinct variance component is assigned to each effect. In the REPEATED statement, this structure is usually used only with the GROUP= option to specify a heterogeneous variance model.

  • Jennrich and Schluchter (1986) provide general information about the use of covariance structures, and Wolfinger (1996) presents details about many of the heterogeneous structures. Marx and Thompson (1987), Cressie (1991), and Zimmerman and Harville (1991) discuss spatial structures.

WEIGHT Statement

  • WEIGHT variable ;

If you do not specify a REPEATED statement, the WEIGHT statement operates exactly like the one in PROC GLM. In this case PROC MIXED replaces X ² X and Z ² Z with X ² WX and Z ² WZ , where W is the diagonal weight matrix. If you specify a REPEATED statement, then the WEIGHT statement replaces R with LRL , where L is a diagonal matrix with elements W ˆ’ 1 / 2 . Observations with nonpositive or missing weights are not included in the PROC MIXED analysis.




SAS.STAT 9.1 Users Guide (Vol. 4)
SAS.STAT 9.1 Users Guide (Vol. 4)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net