Chapter 46: The MIXED Procedure | SAS.STAT 9.1 Users Guide (Vol. 4)

Overview

The MIXED procedure fits a variety of mixed linear models to data and enables you to use these fitted models to make statistical inferences about the data. A mixed linear model is a generalization of the standard linear model used in the GLM procedure, the generalization being that the data are permitted to exhibit correlation and nonconstant variability. The mixed linear model, therefore, provides you with the flexibility of modeling not only the means of your data (as in the standard linear model) but their variances and covariances as well.

The primary assumptions underlying the analyses performed by PROC MIXED are as follows :

The data are normally distributed (Gaussian).
The means (expected values) of the data are linear in terms of a certain set of parameters.
The variances and covariances of the data are in terms of a different set of parameters, and they exhibit a structure matching one of those available in PROC MIXED.

Since Gaussian data can be modeled entirely in terms of their means and variances/covariances, the two sets of parameters in a mixed linear model actually specify the complete probability distribution of the data. The parameters of the mean model are referred to as fixed-effects parameters , and the parameters of the variance-covariance model are referred to as covariance parameters .

The fixed-effects parameters are associated with known explanatory variables, as in the standard linear model. These variables can be either qualitative (as in the traditional analysis of variance) or quantitative (as in standard linear regression). However, the covariance parameters are what distinguishes the mixed linear model from the standard linear model.

The need for covariance parameters arises quite frequently in applications, the following being the two most typical scenarios:

The experimental units on which the data are measured can be grouped into clusters, and the data from a common cluster are correlated.
Repeated measurements are taken on the same experimental unit, and these repeated measurements are correlated or exhibit variability that changes.

The first scenario can be generalized to include one set of clusters nested within another. For example, if students are the experimental unit, they can be clustered into classes, which in turn can be clustered into schools . Each level of this hierarchy can introduce an additional source of variability and correlation. The second scenario occurs in longitudinal studies, where repeated measurements are taken over time. Alternatively, the repeated measures could be spatial or multivariate in nature.

PROC MIXED provides a variety of covariance structures to handle the previous two scenarios. The most common of these structures arises from the use of random-effects parameters , which are additional unknown random variables assumed to impact the variability of the data. The variances of the random-effects parameters, commonly known as variance components , become the covariance parameters for this particular structure. Traditional mixed linear models contain both fixed- and random-effects parameters, and, in fact, it is the combination of these two types of effects that led to the name mixed model .PROCMIXEDfits not only these traditional variance component models but numerous other covariance structures as well.

PROC MIXED fits the structure you select to the data using the method of restricted maximum likelihood (REML) , also known as residual maximum likelihood .Itishere that the Gaussian assumption for the data is exploited. Other estimation methods are also available, including maximum likelihood and MIVQUE0 . The details behind these estimation methods are discussed in subsequent sections.

Once a model has been fit to your data, you can use it to draw statistical inferences via both the fixed-effects and covariance parameters. PROC MIXED computes several different statistics suitable for generating hypothesis tests and confidence intervals. The validity of these statistics depends upon the mean and variance-covariance model you select, so it is important to choose the model carefully . Some of the output from PROC MIXED helps you assess your model and compare it with others.

Basic Features

PROC MIXED provides easy accessibility to numerous mixed linear models that are useful in many common statistical analyses. In the style of the GLM procedure, PROC MIXED fits the specified mixed linear model and produces appropriate statistics.

Some basic features of PROC MIXED are

covariance structures, including variance components, compound symmetry, unstructured, AR(1), Toeplitz, spatial, general linear, and factor analytic
GLM-type grammar, using MODEL, RANDOM, and REPEATED statements for model specification and CONTRAST, ESTIMATE, and LSMEANS statements for inferences
appropriate standard errors for all specified estimable linear combinations of fixed and random effects, and corresponding t- and F- tests
subject and group effects that enable blocking and heterogeneity, respectively
REML and ML estimation methods implemented with a Newton-Raphson algorithm
capacity to handle unbalanced data
ability to create a SAS data set corresponding to any table

PROC MIXED uses the Output Delivery System (ODS), a SAS subsystem that provides capabilities for displaying and controlling the output from SAS procedures. ODS enables you to convert any of the output from PROC MIXED into a SAS data set. See the 'ODS Table Names' section on page 2752.

Experimental graphics are now available with the MIXED procedure. For more information, see the 'ODS Graphics' section on page 2757.

Notation for the Mixed Model

This section introduces the mathematical notation used throughout this chapter to describe the mixed linear model. You should be familiar with basic matrix algebra (refer to Searle 1982). A more detailed description of the mixed model is contained in the 'Mixed Models Theory' section on page 2731.

A statistical model is a mathematical description of how data are generated. The standard linear model, as used by the GLM procedure, is one of the most common statistical models:

In this expression, y represents a vector of observed data, ² is an unknown vector of fixed-effects parameters with known design matrix X , and ˆˆ is an unknown random error vector modeling the statistical noise around X ² . The focus of the standard linear model is to model the mean of y by using the fixed-effects parameters ² . The residual errors ˆˆ are assumed to be independent and identically distributed Gaussian random variables with mean 0 and variance ƒ ² .

The mixed model generalizes the standard linear

model as follows:

Here, ³ is an unknown vector of random-effects parameters with known design matrix Z , and ˆˆ is an unknown random error vector whose elements are no longer required to be independent and homogeneous.

To further develop this notion of variance modeling, assume that ³ and ˆˆ are Gaussian random variables that are uncorrelated and have expectations and variances G and R , respectively. The variance of y is thus

Note that, when R = ƒ ² I and Z = , the mixed model reduces to the standard linear model.

You can model the variance of the data, y , by specifying the structure (or form) of Z , G , and R . The model matrix Z is set up in the same fashion as X , the model matrix for the fixed-effects parameters. For G and R , you must select some covariance structure. Possible covariance structures include

variance components
compound symmetry (common covariance plus diagonal)
unstructured (general covariance)
autoregressive
spatial
general linear
factor analytic

By appropriately defining the model matrices X and Z , as well as the covariance structure matrices G and R , you can perform numerous mixed model analyses.

PROC MIXED Contrasted with Other SAS Procedures

PROC MIXED is a generalization of the GLM procedure in the sense that PROC GLM fits standard linear models, and PROC MIXED fits the wider class of mixed linear models. Both procedures have similar CLASS, MODEL, CONTRAST, ESTIMATE, and LSMEANS statements, but their RANDOM and REPEATED statements differ (see the following paragraphs). Both procedures use the non-full-rank model parameterization, although the sorting of classification levels can differ between the two. PROC MIXED computes only Type I-Type III tests of fixed effects, while PROC GLM offers Types I-IV.

The RANDOM statement in PROC MIXED incorporates random effects constituting the ³ vector in the mixed model. However, in PROC GLM, effects specified in the RANDOM statement are still treated as fixed as far as the model fit is concerned , and they serve only to produce corresponding expected mean squares. These expected mean squares lead to the traditional ANOVA estimates of variance components. PROC MIXED computes REML and ML estimates of variance parameters, which are generally preferred to the ANOVA estimates (Searle 1988; Harville 1988; Searle, Casella, and McCulloch 1992). Optionally, PROC MIXED also computes MIVQUE0 estimates, which are similar to ANOVA estimates.

The REPEATED statement in PROC MIXED is used to specify covariance structures for repeated measurements on subjects, while the REPEATED statement in PROC GLM is used to specify various transformations with which to conduct the traditional univariate or multivariate tests. In repeated measures situations, the mixed model approach used in PROC MIXED is more flexible and more widely applicable than either the univariate or multivariate approaches. In particular, the mixed model approach provides a larger class of covariance structures and a better mechanism for handling missing values (Wolfinger and Chang 1995).

PROC MIXED subsumes the VARCOMP procedure. PROC MIXED provides a wide variety of covariance structures, while PROC VARCOMP estimates only simple random effects. PROC MIXED carries out several analyses that are absent in PROC VARCOMP, including the estimation and testing of linear combinations of fixed and random effects.

The ARIMA and AUTOREG procedures provide more time series structures than PROC MIXED, although they do not fit variance component models. The CALIS procedure fits general covariance matrices, but it does not allow fixed effects as does PROC MIXED. The LATTICE and NESTED procedures fit special types of mixed linear models that can also be handled in PROC MIXED, although PROC MIXED may run slower because of its more general algorithm. The TSCSREG procedure analyzes time-series cross-sectional data, and it fits some structures not available in PROC MIXED.