Syntax


The following statements are available in PROC GLM.

  • PROC GLM < options > ;

    • CLASS variables < / option > ;

    • MODEL dependents=independents < / options > ;

    • ABSORB variables ;

    • BY variables ;

    • FREQ variable ;

    • ID variables ;

    • WEIGHT variable ;

    • CONTRAST label effect values <...effect values ></ options > ;

    • ESTIMATE label effect values < . ..effect values ></ options > ;

    • LSMEANS effects < / options > ;

    • MANOVA < test-options >< / detail-options > ;

    • MEANS effects < / options > ;

    • OUTPUT < OUT= SAS-data-set >

      • keyword=names <...keyword= names ></ option > ;

    • RANDOM effects < / options > ;

    • REPEATED factor-specification < / options > ;

    • TEST < H= effects > E= effect < / options > ;

Although there are numerous statements and options available in PROC GLM, many applications use only a few of them. Often you can find the features you need by looking at an example or by quickly scanning through this section.

To use PROC GLM, the PROC GLM and MODEL statements are required. You can specify only one MODEL statement (in contrast to the REG procedure, for example, which allows several MODEL statements in the same PROC REG run). If your model contains classification effects, the classification variables must be listed in a CLASS statement, and the CLASS statement must appear before the MODEL statement. In addition, if you use a CONTRAST statement in combination with a MANOVA, RANDOM, REPEATED, or TEST statement, the CONTRAST statement must be entered first in order for the contrast to be included in the MANOVA, RANDOM, REPEATED, or TEST analysis.

The following table summarizes the positional requirements for the statements in the GLM procedure.

Table 32.1: Positional Requirements for PROC GLM Statements

Statement

Must Appear Before the

Must Appear After the

ABSORB

first RUN statement

 

BY

first RUN statement

 

CLASS

MODEL statement

 

CONTRAST

MANOVA, REPEATED, or RANDOM statement

MODEL statement

ESTIMATE

 

MODEL statement

FREQ

first RUN statement

 

ID

first RUN statement

 

LSMEANS

 

MODEL statement

MANOVA

 

CONTRAST or MODEL statement

MEANS

 

MODEL statement

MODEL

CONTRAST, ESTIMATE, LSMEANS, or MEANS statement

CLASS statement

OUTPUT

 

MODEL statement

RANDOM

 

CONTRAST or MODEL statement

REPEATED

 

CONTRAST, MODEL, or TEST statement

TEST

MANOVA or REPEATED statement

MODEL statement

WEIGHT

first RUN statement

 

The following table summarizes the function of each statement (other than the PROC statement) in the GLM procedure:

Table 32.2: Statements in the GLM Procedure

Statement

Description

ABSORB

absorbs classification effects in a model

BY

specifies variables to define subgroups for the analysis

CLASS

declares classification variables

CONTRAST

constructs and tests linear functions of the parameters

ESTIMATE

estimates linear functions of the parameters

FREQ

specifies a frequency variable

ID

identifies observations on output

LSMEANS

computes least-squares (marginal) means

MANOVA

performs a multivariate analysis of variance

MEANS

computes and optionally compares arithmetic means

MODEL

defines the model to be fit

OUTPUT

requests an output data set containing diagnostics for each observation

RANDOM

declares certain effects to be random and computes expected mean squares

REPEATED

performs multivariate and univariate repeated measures analysis of variance

TEST

constructs tests using the sums of squares for effects and the error term you specify

WEIGHT

specifies a variable for weighting observations

The rest of this section gives detailed syntax information for each of these statements, beginning with the PROC GLM statement. The remaining statements are covered in alphabetical order.

PROC GLM Statement

  • PROC GLM < options > ;

The PROC GLM statement starts the GLM procedure. You can specify the following options in the PROC GLM statement:

ALPHA= p

  • specifies the level of significance p for 100(1 ˆ’ p ) % confidence intervals. The value must be between 0 and 1; the default value of p =0 . 05 results in 95% intervals. This value is used as the default confidence level for limits computed by the following options.

    Statement

    Options

    LSMEANS

    CL

    MEANS

    CLM CLDIFF

    MODEL

    CLI CLM CLPARM

    OUTPUT

    UCL= LCL= UCLM= LCLM=

  • You can override the default in each of these cases by specifying the ALPHA= option for each statement individually.

DATA = SAS-data-set

  • names the SAS data set used by the GLM procedure. By default, PROC GLM uses the most recently created SAS data set.

MANOVA

  • requests the multivariate mode of eliminating observations with missing values. If any of the dependent variables have missing values, the procedure eliminates that observation from the analysis. The MANOVA option is useful if you use PROC GLM in interactive mode and plan to perform a multivariate analysis.

MULTIPASS

  • requests that PROC GLM reread the input data set when necessary, instead of writing the necessary values of dependent variables to a utility file. This option decreases disk space usage at the expense of increased execution times, and is useful only in rare situations where disk space is at an absolute premium.

NAMELEN= n

  • specifies the length of effect names in tables and output data sets to be n characters long, where n is a value between 20 and 200 characters. The default length is 20 characters.

NOPRINT

  • suppresses the normal display of results. The NOPRINT option is useful when you want only to create one or more output data sets with the procedure. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, Using the Output Delivery System, for more information.

ORDER=DATA FORMATTED FREQ INTERNAL

  • specifies the sorting order for the levels of all classification variables (specified in the CLASS statement). This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option may be useful when you use CONTRAST or ESTIMATE statements. Note that the ORDER= option applies to the levels for all classification variables. The exception is the default ORDER=FORMATTED for numeric variables for which you have supplied no explicit format. In this case, the levels are ordered by their internal value. Note that this represents a change from previous releases for how class levels are ordered. In releases previous to Version 8, numeric class levels with no explicit format were ordered by their BEST12. formatted values, and in order to revert to the previous ordering you can specify this format explicitly for the affected classification variables. The change was implemented because the former default behavior for ORDER=FORMATTED often resulted in levels not being ordered numerically and usually required the user to intervene with an explicit format or ORDER=INTERNAL to get the more natural ordering. The following table shows how PROC GLM interprets values of the ORDER= option.

    Value of ORDER=

    Levels Sorted By

    DATA

    order of appearance in the input data set

    FORMATTED

    external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

    FREQ

    descending frequency count; levels with the most observations come first in the order

    INTERNAL

    unformatted value

  • By default, ORDER=FORMATTED. For FORMATTED and INTERNAL, the sort order is machine dependent. For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide , and the discussion of BY- group processing in SAS Language Reference: Concepts .

OUTSTAT= SAS-data-set

  • names an output data set that contains sums of squares, degrees of freedom, F statistics, and probability levels for each effect in the model, as well as for each CONTRAST that uses the overall residual or error mean square (MSE) as the denominator in constructing the F statistic. If you use the CANONICAL option in the MANOVA statement and do not use an M= specification in the MANOVA statement, the data set also contains results of the canonical analysis. See the section Output Data Sets on page 1840 for more information.

ABSORB Statement

  • ABSORB variables ;

Absorption is a computational technique that provides a large reduction in time and memory requirements for certain types of models. The variables are one or more variables in the input data set.

For a main effect variable that does not participate in interactions, you can absorb the effect by naming it in an ABSORB statement. This means that the effect can be adjusted out before the construction and solution of the rest of the model. This is particularly useful when the effect has a large number of levels.

Several variables can be specified, in which case each one is assumed to be nested in the preceding variable in the ABSORB statement.

Note: When you use the ABSORB statement, the data set (or each BY group, if a BY statement appears) must be sorted by the variables in the ABSORB statement. The GLM procedure cannot produce predicted values or least-squares means (LS-means) or create an output data set of diagnostic values if an ABSORB statement is used. If the ABSORB statement is used, it must appear before the first RUN statement or it is ignored.

When you use an ABSORB statement and also use the INT option in the MODEL statement, the procedure ignores the option but computes the uncorrected total sum of squares (SS) instead of the corrected total sums of squares.

See the Absorption section on page 1799 for more information.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC GLM to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the GLM procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure (in base SAS software).

Since sorting the data changes the order in which PROC GLM reads observations, the sorting order for the levels of the classification variables may be affected if you have also specified ORDER=DATA in the PROC GLM statement. This, in turn , affects specifications in CONTRAST and ESTIMATE statements.

If you specify the BY statement, it must appear before the first RUN statement or it is ignored. When you use a BY statement, the interactive features of PROC GLM are disabled.

When both BY and ABSORB statements are used, observations must be sorted first by the variables in the BY statement, and then by the variables in the ABSORB statement.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Contents . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

  • CLASS variables < / option > ;

The CLASS statement names the classification variables to be used in the model. Typical class variables are TREATMENT, SEX, RACE, GROUP, and REPLICATION. If you specify the CLASS statement, it must appear before the MODEL statement.

By default, class levels are determined from the entire formatted values of the CLASS variables. Note that this represents a slight change from previous releases in the way in which class levels are determined. In releases prior to Version 9, class levels were determined using no more than the first 16 characters of the formatted values. If you wish to revert to this previous behavior you can use the TRUNCATE option on the CLASS statement. In any case, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide , and the discussions for the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .

The GLM procedure displays a table summarizing the class variables and their levels, and you can use this to check the ordering of levels and, hence, of the corresponding parameters for main effects. If you need to check the ordering of parameters for interaction effects, use the E option in the MODEL, CONTRAST, ESTIMATE, and LSMEANS statements. See the Parameterization of PROC GLM Models section on page 1787 for more information.

You can specify the following option in the CLASS statement after a slash(/):

TRUNCATE

  • specifies that class levels should be determined using only up to the first 16 characters of the formatted values of CLASS variables. When formatted values are longer than 16 characters, you can use this option in order to revert to the levels as determined in releases previous to Version 9.

CONTRAST Statement

  • CONTRAST label effect values <...effect values >< / options > ;

The CONTRAST statement enables you to perform custom hypothesis tests by specifying an L vector or matrix for testing the univariate hypothesis L ² =0or the multivariate hypothesis LBM =0. Thus, to use this feature you must be familiar with the details of the model parameterization that PROC GLM uses. For more information, see the Parameterization of PROC GLM Models section on page 1787. All of the elements of the L vector may be given, or if only certain portions of the L vector are given, the remaining elements are constructed by PROC GLM from the context (in a manner similar to rule 4 discussed in the Construction of Least-Squares Means section on page 1820).

There is no limit to the number of CONTRAST statements you can specify, but they must appear after the MODEL statement. In addition, if you use a CONTRAST statement and a MANOVA, REPEATED, or TEST statement, appropriate tests for contrasts are carried out as part of the MANOVA, REPEATED, or TEST analysis. If you use a CONTRAST statement and a RANDOM statement, the expected mean square of the contrast is displayed. As a result of these additional analyses, the CONTRAST statement must appear before the MANOVA, REPEATED, RANDOM, or TEST statement.

In the CONTRAST statement,

label

identifies the contrast on the output. A label is required for every contrast specified. Labels must be enclosed in quotes.

effect

identifies an effect that appears in the MODEL statement, or the INTERCEPT effect. The INTERCEPT effect can be used when an intercept is fitted in the model. You do not need to include all effects that are in the MODEL statement.

values

are constants that are elements of the L vector associated with the effect.

You can specify the following options in the CONTRAST statement after a slash(/):

E

  • displays the entire L vector. This option is useful in confirming the ordering of parameters for specifying L .

E= effect

  • specifies an error term, which must be one of the effects in the model. The procedure uses this effect as the denominator in F tests in univariate analysis. In addition, if you use a MANOVA or REPEATED statement, the procedure uses the effect specified by the E= option as the basis of the E matrix. By default, the procedure uses the overall residual or error mean square (MSE) as an error term.

ETYPE= n

  • specifies the type (1, 2, 3, or 4, corresponding to Type I, II, III, and IV tests, respectively) of the E= effect. If the E= option is specified and the ETYPE= option is not, the procedure uses the highest type computed in the analysis.

SINGULAR= number

  • tunes the estimability checking. If ABS( L ˆ’ LH ) > C number for any row in the contrast, then L is declared nonestimable. H is the ( X ² X ) ˆ’ X ² X matrix, and C is ABS( L ) except for rows where L is zero, and then it is 1. The default value for the SINGULAR= option is 10 ˆ’ 4 . Values for the SINGULAR= option must be between 0 and 1.

  • As stated previously, the CONTRAST statement enables you to perform custom hypothesis tests. If the hypothesis is testable in the univariate case, SS( H : L ² =0)is computed as

    click to expand
  • where b =( X ² X ) ˆ’ X ² y . This is the sum of squares displayed on the analysis-of-variance table.

  • For multivariate testable hypotheses, the usual multivariate tests are performed using

    click to expand
  • where B =( X ² X ) “ X ² Y and Y is the matrix of multivariate responses or dependent variables. The degrees of freedom associated with the hypothesis is equal to the row rank of L . The sum of squares computed in this situation is equivalent to the sum of squares computed using an L matrix with any row deleted that is a linear combination of previous rows.

  • Multiple-degree-of-freedom hypotheses can be specified by separating the rows of the L matrix with commas.

  • For example, for the model

      proc glm;   class A B;   model Y=A B;   run;  

    with A at 5 levels and B at 2 levels, the parameter vector is

    click to expand
  • To test the hypothesis that the pooled A linear and A quadratic effect is zero, you can use the following L matrix:

    click to expand
  • The corresponding CONTRAST statement is

      contrast 'A LINEAR & QUADRATIC'   a -2 -1  0  1  2,   a  2 -1 -2 -1  2;  
  • If the first level of A is a control level and you want a test of control versus others, you can use this statement:

      contrast 'CONTROL VS OTHERS' a -1 0.25 0.25 0.25 0.25;  
  • See the following discussion of the ESTIMATE statement and the Specification of ESTIMATE Expressions section on page 1801 for rules on specification, construction, distribution, and estimability in the CONTRAST statement.

ESTIMATE Statement

  • ESTIMATE label effect values <...effect values >< / options > ;

The ESTIMATE statement enables you to estimate linear functions of the parameters by multiplying the vector L by the parameter estimate vector b resulting in Lb .All of the elements of the L vector may be given, or, if only certain portions of the L vector are given, the remaining elements are constructed by PROC GLM from the context (in a manner similar to rule 4 discussed in the Construction of Least-Squares Means section on page 1820).

The linear function is checked for estimability. The estimate Lb , where b = ( X ² X ) ˆ’ X ² y , is displayed along with its associated standard error,

and t test. If you specify the CLPARM option in the MODEL statement (see page 1771), confidence limits for the true value are also displayed.

There is no limit to the number of ESTIMATE statements that you can specify, but they must appear after the MODEL statement. In the ESTIMATE statement,

label

identifies the estimate on the output. A label is required for every contrast specified. Labels must be enclosed in quotes.

effect

identifies an effect that appears in the MODEL statement, or the INTERCEPT effect. The INTERCEPT effect can be used as an effect when an intercept is fitted in the model. You do not need to include all effects that are in the MODEL statement.

values

are constants that are the elements of the L vector associated with the preceding effect. For example,

  estimate 'A1 VS A2' A 1 -1;  

forms an estimate that is the difference between the parameters estimated for the first and second levels of the CLASS variable A.

You can specify the following options in the ESTIMATE statement after a slash:

DIVISOR= number

  • specifies a value by which to divide all coefficients so that fractional coefficients can be entered as integer numerators. For example, you can use

      estimate '1/3(A1+A2) - 2/3A3' a 1 1   2 / divisor=3;  
  • instead of

      estimate '1/3(A1+A2)   2/3A3' a 0.33333 0.33333   0.66667;  

E

  • displays the entire L vector. This option is useful in confirming the ordering of parameters for specifying L .

SINGULAR= number

  • tunes the estimability checking. If ABS( L ˆ’ LH ) > C number , then the L vector is declared nonestimable. H is the ( X ² X ) ˆ’ X ² X matrix, and C is ABS( L ) except for rows where L is zero, and then it is 1. The default value for the SINGULAR= option is 10 ˆ’ 4 . Values for the SINGULAR= option must be between 0 and 1.

    See also the Specification of ESTIMATE Expressions section on page 1801.

FREQ Statement

  • FREQ variable ;

The FREQ statement names a variable that provides frequencies for each observation in the DATA= data set. Specifically, if n is the value of the FREQ variable for a given observation, then that observation is used n times.

The analysis produced using a FREQ statement reflects the expanded number of observations. For example, means and total degrees of freedom reflect the expanded number of observations. You can produce the same analysis (without the FREQ statement) by first creating a new data set that contains the expanded number of observations. For example, if the value of the FREQ variable is 5 for the first observation, the first 5 observations in the new data set are identical. Each observation in the old data set is replicated n i times in the new data set, where n i is the value of the FREQ variable for that observation.

If the value of the FREQ variable is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.

If you specify the FREQ statement, it must appear before the first RUN statement or it is ignored.

ID Statement

  • ID variables ;

When predicted values are requested as a MODEL statement option, values of the variables given in the ID statement are displayed beside each observed , predicted, and residual value for identification. Although there are no restrictions on the length of ID variables, PROC GLM may truncate the number of values listed in order to display them on one line. The GLM procedure displays a maximum of five ID variables.

If you specify the ID statement, it must appear before the first RUN statement or it is ignored.

LSMEANS Statement

  • LSMEANS effects < / options > ;

Least-squares means (LS-means) are computed for each effect listed in the LSMEANS statement. You may specify only classification effects in the LSMEANS statement ”that is, effects that contain only classification variables. You may also specify options to perform multiple comparisons. In contrast to the MEANS statement, the LSMEANS statement performs multiple comparisons on interactions as well as main effects.

LS-means are predicted population margins ; that is, they estimate the marginal means over a balanced population. In a sense, LS-means are to unbalanced designs as class and subclass arithmetic means are to balanced designs. Each LS-mean is computed as L ² b for a certain column vector L , where b is the vector of parameter estimates ”that is, the solution of the normal equations. For further information, see the section Construction of Least-Squares Means on page 1820.

Multiple effects can be specified in one LSMEANS statement, or multiple LSMEANS statements can be used, but they must all appear after the MODEL statement. For example,

  proc glm;   class A B;   model Y=A B A*B;   lsmeans A B A*B;   run;  

LS-means are displayed for each level of the A , B , and A*B effects.

You can specify the following options in the LSMEANS statement after a slash:

ADJUST=BON

ADJUST=DUNNETT

ADJUST=SCHEFFE

ADJUST=SIDAK

ADJUST=SIMULATE <( simoptions )>

ADJUST=SMM GT2

ADJUST=TUKEY

ADJUST=T

  • requests a multiple comparison adjustment for the p -values and confidence limits for the differences of LS-means. The ADJUST= option modifies the results of the TDIFF and PDIFF options; thus, if you omit the TDIFF or PDIFF option then the ADJUST= option has no effect. By default, PROC GLM analyzes all pairwise differences unless you specify ADJUST=DUNNETT, in which case PROC GLM analyzes all differences with a control level. The default is ADJUST=T, which really signifies no adjustment for multiple comparisons.

  • The BON (Bonferroni) and SIDAK adjustments involve correction factors described in the Multiple Comparisons section on page 1806 and in Chapter 48, The MULTTEST Procedure. When you specify ADJUST=TUKEY and your data are unbalanced, PROC GLM uses the approximation described in Kramer (1956)and identifies the adjustment as Tukey-Kramer in the results. Similarly, when you specify ADJUST=DUNNETT and the LS-means are correlated, PROC GLM uses the factor-analytic covariance approximation described in Hsu (1992) and identifies the adjustment as Dunnett-Hsu in the results. The preceding references also describe the SCHEFFE and SMM adjustments.

  • The SIMULATE adjustment computes the adjusted p -values from the simulated distribution of the maximum or maximum absolute value of a multivariate t random vector. The simulation estimates q , the true (1 ˆ’ ± )th quantile, where 1 ˆ’ ± is the confidence coefficient. The default ± is the value of the ALPHA= option in the PROC GLM statement or 0.05 if that option is not specified. You can change this value with the ALPHA= option in the LSMEANS statement.

  • The number of samples for the SIMULATE adjustment is set so that the tail area for the simulated q is within a certain accuracy radius ³ of 1 ˆ’ ± with an accuracy confidence of 100(1 ˆ’ )%. In equation form,

    click to expand
  • where is the simulated q and F is the true distribution function of the maximum; refer to Edwards and Berry (1987) for details. By default, ³ = 0.005 and = 0.01 so that the tail area of is within 0.005 of 0.95 with 99% confidence.

    You can specify the following simoptions in parentheses after the ADJUST=SIMULATE option.

    ACC= value

    specifies the target accuracy radius ³ of a 100(1 ˆ’ )% confidence interval for the true probability content of the estimated (1 ˆ’ ± )th quantile. The default value is ACC=0.005. Note that, if you also specify the CVADJUST simoption , then the actual accuracy radius will probably be substantially less than this target.

    CVADJUST

    specifies that the quantile should be estimated by the control variate adjustment method of Hsu and Nelson (1998) instead of simply as the quantile of the simulated sample. Specifying the CVADJUST option typically has the effect of significantly reducing the accuracy radius ³ of a 100 — (1 ˆ’ )% confidence interval for the true probability content of the estimated (1 ˆ’ ± )th quantile. The control-variate-adjusted quantile estimate takes roughly twice as long to compute, but it is typically much more accurate than the sample quantile.

    EPS= value

    specifies the value for a 100 — (1 ˆ’ )% confidence interval for the true probability content of the estimated (1 ˆ’ ± )th quantile. The default value for the accuracy confidence is 99%, corresponding to EPS=0.01.

    NSAMP= n

    specifies the sample size for the simulation. By default, n is set based on the values of the target accuracy radius ³ and accuracy confidence 100 — (1 ˆ’ )true probability content of the estimated (1 ˆ’ ± )th quantile. With the default values for ³ , , and ± (0.005, 0.01, and 0.05, respectively), NSAMP=12604 by default.

    REPORT

    specifies that a report on the simulation should be displayed, including a listing of the parameters, such as ³ , , and ± as well as an analysis of various methods for estimating or approximating the quantile.

    SEED= number

    specifies an integer used to start the pseudo-random number generator for the simulation. If you do not specify a seed, or specify a value less than or equal to zero, the seed is by default generated from reading the time of day from the computer s clock.

ALPHA= p

  • specifies the level of significance p for 100(1 ˆ’ p )% confidence intervals. This option is useful only if you also specify the CL option, and, optionally, the PDIFF option. By default, p is equal to the value of the ALPHA= option in the PROC GLM statement or 0.05 if that option is not specified, This value is used to set the endpoints for confidence intervals for the individual means as well as for differences between means.

AT variable = value

AT ( variable-list )=( value-list )

AT MEANS

  • enables you to modify the values of the covariates used in computing LS-means. By default, all covariate effects are set equal to their mean values for computation of standard LS-means. The AT option enables you to set the covariates to whatever values you consider interesting. For more information, see the section Setting Covariate Values on page 1821.

BYLEVEL

  • requests that PROC GLM process the OM data set by each level of the LS-mean effect in question. For more details, see the entry for the OM option in this section.

CL

  • requests confidence limits for the individual LS-means. If you specify the PDIFF option, confidence limits for differences between means are produced as well. You can control the confidence level with the ALPHA= option. Note that, if you specify an ADJUST= option, the confidence limits for the differences are adjusted for multiple inference but the confidence intervals for individual means are not adjusted.

COV

  • includes variances and covariances of the LS-means in the output data set specified in the OUT= option in the LSMEANS statement. Note that this is the covariance matrix for the LS-means themselves , not the covariance matrix for the differences between the LS-means, which is used in the PDIFF computations . If you omit the OUT= option, the COV option has no effect. When you specify the COV option, you can specify only one effect in the LSMEANS statement.

E

  • displays the coefficients of the linear functions used to compute the LS-means.

E= effect

  • specifies an effect in the model to use as an error term. The procedure uses the mean square for the effect as the error mean square when calculating estimated standard errors (requested with the STDERR option) and probabilities (requested with the STDERR, PDIFF, or TDIFF option). Unless you specify STDERR, PDIFF or TDIFF, the E= option is ignored. By default, if you specify the STDERR, PDIFF, or TDIFF option and do not specify the E= option, the procedure uses the error mean square for calculating standard errors and probabilities.

ETYPE= n

  • specifies the type (1, 2, 3, or 4, corresponding to Type I, II, III, and IV tests, respectively) of the E= effect. If you specify the E= option but not the ETYPE= option, the highest type computed in the analysis is used. If you omit the E= option, the ETYPE= option has no effect.

NOPRINT

  • suppresses the normal display of results from the LSMEANS statement. This option is useful when an output data set is created with the OUT= option in the LSMEANS statement.

OBSMARGINS

OM

  • specifies a potentially different weighting scheme for computing LS-means coefficients. The standard LS-means have equal coefficients across classification effects; however, the OM option changes these coefficients to be proportional to those found in the input data set. For more information, see the section Changing the Weighting Scheme on page 1822.

  • The BYLEVEL option modifies the observed-margins LS-means. Instead of computing the margins across the entire data set, the procedure computes separate margins for each level of the LS-mean effect in question. The resulting LS-means are actually equal to raw means in this case. If you specify the BYLEVEL option, it disables the AT option.

OUT= SAS-data-set

  • creates an output data set that contains the values, standard errors, and, optionally, the covariances (see the COV option) of the LS-means. For more information, see the Output Data Sets section on page 1840.

PDIFF <=difftype>

  • requests that p -values for differences of the LS-means be produced. The optional difftype specifies which differences to display. Possible values for difftype are ALL, CONTROL, CONTROLL, and CONTROLU. The ALL value requests all pairwise differences, and it is the default. The CONTROL value requests the differences with a control that, by default, is the first level of each of the specified LS-mean effects.

  • To specify which levels of the effects are the controls, list the quoted formatted values in parentheses after the keyword CONTROL. For example, if the effects A , B , and C are class variables, each having two levels, 1 and 2 , the following LSMEANS statement specifies the 1 2 level of A * B and the 2 1 level of B * C as controls:

      lsmeans A*B B*C / pdiff=control('1' '2', '2' '1');  
  • For multiple effect situations such as this one, the ordering of the list is significant, and you should check the output to make sure that the controls are correct.

  • Two-tailed tests and confidence limits are associated with the CONTROL difftype. For one-tailed results, use either the CONTROLL or CONTROLU difftype.

    • PDIFF=CONTROLL tests whether the noncontrol levels are less than the control; you declare a noncontrol level to be significantly less than the control if the associated upper confidence limit for the noncontrol level minus the control is less than zero, and you ignore the associated lower confidence limits (which are set to minus infinity).

    • PDIFF=CONTROLU tests whether the noncontrol levels are greater than the control; you declare a noncontrol level to be significantly greater than the control if the associated lower confidence limit for the noncontrol level minus the control is greater than zero, and you ignore the associated upper confidence limits (which are set to infinity).

  • The default multiple comparisons adjustment for each difftype is shown in the following table.

    difftype

    Default ADJUST=

    Not specified

    T

    ALL

    TUKEY

    CONTROL

    CONTROLL

    CONTROLU

    DUNNETT

  • If no difftype is specified, the default for the ADJUST= option is T (that is, no adjustment); for PDIFF=ALL, ADJUST=TUKEY is the default; in all other instances, the default value for the ADJUST= option is DUNNETT. If there is a conflict between the PDIFF= and ADJUST= options, the ADJUST= option takes precedence.

  • For example, in order to compute one-sided confidence limits for differences with a control, adjusted according to Dunnett s procedure, the following statements are equivalent:

      lsmeans Treatment / pdiff=controll cl;   lsmeans Treatment / pdiff=controll cl adjust=dunnett;  

SLICE = fixed-effect

SLICE = ( fixed-effects )

  • specifies effects within which to test for differences between interaction LS-mean effects. This can produce what are known as tests of simple effects (Winer 1971). For example, suppose that A * B is significant and you want to test for the effect of A within each level of B . The appropriate LSMEANS statement is

      lsmeans A*B / slice=B;  
  • This code tests for the simple main effects of A for B , which are calculated by extracting the appropriate rows from the coefficient matrix for the A * B LS-means and using them to form an F -test as performed by the CONTRAST statement.

SINGULAR= number

  • tunes the estimability checking. If ABS( L ˆ’ LH ) > C — number for any row, then L is declared nonestimable. H is the ( X ² X ) ˆ’ X ² X matrix, and C is ABS( L ) except for rows where L is zero, and then it is 1. The default value for the SINGULAR= option is 10 ˆ’ 4 . Values for the SINGULAR= option must be between 0 and 1.

STDERR

  • produces the standard error of the LS-means and the probability level for the hypothesis H : LS-mean = 0.

TDIFF

  • produces the t values for all hypotheses H : LS-mean (i) = LS-mean (j) and the corresponding probabilities.

MANOVA Statement

  • MANOVA < test-options >< / detail-options > ;

If the MODEL statement includes more than one dependent variable, you can perform multivariate analysis of variance with the MANOVA statement. The test-options define which effects to test, while the detail-options specify how to execute the tests and what results to display.

When a MANOVA statement appears before the first RUN statement, PROC GLM enters a multivariate mode with respect to the handling of missing values; in addition to observations with missing independent variables, observations with any missing dependent variables are excluded from the analysis. If you want to use this mode of handling missing values and do not need any multivariate analyses, specify the MANOVA option in the PROC GLM statement.

If you use both the CONTRAST and MANOVA statements, the MANOVA statement must appear after the CONTRAST statement.

Test Options

The following options can be specified in the MANOVA statement as test-options in order to define which multivariate tests to perform.

H= effects INTERCEPT _ ALL_

  • specifies effects in the preceding model to use as hypothesis matrices. For each H matrix (the SSCP matrix associated with an effect), the H= specification displays the characteristic roots and vectors of E ˆ’ 1 H (where E is the matrix associated with the error effect), Hotelling-Lawley trace, Pillai s trace, Wilks criterion, and Roy s maximum root criterion. By default, these statistics are tested with approximations based on the F distribution. To test them with exact (but computationally intensive ) calculations, use the MSTAT=EXACT option.

  • Use the keyword INTERCEPT to produce tests for the intercept. To produce tests for all effects listed in the MODEL statement, use the keyword _ ALL_ in place of a list of effects. For background and further details, see the Multivariate Analysis of Variance section on page 1823.

E= effect

  • specifies the error effect. If you omit the E= specification, the GLM procedure uses the error SSCP (residual) matrix from the analysis.

M= equation,...,equation ( row-of-matrix,...,row-of-matrix )

  • specifies a transformation matrix for the dependent variables listed in the MODEL statement. The equations in the M= specification are of the form

    click to expand
  • where the c i values are coefficients for the various dependent-variables . If the value of a given c i is 1, it can be omitted; in other words 1 — Y is the same as Y . Equations should involve two or more dependent variables. For sample syntax, see the Examples section on page 1762.

  • Alternatively, you can input the transformation matrix directly by entering the elements of the matrix with commas separating the rows and parentheses surrounding the matrix. When this alternate form of input is used, the number of elements in each row must equal the number of dependent variables. Although these combinations actually represent the columns of the M matrix, they are displayed by rows.

  • When you include an M= specification, the analysis requested in the MANOVA statement is carried out for the variables defined by the equations in the specification, not the original dependent variables. If you omit the M= option, the analysis is performed for the original dependent variables in the MODEL statement.

  • If an M= specification is included without either the MNAMES= or PREFIX= option, the variables are labeled MVAR1, MVAR2, and so forth, by default. For further information, see the Multivariate Analysis of Variance section on page 1823.

MNAMES= names

  • provides names for the variables defined by the equations in the M= specification. Names in the list correspond to the M= equations or to the rows of the M matrix (as it is entered).

PREFIX= name

  • is an alternative means of identifying the transformed variables definedbytheM= specification. For example, if you specify PREFIX=DIFF, the transformed variables are labeled DIFF1, DIFF2, and so forth.

Detail Options

You can specify the following options in the MANOVA statement after a slash as detail-options .

CANONICAL

  • displays a canonical analysis of the H and E matrices (transformed by the M matrix, if specified) instead of the default display of characteristic roots and vectors.

ETYPE= n

  • specifies the type (1, 2, 3, or 4, corresponding to Type I, II, III, and IV tests, respectively) of the E matrix, the SSCP matrix associated with the E= effect. You need this option if you use the E= specification to specify an error effect other than residual error and you want to specify the type of sums of squares used for the effect. If you specify ETYPE= n , the corresponding test must have been performed in the MODEL statement, either by options SS n , E n , or the default Type I and Type III tests. By default, the procedure uses an ETYPE= value corresponding to the highest type (largest n ) used in the analysis.

HTYPE= n

  • specifies the type (1, 2, 3, or 4, corresponding to Type I, II, III, and IV tests, respectively) of the H matrix. See the ETYPE= option for more details.

MSTAT=FAPPROX

MSTAT=EXACT

  • specifies the method of evaluating the multivariate test statistics. The default is MSTAT=FAPPROX, which specifies that the multivariate tests are evaluated using the usual approximations based on the F distribution, as discussed in the Multivariate Tests section in Chapter 2, Introduction to Regression Procedures. Alternatively, you can specify MSTAT=EXACT to compute exact p -values for three of the four tests (Wilks Lambda, the Hotelling-Lawley Trace, and Roy s Greatest Root) and an improved F-approximation for the fourth (Pillai s Trace). While MSTAT=EXACT provides better control of the significance probability for the tests, especially for Roy s Greatest Root, computations for the exact p -values can be appreciably more demanding, and are in fact infeasible for large problems (many dependent variables). Thus, although MSTAT=EXACT is more accurate for most data, it is not the default method. For more information on the results of MSTAT=EXACT, see the Multivariate Analysis of Variance section on page 1823.

ORTH

  • requests that the transformation matrix in the M= specification of the MANOVA statement be orthonormalized by rows before the analysis.

PRINTE

  • displays the error SSCP matrix E . If the E matrix is the error SSCP (residual) matrix from the analysis, the partial correlations of the dependent variables given the independent variables are also produced.

  • For example, the statement

      manova / printe;  
  • displays the error SSCP matrix and the partial correlation matrix computed from the error SSCP matrix.

PRINTH

  • displays the hypothesis SSCP matrix H associated with each effect specified by the H= specification.

SUMMARY

  • produces analysis-of-variance tables for each dependent variable. When no M matrix is specified, a table is displayed for each original dependent variable from the MODEL statement; with an M matrix other than the identity, a table is displayed for each transformed variable defined by the M matrix.

Examples

The following statements provide several examples of using a MANOVA statement.

  proc glm;   class A B;   model Y1-Y5=A B(A) / nouni;   manova h=A e=B(A) / printh printe htype=1 etype=1;   manova h=B(A) / printe;   manova h=A e=B(A) m=Y1-Y2,Y2-Y3,Y3-Y4,Y4-Y5   prefix=diff;   manova h=A e=B(A) m=(1 -1  0  0  0,   0  1   1  0  0,   0  0  1   1  0,   0  0  0  1   1) prefix=diff;   run;  

Since this MODEL statement requests no options for type of sums of squares, the procedure uses Type I and Type III sums of squares. The first MANOVA statement specifies A as the hypothesis effect and B ( A ) as the error effect. As a result of the PRINTH option, the procedure displays the hypothesis SSCP matrix associated with the A effect; and, as a result of the PRINTE option, the procedure displays the error SSCP matrix associated with the B ( A ) effect. The option HTYPE=1 specifies a Type I H matrix, and the option ETYPE=1 specifies a Type I E matrix.

The second MANOVA statement specifies B ( A ) as the hypothesis effect. Since no error effect is specified, PROC GLM uses the error SSCP matrix from the analysis as the E matrix. The PRINTE option displays this E matrix. Since the E matrix is the error SSCP matrix from the analysis, the partial correlation matrix computed from this matrix is also produced.

The third MANOVA statement requests the same analysis as the first MANOVA statement, but the analysis is carried out for variables transformed to be successive differences between the original dependent variables. The option PREFIX=DIFF labels the transformed variables as DIFF1, DIFF2, DIFF3, and DIFF4.

Finally, the fourth MANOVA statement has the identical effect as the third, but it uses an alternative form of the M= specification. Instead of specifying a set of equations, the fourth MANOVA statement specifies rows of a matrix of coefficients for the five dependent variables.

As a second example of the use of the M= specification, consider the following:

  proc glm;   class group;   model dose1-dose4=group / nouni;   manova h = group   m =   3*dose1   dose2 +   dose3 + 3*dose4,   dose1   dose2   dose3 +   dose4,     dose1  + 3*dose2   3*dose3 +   dose4   mnames = Linear Quadratic Cubic   / printe;   run;  

The M= specification gives a transformation of the dependent variables dose1 through dose4 into orthogonal polynomial components, and the MNAMES= option labels the transformed variables LINEAR, QUADRATIC, and CUBIC, respectively. Since the PRINTE option is specified and the default residual matrix is used as an error term, the partial correlation matrix of the orthogonal polynomial components is also produced.

MEANS Statement

  • MEANS effects < / options > ;

Within each group corresponding to each effect specified in the MEANS statement, PROC GLM computes the arithmetic means and standard deviations of all continuous variables in the model (both dependent and independent). You may specify only classification effects in the MEANS statement ”that is, effects that contain only classification variables.

Note that the arithmetic means are not adjusted for other effects in the model; for adjusted means, see the LSMEANS Statement section on page 1753. If you use a WEIGHT statement, PROC GLM computes weighted means; see the Weighted Means section on page 1820.

You may also specify options to perform multiple comparisons. However, the MEANS statement performs multiple comparisons only for main effect means; for multiple comparisons of interaction means, see the LSMEANS Statement section on page 1753.

You can use any number of MEANS statements, provided that they appear after the MODEL statement. For example, suppose A and B each have two levels. Then, if you use the following statements

  proc glm;   class A B;   model Y=A B A*B;   means A=B / tukey;   means A*B;   run;  

the means, standard deviations, and Tukey s multiple comparisons tests are displayed for each level of the main effects A and B , and just the means and standard deviations are displayed for each of the four combinations of levels for A * B . Since multiple comparisons tests apply only to main effects, the single MEANS statement

  means A B A*B / tukey;  

produces the same results.

PROC GLM does not compute means for interaction effects containing continuous variables. Thus, if you have the model

  class A;   model Y=A X A*X;  

then the effects X and A * X cannot be used in the MEANS statement. However, if you specify the effect A in the means statement

  means A;  

then PROC GLM, by default, displays within- A arithmetic means of both Y and X . Use the DEPONLY option to display means of only the dependent variables.

  means A / deponly;  

If you use a WEIGHT statement, PROC GLM computes weighted means and estimates their variance as inversely proportional to the corresponding sum of weights (see the Weighted Means section on page 1820). However, note that the statistical interpretation of multiple comparison tests for weighted means is not well understood . See the Multiple Comparisons section on page 1806 for formulas. The following table summarizes categories of options available in the MEANS statement.

Task

Available options

Modify output

DEPONLY

Perform multiple comparison tests

BON

DUNCAN

DUNNETT

DUNNETTL

DUNNETTU

GABRIEL

GT2

LSD

REGWQ

SCHEFFE

SIDAK

SMM

SNK

T

TUKEY

WALLER

Specify additional details for multiple comparison tests

ALPHA=

CLDIFF

CLM

E=

ETYPE=

HTYPE=

KRATIO=

LINES

NOSORT

Test for homogeneity of variances

HOVTEST

Compensate for heterogeneous variances

WELCH

These options are described in the following list.

ALPHA= p

  • specifies the level of significance for comparisons among the means. By default, p is equal to the value of the ALPHA= option in the PROC GLM statement or 0.05 if that option is not specified. You can specify any value greater than 0 and less than 1.

BON

  • performs Bonferroni t tests of differences between means for all main effect means in the MEANS statement. See the CLDIFF and LINES options for a discussion of how the procedure displays results.

CLDIFF

  • presents results of the BON, GABRIEL, SCHEFFE, SIDAK, SMM, GT2, T, LSD, and TUKEY options as confidence intervals for all pairwise differences between means, and the results of the DUNNETT, DUNNETTU, and DUNNETTL options as confidence intervals for differences with the control. The CLDIFF option is the default for unequal cell sizes unless the DUNCAN, REGWQ, SNK, or WALLER option is specified.

CLM

  • presents results of the BON, GABRIEL, SCHEFFE, SIDAK, SMM, T, and LSD options as intervals for the mean of each level of the variables specified in the MEANS statement. For all options except GABRIEL, the intervals are confidence intervals for the true means. For the GABRIEL option, they are comparison intervals for comparing means pairwise: in this case, if the intervals corresponding to two means overlap, then the difference between them is insignificant according to Gabriel s method.

DEPONLY

  • displays only means for the dependent variables. By default, PROC GLM produces means for all continuous variables, including continuous independent variables.

DUNCAN

  • performs Duncan s multiple range test on all main effect means given in the MEANS statement. See the LINES option for a discussion of how the procedure displays results.

DUNNETT < ( formatted-control-values ) >

  • performs Dunnett s two-tailed t test, testing if any treatments are significantly different from a single control for all main effects means in the MEANS statement.

  • To specify which level of the effect is the control, enclose the formatted value in quotes in parentheses after the keyword. If more than one effect is specified in the MEANS statement, you can use a list of control values within the parentheses. By default, the first level of the effect is used as the control. For example,

      means A / dunnett('CONTROL');  
  • where CONTROL is the formatted control value of A . As another example,

      means A B C / dunnett('CNTLA' 'CNTLB' 'CNTLC');  

    where CNTLA, CNTLB, and CNTLC are the formatted control values for A , B , and C , respectively.

DUNNETTL < ( formatted-control-value ) >

  • performs Dunnett s one-tailed t test, testing if any treatment is significantly less than the control. Control level information is specified as described for the DUNNETT option.

DUNNETTU < ( formatted-control-value ) >

  • performs Dunnett s one-tailed t test, testing if any treatment is significantly greater than the control. Control level information is specified as described for the DUNNETT option.

E= effect

  • specifies the error mean square used in the multiple comparisons. By default, PROC GLM uses the overall residual or error mean square (MS). The effect specified with the E= option must be a term in the model; otherwise , the procedure uses the residual MS.

ETYPE= n

  • specifies the type of mean square for the error effect. When you specify E= effect , you may need to indicate which type (1, 2, 3, or 4) of MS is to be used. The n value must be one of the types specified in or implied by the MODEL statement. The default MS type is the highest type used in the analysis.

GABRIEL

  • performs Gabriel s multiple-comparison procedure on all main effect means in the MEANS statement. See the CLDIFF and LINES options for discussions of how the procedure displays results.

GT2

  • see the SMM option.

HOVTEST

HOVTEST=BARTLETT

HOVTEST=BF

HOVTEST=LEVENE < (TYPE= ABS SQUARE) >

HOVTEST=OBRIEN < (W= number ) >

  • requests a homogeneity of variance test for the groups defined by the MEANS effect. You can optionally specify a particular test; if you do not specify a test, Levene s test (Levene 1960) with TYPE=SQUARE is computed. Note that this option is ignored unless your MODEL statement specifies a simple one-way model.

  • The HOVTEST=BARTLETT option specifies Bartlett s test (Bartlett 1937), a modification of the normal-theory likelihood ratio test.

  • The HOVTEST=BF option specifies Brown and Forsythe s variation of Levene s test (Brown and Forsythe 1974).

  • The HOVTEST=LEVENE option specifies Levene s test (Levene 1960), which is widely considered to be the standard homogeneity of variance test. You can use the TYPE= option in parentheses to specify whether to use the absolute residuals (TYPE=ABS) or the squared residuals (TYPE=SQUARE) in Levene s test. TYPE=SQUARE is the default.

  • The HOVTEST=OBRIEN option specifies O Brien s test (O Brien 1979), which is basically a modification of HOVTEST=LEVENE(TYPE=SQUARE). You can use the W= option in parentheses to tune the variable to match the suspected kurtosis of the underlying distribution. By default, W=0.5, as suggested by O Brien (1979; 1981).

  • See the Homogeneity of Variance in One-Way Models section on page 1818 for more details on these methods. Example 32.10 on page 1892 illustrates the use of the HOVTEST and WELCH options in the MEANS statement in testing for equal group variances and adjusting for unequal group variances in a one-way ANOVA.

HTYPE= n

  • specifies the MS type for the hypothesis MS. The HTYPE= option is needed only when the WALLER option is specified. The default HTYPE= value is the highest type used in the model.

KRATIO= value

  • specifies the Type 1/Type 2 error seriousness ratio for the Waller-Duncan test. Reasonable values for the KRATIO= option are 50, 100, 500, which roughly correspond for the two-level case to ALPHA levels of 0.1, 0.05, and 0.01, respectively. By default, the procedure uses the value of 100.

LINES

  • presents results of the BON, DUNCAN, GABRIEL, REGWQ, SCHEFFE, SIDAK, SMM, GT2, SNK, T, LSD, TUKEY, and WALLER options by listing the means in descending order and indicating nonsignificant subsets by line segments beside the corresponding means. The LINES option is appropriate for equal cell sizes, for which it is the default. The LINES option is also the default if the DUNCAN, REGWQ, SNK, or WALLER option is specified, or if there are only two cells of unequal size. The LINES option cannot be used in combination with the DUNNETT, DUNNETTL, or DUNNETTU option. In addition, the procedure has a restriction that no more than 24 overlapping groups of means can exist. If a mean belongs to more than 24 groups, the procedure issues an error message. You can either reduce the number of levels of the variable or use a multiple comparison test that allows the CLDIFF option rather than the LINES option.

  • Note: If the cell sizes are unequal, the harmonic mean of the cell sizes is used to compute the critical ranges. This approach is reasonable if the cell sizes are not too different, but it can lead to liberal tests if the cell sizes are highly disparate. In this case, you should not use the LINES option for displaying multiple comparisons results; use the TUKEY and CLDIFF options instead.

LSD

  • see the T option.

NOSORT

  • prevents the means from being sorted into descending order when the CLDIFF or CLM option is specified.

REGWQ

  • performs the Ryan-Einot-Gabriel-Welsch multiple range test on all main effect means in the MEANS statement. See the LINES option for a discussion of how the procedure displays results.

SCHEFFE

  • performs Scheff s multiple-comparison procedure on all main effect means in the MEANS statement. See the CLDIFF and LINES options for discussions of how the procedure displays results.

SIDAK

  • performs pairwise t tests on differences between means with levels adjusted according to Sidak s inequality for all main effect means in the MEANS statement. See the CLDIFF and LINES options for discussions of how the procedure displays results.

SMM

GT2

  • performs pairwise comparisons based on the studentized maximum modulus and Sidak s uncorrelated- t inequality, yielding Hochberg s GT2 method when sample sizes are unequal, for all main effect means in the MEANS statement. See the CLDIFF and LINES options for discussions of how the procedure displays results.

SNK

  • performs the Student-Newman-Keuls multiple range test on all main effect means in the MEANS statement. See the LINES option for discussions of how the procedure displays results.

T

LSD

  • performs pairwise t tests, equivalent to Fisher s least-significant-difference test in the case of equal cell sizes, for all main effect means in the MEANS statement. See the CLDIFF and LINES options for discussions of how the procedure displays results.

TUKEY

  • performs Tukey s studentized range test (HSD) on all main effect means in the MEANS statement. (When the group sizes are different, this is the Tukey-Kramer test.) See the CLDIFF and LINES options for discussions of how the procedure displays results.

WALLER

  • performs the Waller-Duncan k -ratio t test on all main effect means in the MEANS statement. See the KRATIO= and HTYPE= options for information on controlling details of the test, and the LINES option for a discussion of how the procedure displays results.

WELCH

  • requests the variance-weighted one-way ANOVA of Welch (1951). This alternative to the usual analysis of variance for a one-way model is robust to the assumption of equal within-group variances. This option is ignored unless your MODEL statement specifies a simple one-way model.

  • Note that using the WELCH option merely produces one additional table consisting of Welch s ANOVA. It does not affect all of the other tests displayed by the GLM procedure, which still require the assumption of equal variance for exact validity.

  • See the Homogeneity of Variance in One-Way Models section on page 1818 for more details on Welch s ANOVA. Example 32.10 on page 1892 illustrates the use of the HOVTEST and WELCH options in the MEANS statement in testing for equal group variances and adjusting for unequal group variances in a one-way ANOVA.

MODEL Statement

  • MODEL dependents=independents < / options > ;

The MODEL statement names the dependent variables and independent effects. The syntax of effects is described in the Specification of Effects section on page 1784. For any model effect involving classification variables (interactions as well as main effects), the number of levels can not exceed 32,767. If no independent effects are specified, only an intercept term is fit. You can specify only one MODEL statement (in contrast to the REG procedure, for example, which allows several MODEL statements in the same PROC REG run).

The following table summarizes options available in the MODEL statement.

Task

Options

Produce tests for the intercept

INTERCEPT

Omit the intercept parameter from model

NOINT

Produce parameter estimates

SOLUTION

Produce tolerance analysis

TOLERANCE

Suppress univariate tests and output

NOUNI

Display estimable functions

E

E1

E2

E3

E4

ALIASING

Control hypothesis tests performed

SS1

SS2

SS3

SS4

Produce confidence intervals

ALPHA=

CLI

CLM

CLPARM

Display predicted and residual values

P

Display intermediate calculations

INVERSE

XPX

Tune sensitivity

SINGULAR=

ZETA=

These options are described in the following list.

ALIASING

  • specifies that the estimable functions should be displayed as an aliasing structure , for which each row says which linear combination of the parameters is estimated by each estimable function; also, adds a column of the same information to the table of parameter estimates, giving for each parameter the expected value of the estimate associated with that parameter. This option is most useful in fractional factorial experiments that can be analyzed without a CLASS statement.

ALPHA= p

  • specifies the level of significance p for 100(1 ˆ’ p )% confidence intervals. By default, p is equal to the value of the ALPHA= option in the PROC GLM statement, or 0.05 if that option is not specified. You may use values between 0 and 1.

CLI

  • produces confidence limits for individual predicted values for each observation. The CLI option is ignored if the CLM option is also specified.

CLM

  • produces confidence limits for a mean predicted value for each observation.

CLPARM

  • produces confidence limits for the parameter estimates (if the SOLUTION option is also specified) and for the results of all ESTIMATE statements.

E

  • displays the general form of all estimable functions. This is useful for determining the order of parameters when writing CONTRAST and ESTIMATE statements.

E1

  • displays the Type I estimable functions for each effect in the model and computes the corresponding sums of squares.

E2

  • displays the Type II estimable functions for each effect in the model and computes the corresponding sums of squares.

E3

  • displays the Type III estimable functions for each effect in the model and computes the corresponding sums of squares.

E4

  • displays the Type IV estimable functions for each effect in the model and computes the corresponding sums of squares.

INTERCEPT

INT

  • produces the hypothesis tests associated with the intercept as an effect in the model. By default, the procedure includes the intercept in the model but does not display associated tests of hypotheses. Except for producing the uncorrected total sum of squares instead of the corrected total sum of squares, the INT option is ignored when you use an ABSORB statement.

INVERSE

I

  • displays the augmented inverse (or generalized inverse) X ² X matrix:

    click to expand
  • The upper left-hand corner is the generalized inverse of X ² X , the upper right-hand corner is the parameter estimates, and the lower right-hand corner is the error sum of squares.

NOINT

  • omits the intercept parameter from the model.

NOUNI

  • suppresses the display of univariate statistics. You typically use the NOUNI option with a multivariate or repeated measures analysis of variance when you do not need the standard univariate results. The NOUNI option in a MODEL statement does not affect the univariate output produced by the REPEATED statement.

P

  • displays observed, predicted, and residual values for each observation that does not contain missing values for independent variables. The Durbin-Watson statistic is also displayed when the P option is specified. The PRESS statistic is also produced if either the CLM or CLI option is specified.

SINGULAR= number

  • tunes the sensitivity of the regression routine to linear dependencies in the design. If a diagonal pivot element is less than C number as PROC GLM sweeps the X ² X matrix, the associated design column is declared to be linearly dependent with previous columns, and the associated parameter is zeroed.

  • The C value adjusts the check to the relative scale of the variable. The C value is equal to the corrected sum of squares for the variable, unless the corrected sum of squares is 0, in which case C is 1. If you specify the NOINT option but not the ABSORB statement, PROC GLM uses the uncorrected sum of squares instead.

  • The default value of the SINGULAR= option, 10 ˆ’ 7 , may be too small, but this value is necessary in order to handle the high-degree polynomials used in the literature to compare regression routines.

SOLUTION

  • produces a solution to the normal equations (parameter estimates). PROC GLM displays a solution by default when your model involves no classification variables, so you need this option only if you want to see the solution for models with classification effects.

SS1

  • displays the sum of squares associated with Type I estimable functions for each effect. These are also displayed by default.

SS2

  • displays the sum of squares associated with Type II estimable functions for each effect.

SS3

  • displays the sum of squares associated with Type III estimable functions for each effect. These are also displayed by default.

SS4

  • displays the sum of squares associated with Type IV estimable functions for each effect.

TOLERANCE

  • displays the tolerances used in the SWEEP routine. The tolerances are of the form C/USS or C/CSS, as described in the discussion of the SINGULAR= option. The tolerance value for the intercept is not divided by its uncorrected sum of squares.

XPX

  • displays the augmented X ² X crossproducts matrix:

ZETA= value

  • tunes the sensitivity of the check for estimability for Type III and Type IV functions. Any element in the estimable function basis with an absolute value less than the ZETA= option is set to zero. The default value for the ZETA= option is 10 ˆ’ 8 .

  • Although it is possible to generate data for which this absolute check can be defeated, the check suffices in most practical examples. Additional research needs to be performed to make this check relative rather than absolute.

OUTPUT Statement

  • OUTPUT < OUT= SAS-data-set > keyword=names

    • ...keyword=names ></ option > ;

The OUTPUT statement creates a new SAS data set that saves diagnostic measures calculated after fitting the model. At least one specification of the form keyword=names is required.

All the variables in the original data set are included in the new data set, along with variables created in the OUTPUT statement. These new variables contain the values of a variety of diagnostic measures that are calculated for each observation in the data set. If you want to create a permanent SAS data set, you must specify a two-level name (refer to SAS Language Reference: Concepts for more information on permanent SAS data sets).

Details on the specifications in the OUTPUT statement follow.

keyword=names

  • specifies the statistics to include in the output data set and provides names to the new variables that contain the statistics. Specify a keyword for each desired statistic (see the following list of keywords), an equal sign, and the variable or variables to contain the statistic.

  • In the output data set, the first variable listed after a keyword in the OUTPUT statement contains that statistic for the first dependent variable listed in the MODEL statement; the second variable contains the statistic for the second dependent variable in the MODEL statement, and so on. The list of variables following the equal sign can be shorter than the list of dependent variables in the MODEL statement. In this case, the procedure creates the new names in order of the dependent variables in the MODEL statement. See the Examples section on page 1775.

  • The keywords allowed and the statistics they represent are as follows :

    COOKD

    Cook s D influence statistic

    COVRATIO

    standard influence of observation on covariance of parameter estimates

    DFFITS

    standard influence of observation on predicted value

    H

    leverage, click to expand

    LCL

    lower bound of a 100(1 ˆ’ p )% confidence interval for an individual prediction. The p -level is equal to the value of the ALPHA= option in the OUTPUT statement or, if this option is not specified, to the ALPHA= option in the PROC GLM statement. If neither of these options is set then p = 0 . 05 by default, resulting in the lower bound for a 95% confidence interval. The interval also depends on the variance of the error, as well as the variance of the parameter estimates. For the corresponding upper bound, see the UCL keyword.

    LCLM

    lower bound of a 100(1 ˆ’ p )% confidence interval for the expected value (mean) of the predicted value. The p -level is equal to the value of the ALPHA= option in the OUTPUT statement or, if this option is not specified, to the ALPHA= option in the PROC GLM statement. If neither of these options is set then p = 0 . 05 by default, resulting in the lower bound for a 95% confidence interval. For the corresponding upper bound, see the UCLM keyword.

    PREDICTED P

    predicted values

    PRESS

    residual for the i th observation that results from dropping it and predicting it on the basis of all other observations. This is the residual divided by (1 ˆ’ h i ) where h i is the leverage, defined previously.

    RESIDUAL R

    residuals, calculated as ACTUAL ˆ’ PREDICTED

    RSTUDENT

    a studentized residual with the current observation deleted

    STDI

    standard error of the individual predicted value

    STDP

    standard error of the mean predicted value

    STDR

    standard error of the residual

    STUDENT

    studentized residuals, the residual divided by its standard error

    UCL

    upper bound of a 100(1 ˆ’ p )% confidence interval for an individual prediction. The p -level is equal to the value of the ALPHA= option in the OUTPUT statement or, if this option is not specified, to the ALPHA= option in the PROC GLM statement. If neither of these options is set then p = 0 . 05 by default, resulting in the upper bound for a 95% confidence interval. The interval also depends on the variance of the error, as well as the variance of the parameter estimates. For the corresponding lower bound, see the LCL keyword.

    UCLM

    upper bound of a 100(1 ˆ’ p )% confidence interval for the expected value (mean) of the predicted value. The p -level is equal to the value of the ALPHA= option in the OUTPUT statement or, if this option is not specified, to the ALPHA= option in the PROC GLM statement. If neither of these options is set then p = 0 . 05 by default, resulting in the upper bound for a 95% confidence interval. For the corresponding lower bound, see the LCLM keyword.

OUT= SAS-data-set

  • gives the name of the new data set. By default, the procedure uses the DATA n convention to name the new data set.

  • The following option is available in the OUTPUT statement and is specified after a slash(/):

ALPHA= p

  • specifies the level of significance p for 100(1 ˆ’ p )% confidence intervals. By default, p is equal to the value of the ALPHA= option in the PROC GLM statement or 0.05 if that option is not specified. You may use values between 0 and 1.

  • See Chapter 2, Introduction to Regression Procedures, and the Influence Diagnostics section on page 3898 in Chapter 61, The REG Procedure, for details on the calculation of these statistics.

Examples

The following statements show the syntax for creating an output data set with a single dependent variable.

  proc glm;   class a b;   model y=a b a*b;   output out=new p=yhat r=resid stdr=eresid;   run;  

These statements create an output data set named new . In addition to all the variables from the original data set, new contains the variable yhat , with values that are predicted values of the dependent variable y ; the variable resid , with values that are the residual values of y ; and the variable eresid , with values that are the standard errors of the residuals.

The following statements show a situation with five dependent variables.

  proc glm;   by group;   class a;   model y1-y5=a x(a);   output out=pout predicted=py1-py5;   run;  

Data set pout contains five new variables, py1 through py5 . The values of py1 are the predicted values of y1 ; the values of py2 are the predicted values of y2 ; and so on.

For more information on the data set produced by the OUTPUT statement, see the section Output Data Sets on page 1840.

RANDOM Statement

  • RANDOM effects < / options > ;

When some model effects are random (that is, assumed to be sampled from a normal population of effects), you can specify these effects in the RANDOM statement in order to compute the expected values of mean squares for various model effects and contrasts and, optionally, to perform random effects analysis of variance tests. You can use as many RANDOM statements as you want, provided that they appear after the MODEL statement. If you use a CONTRAST statement with a RANDOM statement and you want to obtain the expected mean squares for the contrast hypothesis, you must enter the CONTRAST statement before the RANDOM statement.

Note: PROC GLM uses only the information pertaining to expected mean squares when you specify the TEST option in the RANDOM statement and, even then, only in the extra F tests produced by the RANDOM statement. Other features in the GLM procedure ”including the results of the LSMEANS and ESTIMATE statements ”assume that all effects are fixed, so that all tests and estimability checks for these statements are based on a fixed effects model, even when you use a RANDOM statement. Therefore, you should use the MIXED procedure to compute tests involving these features that take the random effects into account; see the section PROC GLM versus PROC MIXED for Random Effects Analysis on page 1833 and Chapter 46, The MIXED Procedure, for more information.

When you use the RANDOM statement, by default the GLM procedure produces the Type III expected mean squares for model effects and for contrasts specified before the RANDOM statement in the program code. In order to obtain expected values for other types of mean squares, you need to specify which types of mean squares are of interest in the MODEL statement. See the section Computing Type I, II, and IV Expected Mean Squares on page 1835 for more information.

The list of effects in the RANDOM statement should contain one or more of the pure classification effects specified in the MODEL statement (that is, main effects, crossed effects, or nested effects involving only class variables). The coefficients corresponding to each effect specified are assumed to be normally and independently distributed with common variance. Levels in different effects are assumed to be independent.

You can specify the following options in the RANDOM statement after a slash:

Q

  • displays all quadratic forms in the fixed effects that appear in the expected mean squares. For some designs, large mixed-level factorials, for example, the Q option may generate a substantial amount of output.

TEST

  • performs hypothesis tests for each effect specified in the model, using appropriate error terms as determined by the expected mean squares.

  • Caution: PROC GLM does not automatically declare interactions to be random when the effects in the interaction are declared random. For example,

      random a b / test;  
  • does not produce the same expected mean squares or tests as

      random a b a*b / test;  
  • To ensure correct tests, you need to list all random interactions and random main effects in the RANDOM statement.

  • See the section Random Effects Analysis on page 1833 for more information on the calculation of expected mean squares and tests and on the similarities and differences between the GLM and MIXED procedures. See Chapter 3, Introduction to Analysis-of-Variance Procedures, and Chapter 46, The MIXED Procedure, for more information on random effects.

REPEATED Statement

  • REPEATED factor-specification < / options > ;

When values of the dependent variables in the MODEL statement represent repeated measurements on the same experimental unit, the REPEATED statement enables you to test hypotheses about the measurement factors (often called within-subject factors ) as well as the interactions of within-subject factors with independent variables in the MODEL statement (often called between-subject factors ). The REPEATED statement provides multivariate and univariate tests as well as hypothesis tests for a variety of single-degree-of-freedom contrasts. There is no limit to the number of within-subject factors that can be specified.

The REPEATED statement is typically used for handling repeated measures designs with one repeated response variable. Usually, the variables on the left-hand side of the equation in the MODEL statement represent one repeated response variable. This does not mean that only one factor can be listed in the REPEATED statement. For example, one repeated response variable (hemoglobin count) might be measured 12 times (implying variables Y1 to Y12 on the left-hand side of the equal sign in the MODEL statement), with the associated within-subject factors treatment and time ( implying two factors listed in the REPEATED statement). See the Examples section on page 1781 for an example of how PROC GLM handles this case. Designs with two or more repeated response variables can, however, be handled with the IDENTITY transformation; see page 1779 for more information, and Example 32.9 on page 1886 for an example of analyzing a doubly-multivariate repeated measures design.

When a REPEATED statement appears, the GLM procedure enters a multivariate mode of handling missing values. If any values for variables corresponding to each combination of the within-subject factors are missing, the observation is excluded from the analysis.

If you use a CONTRAST or TEST statement with a REPEATED statement, you must enter the CONTRAST or TEST statement before the REPEATED statement.

The simplest form of the REPEATED statement requires only a factor-name . With two repeated factors, you must specify the factor-name and number of levels ( levels ) for each factor. Optionally, you can specify the actual values for the levels ( level-values ), a transformation that defines single-degree-of freedom contrasts, and options for additional analyses and output. When you specify more than one within-subject factor, the factor-names (and associated level and transformation information) must be separated by a comma in the REPEATED statement. These terms are described in the following section, Syntax Details.

Syntax Details

You can specify the following terms in the REPEATED statement.

factor-specification

  • The factor-specification for the REPEATED statement can include any number of individual factor specifications, separated by commas, of the following form:

    • factor-name levels < ( level-values ) >< transformation >

  • where

    factor-name

    names a factor to be associated with the dependent variables. The name should not be the same as any variable name that already exists in the data set being analyzed and should conform to the usual conventions of SAS variable names.

    When specifying more than one factor, list the dependent variables in the MODEL statement so that the within-subject factors defined in the REPEATED statement are nested; that is, the first factor defined in the REPEATED statement should be the one with values that change least frequently.

    levels

    gives the number of levels associated with the factor being defined. When there is only one within-subject factor, the number of levels is equal to the number of dependent variables. In this case, levels is optional. When more than one within-subject factor is defined, however, levels is required, and the product of the number of levels of all the factors must equal the number of dependent variables in the MODEL statement.

    ( level-values )

    gives values that correspond to levels of a repeated-measures factor. These values are used to label output and as spacings for constructing orthogonal polynomial contrasts if you specify a POLYNOMIAL transformation. The number of values specified must correspond to the number of levels for that factor in the REPEATED statement. Enclose the level-values in parentheses.

The following transformation keywords define single-degree-of-freedom contrasts for factors specified in the REPEATED statement. Since the number of contrasts generated is always one less than the number of levels of the factor, you have some control over which contrast is omitted from the analysis by which transformation you select. The only exception is the IDENTITY transformation; this transformation is not composed of contrasts and has the same degrees of freedom as the factor has levels. By default, the procedure uses the CONTRAST transformation.

CONTRAST < ( ordinal-reference-level ) >

generates contrasts between levels of the factor and a reference level. By default, the procedure uses the last level as the reference level; you can optionally specify a reference level in parentheses after the keyword CONTRAST. The reference level corresponds to the ordinal value of the level rather than the level value specified. For example, to generate contrasts between the first level of a factor and the other levels, use

  contrast(1)  

HELMERT

generates contrasts between each level of the factor and the mean of subsequent levels.

IDENTITY

generates an identity transformation corresponding to the associated factor. This transformation is not composed of contrasts; it has n degrees of freedom for an n -level factor, instead of n ˆ’ 1. This can be used for doubly-multivariate repeated measures.

MEAN < ( ordinal-reference-level ) >

generates contrasts between levels of the factor and the mean of all other levels of the factor. Specifying a reference level eliminates the contrast between that level and the mean. Without a reference level, the contrast involving the last level is omitted. See the CONTRAST transformation for an example.

POLYNOMIAL

generates orthogonal polynomial contrasts. Level values, if provided, are used as spacings in the construction of the polynomials; otherwise, equal spacing is assumed.

PROFILE

generates contrasts between adjacent levels of the factor.

You can specify the following options in the REPEATED statement after a slash.

CANONICAL

  • performs a canonical analysis of the H and E matrices corresponding to the transformed variables specified in the REPEATED statement.

HTYPE= n

  • specifies the type of the H matrix used in the multivariate tests and the type of sums of squares used in the univariate tests. See the HTYPE= optioninthespecifications for the MANOVA statement for further details.

MEAN

  • generates the overall arithmetic means of the within-subject variables.

MSTAT=FAPPROX

MSTAT=EXACT

  • specifies the method of evaluating the test statistics for the multivariate analysis. The default is MSTAT=FAPPROX, which specifies that the multivariate tests are evaluated using the usual approximations based on the F distribution, as discussed in the Multivariate Tests section in Chapter 2, Introduction to Regression Procedures. Alternatively, you can specify MSTAT=EXACT to compute exact p -values for three of the four tests (Wilks Lambda, the Hotelling-Lawley Trace, and Roy s Greatest Root) and an improved F-approximation for the fourth (Pillai s Trace). While MSTAT=EXACT provides better control of the significance probability for the tests, especially for Roy s Greatest Root, computations for the exact p -values can be appreciably more demanding, and are in fact infeasible for large problems (many dependent variables). Thus, although MSTAT=EXACT is more accurate for most data, it is not the default method. For more information on the results of MSTAT=EXACT, see the Multivariate Analysis of Variance section on page 1823.

NOM

  • displays only the results of the univariate analyses.

NOU

  • displays only the results of the multivariate analyses.

PRINTE

  • displays the E matrix for each combination of within-subject factors, as well as partial correlation matrices for both the original dependent variables and the variables defined by the transformations specified in the REPEATED statement. In addition, the PRINTE option provides sphericity tests for each set of transformed variables. If the requested transformations are not orthogonal, the PRINTE option also provides a sphericity test for a set of orthogonal contrasts.

PRINTH

  • displays the H (SSCP) matrix associated with each multivariate test.

PRINTM

  • displays the transformation matrices that define the contrasts in the analysis. PROC GLM always displays the M matrix so that the transformed variables are defined by the rows, not the columns, of the displayed M matrix. In other words, PROC GLM actually displays M ² .

PRINTRV

  • displays the characteristic roots and vectors for each multivariate test.

SUMMARY

  • produces analysis-of-variance tables for each contrast defined by the within-subject factors. Along with tests for the effects of the independent variables specified in the MODEL statement, a term labeled MEAN tests the hypothesis that the overall mean of the contrast is zero.

Examples

When specifying more than one factor, list the dependent variables in the MODEL statement so that the within-subject factors defined in the REPEATED statement are nested; that is, the first factor defined in the REPEATED statement should be the one with values that change least frequently. For example, assume that three treatments are administered at each of four times, for a total of twelve dependent variables on each experimental unit. If the variables are listed in the MODEL statement as Y1 through Y12 , then the following REPEATED statement

  proc glm;   classes group;   model Y1-Y12=group / nouni;   repeated trt 3, time 4;   run;  

implies the following structure:

 

Dependent Variables

 

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

Y9

Y10

Y11

Y12

Value of trt

1

1

1

1

2

2

2

2

3

3

3

3

Value of time

1

2

3

4

1

2

3

4

1

2

3

4

The REPEATED statement always produces a table like the preceding one. For more information, see the section Repeated Measures Analysis of Variance on page 1825.

TEST Statement

  • TEST < H= effects > E= effect < / options > ;

Although an F value is computed for all sums of squares in the analysis using the residual MS as an error term, you may request additional F tests using other effects as error terms. You need a TEST statement when a nonstandard error structure (as in a split-plot design) exists. Note, however, that this may not be appropriate if the design is unbalanced, since in most unbalanced designs with nonstandard error structures, mean squares are not necessarily independent with equal expectations under the null hypothesis.

Caution: The GLM procedure does not check any of the assumptions underlying the F statistic. When you specify a TEST statement, you assume sole responsibility for the validity of the F statistic produced. To help validate a test, you can use the RANDOM statement and inspect the expected mean squares, or you can use the TEST option of the RANDOM statement.

You may use as many TEST statements as you want, provided that they appear after the MODEL statement.

You can specify the following terms in the TEST statement.

H= effects

specifies which effects in the preceding model are to be used as hypothesis (numerator) effects.

E= effect

specifies one, and only one, effect to use as the error (denominator) term. The E= specification is required.

By default, the sum of squares type for all hypothesis sum of squares and error sum of squares is the highest type computed in the model. If the hypothesis type or error type is to be another type that was computed in the model, you should specify one or both of the following options after a slash.

ETYPE= n

  • specifies the type of sum of squares to use for the error term. The type must be a type computed in the model ( n =1, 2, 3, or 4).

HTYPE= n

  • specifies the type of sum of squares to use for the hypothesis. The type must be a type computed in the model ( n =1, 2, 3, or 4).

  • This example illustrates the TEST statement with a split-plot model:

      proc glm;   class a b c;   model y=a  b(a) c a*c b*c(a);   test h=a e=b(a)/ htype=1 etype=1;   test h=c a*c e=b*c(a) / htype=1 etype=1;   run;  

WEIGHT Statement

  • WEIGHT variable ;

When a WEIGHT statement is used, a weighted residual sum of squares

is minimized, where w i is the value of the variable specified in the WEIGHT statement, y i is the observed value of the response variable, and i is the predicted value of the response variable.

If you specify the WEIGHT statement, it must appear before the first RUN statement or it is ignored.

An observation is used in the analysis only if the value of the WEIGHT statement variable is nonmissing and greater than zero.

The WEIGHT statement has no effect on degrees of freedom or number of observations, but it is used by the MEANS statement when calculating means and performing multiple comparison tests (as described in the MEANS Statement section beginning on page 1763). The normal equations used when a WEIGHT statement is present are

click to expand

where W is a diagonal matrix consisting of the values of the variable specified in the WEIGHT statement.

If the weights for the observations are proportional to the reciprocals of the error variances, then the weighted least-squares estimates are best linear unbiased estimators (BLUE).




SAS.STAT 9.1 Users Guide (Vol. 3)
SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)
ISBN: B0042UQTBS
EAN: N/A
Year: 2004
Pages: 105

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net