Syntax | SAS.STAT 9.1 Users Guide (Vol. 4)

The following statements are available in PROC MULTTEST.

PROC MULTTEST < options > ;
- BY variables ;
- CLASS variable ;
- CONTRAST label values ;
- FREQ variable ;
- STRATA variable ;
- TEST name (variables < / options > ) ;

Items within angle brackets (< >) are optional, and statements following the PROC MULTTEST statement can appear in any order. The CLASS and TEST statements are required. The syntax of each statement is described in the following section in alphabetical order after the description of the PROC MULTTEST statement.

PROC MULTTEST Statement

PROC MULTTEST < options > ;

You can specify the following options in the PROC MULTTEST statement.

BONFERRONI

BON

specifies that the Bonferroni adjustments (number of tests — p -value) be computed for each test. These adjustments can be extremely conservative and should be viewed with caution. When exact tests are specified via the PERMUTATION= option in the TEST statement, the actual permutation distributions are used, resulting in a much less conservative version of this procedure (Westfall and Wolfinger 1997).

BOOTSTRAP

BOOT

specifies that the p -values be adjusted using the bootstrap method to resample vectors (Westfall and Young 1993). Resampling is performed with replacement and independently within levels of the STRATA variable. Continuous variables are mean-centered by default prior to resampling. The BOOTSTRAP option is not allowed with the PETO test for theoretical reasons.

If the PERMUTATION= suboption is used with the CA test on the TEST statement, the exact permutation distribution is recomputed for each bootstrap sample. Caution: This can be very time-consuming . It is preferable to use permutation resampling when permutation base tests are used.

CENTER

requests that continuous variables be mean-centered prior to resampling. The default action is to mean-center for bootstrap resampling and not to mean-center for permutation resampling.

DATA= SAS-data-set

names the input SAS data set to be used by PROC MULTTEST. The default is to use the most recently created data set. The DATA= and PDATA= options cannot both be specified.

FDR

requests adjusted p -values using the method of Benjamini and Hochberg (1995). These p -values do not control the familywise error rate, but they do control the false discovery rate in some cases.

FISHER_C

requests adjusted p -values using Fisher s combination method.

HOC

requests adjusted p -values using Hochberg s (1988) step-up Bonferroni method.

HOMMEL

HOM

requests adjusted p -values using Hommel s (1988) method.

HOLM

is an alias for the STEPBON adjustment.

NOCENTER

requests that continuous variables not be mean-centered prior to resampling. The default action is to mean-center for bootstrap resampling and not to mean-center for permutation resampling.

NOPRINT

suppresses the normal display of results. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, The Output Delivery System, for more information.

NOTABLES

suppresses display of the Discrete Variable Tabulations and Continuous Variable Tabulations tables.

NOZEROS

suppresses display of tables having zero occurrences for all CLASS levels.

NSAMPLE= number

N= number

specifies the number of resamples for use with the BOOTSTRAP and PERMUTATION options; it is assumed to be 20,000 by default. Large values of number (20,000 or more) are usually recommended for accuracy, but long execution times may result, particularly with large data sets.

ORDER=DATA FORMATTED FREQ INTERNAL

specifies the sorting order for the levels of the CLASS variable. By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option may be useful when you use CONTRAST statements.

When ORDER=FORMATTED is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values. In releases previous to Version 8, numeric class levels with no explicit format were ordered by their BEST12. formatted values. In order to revert to the previous method, you can specify this format explicitly for the CLASS variable. The change was implemented because the former default behavior for ORDER=FORMATTED often resulted in levels not being ordered numerically and required you to use an explicit format or to specify ORDER=INTERNAL to get the more natural ordering.

The following table shows how PROC MULTTEST interprets values of the ORDER= option.

Value of ORDER=	Levels Sorted By
DATA	order of appearance in the input data set
FORMATTED	external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value
FREQ	descending frequency count; levels with the most observations come first in the order
INTERNAL	unformatted value

For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY- group processing in SAS Language Reference: Concepts .

OUT= SAS-data-set

names the output SAS data set containing variable names, contrast names, intermediate calculations, and all associated p -values.

OUTPERM= SAS-data-set

names the output SAS data set containing entire permutation distributions (upper-tail probabilities) for all tests when the PERMUTATION= option is used. Caution: This data set can be very large.

OUTSAMP= SAS-data-set

names the output SAS data set containing information from the resampled data sets when resampling is performed. Caution: This data set can be very large.

PDATA= SAS-data-set

names an input SAS data set containing the variable raw_p with observations that consist of raw p -values. The MULTTEST procedure adjusts the collection of raw p -values for multiplicity. Resampling-based adjustments are not permitted with this type of data input. The PDATA= and DATA= options cannot both be specified.

PERMUTATION

PERM

specifies adjusted p -values in identical fashion as the BOOTSTRAP option, with the exception that PROC MULTTEST resamples without replacement rather than with replacement. Resampling is performed independently within levels of the STRATA variable. Continuous variables are not mean-centered prior to resampling. The PERMUTATION option is not allowed with the PETO test for theoretical reasons.

PVALS

requests that a summary table of raw and adjusted p -values be included.

SEED= number

S= number

specifies the initial seed for the random number generator used for resampling. The value for number must be an integer. If you do not specify a seed, or if you specify a value less than or equal to zero, then PROC MULTTEST uses the time of day from the computer s clock to generate an initial seed. For more details about seed values, refer to SAS Language Reference: Concepts .

SIDAK

SID

specifies that the Sidak adjustments be computed for each test. These adjustments take the form

where p is the raw p -value and n is the number of tests. These are slightly less conservative than the Bonferroni adjustments, but they still should be viewed with caution. When exact tests are specified via the PERMUTATION= option in the TEST statement, the actual permutation distributions are used, resulting in a much less conservative version of this procedure (Westfall and Wolfinger 1997).

STEPBON

requests adjusted p -values using the stepdown Bonferroni method of Holm (1979).

STEPBOOT

requests that adjusted p -values be computed using bootstrap resampling as described under the BOOTSTRAP option, but in stepdown fashion.

STEPPERM

requests that adjusted p -values be computed using permutation resampling as described under the PERMUTATION option, but in stepdown fashion.

STEPSID

requests adjusted p -values using the Sidak method as described in the SIDAK option, but in stepdown fashion.

BY Statement

BY variables ;

You can specify a BY statement with PROC MULTTEST to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the MIXED procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure (in base SAS software).

Since sorting the data changes the order in which PROC MULTTEST reads observations, this can affect the sorting order for the levels of the CLASS variable if you have specified ORDER=DATA in the PROC MULTTEST statement. This, in turn , affects specifications in the CONTRAST statements.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

CLASS variable ;

The CLASS statement is required. It declares a single variable (character or numeric) used to identify the groups for the analysis. For example, if the variable Treatment defines different levels of a treatment, then the statement is

  class Treatment;

The CLASS variable can be either character or numeric. By default, its levels are determined from entire formatted values. Note that this represents a slight change from previous releases in the way in which class levels are determined. In releases prior to Version 9, class levels were determined using no more than the first 16 characters of the formatted values. If you wish to revert to this previous behavior you can use the TRUNCATE option in the CLASS statement. In any case, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary . You can adjust the order of CLASS variable levels with the ORDER= option in the PROC MULTTEST statement. You need to be aware of the order when using the CONTRAST statement, and you should check the Contrast Coefficients table to verify that it is suitable.

You can specify the following option in the CLASS statement after a slash(/):

TRUNCATE specifies that class levels should be determined using only no more than the first 16 characters of the formatted values of CLASS variables. When formatted values are longer than 16 characters, you can use this option in order to revert to the levels as determined in releases previous to Version 9.

The order of the CLASS levels used by PROC MULTTEST correspond to their formatted values; this order can be changed with the ORDER= option in the PROC MULTTEST statement.

CONTRAST Statement

CONTRAST label values ;

This statement is used to identify tests between the levels of the CLASS variable; in particular, it is used to specify the coefficients for the trend tests. The label is a string naming the contrast; it contains a maximum of 21 characters. The values are scoring coefficients across the CLASS variable levels.

You can specify multiple CONTRAST statements, thereby specifying multiple contrasts for each variable. Multiplicity adjustments are computed for all contrasts and all variables simultaneously . The coefficients are applied in the order of the CLASS variables; this order can be changed with the ORDER= option in the PROC MULTTEST statement. For example, consider a four-group experiment with CLASS variable levels A1, A2, B1, and B2 denoting two levels of two treatments . The following statements produce three linear trend tests for each variable identified in the TEST statement. PROC MULTTEST computes the multiplicity adjustments over the entire collection of tests, which is three times the number of variables.

  contrast 'a vs b'   1   1  1  1;   contrast 'a linear'   1  1  0  0;   contrast 'b linear'   0  0   1  1;

As another example, consider an animal carcinogenicity experiment with dose levels 0, 4, 8, 16, and 50. You might consider trend tests defined using the following statement:

  contrast 'arithmetic trend' 0 4 8 16 50;

This statement produces a trend test using the indicated scoring coefficients. Multiplicity-adjusted p -values are then computed over the collection of variables identified in the TEST statement. Refer to Lagakos and Louis (1985) for guidelines on the selection of contrast-scoring values.

When a Fisher test is specified in the TEST statement, the CONTRAST statement coefficients are used to group the CLASS variable s levels. Groups with a ˆ’ 1 contrast coefficient are combined and compared with groups with a 1 contrast coefficient for each test, and groups with a 0 coefficient are not included in the contrast. For example, the statements

  contrast 'c vs all' 1   1   1   1;   contrast 'c vs t1'  1   1  0  0;   contrast 'c vs t3'  1  0  0   1;

compute Fisher exact tests for (a) control versus the combined treatment groups, (b) control versus the first treatment group, and (c) control versus the third treatment group. Multiplicity adjustments are then computed over the entire collection of tests and variables. Only ˆ’ 1, 1, and 0 are acceptable CONTRAST coefficients when the Fisher test is specified; PROC MULTTEST ignores the CONTRAST statement if any other coefficients appear.

If you specify the FISHER test and no CONTRAST statements, then all contrasts of control versus treatment are automatically generated, with the first level of the CLASS variable deemed to be the control. In this case, the control level is assigned the value 1 in each contrast and the other treatment levels are assigned ˆ’ 1. You should therefore use the LOWERTAILED option to test for higher success rates in the treatment groups.

For tests other than FISHER, CONTRAST values are 0,1,2,... by default. For t -tests for the mean using continuous data (and for the FT tests), the contrast coefficients are centered to have mean 0. The resulting centered scoring coefficients are then applied to the sample means (or to the double-arcsine-transformed proportions in the case of the FT tests).

FREQ Statement

FREQ variable ;

The FREQ statement names a variable that provides frequencies for each observation in the DATA= data set. Specifically, if n is the value of the FREQ variable for a given observation, then that observation is used n times.

If the value of the FREQ variable is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.

STRATA Statement

STRATA variable ;

The STRATA statement identifies a single variable to use as a stratification variable in the analysis. This yields tests similar to those discussed in Mantel and Haenszel (1959) and Hoel and Walburg (1972) for binary data and pooled-means tests for continuous data. For example, when you test for prevalence in a carcinogenicity study, it is common to stratify on intervals of the time at death; the first level of the stratification variable may represent weeks 0 “52, the second weeks 53 “80, and so on. In multicenter clinical studies, each level of the stratification variable may represent a particular center.

The following option is available in the STRATA statement after a slash (/):

WEIGHT=

specifies the type of strata weighting to use when computing the Freeman-Tukey and t -tests for the mean. Valid values are SAMPLESIZE, HARMONIC, and EQUAL. SAMPLESIZE requests weights proportional to the within-stratum sample sizes, and is the default method. HARMONIC sets up weights equal to the harmonic mean of the non-missing within-stratum CLASS sizes, and is similar to a Type 2 analysis in PROC GLM. EQUAL specifies equal weights, and is similar to a Type 3 analysis in PROC GLM.

TEST Statement

TEST name (variables < / options > ) ;

The TEST statement is required. It identifies statistical tests to be performed and the discrete and continuous variables to be tested . The following tests are permitted as name in the TEST statement.

CA	requests the Cochran-Armitage linear trend tests for group comparisons. The test variables should take the value 0 for a failure and 1 for a success. The PERMUTATION= option can be used to request an exact permutation test; otherwise , a Z -score approximation is used. The CONTINUITY= option can be used to specify a continuity correction for the Z -score approximation.
FISHER	requests Fisher exact tests for comparing two treatment groups. The test variables should take the value 0 for a failure and 1 for a success.
FT	requests Z -score CA tests based upon the Freeman-Tukey double arcsine transformation of the frequencies. The test variables should take the value 0 for a failure and 1 for a success.
MEAN	requests the t -test for the mean. The test variables can take on any numeric values.
PETO	requests the Peto mortality-prevalence test. The test variables should take the value 0 for a nonoccurrence, 1 for an incidental occurrence, and 2 for a fatal occurrence. The TIME= option should be used with the PETO test to specify a variable giving the age at death. The CONTINUITY= option can be used to specify a continuity correction for the test.

If the value of a TEST variable is invalid, the observation is not used in the analysis. You can specify two tests only if one of them is MEAN. For example, the following statement is valid

  test ca(d1-d2) mean(c1-c2);

but the statement

  test ca(d1-d2) ft(d1-d2);

is invalid.

You can specify the following options in the TEST statement (some apply to only one test).

BINOMIAL

specifies that the binomial variance estimate be used for CA and PETO tests in their asymptotic normal approximations. The default is to use the hypergeometric variance.

CONTINUITY= number

C= number

specifies number as a particular continuity correction for the Z -score approximation in the CA and PETO tests. The default is 0.

LOWERTAILED

LOWER

is used to make all tests lower-tailed. All tests are two-tailed by default.

PERMUTATION= number

PERM= number

specifies that p -values for the CA and PETO tests be computed using exact permutation distributions when marginal success or failure totals within a stratum are number or less. For values greater than number (or when the PERMUTATION= option is omitted), PROC MULTTEST uses standard normal approximations with a continuity correction chosen to approximate the permutation distribution. PROC MULTTEST computes the appropriate convolution distributions when you use the STRATA statement along with the PERMUTATION= option.

TIME= variable

identifies the PETO test variable containing the age at death, which is assumed to be integer valued. If the TIME= option is omitted, all ages are assumed to equal 1.

UPPERTAILED

UPPER

is used to make all tests upper-tailed. All tests are two-tailed by default.