In statistical hypothesis testing, you typically express the belief that some effect exists in a population by specifying an alternative hypothesis H 1 . You state a null hypothesis H as the assertion that the effect does not exist and attempt to gather evidence to reject H in favor of H 1 . Evidence is gathered in the form of sample data, and a statistical test is used to assess H . If H is rejected but there really is no effect, this is called a Type 1 error . The probability of a Type 1 error is usually designated alpha or ± , and statistical tests are designed to ensure that ± is suitably small (for example, less than 0.05).
If there really is an effect in the population but H is not rejected in the statistical test, then a Type 2 error has been made. The probability of a Type 2 error is usually designated beta or ² . The probability 1 ˆ’ ² of avoiding a Type 2 error, that is, correctly rejecting H and achieving statistical significance, is called the power . ( Note: Another more general definition of power is the probability of rejecting H for any given set of circumstances, even those corresponding to H being true. The POWER procedure uses this more general definition.)
An important goal in study planning is to ensure an acceptably high level of power. Sample size plays a prominent role in power computations because the focus is often on determining a sufficient sample size to achieve a certain power, or assessing the power for a range of different sample sizes.
Some of the analyses in the POWER procedure focus on precision rather than power. An analysis of confidence interval precision is analogous to a traditional power analysis, with CI Half-Width taking the place of effect size and Prob(Width) taking the place of power. The CI Half-Width is the margin of error associated with the confidence interval, the distance between the point estimate and an endpoint. The Prob(Width) is the probability of obtaining a confidence interval with at most a target half-width.
Table 57.23 gives a summary of the analyses supported in the POWER procedure. The name of the analysis statement reflects the type of data and design. The TEST=, CI=, and DIST= options specify the focus of the statistical hypothesis (in other words, the criterion on which the research question is based) and the test statistic to be used in data analysis.
Statement | Options | |
---|---|---|
Multiple linear regression: | MULTREG | |
Type III F test | ||
Correlation: Fisher s z test | ONECORR | DIST=FISHERZ |
Correlation: t test | ONECORR | DIST=T |
Binomial proportion: Exact test | ONESAMPLEFREQ | TEST=EXACT |
Binomial proportion: z test | ONESAMPLEFREQ | TEST=Z |
Binomial proportion: z test with continuity adjustment | ONESAMPLEFREQ | TEST=ADJZ |
One-sample t test | ONESAMPLEMEANS | TEST=T |
One-sample t test with lognormal data | ONESAMPLEMEANS | TEST=T DIST=LOGNORMAL |
One-sample equivalence test for mean of normal data | ONESAMPLEMEANS | TEST=EQUIV |
One-sample equivalence test for mean of lognormal data | ONESAMPLEMEANS | TEST=EQUIV DIST=LOGNORMAL |
Confidence interval for a mean | ONESAMPLEMEANS | CI=T |
One-way ANOVA: One-degree-of-freedom contrast | ONEWAYANOVA | TEST=CONTRAST |
One-way ANOVA: Overall F test | ONEWAYANOVA | TEST=OVERALL |
McNemar exact conditional test | PAIREDFREQ | |
McNemar normal approximation test | PAIREDFREQ | DIST=NORMAL |
Paired t test | PAIREDMEANS | TEST=DIFF |
Paired t test of mean ratio with lognormal data | PAIREDMEANS | TEST=RATIO |
Paired additive equivalence of mean difference with normal data | PAIREDMEANS | TEST=EQUIV_ DIFF |
Paired multiplicative equivalence of mean ratio with lognormal data | PAIREDMEANS | TEST=EQUIV_ RATIO |
Confidence interval for mean of paired differences | PAIREDMEANS | CI=DIFF |
Pearson chi-square test for two independent proportions | TWOSAMPLEFREQ | TEST=PCHI |
Fisher s exact test for two independent proportions | TWOSAMPLEFREQ | TEST=FISHER |
Likelihood ratio chi-square test for two independent proportions | TWOSAMPLEFREQ | TEST=LRCHI |
Two-sample t test assuming equal variances | TWOSAMPLEMEANS | TEST=DIFF |
Two-sample Satterthwaite t test assuming unequal variances | TWOSAMPLEMEANS | TEST=DIFF_ SATT |
Two-sample pooled t test of mean ratio with lognormal data | TWOSAMPLEMEANS | TEST=RATIO |
Two-sample additive equivalence of mean difference with normal data | TWOSAMPLEMEANS | TEST=EQUIV_ DIFF |
Two-sample multiplicative equivalence of mean ratio with lognormal data | TWOSAMPLEMEANS | TEST=EQUIV_ RATIO |
Two-sample confidence interval for mean difference | TWOSAMPLEMEANS | CI=DIFF |
Log-rank test for comparing two survival curves | TWOSAMPLESURVIVALTEST=LOGRANK | |
Gehan rank test for comparing two survival curves | TWOSAMPLESURVIVALTEST=GEHAN | |
Tarone-Ware rank test for comparing two survival curves | TWOSAMPLESURVIVALTEST=TARONEWARE |
To specify one or more scenarios for an analysis parameter (or set of parameters), you provide a list of values for the statement option that corresponds to the parameter(s). To identify the parameter you wish to solve for, you place missing values in the appropriate list.
There are five basic types of such lists: keyword-lists , number-lists , grouped-number- lists , name-lists , and grouped-name-lists . Some parameters, such as the direction of a test, have values represented by one or more keywords in a keyword-list . Scenarios for scalar-valued parameters, such as power, are represented by a number-list . Scenarios for groups of scalar-valued parameters, such as group sample sizes in a multigroup design, are represented by a grouped-number-list . Scenarios for named parameters, such as reference survival curves, are represented by a name-list . Scenarios for groups of named parameters, such as group survival curves, are represented by a grouped- name-list .
The following subsections explain these five basic types of lists.
A keyword-list is a list of one or more keywords separated by spaces. For example, you can specify both 2-sided and upper-tailed versions of a one-sample t test:
SIDES = 2 U
A number-list can be one of two things: a series of one or more numbers expressed in the form of one or more DOLISTs, or a missing value indicator (.).
The DOLIST format is the same as in the DATA step language. For example, for the one-sample t test you can specify four scenarios (30, 50, 70, and 100) for a total sample size in any of the following ways.
NTOTAL = 30 50 70 100 NTOTAL = 30 to 70 by 20 100
A missing value identifies a parameter as the result parameter; it is valid only with options representing parameters you can solve for in a given analysis. For example, you can request a solution for NTOTAL:
NTOTAL = .
A grouped-number-list specifies multiple scenarios for numeric values in two or more groups, possibly including missing value indicators to solve for a specific group. The list can assume one of two general forms, a crossed version and a matched version.
The crossed version of a grouped number list consists of a series of number-lists (see the Number-lists section on page 3491), one representing each group, each separated by a vertical bar (). The values for each group represent multiple scenarios for that group, and the scenarios for each individual group are crossed to produce the set of all scenarios for the analysis option. For example, you can specify the following six scenarios for the sizes ( n 1 , n 2 ) of two groups
(20 , 30)(20 , 40)(20 , 50)
(25 , 30)(25 , 40)(25 , 50)
as follows :
GROUPNS = 20 25 30 40 50
If the analysis can solve for a value in one group given the other groups, then one of the number-lists in a crossed grouped-number-list can be a missing value indicator (.). For example, in a two-sample t test you can posit three scenarios for the group 2 sample size while solving for the group 1 sample size:
GROUPNS = . 30 40 50
Some analyses can involve more than two groups. For example, you can specify 2 — 3 — 1 = 6 scenarios for the means of three groups in a one-way ANOVA as follows:
GROUPMEANS = 10 12 10 to 20 by 5 24
The matched version of a grouped number list consists of a series of numeric lists each enclosed in parentheses. Each list consists of a value for each group and represents a single scenario for the analysis option. Multiple scenarios for the analysis option are represented by multiple lists. For example, you can express the crossed grouped-number-list
GROUPNS = 20 25 30 40 50
alternatively in a matched format:
GROUPNS = (20 30) (20 40) (20 50) (25 30) (25 40) (25 50)
The matched version is particularly useful when you wish to include only a subset of all combinations of individual group values. For example, you may want to pair 20 only with 50, and 25 only with 30 and 40:
GROUPNS = (20 50) (25 30) (25 40)
If the analysis can solve for a value in one group given the other groups, then you can replace the value for that group with a missing value indicator (.). If used, the missing value indicator must occur in the same group in every scenario. For example, you can solve for the group 1 sample size (as in the Crossed Grouped-number-lists section on page 3491) using a matched format:
GROUPNS = (. 30) (. 40) (. 50)
Some analyses can involve more than two groups. For example, you can specify two scenarios for the means of three groups in a one-way ANOVA:
GROUPMEANS = (15 24 32) (12 25 36)
A name-list is a list of one or more names in single or double quotes separated by spaces. For example, you can specify two scenarios for the reference survival curve in a log-rank test:
REFSURVIVAL = "Curve A" "Curve B"
A grouped-name-list specifies multiple scenarios for names in two or more groups. The list can assume one of two general forms, a crossed version and a matched version.
The crossed version of a grouped name list consists of a series of name-lists (see the Name-lists section on page 3492), one representing each group, each separated by a vertical bar (). The values for each group represent multiple scenarios for that group, and the scenarios for each individual group are crossed to produce the set of all scenarios for the analysis option. For example, you can specify the following six scenarios for the survival curves ( c 1 , c 2 ) of two groups
( Curve A , Curve C )( Curve A , Curve D )( Curve A , Curve E )
( Curve B , Curve C )( Curve B , Curve D )( Curve B , Curve E )
as follows:
GROUPSURVIVAL = "Curve A" "Curve B" "Curve C" "Curve D" "Curve E"
The matched version of a grouped name list consists of a series of name lists each enclosed in parentheses. Each list consists of a name for each group and represents a single scenario for the analysis option. Multiple scenarios for the analysis option are represented by multiple lists. For example, you can express the crossed grouped-name-list
GROUPSURVIVAL = "Curve A" "Curve B" "Curve C" "Curve D" "Curve E"
alternatively in a matched format:
GROUPSURVIVAL = ("Curve A" "Curve C") ("Curve A" "Curve D") ("Curve A" "Curve E") ("Curve B" "Curve C") ("Curve B" "Curve D") ("Curve B" "Curve E")
The matched version is particularly useful when you wish to include only a subset of all combinations of individual group values. For example, you may want to pair Curve A only with Curve C , and Curve B only with Curve D and Curve E :
GROUPSURVIVAL = ("Curve A" "Curve C") ("Curve B" "Curve D") ("Curve B" "Curve E")
By default, PROC POWER rounds sample sizes conservatively (down in the input, up in the output) so that all total sizes (and individual group sample sizes, if a multigroup design) are integers. This is generally considered conservative because it selects the closest realistic design providing at most the power of the (possibly fractional ) input or mathematically optimized design. In addition, in a multigroup design, all group sizes are adjusted to be multiples of the corresponding group weights. For example, if GROUPWEIGHTS = (2 6), then all group 1 sample sizes become multiples of 2, and all group 2 sample sizes become multiples of 6 (and all total sample sizes become multiples of 8).
With the NFRACTIONAL option, sample size input is not rounded, and sample size output (whether total or group-wise) are reported in two versions, a raw fractional version and a ceiling version rounded up to the nearest integer.
Whenever an input sample size is adjusted, both the original ( nominal ) and adjusted ( actual ) sample sizes are reported. Whenever computed output sample sizes are adjusted, both the original input ( nominal ) power and the achieved ( actual ) power at the adjusted sample size are reported.
The Error column in the main output table explains reasons for missing results and flags numerical results that are bounds rather than exact answers. For example, consider the sample size analysis implemented by the following statements:
proc power; twosamplefreq test=pchi oddsratio= 1.0001 refproportion=.4 nulloddsratio=1 power=.9 ntotal=.; run;
The output in Figure 57.6 reveals that the sample size to achieve a power of 0.9 could not be computed, but that the sample size 2.15E+09 achieves a power of 0.206.
The POWER Procedure Pearson Chi-square Test for Two Proportions Fixed Scenario Elements Distribution Asymptotic normal Method Normal approximation Null Odds Ratio 1 Reference (Group 1) Proportion 0.4 Odds Ratio 1.0001 Nominal Power 0.9 Number of Sides 2 Alpha 0.05 Group 1 Weight 1 Group 2 Weight 1 Computed N Total Actual Power N Total Error 0.206 2.15E+09 Solution is a lower bound
The Information column provides further details about Error entries, warnings about any boundary conditions detected , and notes about any adjustments to input. Note that the Information column is hidden by default in the main output. You can view it by using the ODS OUTPUT statement to save the output as a data set and the PRINT procedure. For example, the following SAS statements print both the Error and Info columns for a power computation in a two-sample t test.
proc power; twosamplemeans meandiff= 0 7 stddev=2 ntotal=2 5 power=.; ods output output=Power; proc print noobs data=Power; var MeanDiff NominalNTotal NTotal Power Error Info; run;
The output is shown in Figure 57.7.
Mean Nominal Diff NTotal NTotal Power Error Info 0 2 2 . Invalid input N too small / No effect 0 5 4 0.050 Input N adjusted / No effect 7 2 2 . Invalid input N too small 7 5 4 0.477 Input N adjusted
The mean difference of 0 specified with the MEANDIFF= option leads to a No effect message to appear in the Info column. The sample size of 2 specified with the NTOTAL= option leads to an Invalid input message in the Error column and an NTotal too small message in the Info column. The sample size of 5 leads to an Input N adjusted message in the Info column because it is rounded down to 4 to produce integer group sizes of 2 per group.
If you use the PLOTONLY option in the PROC POWER statement, the procedure only displays graphical output. Otherwise , the displayed output of the POWER procedure includes the following:
the Fixed Scenario Elements table, which shows all applicable single-valued analysis parameters, in the following order: distribution, method, parameters input explicitly, and parameters supplied with defaults
an output table showing the following when applicable (in order): the index of the scenario, all multivalued input, ancillary results, the primary computed result, and error descriptions
plots (if requested )
For each input parameter, the order of the input values is preserved in the output.
Ancillary results include the following:
Actual Power, the achieved power, if it differs from the input (Nominal) power value
Actual Prob(Width), the achieved precision probability, if it differs from the input (Nominal) probability value
Actual Alpha, the achieved significance level, if it differs from the input (Nominal) alpha value
fractional sample size, if the NFRACTIONAL option is used in the analysis statement
If sample size is the result parameter and the NFRACTIONAL option is used in the analysis statement, then both Fractional and Ceiling sample size results are displayed. Fractional sample sizes correspond to the Nominal values of power or precision probability. Ceiling sample sizes are simply the fractional sample sizes rounded up to the nearest integer; they correspond to Actual values of power or precision probability.
PROC POWER assigns a name to each table that it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 57.24. For more information on ODS, see Chapter 14, Using the Output Delivery System.
ODS Table Name | Description | Statement |
---|---|---|
FixedElements | factoid with single-valued analysis parameters | default [*] |
Output | all input and computed analysis parameters, error messages, and information messages for each scenario | default |
PlotContent | data contained in plots, including analysis parameters and indices identifying plot features. ( Note: this table is saved as a data set and not displayed in PROC POWER output.) | PLOT |
[*] Depends on input. |
The ODS path names are created as follows:
Power.<analysis statement name> < n > .FixedElements
Power.<analysis statement name> < n > .Output
Power.<analysis statement name> < n > .PlotContent
Power.<analysis statement name> < n > .Plot < m >
where
The Plot < m > objects are the graphs.
The < n > indexing the analysis statement name is only used if there is more than one instance.
The < n > indexing the plots increases with every panel in every plot statement, resetting to 1 only at new analysis statements.
In the TWOSAMPLESURVIVAL statement, the amount of required memory is roughly proportional to the product of the number of subintervals (specified by the NSUBINTERVAL= option) and the total time of the study (specified by the ACCRUALTIME=, FOLLOWUPTIME=, and TOTALTIME= options).
In the Satterthwaite t test analysis (TWOSAMPLEMEANS TEST=DIFF_ SATT), the required CPU time grows as the mean difference decreases relative to the standard deviations. In the PAIREDFREQ statement, the required CPU time for the exact power computation (METHOD=EXACT) grows with the sample size.
This section describes the approaches used in PROC POWER to compute power for each analysis. The first subsection defines some common notation. The following subsections describe the various power analyses, including discussions of the data, statistical test, and power formula for each analysis. Unless otherwise indicated, computed values for parameters besides power (for example, sample size) are obtained by solving power formulas for the desired parameters.
Table 57.25 displays notation for some of the more common parameters across analyses. The Associated Syntax column shows examples of relevant analysis statement options, where applicable.
Symbol | Description | Associated Syntax |
---|---|---|
± | significance level | ALPHA= |
N | total sample size | NTOTAL=, NPAIRS= |
n i | sample size in i th group | NPERGROUP=, GROUPNS= |
w i | allocation weight for i th group (standardized to sum to 1) | GROUPWEIGHTS= |
µ | (arithmetic) mean | MEAN= |
µ i | (arithmetic) mean in i th group | GROUPMEANS=, PAIREDMEANS= |
µ diff | (arithmetic) mean difference, µ 2 ˆ’ µ 1 or µ T ˆ’ µ R | MEANDIFF= |
µ | null mean or mean difference (arithmetic) | NULL=, NULLDIFF= |
³ | geometric mean | MEAN= |
³ i | geometric mean in i th group | GROUPMEANS=, PAIREDMEANS= |
³ | null mean or mean ratio (geometric) | NULL=, NULLRATIO= |
ƒ | standard deviation (or common standard deviation per group) | STDDEV= |
ƒ i | standard deviation in i th group | GROUPSTDDEVS=, PAIREDSTDDEVS= |
ƒ diff | standard deviation of differences | |
CV | coefficient of variation, defined as the ratio of the standard deviation to the (arithmetic) mean | CV=, PAIREDCVS= |
| correlation | CORR= |
µ T , µ R | treatment and reference (arithmetic) means for equivalence test | GROUPMEANS=, PAIREDMEANS= |
³ T , ³ R | treatment and reference geometric means for equivalence test | GROUPMEANS=, PAIREDMEANS= |
L | lower equivalence bound | LOWER= |
U | upper equivalence bound | UPPER= |
t ( ½ , ) | t distribution with d.f. ½ and noncentrality | |
F ( ½ 1 , ½ 2 , » ) | F distribution with numerator d.f. ½ 1 , denominator d.f. ½ 2 , and noncentrality » | |
t p ; ½ | p th percentile of t distribution with d.f. ½ | |
F p ; ½ 1 , ½ 2 | p th percentile of F distribution with numerator d.f. ½ 1 and denominator d.f. ½ 2 | |
Bin( N, p ) | binomial distribution with sample size N and proportion p |
A lower 1-sided test is associated with SIDES=L (or SIDES=1 with the effect smaller than the null value), and an upper 1-sided test is associated with SIDES=U (or SIDES=1 with the effect larger than the null value).
Owen (1965) defines a function, known as Owen s Q , that is convenient for representing terms in power formulas for confidence intervals and equivalence tests:
where ( ·) and ( ·) are the density and cumulative distribution function of the standard normal distribution, respectively.
Maxwell (2000) discusses a number of different ways to represent effect sizes (and to compute exact power based on them) in multiple regression. PROC POWER supports two of these, multiple partial correlation and R 2 in full and reduced models.
Let p denote the total number of predictors in the full model (excluding the intercept) and Y the response variable. You are testing that the coefficients of p 1 ‰ 1 predictors in a set X 1 are 0, controlling for all of the other predictors X ˆ’ 1 , which is comprised of p ˆ’ p 1 ‰ variables .
The hypotheses can be expressed in two different ways. The first is in terms of , the multiple partial correlation between the predictors in X 1 and the response Y adjusting for the predictors in X ˆ’ 1 :
The second is in terms of the multiple correlations in full and reduced nested models:
Note that the squared values of and are the population R 2 values for full and reduced models.
The test statistic can be written in terms of the sample multiple partial correlation
or the sample multiple correlations in full and reduced models,
The test is the usual Type III F test in multiple regression:
Although the test is invariant to whether the predictors are assumed to be random or fixed, the power is affected by this assumption. If the response and predictors are assumed to have a joint multivariate normal distribution, then the exact power is given by the following formula:
The distribution of (for any ) is given in Chapter 32 of Johnson, Kotz, and Balakrishnan (1995). Sample size tables are presented in Gatsonis and Sampson (1989).
If the predictors are assumed to have fixed values, then the exact power is given by the noncentral F distribution. The noncentrality parameter is
or equivalently,
The power is
The minimum acceptable input value of N depends on several factors, as shown in Table 57.26.
Predictor Type | Intercept in Model? | p 1 = 1? | Minimum N |
---|---|---|---|
Random | Yes | Yes | p + 3 |
Random | Yes | No | p + 2 |
Random | No | Yes | p + 2 |
Random | No | No | p + 1 |
Fixed | Yes | Yes or No | p + 2 |
Fixed | No | Yes or No | p + 1 |
Fisher s z transformation (Fisher 1921) of the sample correlation is defined as
Fisher s z test assumes the approximate normal distribution N ( µ, ƒ 2 ) for z , where
and
where p * is the number of variables partialled out (Anderson 1984, pp. 132_133) and is the partial correlation between Y and X 1 adjusting for the set of zero or more variables X ˆ’ 1 .
The test statistic
is assumed to have a normal distribution N ( , ½ ) where is the null partial correlation and and ½ are derived from section 16.33 of Stuart and Ord (1994):
The approximate power is computed as
Because the test is biased , the achieved significance level may differ from the nominal significance level. The actual alpha is computed in the same way as the power except with the correlation replaced by the null correlation .
The 2-sided case is identical to multiple regression with an intercept and p 1 = 1, which is discussed in the Analyses in the MULTREG Statement section on page 3500.
Let p * denote the number of variables partialled out. For the 1-sided cases, the test statistic is
which is assumed to have a null distribution of t ( N ˆ’ 2 ˆ’ p * ).
If the X and Y variables are assumed to have a joint multivariate normal distribution, then the exact power is given by the following formula:
The distribution of (given the underlying true correlation is given in Chapter 32 of Johnson, Kotz, and Balakrishnan (1995).
If the X variables are assumed to have fixed values, then the exact power is given by the noncentral t distribution t ( N ˆ’ 2 ˆ’ p * , ), where the noncentrality is
The power is
Let X be distributed as Bin( N, p ). The hypotheses for the test of the proportion p are as follows:
The exact test assumes binomially distributed data and requires N ‰ 1 and 0 < p < 1. The test statistic is
The significance probability ± is split symmetrically for 2-sided tests, in the sense that each tail is filled with as much as possible up to ± / 2.
Exact power computations are based on the binomial distribution and computing formulas such as the following from Johnson and Kotz (1970, equation 3.20):
where ½ 1 = 2 C and ½ 2 = 2( N ˆ’ C + 1)
Let C L and C U denote lower and upper critical values, respectively. Let ± a denote the achieved (actual) significance level, which for 2-sided tests is the sum of the favorable major tail ( ± M ) and the opposite minor tail ( ± m ).
For the upper 1-sided case,
For the lower 1-sided case,
For the 2-sided case,
For the normal approximation test, the test statistic is
For the METHOD=EXACT option, the computations are the same as described in the Exact Test of a Binomial Proportion (TEST=EXACT) section on page 3504 except for the definitions of the critical values.
For the upper 1-sided case,
For the lower 1-sided case,
For the 2-sided case,
For the METHOD=NORMAL option, the test statistic Z ( X ) is assumed to have the normal distribution
The approximate power is computed as
The approximate sample size is computed in closed form for the 1-sided cases by inverting the power equation,
and by numerical inversion for the 2-sided case.
For the normal approximation test with continuity adjustment, the test statistic is (Pagano and Gauvreau 1993 p. 295):
For the METHOD=EXACT option, the computations are the same as described in the Exact Test of a Binomial Proportion (TEST=EXACT) section on page 3504 except for the definitions of the critical values.
For the upper 1-sided case,
For the lower 1-sided case,
For the 2-sided case,
For the METHOD=NORMAL option, the test statistic Z c ( X ) is assumed to have the normal distribution N ( µ, ƒ 2 ) where µ and ƒ 2 are derived as follows.
For convenience of notation, define
Then
and
The probabilities P ( X = Np ), P ( X < Np ), and P ( X > Np ) and the truncated expectations and are approximated by assuming the normal-approximate distribution of X , N ( Np, Np (1 ˆ’ p )). Letting ( ·) and ( ·) denote the standard normal PDF and CDF, respectively, and defining d as
the terms are computed as follows:
The mean and variance of Z c ( X ) are thus approximated by
and
The approximate power is computed as
The hypotheses for the one-sample t test are
The test assumes normally distributed data and requires N ‰ 2. The test statistics are
where x is the sample mean, s is the sample standard deviation, and
The test is
Exact power computations for t tests are discussed in O Brien and Muller (1993, section 8.2), although not specifically for the one-sample case. The power is based on the noncentral t and F distributions:
Solutions for N , ± , and are obtained by numerically inverting the power equation. Closed-form solutions for other parameters, in terms of , are as follows:
The lognormal case is handled by re- expressing the analysis equivalently as a normality-based test on the log-transformed data, using properties of the lognormal distribution as discussed in Johnson and Kotz (1970, chapter 14). The approaches in the One-sample t Test (TEST=T) section on page 3508 then apply.
In contrast to the usual t test on normal data, the hypotheses with lognormal data are defined in terms of geometric means rather than arithmetic means. This is because the transformation of a null arithmetic mean of lognormal data to the normal scale depends on the unknown coefficient of variation, resulting in an ill-defined hypothesis on the log-transformed data. Geometric means transform cleanly and are more natural for lognormal data.
The hypotheses for the one-sample t test with lognormal data are
Let µ * and ƒ * be the (arithmetic) mean and standard deviation of the normal distribution of the log-transformed data. The hypotheses can be rewritten as follows:
where µ * = log( ³ ).
The test assumes lognormally distributed data and requires N ‰ 2.
The power is
where
The hypotheses for the equivalence test are
The analysis is the two one-sided tests (TOST) procedure of Schuirmann (1987). The test assumes normally distributed data and requires N ‰ 2. Phillips (1990) derives an expression for the exact power assuming a two-sample balanced design; the results are easily adapted to a one-sample design:
where Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
The lognormal case is handled by re-expressing the analysis equivalently as a normality-based test on the log-transformed data, using properties of the lognormal distribution as discussed in Johnson and Kotz (1970, chapter 14). The approaches in the Equivalence Test for Mean of Normal Data (TEST=EQUIV DIST=NORMAL) section on page 3510 then apply.
In contrast to the additive equivalence test on normal data, the hypotheses with lognormal data are defined in terms of geometric means rather than arithmetic means. This is because the transformation of an arithmetic mean of lognormal data to the normal scale depends on the unknown coefficient of variation, resulting in an ill-defined hypothesis on the log-transformed data. Geometric means transform cleanly and are more natural for lognormal data.
The hypotheses for the equivalence test are
The analysis is the two one-sided tests (TOST) procedure of Schuirmann (1987) on the log-transformed data. The test assumes lognormally distributed data and requires N ‰ 2. Diletti, Hauschke, and Steinijans (1991) derive an expression for the exact power assuming a crossover design; the results are easily adapted to a one-sample design:
where
is the standard deviation of the log-transformed data, and Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
This analysis of precision applies to the standard t -based confidence interval:
where x is the sample mean and s is the sample standard deviation. The half-width is defined as the distance from the point estimate x to a finite endpoint,
A valid conference interval captures the true mean. The exact probability of obtaining at most the target confidence interval half-width h , unconditional or conditional on validity, is given by Beal (1989):
where
and Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
A quality confidence interval is both sufficiently narrow (half-width ‰ h ) and valid:
The hypotheses are
where G is the number of groups, { c 1 , , c G } are the contrast coefficients, and c is the null contrast value.
The test is the usual F test for a contrast in one-way ANOVA. It assumes normal data with common group variances and requires N ‰ G + 1 and n i ‰ 1.
O Brien and Muller (1993, section 8.2.3.2) give the exact power as
where
The hypotheses are
where G is the number of groups.
The test is the usual overall F test for equality of means in one-way ANOVA. It assumes normal data with common group variances and requires N ‰ G + 1 and n i ‰ 1.
O Brien and Muller (1993, section 8.2.3.1) give the exact power as
where the noncentrality is
and
Notation:
Case | ||||
---|---|---|---|---|
Failure | Success | |||
Control | Failure | n 00 | n 01 | n . |
Success | n 10 | n 11 | n 1 . | |
n . | n . 1 | N |
n 00 = #{control=failure, case=failure }
n 01 = #{control=failure, case=success }
n 10 = #{control=success, case=failure }
n 11 = #{control=success, case=success }
N = n 00 + n 01 + n 10 + n 11
n D = n 01 + n 10 ‰ # discordant pairs
ij = theoretical population value of » ij
1 · = 10 + 11
·1 = 01 + 11
OR = null odds ratio
All McNemar tests covered in PROC POWER are conditional , meaning that n D is assumed fixed at its observed value.
For the usual OR = 0, the hypotheses are
The test statistic for both tests covered in PROC POWER (DIST=EXACT_ COND and DIST=NORMAL) is the McNemar statistic Q M , which has the following form when OR = 0:
For the conditional McNemar tests, this is equivalent to the square of the Z ( X ) statistic for the test of a single proportion (normal approximation to binomial), where the proportion is the null is 0 . 5, and N is n D (see, e.g., Schork and Williams 1980):
This can be generalized to a custom null for , which is equivalent to specifying a custom odds ratio:
So, a conditional McNemar test (asymptotic or exact) with a custom null is equivalent to the test of a single proportion with a null value , with a sample size of n D :
which is equivalent to
The general form of the test statistic is thus
The two most common conditional McNemar tests assume either the exact conditional distribution of Q M (covered by the DIST=EXACT_ COND analysis) or a standard normal distribution for Q M (covered by the DIST=NORMAL analysis).
For DIST=EXACT_ COND, the power is calculated assuming that the test is conducted using the exact conditional distribution of Q M (conditional on n D ). The power is calculated by first computing the conditional power for each possible n D . The unconditional power is computed as a weighted average over all possible outcomes of n D :
where n D _ Bin( 01 + 10 , N ), and P (Reject p 1 = p n D ) is calculated using the exact method in the Exact Test of a Binomial Proportion (TEST=EXACT) section on page 3504.
The achieved significance level, reported as Actual Alpha in the analysis, is computed in the same way except using the actual alpha of the one-sample test in place of its power:
where ± * ( p 1 , p n D ) is the actual alpha calculated using the exact method in the Exact Test of a Binomial Proportion (TEST=EXACT) section on page 3504 with proportion p 1 , null p , and sample size n D .
For DIST=NORMAL, power is calculated assuming the test is conducted using the normal-approximate distribution of Q M (conditional on n D ).
For the METHOD=EXACT option, the power is calculated in the same way as described in the McNemar Exact Conditional Test (TEST=MCNEMAR DIST=EXACT_ COND) section on page 3516, except that P (Reject p 1 = p n D ) is calculated using the exact method in the z Test for Binomial Proportion (TEST=Z) section on page 3505. The achieved significance level is calculated in the same way as described at the end of the McNemar Exact Conditional Test (TEST=MCNEMAR DIST=EXACT_ COND) section on page 3516.
For the METHOD=MIETTINEN option, approximate sample size for the 1-sided cases is computed according to equation (5.6) in Miettinen (1968):
Approximate power for the 1-sided cases is computed by solving the sample size equation for power, and approximate power for the 2-sided case follows easily by summing the 1-sided powers each at ± / 2:
The 2-sided solution for N is obtained by numerically inverting the power equation.
In general, compared to METHOD=CONNOR, the METHOD=MIETTINEN approximation tends to be slightly more accurate but may be slightly anticonservative in the sense of underestimating sample size and overestimating power (Lachin 1992, p. 1250).
For the METHOD=CONNOR option, approximate sample size for the 1-sided cases is computed according to equation (3) in Connor (1987):
Approximate power for the 1-sided cases is computed by solving the sample size equation for power, and approximate power for the 2-sided case follows easily by summing the 1-sided powers each at ± / 2:
The 2-sided solution for N is obtained by numerically inverting the power equation.
In general, compared to METHOD=MIETTINEN, the METHOD=CONNOR approximation tends to be slightly less accurate but slightly conservative in the sense of overestimating sample size and underestimating power (Lachin 1992, p. 1250).
The hypotheses for the paired t test are
The test assumes normally distributed data and requires N ‰ 2. The test statistics are
where d and s d are the sample mean and standard deviation of the differences and
and
The test is
Exact power computations for t tests are given in O Brien and Muller (1993, section 8.2.2):
The lognormal case is handled by re-expressing the analysis equivalently as a normality-based test on the log-transformed data, using properties of the lognormal distribution as discussed in Johnson and Kotz (1970, chapter 14). The approaches in the Paired t Test (TEST=DIFF) section on page 3518 then apply.
In contrast to the usual t test on normal data, the hypotheses with lognormal data are defined in terms of geometric means rather than arithmetic means.
The hypotheses for the paired t test with lognormal pairs { Y 1 , Y 2 } are
Let and * be the (arithmetic) means, standard deviations, and correlation of the bivariate normal distribution of the log-transformed data {log Y 1 , log Y 2 }. The hypotheses can be rewritten as follows:
where
and CV 1 , CV 2 , and are the coefficients of variation and the correlation of the original untransformed pairs { Y 1 ,Y 2 }. The conversion from to * is shown in Jones and Miller (1966).
The test assumes lognormally distributed data and requires N ‰ 2. The power is
where
and
The hypotheses for the equivalence test are
The analysis is the two one-sided tests (TOST) procedure of Schuirmann (1987). The test assumes normally distributed data and requires N ‰ 2. Phillips (1990) derives an expression for the exact power assuming a two-sample balanced design; the results are easily adapted to a paired design:
where
and Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
The lognormal case is handled by re-expressing the analysis equivalently as a normality-based test on the log-transformed data, using properties of the lognormal distribution as discussed in Johnson and Kotz (1970, chapter 14). The approaches in the Additive Equivalence Test for Mean Difference with Normal Data (TEST=EQUIV_DIFF) section on page 3520 then apply.
In contrast to the additive equivalence test on normal data, the hypotheses with lognormal data are defined in terms of geometric means rather than arithmetic means.
The hypotheses for the equivalence test are
where 0 < L < U
The analysis is the two one-sided tests (TOST) procedure of Schuirmann (1987) on the log-transformed data. The test assumes lognormally distributed data and requires N ‰ 2. Diletti, Hauschke, and Steinijans (1991) derive an expression for the exact power assuming a crossover design; the results are easily adapted to a paired design:
where ƒ * is the standard deviation of the differences between the log-transformed pairs (in other words, the standard deviation of log( Y T ) ˆ’ log( Y R ), where Y T and Y R are observations from the treatment and reference, respectively), computed as
where CV R , CV T , and are the coefficients of variation and the correlation of the original untransformed pairs { Y T , Y R }, and Q .( · , ·; · , ·) is Owen s Q function. The conversion from to * is shown in Jones and Miller (1966), and Owen s Q function is definedinthe Common Notation section on page 3498.
This analysis of precision applies to the standard t -based confidence interval:
where d and s d are the sample mean and standard deviation of the differences. The half-width is defined as the distance from the point estimate d to a finite endpoint,
A valid conference interval captures the true mean difference. The exact probability of obtaining at most the target confidence interval half-width h , unconditional or conditional on validity, is given by Beal (1989):
where
and Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
A quality confidence interval is both sufficiently narrow (half-width ‰ h ) and valid:
Notation:
Group 2 | ||||
---|---|---|---|---|
Failure | Success | |||
Group 1 | Failure | x 1 | x 2 | m |
Success | n 1 ˆ’ x 1 | n 2 ˆ’ x 2 | N ˆ’ m | |
n 1 | n 2 | N |
x 1 = # successes in group 1
x 2 = # successes in group 2
m = x 1 + x 2 = total # successes
The hypotheses are
where p is constrained to be 0 for all but the unconditional Pearson chi-square test.
Internal calculations are performed in terms of p 1 , p 2 , and p . An input set consisting of OR , p 1 , and OR is transformed as follows:
An input set consisting of RR , p 1 ,and RR is transformed as follows:
Note that the transformation of either OR or RR to p is not unique. The chosen parameterization fixes the null value p 10 at the input value of p 1 .
The usual Pearson chi-square test is unconditional. The test statistic
is assumed to have a null distribution of N (0 , 1).
Sample size for the 1-sided cases is given by equation (4) in Fleiss, Tytun, and Ury (1980). One-sided power is computed as suggested by Diegert and Diegert (1981) by inverting the sample size formula. Power for the 2-sided case is computed by adding the lower-sided and upper-sided powers each with ± / 2, and sample size for the 2-sided case is obtained by numerically inverting the power formula. A custom null value p for the proportion difference p 2 ˆ’ p 1 is also supported.
For the 1-sided cases, a closed-form inversion of the power equation yield an approximate total sample size
For the 2-sided case, the solution for N is obtained by numerically inverting the power equation.
The usual likelihood ratio chi-square test is unconditional. The test statistic
is assumed to have a null distribution of N (0 , 1) and an alternative distribution of N ( , 1) where
The approximate power is
For the 1-sided cases, a closed-form inversion of the power equation yield an approximate total sample size
For the 2-sided case, the solution for N is obtained by numerically inverting the power equation.
Fisher s exact test is conditional on the observed total number of successes m .Power and sample size computations for the METHOD=WALTERS option are based on a test with similar power properties, the continuity-adjusted arcsine test. The test statistic
is assumed to have a null distribution of N (0 , 1) and an alternative distribution of N ( , 1) where
The approximate power for the 1-sided balanced case is given by Walters (1979) and is easily extended to the unbalanced and 2-sided cases:
The hypotheses for the two-sample t test are
The test assumes normally distributed data and common standard deviation per group, and it requires N ‰ 3, n 1 ‰ 1, and n 2 ‰ 1. The test statistics are
where x 1 and x 2 are the sample means and s p is the pooled standard deviation, and
The test is
Exact power computations for t tests are given in O Brien and Muller (1993, section 8.2.1):
Solutions for N , n 1 , n 2 , ± , and are obtained by numerically inverting the power equation. Closed-form solutions for other parameters, in terms of , are as follows:
Finally, here is a derivation of the solution for w 1 :
Solve the equation for w 1 (which requires the quadratic formula). Then determine the range of given w 1 :
This implies
The hypotheses for the two-sample Satterthwaite t test are
The test assumes normally distributed data and requires N ‰ 3, n 1 ‰ 1, and n 2 ‰ 1. The test statistics are
where x 1 and x 2 are the sample means and s 1 and s 2 are the sample standard deviations.
As DiSantostefano and Muller (1995, p. 585) state, the test is based on assuming that under H , F is distributed as F (1 , ½ ), where ½ is given by Satterthwaite s approximation (Satterthwaite 1946),
Since ½ is unknown, in practice it must be replaced by an estimate
So the test is
Exact solutions for power for the 2-sided and upper 1-sided cases are given in Moser, Stevens, and Watts (1989). The lower 1-sided case follows easily using symmetry. The equations are as follows:
where
The density f ( u ) is obtained from the fact that
The lognormal case is handled by re-expressing the analysis equivalently as a normality-based test on the log-transformed data, using properties of the lognormal distribution as discussed in Johnson and Kotz (1970, chapter 14). The approaches in the Two-sample t Test Assuming Equal Variances (TEST=DIFF) section on page 3526 then apply.
In contrast to the usual t test on normal data, the hypotheses with lognormal data are defined in terms of geometric means rather than arithmetic means. The test assumes equal coefficients of variation in the two groups.
The hypotheses for the two-sample t test with lognormal data are
Let and ƒ * be the (arithmetic) means and common standard deviation of the corresponding normal distributions of the log-transformed data. The hypotheses can be rewritten as follows:
where
The test assumes lognormally distributed data and requires N ‰ 3, n 1 ‰ 1, and n 2 ‰ 1.
The power is
where
The hypotheses for the equivalence test are
The analysis is the two one-sided tests (TOST) procedure of Schuirmann (1987). The test assumes normally distributed data and requires N ‰ 3, n 1 ‰ 1, and n 2 ‰ 1. Phillips (1990) derives an expression for the exact power assuming a balanced design; the results are easily adapted to an unbalanced design:
where Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
The lognormal case is handled by re-expressing the analysis equivalently as a normality-based test on the log-transformed data, using properties of the lognormal distribution as discussed in Johnson and Kotz (1970, chapter 14). The approaches in the Additive Equivalence Test for Mean Difference with Normal Data (TEST=EQUIV_DIFF) section on page 3530 then apply.
In contrast to the additive equivalence test on normal data, the hypotheses with lognormal data are defined in terms of geometric means rather than arithmetic means.
The hypotheses for the equivalence test are
where 0 < L < U
The analysis is the two one-sided tests (TOST) procedure of Schuirmann (1987) on the log-transformed data. The test assumes lognormally distributed data and requires N ‰ 3, n 1 ‰ 1, and n 2 ‰ 1. Diletti, Hauschke, and Steinijans (1991) derive an expression for the exact power assuming a crossover design; the results are easily adapted to an unbalanced two-sample design:
where
is the (assumed common) standard deviation of the normal distribution of the logtransformed data, and Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
This analysis of precision applies to the standard t -based confidence interval:
where x 1 and x 2 are the sample means and s p is the pooled standard deviation. The half-width is defined as the distance from the point estimate x 2 ˆ’ x 1 to a finite endpoint,
A valid conference interval captures the true mean. The exact probability of obtaining at most the target confidence interval half-width h , unconditional or conditional on validity, is given by Beal (1989):
where
and Q .( · , ·; · , ·) is Owen s Q function, defined in the Common Notation section on page 3498.
A quality confidence interval is both sufficiently narrow (half-width ‰ h ) and valid:
The method is from Lakatos (1988) and Cantor (1997, pp. 83_92).
Define the following notation:
X j ( i )= i th input time point on survival curve for group j
S j ( i )=input survivor function value corresponding to X j ( i )
h j ( t )= hazard rate for group j at time t
j ( t )=loss hazard rate for group j at time t
» j = exponential hazard rate for group j
R = hazard ratio of group 2 to group 1 ‰ (assumed constant) value of
m j = median survival time for group j
b = number of subintervals per time unit
T = accrual time
= post-accrual follow-up time
L j = exponential loss rate for group j
XL j = input time point on loss curve for group j
SL j = input survivor function value corresponding to XL j
mL j = median survival time for group j
r i = rank for i th time point
Each survival curve can be specified in one of several ways.
For exponential curves:
a single point ( X j (1) ,S j (1)) on the curve
median survival time
hazard rate
hazard ratio (for curve 2, with respect to curve 1)
For piecewise linear curves with proportional hazards:
a set of points {( X 1 (1) , S 1 (1)) , ( X 1 (2) , S 1 (2)) , } (for curve 1)
hazard ratio (for curve 2, with respect to curve 1)
For arbitrary piecewise linear curves:
a set of points {( X j (1) , S j (1)) , ( X j (2) , S j (2)) , }
A total of M evenly spaced time points{ t =0 , t 1 , t 2 , ,t M = T + } are used in calculations, where
The hazard function is calculated for each survival curve at each time point. For an exponential curve, the (constant) hazard is given by one of the following, depending on the input parameterization:
For a piecewise linear curve, define the following additional notation:
The hazard is computed using linear interpolation as follows:
With proportional hazards, the hazard rate of group 2 s curve in terms of the hazard rate of group 1 s curve is
Hazard function values { j ( t i )} for the loss curves are computed in an analogous way from { L j , XL j , SL j , mL j }.
The expected number at risk N j ( i ) at time i in group j is calculated for each group and time points 0 through M ˆ’ 1, as follows:
Define i as the ratio of hazards and i as the ratio of expected numbers at risk for time t i :
The expected number of deaths in each subinterval is calculated as follows:
The rank values are calculated as follows according to which test statistic is used:
The distribution of the test statistic is approximated by N ( E, 1) where
Note that N 1/2 can be factored out of the mean E , and so it can be expressed equivalently as
where E * is free of N and
The approximate power is
Note that the upper and lower 1-sided cases are expressed differently than in other analyses. This is because E * > 0 corresponds to a higher survival curve in group 1 and thus, by the convention used in PROC power for 2-group analyses, the lower side.
For the 1-sided cases, a closed-form inversion of the power equation yield an approximate total sample size
For the 2-sided case, the solution for N is obtained by numerically inverting the power equation.