Estimable Functions


Estimable Functions

Type I SS and Estimable Functions

The Type I SS and the associated hypotheses they test are by-products of the modified sweep operator used to compute a generalized inverse of X ² X and a solution to the normal equations. For the model E ( Y ) = X 1 — B 1 + X 2 — B 2 + X 3 — B 3, the Type I SS for each effect correspond to

Effect

Type I SS

B 1

R ( B 1)

B 2

R ( B 2 B 1)

B 3

R ( B 3 B 1 ,B 2)

The Type I SS are model-order dependent ; each effect is adjusted only for the preceding effects in the model.

There are numerous ways to obtain a Type I hypothesis matrix L for each effect. One way is to form the X ² X matrix and then reduce X ² X to an upper triangular matrix by row operations, skipping over any rows with a zero diagonal. The nonzero rows of the resulting matrix associated with X 1 provide an L such that

click to expand

The nonzero rows of the resulting matrix associated with X 2 provide an L such that

click to expand

The last set of nonzero rows (associated with X 3) provide an L such that

click to expand

Another more formalized representation of Type I generating sets for B 1, B 2, and B 3, respectively, is

click to expand

where

click to expand

and

click to expand

Using the Type I generating set G 2 (for example), if an L is formed from linear combinations of the rows of G 2 such that L is of full row rank and of the same row rank as G 2 , then SS( H : L ² =0)= R ( B 2 B 1).

In the GLM procedure, the Type I estimable functions displayed symbolically when the E1 option is requested are

click to expand

As can be seen from the nature of the generating sets G 1 , G 2 , and G 3 , only the Type I estimable functions for B 3 are guaranteed not to involve the B 1 and B 2 parameters. The Type I hypothesis for B 2 can (and usually does) involve B 3 parameters. The Type I hypothesis for B 1 usually involves B 2 and B 3 parameters.

There are, however, a number of models for which the Type I hypotheses are considered appropriate. These are

  • balanced ANOVA models specified in proper sequence (that is, interactions do not precede main effects in the MODEL statement and so forth)

  • purely nested models (specified in the proper sequence)

  • polynomial regression models (in the proper sequence).

Type II SS and Estimable Functions

For main effects models and regression models, the general form of estimable functions can be manipulated to provide tests of hypotheses involving only the parameters of the effect in question. The same result can also be obtained by entering each effect in turn as the last effect in the model and obtaining the Type I SS for that effect. These are the Type II SS . Using a modified reversible sweep operator, it is possible to obtain the Type II SS without actually rerunning the model.

Thus, the Type II SS correspond to the R notation in which each effect is adjusted for all other effects possible . For a regression model such as

click to expand

the Type II SS correspond to

Effect

Type II SS

B 1

R ( B 1 B 2 ,B 3)

B 2

R ( B 2 B 1 ,B 3)

B 3

R ( B 3 B 1 ,B 2)

For a main effects model ( A , B , and C as classification variables ), the Type II SS correspond to

Effect

Type II SS

A

R ( A B,C )

B

R ( B A,C )

C

R ( C A,B )

As the discussion in the section A Three-Factor Main Effects Model on page 177 indicates, for regression and main effects models the Type II SS provide an MRH for each effect that does not involve the parameters of the other effects.

For models involving interactions and nested effects, in the absence of a priori parametric restrictions, it is not possible to obtain a test of a hypothesis for a main effect free of parameters of higher-level effects with which the main effect is involved.

It is reasonable to assume, then, that any test of a hypothesis concerning an effect should involve the parameters of that effect and only those other parameters with which that effect is involved.

Contained Effect

Given two effects F 1 and F 2, F 1 is said to be contained in F 2 provided that

  • both effects involve the same continuous variables (if any)

  • F 2 has more CLASS variables than does F 1, and if F 1 has CLASS variables, they all appear in F 2

Note that the interaction effect µ is contained in all pure CLASS effects, but it is not contained in any effect involving a continuous variable. No effect is contained by µ .

Type II, Type III, and Type IV estimable functions rely on this definition, and they all have one thing in common: the estimable functions involving an effect F 1 also involve the parameters of all effects that contain F 1, and they do not involve the parameters of effects that do not contain F 1 (other than F 1).

Hypothesis Matrix for Type II Estimable Functions

The Type II estimable functions for an effect F 1 have an L (before reduction to full row rank) of the following form:

  • All columns of L associated with effects not containing F 1 (except F 1)are zero.

  • The submatrix of L associated with effect F 1 is click to expand .

  • Each of the remaining submatrices of L associated with an effect F 2 that contains F 1 is click to expand .

In these submatrices,

  • X = the columns of X whose associated effects do not contain F 1.

  • X 1 = the columns of X associated with F 1.

  • X 2 = the columns of X associated with an F 2 effect that contains F 1.

  • M = click to expand .

For the model Y = ABA * B , the Type II SS correspond to

click to expand

for effects A , B , and A * B , respectively. For the model Y = A B ( A ) C ( A B ), the Type II SS correspond to

click to expand

for effects A , B ( A ) and C ( AB ), respectively. For the model Y = XX * X , the Type II SS correspond to

click to expand

for X and X * X , respectively.

Example of Type II Estimable Functions

For a 2 — 2 factorial with w observations per cell , the general form of estimable functions is shown in Table 11.5. Any nonzero values for L 2, L 4, and L 6 can be used to construct L vectors for computing the Type II SS for A , B , and A * B , respectively.

Table 11.5: General Form of Estimable Functions for 2 — 2 Factorial

Effect

Coefficient

µ

L 1

A 1

L 2

A 2

L 1 ˆ’ L 2

B 1

L 4

B 2

L 1 ˆ’ L 4

AB 11

L 6

AB 12

L 2 ˆ’ L 6

AB 21

L 4 ˆ’ L 6

AB 22

L 1 ˆ’ L 2 ˆ’ L 4+ L 6

For a balanced 2 — 2 factorial with the same number of observations in every cell, the Type II estimable functions are shown in Table 11.6.

Table 11.6: Type II Estimable Functions for Balanced 2 — 2 Factorial

Effect

Coefficients for Effect

A

B

A * B

µ

A 1

L 2

A 2

ˆ’ L 2

B 1

L 4

B 2

ˆ’ L 4

AB 11

. 5 * L 2

. 5 * L 4

L 6

AB 12

. 5 * L 2

ˆ’ . 5 * L 4

ˆ’ L 6

AB 21

ˆ’ . 5 * L 2

. 5 * L 4

ˆ’ L 6

AB 22

ˆ’ . 5 * L 2

ˆ’ . 5 * L 4

L 6

For an unbalanced 2 — 2 factorial (with two observations in every cell except the AB 22 cell, which contains only one observation), the general form of estimable functions is the same as if it were balanced since the same effects are still estimable. However, the Type II estimable functions for A and B are not the same as they were for the balanced design. The Type II estimable functions for this unbalanced 2 — 2 factorial are shown in Table 11.7.

Table 11.7: Type II Estimable Functions for Unbalanced 2 — 2 Factorial

Effect

Coefficients for Effect

A

B

A * B

µ

A 1

L 2

A 2

ˆ’ L 2

B 1

L 4

B 2

ˆ’ L 4

AB 11

. 6 * L 2

. 6 * L 4

L 6

AB 12

. 4 * L 2

ˆ’ . 6 * L 4

ˆ’ L 6

AB 21

ˆ’ . 6 * L 2

. 4 * L 4

ˆ’ L 6

AB 22

ˆ’ . 4 * L 2

ˆ’ . 4 * L 4

L 6

By comparing the hypothesis being tested in the balanced case to the hypothesis being tested in the unbalanced case for effects A and B , you can note that the Type II hypotheses for A and B are dependent on the cell frequencies in the design. For unbalanced designs in which the cell frequencies are not proportional to the background population, the Type II hypotheses for effects that are contained in other effects are of questionable merit.

However, if an effect is not contained in any other effect, the Type II hypothesis for that effect is an MRH that does not involve any parameters except those associated with the effect in question.

Thus, Type II SS are appropriate for

  • any balanced model

  • any main effects model

  • any pure regression model

  • an effect not contained in any other effect (regardless of the model)

In addition to the preceding, the Type II SS is generally accepted by most statisticians for purely nested models.

Type III and IV SS and Estimable Functions

When an effect is contained in another effect, the Type II hypotheses for that effect are dependent on the cell frequencies. The philosophy behind both the Type III and Type IV hypotheses is that the hypotheses tested for any given effect should be the same for all designs with the same general form of estimable functions.

To demonstrate this concept, recall the hypotheses being tested by the Type II SS in the balanced 2 — 2 factorial shown in Table 11.6. Those hypotheses are precisely the ones that the Type III and Type IV hypotheses employ for all 2 — 2 factorials that have at least one observation per cell. The Type III and Type IV hypotheses for a design without missing cells usually differ from the hypothesis employed for the same design with missing cells since the general form of estimable functions usually differs .

Type III Estimable Functions

Type III hypotheses are constructed by working directly with the general form of estimable functions. The following steps are used to construct a hypothesis for an effect F 1:

  1. For every effect in the model except F 1 and those effects that contain F 1, equate the coefficients in the general form of estimable functions to zero.

    If F 1 is not contained in any other effect, this step defines the Type III hypothesis (as well as the Type II and Type IV hypotheses). If F 1 is contained in other effects, go on to step 2. (See the section Type II SS and Estimable Functions on page 181 for a definition of when effect F 1 is contained in another effect.)

  2. If necessary, equate new symbols to compound expressions in the F 1 block in order to obtain the simplest form for the F 1 coefficients.

  3. Equate all symbolic coefficients outside of the F 1 block to a linear function of the symbols in the F 1 block in order to make the F 1 hypothesis orthogonal to hypotheses associated with effects that contain F 1.

By once again observing the Type II hypotheses being tested in the balanced 2 — 2 factorial, it is possible to verify that the A and A * B hypotheses are orthogonal and also that the B and A * B hypotheses are orthogonal. This principle of orthogonality between an effect and any effect that contains it holds for all balanced designs. Thus, construction of Type III hypotheses for any design is a logical extension of a process that is used for balanced designs.

The Type III hypotheses are precisely the hypotheses being tested by programs that reparameterize using the usual assumptions (for example, all parameters for an effect summing to zero). When no missing cells exist in a factorial model, Type III SS coincide with Yates weighted squares-of-means technique. When cells are missing in factorial models, the Type III SS coincide with those discussed in Harvey (1960) and Henderson (1953).

The following steps illustrate the construction of Type III estimable functions for a 2 — 2 factorial with no missing cells.

To obtain the A * B interaction hypothesis, start with the general form and equate the coefficients for effects µ , A , and B to zero, as shown in Table 11.8.

Table 11.8: Type III Hypothesis for A * B Interaction

Effect

General Form

L 1= L 2= L 4=0

µ

L 1

A 1

L 2

A 2

L 1 ˆ’ L 2

B 1

L 4

B 2

L 1 ˆ’ L 4

AB 11

L 6

L 6

AB 12

L 2 ˆ’ L 6

ˆ’ L 6

AB 21

L 4 ˆ’ L 6

ˆ’ L 6

AB 22

L 1 ˆ’ L 2 ˆ’ L 4+ L 6

L 6

The last column in Table 11.8 represents the form of the MRH for A * B .

To obtain the Type III hypothesis for A , first start with the general form and equate the coefficients for effects µ and B to zero (let L 1= L 4=0). Next let L 6= K * L 2, and find the value of K that makes the A hypothesis orthogonal to the A*B hypothesis. In this case, K=0.5. Each of these steps is shown in Table 11.9.

Table 11.9: Type III Hypothesis for A

Effect

General Form

L 1= L 4=0

L 6= K * L 2

K =0 . 5

µ

L 1

A 1

L 2

L 2

L 2

L 2

A 2

L 1 ˆ’ L 2

ˆ’ L 2

ˆ’ L 2

ˆ’ L 2

B 1

L 4

B 2

L 1 ˆ’ L 4

AB 11

L 6

L 6

K * L 2

. 5 * L 2

AB 12

L 2 ˆ’ L 6

L 2 ˆ’ L 6

(1 ˆ’ K ) * L 2

. 5 * L 2

AB 21

L 4 ˆ’ L 6

ˆ’ L 6

ˆ’ K * L 2

ˆ’ . 5 * L 2

AB 22

L 1 ˆ’ L 2 ˆ’ L 4 + L 6

ˆ’ L 2 + L 6

( K ˆ’ 1) * L 2

ˆ’ . 5 * L 2

In Table 11.9, the fourth column (under L 6= K * L 2) represents the form of all estimable functions not involving µ , B 1, or B 2. The prime difference between the Type II and Type III hypotheses for A is the way K is determined. Type II chooses K as a function of the cell frequencies, whereas Type III chooses K such that the estimable functions for A are orthogonal to the estimable functions for A * B .

An example of Type III estimable functions in a 3 — 3 factorial with unequal cell frequencies and missing diagonals is given in Table 11.10 ( N 1 through N 6 represent the nonzero cell frequencies).

 
Table 11.10: A 3 — 3 Factorial Design with Unequal Cell Frequencies and Missing Diagonals
     

B

 
   

1

2

3

 

1

 

N 1

N 2

A

2

N 3

 

N 4

 

3

N 5

N 6

 

For any nonzero values of N 1 through N 6 , the Type III estimable functions for each effect are shown in Table 11.11.

Table 11.11: Type III Estimable Functions for 3 — 3 Factorial Design with Unequal Cell Frequencies and Missing Diagonals

Effect

A

B

A * B

µ

A 1

L 2

A 2

L 3

A 3

ˆ’ L 2 ˆ’ L 3

B 1

L 5

B 2

L 6

B 3

ˆ’ L 5 ˆ’ L 6

AB 12

. 667 * L 2 + 0 . 333 * L 3

. 333 * L 5+0 . 667 * L 6

L 8

AB 13

. 333 * L 2 ˆ’ . 333 * L 3

ˆ’ . 333 * L 5 ˆ’ . 667 * L 6

ˆ’ L 8

AB 21

. 333 * L 2 + 0 . 667 * L 3

. 667 * L 5 + 0 . 333 * L 6

ˆ’ L 8

AB 23

ˆ’ . 333 * L 2 + 0 . 333 * L 3

ˆ’ . 667 * L 5 ˆ’ . 333 * L 6

L 8

AB 31

ˆ’ . 333 * L 2 ˆ’ . 667 * L 3

. 333 * L 5 ˆ’ . 333 * L 6

L 8

AB 32

ˆ’ . 667 * L 2 ˆ’ . 333 * L 3

ˆ’ . 333 * L 5+0 . 333 * L 6

ˆ’ L 8

Type IV Estimable Functions

By once again looking at the Type II hypotheses being tested in the balanced 2 — 2 factorial (see Table 11.6), you can see another characteristic of the hypotheses employed for balanced designs: the coefficients of lower-order effects are averaged across each higher-level effect involving the same subscripts. For example, in the A hypothesis, the coefficients of AB 11 and AB 12 are equal to one-half the coefficient of A 1,and the coefficients of AB 21 and AB 22 are equal to one-half the coefficient of A 2. With this in mind then, the basic concept used to construct Type IV hypotheses is that the coefficients of any effect, say F 1, are distributed equitably across higher-level effects that contain F 1. When missing cells occur, this same general philosophy is adhered to, but care must be taken in the way the distributive concept is applied.

Construction of Type IV hypotheses begins as does the construction of the Type III hypotheses. That is, for an effect F 1, equate to zero all coefficients in the general form that do not belong to F 1 or to any other effect containing F 1. If F 1 is not contained in any other effect, then the Type IV hypothesis (and Type II and III) has been found. If F 1 is contained in other effects, then simplify, if necessary, the coefficients associated with F 1 so that they are all free coefficients or functions of other free coefficients in the F 1 block.

To illustrate the method of resolving the free coefficients outside of the F 1 block, suppose that you are interested in the estimable functions for an effect A and that A is contained in AB , AC , and ABC . (In other words, the main effects in the model are A , B , and C .)

With missing cells, the coefficients of intermediate effects (here they are AB and AC ) do not always have an equal distribution of the lower-order coefficients, so the coefficients of the highest-order effects are determined first (here it is ABC ). Once the highest-order coefficients are determined, the coefficients of intermediate effects are automatically determined.

The following process is performed for each free coefficient of A in turn. The resulting symbolic vectors are then added together to give the Type IV estimable functions for A .

  1. Select a free coefficient of A , and set all other free coefficients of A to zero.

  2. If any of the levels of A have zero as a coefficient, equate all of the coefficients of higher-level effects involving that level of A to zero. This step alone usually resolves most of the free coefficients remaining.

  3. Check to see if any higher-level coefficients are now zero when the coefficient of the associated level of A is not zero. If this situation occurs, the Type IV estimable functions for A are not unique.

  4. For each level of A in turn, if the A coefficient for that level is nonzero, count the number of times that level occurs in the higher-level effect. Then equate each of the higher-level coefficients to the coefficient of that level of A divided by the count.

An example of a 3 — 3 factorial with four missing cells ( N 1 through N 5 represent positive cell frequencies) is shown in Table 11.12.

 
Table 11.12: 3 — 3 Factorial Design with Four Missing Cells
     

B

 
   

1

2

3

 

1

N 1

N 2

 

A

2

N 3

N 4

 
 

3

   

N 5

The Type IV estimable functions are shown in Table 11.13.

Table 11.13: Type IV Estimable Functions for 3 — 3 Factorial Design with Four Missing Cells

Effect

A

B

A * B

µ

A 1

ˆ’ L 3

A 2

L 3

A 3

B 1

L 5

B 2

ˆ’ L 5

B 3

AB 11

ˆ’ . 5 * L 3

. 5 * L 5

L 8

AB 12

ˆ’ . 5 * L 3

ˆ’ . 5 * L 5

ˆ’ L 8

AB 21

. 5 * L 3

. 5 * L 5

ˆ’ L 8

AB 22

. 5 * L 3

ˆ’ . 5 * L 5

L 8

AB 33

A Comparison of Type III and Type IV Hypotheses

For the vast majority of designs, Type III and Type IV hypotheses for a given effect are the same. Specifically, they are the same for any effect F 1 that is not contained in other effects for any design (with or without missing cells). For factorial designs with no missing cells, the Type III and Type IV hypotheses coincide for all effects. When there are missing cells, the hypotheses can differ. By using the GLM procedure, you can study the differences in the hypotheses and then decide on the appropriateness of the hypotheses for a particular model.

The Type III hypotheses for three-factor and higher completely nested designs with unequal N s in the lowest level differ from the Type II hypotheses; however, the Type IV hypotheses do correspond to the Type II hypotheses in this case.

When missing cells occur in a design, the Type IV hypotheses may not be unique. If this occurs in PROC GLM, you are notified, and you may need to consider defining your own specific comparisons.




SAS.STAT 9.1 Users Guide (Vol. 1)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net