Estimability | SAS/STAT 9.1 Users Guide, Volumes 1-7

For linear models such as

with E ( Y )= X ² , a primary analytical goal is to estimate or test for the significance of certain linear combinations of the elements of ² . This is accomplished by computing linear combinations of the observed Y s. An unbiased linear estimate of a specific linear function of the individual ² s, say L ² , is a linear combination of the Y s that has an expected value of L ² . Hence, the following definition:

A linear combination of the parameters L ² is estimable if and only if a linear combination of the Y s exists that has expected value L ² .

Any linear combination of the Y s, for instance KY , will have expectation E ( KY )= KX ² . Thus, the expected value of any linear combination of the Y s is equal to that same linear combination of the rows of X multiplied by ² . Therefore,

L ² is estimable if and only if there is a linear combination of the rows of X that is equal to L ”that is, if and only if there is a K such that L = KX .

Thus, the rows of X form a generating set from which any estimable L can be constructed . Since the row space of X is the same as the row space of X ² X , the rows of X ² X also form a generating set from which all estimable L s can be constructed. Similarly, the rows of ( X ² X ) ˆ’ X ² X also form a generating set for L .

Therefore, if L can be written as a linear combination of the rows of X , X ² X , or ( X ² X ) ˆ’ X ² X , then L ² is estimable.

Once an estimable L has been formed , L ² can be estimated by computing Lb , where b =( X ² X ) ˆ’ X ² Y . From the general theory of linear models, the unbiased estimator Lb is, infact, the best linear unbiased estimator of L ² in the sense of having minimum variance as well as maximum likelihood when the residuals are normal. To test the hypothesis that L ² =0, compute SS ( H : L ² =0)= ( Lb ) ² ( L ( X ² X ) ˆ’ L ² ) ^{ˆ’ 1} Lb and form an F test using the appropriate error term .

General Form of an Estimable Function

This section demonstrates a shorthand technique for displaying the generating set for any estimable L . Suppose

X is a generating set for L , but so is the smaller set

X * is formed from X by deleting duplicate rows.

Since all estimable L s must be linear functions of the rows of X * for L ² to be estimable, an L for a single-degree-of-freedom estimate can be represented symbolically as

For this example, L ² is estimable if and only if the first element of L is equal to the sum of the other elements of L or if

is estimable for any values of L 1, L 2, and L 3.

If other generating sets for L are represented symbolically, the symbolic notation looks different. However, the inherent nature of the rules is the same. For example, if row operations are performed on X * to produce an identity matrix in the first 3 —3 submatrix of the resulting matrix

then X ** is also a generating set for L . An estimable L generated from X ** can be represented symbolically as

Note that, again, the first element of L is equal to the sum of the other elements.

With multiple generating sets available, the question arises as to which one is the best to represent L symbolically. Clearly, a generating set containing a minimum of rows (of full row rank) and a maximum of zero elements is desirable. The generalized inverse of X ² X computed by the GLM procedure has the property that ( X ² X ) ˆ’ X ² X usually contains numerous zeros. For this reason, PROC GLM uses the nonzero rows of ( X ² X ) ˆ’ X ² X to represent L symbolically.

If the generating set represented symbolically is of full row rank, the number of symbols ( L 1 , L 2 , ) represents the maximum rank of any testable hypothesis (in other words, the maximum number of linearly independent rows for any L matrix that can be constructed). By letting each symbol in turn take on the value of 1 while the others are set to 0, the original generating set can be reconstructed.

Introduction to Reduction Notation

Reduction notation can be used to represent differences in Sums of Squares for two models. The notation R ( µ, A, B, C ) denotes the complete main effects model for effects A , B , and C . The notation

denotes the difference between the model SS for the complete main effects model containing A , B , and C and the model SS for the reduced model containing only B and C .

In other words, this notation represents the differences in Model SS produced by

  proc glm;   classabc;   model y=a b c;   run;

and

  proc glm;   class b c;   model y=b c;   run;

As another example, consider a regression equation with four independent variables . The notation R ( ² ₃ , ² ₄ ² ₁ , ² ₂ ) denotes the differences in Model SS between

and

With PROC REG, this is the difference in Model SS for the models produced by

  model y=x1 x2 x3 x4;

and

  model y=x1 x2;

Examples

A One-Way Classification Model

For the model

the general form of estimable functions Lb is (from the previous example)

Thus,

Tests involving only the parameters A ₁ , A ₂ , and A ₃ must have an L of the form

Since the preceding L involves only two symbols, hypotheses with at most two degrees-of-freedom can be constructed. For example, let L 2=1and L 3=0; then let L 2=0and L 3=1:

The preceding L can be used to test the hypothesis that A ₁ = A ₂ = A ₃ . For this example, any L with two linearly independent rows with column 1 equal to zero produces the same Sum of Squares. For example, a pooled linear quadratic

gives the same SS. In fact, for any L of full row rank and any nonsingular matrix K of conformable dimensions,

A Three-Factor Main Effects Model

Consider a three-factor main effects model involving the CLASS variables A , B , and C , as shown in Table 11.1.

Table 11.1: Three-Factor Main Effects Model
Obs	A	B	C
1	1	2	1
2	1	1	2
3	2	1	3
4	2	2	2
5	2	2	2

The general form of an estimable function is shown in Table 11.2.

Table 11.2: General Form of an Estimable Function for Three-Factor Main Effects Model
Parameter	Coefficient
µ (Intercept)	L 1
A 1	L 2
A 2	L 1 ˆ’ L 2
B 1	L 4
B 2	L 1 ˆ’ L 4
C 1	L 6
C 2	L 1 + L 2 ˆ’ L 4 ˆ’ 2 — L 6
C 3	ˆ’ L 2 + L 4 + L 6

Since only four symbols ( L 1, L 2, L 4, and L 6) are involved, any testable hypothesis will have at most four degrees of freedom. If you form an L matrix with four linearly independent rows according to the preceding rules, then

In a main effects model, the usual hypothesis of interest for a main effect is the equality of all the parameters. In this example, it is not possible to test such a hypothesis because of confounding. One way to proceed is to construct a maximum rank hypothesis (MRH) involving only the parameters of the main effect in question. This can be done using the general form of estimable functions. Note the following:

To get an MRH involving only the parameters of A , the coefficients of L associated with µ , B 1, B 2, C 1, C 2, and C 3 must be equated to zero. Starting at the top of the general form, let L 1=0, then L 4=0, then L 6 = 0. If C 2 and C 3 are not to be involved, then L 2 must also be zero. Thus, A 1 ˆ’ A 2 is not estimable; that is, the MRH involving only the A parameters has zero rank and R ( A µ, B, C )=0.
To obtain the MRH involving only the B parameters, let L 1= L 2= L 6=0. But then to remove C 2 and C 3 from the comparison, L 4 must also be set to 0. Thus, B 1 ˆ’ B 2 is not estimable and R ( B µ, A, C )=0.
To obtain the MRH involving only the C parameters, let L 1= L 2= L 4=0. Thus, the MRH involving only C parameters is

or any multiple of the left-hand side equal to K . Furthermore,

A Multiple Regression Model

Suppose

If the X ² X matrix is of full rank, the general form of estimable functions is as shown in Table 11.3.

Table 11.3: General Form of Estimable Functions for a Multiple Regression Model When X ² X Matrix Is of Full Rank
Parameter	Coefficient
²	L 1
² ₁	L 2
² ₂	L 3
² ₃	L 4

To test, for example, the hypothesis that ² ₂ = 0, let L 1 = L 2 = L 4 = 0 and let L 3=1. Then SS( L ² =0)= R ( ² ₂ ² , ² ₁ , ² ₃ ). In the full-rank case, all parameters, as well as any linear combination of parameters, are estimable.

Suppose, however, that X 3 = 2 — X 1 + 3 — X 2. The general form of estimable functions is shown in Table 11.4.

Table 11.4: General Form of Estimable Functions for a Multiple Regression Model When X ² X Matrix Is Not of Full Rank
Parameter	Coefficient
²	L 1
² ₁	L 2
² ₂	L 3
² ₃	2 — L 2 + 3 — L 3

For this example, it is possible to test H : ² =0.However, ² ₁ , ² ₂ , and ² ₃ are not jointly estimable; that is,

Using Symbolic Notation

The preceding examples demonstrate the ability to manipulate the symbolic representation of a generating set. Note that any operations performed on the symbolic notation have corresponding row operations that are performed on the generating set itself.