In an example from Ries and Smith (1963), the choice of detergent brand ( Brand = M or X) is related to three other categorical variables : the softness of the laundry water ( Softness = soft, medium, or hard), the temperature of the water ( Temperature = high or low), and whether the subject was a previous user of Brand M ( Previous = yes or no). The linear response function, which could also be specified as RESPONSE MARGINALS, yields one probability, Pr(brand preference=M), as the response function to be analyzed . Two models are fit in this example: the first model is a saturated one, containing all of the main effects and interactions, while the second is a reduced model containing only the main effects. The following statements produce Output 22.1.1 through Output 22.1.4:
data detergent; input Softness $ Brand $ Previous $ Temperature $ Count @@; datalines; soft X yes high 19 soft X yes low 57 soft X no high 29 soft X no low 63 soft M yes high 29 soft M yes low 49 soft M no high 27 soft M no low 53 med X yes high 23 med X yes low 47 med X no high 33 med X no low 66 med M yes high 47 med M yes low 55 med M no high 23 med M no low 50 hard X yes high 24 hard X yes low 37 hard X no high 42 hard X no low 68 hard M yes high 43 hard M yes low 52 hard M no high 30 hard M no low 42 ; title 'Detergent Preference Study'; proc catmod data=detergent; response 1 0; weight Count; model Brand=SoftnessPreviousTemperature / freq prob; title2 'Saturated Model'; run;
Detergent Preference Study Saturated Model The CATMOD Procedure Data Summary Response Brand Response Levels 2 Weight Variable Count Populations 12 Data Set DETERGENT Total Frequency 1008 Frequency Missing 0 Observations 24
Detergent Preference Study Saturated Model Population Profiles Sample Softness Previous Temperature Sample Size ------------------------------------------------------------ 1 hard no high 72 2 hard no low 110 3 hard yes high 67 4 hard yes low 89 5 med no high 56 6 med no low 116 7 med yes high 70 8 med yes low 102 9 soft no high 56 10 soft no low 116 11 soft yes high 48 12 soft yes low 106
Detergent Preference Study Saturated Model Response Profiles Response Brand ----------------- 1 M 2 X Response Frequencies Response Number Sample 1 2 ------------------------ 1 30 42 2 42 68 3 43 24 4 52 37 5 23 33 6 50 66 7 47 23 8 55 47 9 27 29 10 53 63 11 29 19 12 49 57 Response Probabilities Response Number Sample 1 2 ---------------------------- 1 0.41667 0.58333 2 0.38182 0.61818 3 0.64179 0.35821 4 0.58427 0.41573 5 0.41071 0.58929 6 0.43103 0.56897 7 0.67143 0.32857 8 0.53922 0.46078 9 0.48214 0.51786 10 0.45690 0.54310 11 0.60417 0.39583 12 0.46226 0.53774
Detergent Preference Study Saturated Model Analysis of Variance Source DF Chi-Square Pr > ChiSq ---------------------------------------------------------- Intercept 1 983.13 <.0001 Softness 2 0.09 0.9575 Previous 1 22.68 <.0001 Softness*Previous 2 3.85 0.1457 Temperature 1 3.67 0.0555 Softness*Temperature 2 0.23 0.8914 Previous*Temperature 1 2.26 0.1324 Softnes*Previou*Temperat 2 0.76 0.6850 Residual 0 . . Analysis of Weighted Least Squares Estimates Standard Chi Parameter Estimate Error Square Pr > ChiSq -------------------------------------------------------------------------------- Intercept 0.5069 0.0162 983.13 <.0001 Softness hard 0.00073 0.0225 0.00 0.9740 med 0.00623 0.0226 0.08 0.7830 Previous no 0.0770 0.0162 22.68 <.0001 Softness*Previous hard no -0.0299 0.0225 1.77 0.1831 med no -0.0152 0.0226 0.45 0.5007 Temperature high 0.0310 0.0162 3.67 0.0555 Softness*Temperature hard high 0.00786 0.0225 0.12 0.7265 med high 0.00298 0.0226 0.02 0.8953 Previous*Temperature no high 0.0243 0.0162 2.26 0.1324 Softnes*Previou*Temperat hard no high 0.0187 0.0225 0.69 0.4064 med no high 0.0138 0.0226 0.37 0.5415
The Data Summary table (Output 22.1.1) indicates that you have two response levels and twelve populations.
The Population Profiles table in Output 22.1.2 displays the ordering of independent variable levels as used in the table of parameter estimates.
Since Brand Misthefirst level in the Response Profiles table (Output 22.1.3), the RESPONSE statement causes Pr( Brand =M) to be the single response function modeled .
The Analysis of Variance table in Output 22.1.4 shows that all of the interactions are nonsignificant. Therefore, a main-effects model is fit with the following statements:
model Brand=Softness Previous Temperature / clparm noprofile design; title2 'Main-Effects Model'; run; quit;
The PROC CATMOD statement is not required due to the interactive capability of the CATMOD procedure. The NOPROFILE option suppresses the redisplay of the Response Profiles table. The CLPARM option produces 95% confidence limits for the parameter estimates. Output 22.1.5 through Output 22.1.7 are produced.
Detergent Preference Study Main-Effects Model The CATMOD Procedure Data Summary Response Brand Response Levels 2 Weight Variable Count Populations 12 Data Set DETERGENT Total Frequency 1008 Frequency Missing 0 Observations 24 Response Functions and Design Matrix Response Design Matrix Sample Function 1 2 3 4 5 ----------------------------------------------------------------- 1 0.41667 1 1 0 1 1 2 0.38182 1 1 0 1 1 3 0.64179 1 1 0 1 1 4 0.58427 1 1 0 1 1 5 0.41071 1 0 1 1 1 6 0.43103 1 0 1 1 1 7 0.67143 1 0 1 1 1 8 0.53922 1 0 1 1 1 9 0.48214 1 1 1 1 1 10 0.45690 1 1 1 1 1 11 0.60417 1 1 1 1 1 12 0.46226 1 1 1 1 1
Detergent Preference Study Main-Effects Model Analysis of Variance Source DF Chi-Square Pr > ChiSq --------------------------------------------- Intercept 1 1004.93 <.0001 Softness 2 0.24 0.8859 Previous 1 20.96 <.0001 Temperature 1 3.95 0.0468 Residual 7 8.26 0.3100
Detergent Preference Study Main-Effects Model Analysis of Weighted Least Squares Estimates Standard Chi- 95% Confidence Parameter Estimate Error Square Pr > ChiSq Limits ------------------------------------------------------------------------------- Intercept 0.5080 0.0160 1004.93 <.0001 0.4766 0.5394 Softness hard 0.00256 0.0218 0.01 0.9066 0.0454 0.0402 med 0.0104 0.0218 0.23 0.6342 -0.0323 0.0530 Previous no 0.0711 0.0155 20.96 <.0001 0.1015 0.0407 Temperature high 0.0319 0.0161 3.95 0.0468 0.000446 0.0634
The design matrix in Output 22.1.5 displays the results of the factor effects modeling used in PROC CATMOD.
The analysis of variance table in Output 22.1.6 shows that previous use of Brand M, together with the temperature of the laundry water, are significant factors in preferring Brand M laundry detergent. The table also shows that the additive model fits since the goodness-of-fit statistic (the Residual Chi-Square) is nonsignificant.
The chi-square test in Output 22.1.7 shows that the Softness parameters are not significantly different from zero; as expected, the Wald confidence limits for these two estimates contain zero. So softness of the water is not a factor in choosing Brand M.
The negative coefficient for Previous ( ˆ’ . 0711) indicates that the first level of Previous (which, from the table of population profiles, is ˜no) is associated with a smaller probability of preferring Brand M than the second level of Previous (with coefficient constrained to be 0.0711 since the parameter estimates for a given effect must sum to zero). In other words, previous users of Brand M are much more likely to prefer it than those who have never used it before.
Similarly, the positive coefficient for Temperature indicates that the first level of Temperature (which, from the Population Profiles table, is ˜high) has a larger probability of preferring Brand M than the second level of Temperature . In other words, those who do their laundry in hot water are more likely to prefer Brand M than those who do their laundry in cold water.
Four surgical operations for duodenal ulcers are compared in a clinical trial at four hospitals . The operations performed are: Treatment =a, drainage and vagotomy; Treatment =b, 25%resection and vagotomy; Treatment =c, 50%resection and vagotomy; and Treatment =d, 75%resection. The response is severity of an undesirable complication called dumping syndrome. The data are from Grizzle, Starmer, and Koch (1969, pp. 489 “504).
data operate; input Hospital Treatment $ Severity $ wt @@; datalines; 1 a none 23 1 a slight 7 1 a moderate 2 1 b none 23 1 b slight 10 1 b moderate 5 1 c none 20 1 c slight 13 1 c moderate 5 1 d none 24 1 d slight 10 1 d moderate 6 2 a none 18 2 a slight 6 2 a moderate 1 2 b none 18 2 b slight 6 2 b moderate 2 2 c none 13 2 c slight 13 2 c moderate 2 2 d none 9 2 d slight 15 2 d moderate 2 3 a none 8 3 a slight 6 3 a moderate 3 3 b none 12 3 b slight 4 3 b moderate 4 3 c none 11 3 c slight 6 3 c moderate 2 3 d none 7 3 d slight 7 3 d moderate 4 4 a none 12 4 a slight 9 4 a moderate 1 4 b none 15 4 b slight 3 4 b moderate 2 4 c none 14 4 c slight 8 4 c moderate 3 4 d none 13 4 d slight 6 4 d moderate 4 ;
The response variable ( Severity ) is ordinally scaled with three levels, so assignment of scores is appropriate (0=none, 0.5=slight, 1=moderate). For these scores, the response function yields the mean score. The following statements produce Output 22.2.1 through Output 22.2.6.
title 'Dumping Syndrome Data'; proc catmod data=operate order=data ; weight wt; response 0 0.5 1; model Severity=Treatment Hospital / freq oneway design; title2 'Main-Effects Model'; quit;
Dumping Syndrome Data Main-Effects Model The CATMOD Procedure Data Summary Response Severity Response Levels 3 Weight Variable wt Populations 16 Data Set OPERATE Total Frequency 417 Frequency Missing 0 Observations 48 One-Way Frequencies Variable Value Frequency ---------------------------------- Severity none 240 slight 129 moderate 48 Treatment a 96 b 104 c 110 d 107 Hospital 1 148 2 105 3 74 4 90
Dumping Syndrome Data Main-Effects Model Population Profiles Sample Treatment Hospital Sample Size ----------------------------------------------- 1 a 1 32 2 a 2 25 3 a 3 17 4 a 4 22 5 b 1 38 6 b 2 26 7 b 3 20 8 b 4 20 9 c 1 38 10 c 2 28 11 c 3 19 12 c 4 25 13 d 1 40 14 d 2 26 15 d 3 18 16 d 4 23
Dumping Syndrome Data Main-Effects Model Response Profiles Response Severity --------------------- 1 none 2 slight 3 moderate Response Frequencies Response Number Sample 1 2 3 ---------------------------------- 1 23 7 2 2 18 6 1 3 8 6 3 4 12 9 1 5 23 10 5 6 18 6 2 7 12 4 4 8 15 3 2 9 20 13 5 10 13 13 2 11 11 6 2 12 14 8 3 13 24 10 6 14 9 15 2 15 7 7 4 16 13 6 4
Dumping Syndrome Data Main-Effects Model Response Functions and Design Matrix Response Design Matrix Sample Function 1 2 3 4 5 6 7 ---------------------------------------------------------------------------- 1 0.17188 1 1 0 0 1 0 0 2 0.16000 1 1 0 0 0 1 0 3 0.35294 1 1 0 0 0 0 1 4 0.25000 1 1 0 0 -1 -1 -1 5 0.26316 1 0 1 0 1 0 0 6 0.19231 1 0 1 0 0 1 0 7 0.30000 1 0 1 0 0 0 1 8 0.17500 1 0 1 0 -1 -1 -1 9 0.30263 1 0 0 1 1 0 0 10 0.30357 1 0 0 1 0 1 0 11 0.26316 1 0 0 1 0 0 1 12 0.28000 1 0 0 1 -1 -1 -1 13 0.27500 1 -1 -1 -1 1 0 0 14 0.36538 1 -1 -1 -1 0 1 0 15 0.41667 1 -1 -1 -1 0 0 1 16 0.30435 1 -1 -1 -1 -1 -1 -1
Dumping Syndrome Data Main-Effects Model Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------- Intercept 1 248.77 <.0001 Treatment 3 8.90 0.0307 Hospital 3 2.33 0.5065 Residual 9 6.33 0.7069
Dumping Syndrome Data Main-Effects Model Analysis of Weighted Least Squares Estimates Standard Chi Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------ Intercept 0.2724 0.0173 248.77 <.0001 Treatment a -0.0552 0.0270 4.17 0.0411 b -0.0365 0.0289 1.59 0.2073 c 0.0248 0.0280 0.78 0.3757 Hospital 1 -0.0204 0.0264 0.60 0.4388 2 -0.0178 0.0268 0.44 0.5055 3 0.0531 0.0352 2.28 0.1312
The ORDER= option is specified so that the levels of the response variable remain in the correct order. A main effects model is fit. The FREQ option displays the frequency of each response within each sample (Output 22.2.3), and the ONEWAY option produces a table of the number of subjects within each variable level (Output 22.2.1).
You can use the oneway frequencies (Output 22.2.1) and the response profiles (Output 22.2.3) to verify that the response levels are in the desired order (none, slight, moderate) so that the response scores (0, 0.5, 1.0) are applied appropriately. If the ORDER=DATA option had not been used, the levels would have been in a different order.
The analysis of variance table (Output 22.2.5) shows that the additive model fits (since the Residual Chi-Square is not significant), that the Treatment effect is significant, and that the Hospital effect is not significant.
The coefficients of Treatment in Output 22.2.6 show that the first two treatments (with negative coefficients) have lower mean scores than the last two treatments (the fourth coefficient, not shown, must be positive since the four coefficients must sum to zero). In other words, the less severe treatments (the first two) cause significantly less severe dumping syndrome complications.
In this data set, from Cox and Snell (1989), ingots are prepared with different heating and soaking times and tested for their readiness to be rolled. The response variable Y has value 1 for ingots that are not ready and value 0 otherwise . The explanatory variables are Heat and Soak .
data ingots; input Heat Soak nready ntotal @@; Count=nready; Y=1; output; Count=ntotal-nready; Y=0; output; drop nready ntotal; datalines; 7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13 7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1 7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1 7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1 7 4.0 0 9 14 4.0 0 19 27 4.0 1 16 ;
Logistic regression analysis is often used to investigate the relationship between discrete response variables and continuous explanatory variables. For logistic regression, the continuous design-effects are declared in a DIRECT statement. The following statements produce Output 22.3.1 through Output 22.3.8.
title 'Maximum Likelihood Logistic Regression'; proc catmod data=ingots; weight Count; direct Heat Soak; model Y=Heat Soak / freq covb corrb itprint design; quit;
Maximum Likelihood Logistic Regression The CATMOD Procedure Data Summary Response Y Response Levels 2 Weight Variable Count Populations 19 Data Set INGOTS Total Frequency 387 Frequency Missing 0 Observations 25 Population Profiles Sample Heat Soak Sample Size ------------------------------------- 1 7 1 10 2 7 1.7 17 3 7 2.2 7 4 7 2.8 12 5 7 4 9 6 14 1 31 7 14 1.7 43 8 14 2.2 33 9 14 2.8 31 10 14 4 19 11 27 1 56 12 27 1.7 44 13 27 2.2 21 14 27 2.8 22 15 27 4 16 16 51 1 13 17 51 1.7 1 18 51 2.2 1 19 51 4 1
Maximum Likelihood Logistic Regression Response Profiles Response Y ------------- 1 0 2 1 Response Frequencies Response Number Sample 1 2 ------------------------ 1 10 0 2 17 0 3 7 0 4 12 0 5 9 0 6 31 0 7 43 0 8 31 2 9 31 0 10 19 0 11 55 1 12 40 4 13 21 0 14 21 1 15 15 1 16 10 3 17 1 0 18 1 0 19 1 0
Maximum Likelihood Logistic Regression Response Functions and Design Matrix Response Design Matrix Sample Function 1 2 3 ----------------------------------------------- 1 2.99573 1 7 1 2 3.52636 1 7 1.7 3 2.63906 1 7 2.2 4 3.17805 1 7 2.8 5 2.89037 1 7 4 6 4.12713 1 14 1 7 4.45435 1 14 1.7 8 2.74084 1 14 2.2 9 4.12713 1 14 2.8 10 3.63759 1 14 4 11 4.00733 1 27 1 12 2.30259 1 27 1.7 13 3.73767 1 27 2.2 14 3.04452 1 27 2.8 15 2.70805 1 27 4 16 1.20397 1 51 1 17 0.69315 1 51 1.7 18 0.69315 1 51 2.2 19 0.69315 1 51 4
Maximum Likelihood Logistic Regression Maximum Likelihood Analysis Sub -2 Log Convergence Parameter Estimates Iteration Iteration Likelihood Criterion 1 2 3 ------------------------------------------------------------------------------ 0 0 536.49592 1.0000 0 0 0 1 0 152.58961 0.7156 2.1594 0.0139 0.003733 2 0 106.76066 0.3003 3.5334 0.0363 0.0120 3 0 96.692171 0.0943 4.7489 0.0640 0.0299 4 0 95.383825 0.0135 5.4138 0.0790 0.0498 5 0 95.345659 0.000400 5.5539 0.0819 0.0564 6 0 95.345613 4.8289E-7 5.5592 0.0820 0.0568 7 0 95.345613 7.731E-13 5.5592 0.0820 0.0568 Maximum likelihood computations converged.
Maximum Likelihood Logistic Regression Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 1 24.65 <.0001 Heat 1 11.95 0.0005 Soak 1 0.03 0.8639 Likelihood Ratio 16 13.75 0.6171
Maximum Likelihood Logistic Regression Analysis of Maximum Likelihood Estimates Standard Chi Parameter Estimate Error Square Pr > ChiSq ---------------------------------------------------------- Intercept 5.5592 1.1197 24.65 <.0001 Heat 0.0820 0.0237 11.95 0.0005 Soak 0.0568 0.3312 0.03 0.8639
Maximum Likelihood Logistic Regression Covariance Matrix of the Maximum Likelihood Estimates Row Parameter Col1 Col2 Col3 ------------------------------------------------------------------ 1 Intercept 1.2537133 0.0215664 0.2817648 2 Heat 0.0215664 0.0005633 0.0026243 3 Soak 0.2817648 0.0026243 0.1097020
Maximum Likelihood Logistic Regression Correlation Matrix of the Maximum Likelihood Estimates Row Parameter Col1 Col2 Col3 ------------------------------------------------------------------ 1 Intercept 1.00000 0.81152 0.75977 2 Heat 0.81152 1.00000 0.33383 3 Soak 0.75977 0.33383 1.00000
You can verify that the populations are defined as you intended by looking at the Population Profiles table in Output 22.3.1.
Since the Response Profiles table shows the response level ordering as 0, 1, the default response function, the logit, is defined as log .
The values of the continuous variable are inserted into the design matrix.
Seven Newton-Raphson iterations are required to find the maximum likelihood estimates.
The analysis of variance table (Output 22.3.5) shows that the model fits since the likelihood ratio goodness-of-fit test is nonsignificant. It also shows that the length of heating time is a significant factor with respect to readiness but that length of soaking time is not.
From the table of maximum likelihood estimates (Output 22.3.6), the fitted model is
For example, for Sample 1 with Heat =7and Soak =1, the estimate is
Predicted values of the logits, as well as the probabilities of readiness, could be obtained by specifying PRED=PROB in the MODEL statement. For the example of Sample 1 with Heat =7and Soak =1, PRED=PROB would give an estimate of the probability of readiness equal to 0.9928 since
implies that
As another consideration, since soaking time is nonsignificant, you could fit another model that deleted the variable Soak .
This analysis reproduces the predicted cell frequencies for Bartlett s data using a log-linear model of no three-variable interaction (Bishop, Fienberg, and Holland 1975, p. 89). Cuttings of two different lengths ( Length =short or long) are planted at one of two time points ( Time =now or spring), and their survival status ( Status =dead or alive ) is recorded.
As in the text, the variable levels are simply labeled 1 and 2. The following statements produce Output 22.4.1 through Output 22.4.5:
data bartlett; input Length Time Status wt @@; datalines; 1 1 1 156 1 1 2 84 1 2 1 84 1 2 2 156 2 1 1 107 2 1 2 133 2 2 1 31 2 2 2 209 ; title 'Bartlett''s Data'; proc catmod data=bartlett; weight wt; model Length*Time*Status=_response_ / noparm pred=freq; loglin LengthTimeStatus @ 2; title2 'Model with No 3-Variable Interaction'; quit;
Bartlett's Data Model with No 3-Variable Interaction The CATMOD Procedure Data Summary Response Length*Time*Status Response Levels 8 Weight Variable wt Populations 1 Data Set BARTLETT Total Frequency 960 Frequency Missing 0 Observations 8 Population Profiles Sample Sample Size --------------------- 1 960
Bartlett's Data Model with No 3-Variable Interaction Response Profiles Response Length Time Status ------------------------------------ 1 1 1 1 2 1 1 2 3 1 2 1 4 1 2 2 5 2 1 1 6 2 1 2 7 2 2 1 8 2 2 2
Bartlett's Data Model with No 3-Variable Interaction Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------------- Length 1 2.64 0.1041 Time 1 5.25 0.0220 Length*Time 1 5.25 0.0220 Status 1 48.94 <.0001 Length*Status 1 48.94 <.0001 Time*Status 1 95.01 <.0001 Likelihood Ratio 1 2.29 0.1299
Bartlett's Data Model with No 3-Variable Interaction The CATMOD Procedure Maximum Likelihood Predicted Values for Response Functions ------Observed------ ------Predicted---- Function Standard Standard Number Function Error Function Error Residual -------------------------------------------------------------------- 1 0.29248 0.105806 0.23565 0.098486 0.05683 2 0.91152 0.129188 0.94942 0.129948 0.037901 3 0.91152 0.129188 0.94942 0.129948 0.037901 4 0.29248 0.105806 0.23565 0.098486 0.05683 5 0.66951 0.118872 0.69362 0.120172 0.024113 6 0.45199 0.110921 0.3897 0.102267 0.06229 7 1.90835 0.192465 1.73146 0.142969 0.17688
Bartlett's Data Model with No 3-Variable Interaction Maximum Likelihood Predicted Values for Frequencies -------Observed------ ------Predicted----- Standard Standard Length Time Status Frequency Error Frequency Error Residual -------------------------------------------------------------------------------------- 1 1 1 156 11.43022 161.0961 11.07379 5.09614 1 1 2 84 8.754999 78.90386 7.808613 5.096139 1 2 1 84 8.754999 78.90386 7.808613 5.096139 1 2 2 156 11.43022 161.0961 11.07379 5.09614 2 1 1 107 9.750588 101.9039 8.924304 5.096139 2 1 2 133 10.70392 138.0961 10.33434 5.09614 2 2 1 31 5.47713 36.09614 4.826315 5.09614 2 2 2 209 12.78667 203.9039 12.21285 5.09614
The analysis of variance table shows that the model fits since the likelihood ratio test for the three-variable interaction is nonsignificant. All of the two-variable interactions, however, are significant; this shows that there is mutual dependence among all three variables.
The predicted values table (Output 22.4.4) displays observed and predicted values for the generalized logits. The predicted frequencies table (Output 22.4.5) displays observed and predicted cell frequencies, their standard errors, and residuals.
This example illustrates a log-linear model of independence, using data that contain structural zero frequencies as well as sampling (random) zero frequencies.
In a population of six squirrel monkeys , the joint distribution of genital display with respect to active or passive role was observed. The data are from Fienberg (1980, Table 8-2). Since a monkey cannot have both the active and passive roles in the same interaction, the diagonal cells of the table are structural zeros. See Agresti (2002) for more information on the quasi-independence model.
The DATA step replaces the structural zeros with missing values, and the MISSING=STRUCTURAL option is specified in the MODEL statement to remove these zeros from the analysis. The ZERO=SAMPLING option treats the off-diagonal zeros as sampling zeros. Also, the row for Monkey ˜t is deleted since it contains all zeros; therefore, the cell frequencies predicted by a model of independence are also zero. In addition, the CONTRAST statement compares the behavior of the two monkeys labeled ˜u and ˜v . See the Structural and Sampling Zeros with Raw Data section on page 924 for information on how to perform this analysis when you have raw data. The following statements produce Output 22.5.1 through Output 22.5.8:
data Display; input Active $ Passive $ wt @@; if Active ne 't'; if Active eq Passive then wt=.; datalines; r r 0 r s 1 r t 5 r u 8 r v 9 r w 0 s r 29 s s 0 s t 14 s u 46 s v 4 s w 0 t r 0 t s 0 t t 0 t u 0 t v 0 t w 0 u r 2 u s 3 u t 1 u u 0 u v 38 u w 2 v r 0 v s 0 v t 0 v u 0 v v 0 v w 1 ; title 'Behavior of Squirrel Monkeys'; proc catmod data=Display; weight wt; model Active*Passive=_response_ / missing=structural zero=sampling freq pred=freq noparm oneway; loglin Active Passive; contrast 'Passive, U vs. V' Passive 0 0 0 1 -1; contrast 'Active, U vs. V' Active 0 0 1 -1; title2 'Test Quasi-Independence for the Incomplete Table'; quit;
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table The CATMOD Procedure Data Summary Response Active*Passive Response Levels 25 Weight Variable wt Populations 1 Data Set DISPLAY Total Frequency 220 Frequency Missing 0 Observations 25
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table One-Way Frequencies Variable Value Frequency ----------------------------- Active r 23 s 93 u 46 v 1 w 57 Passive r 40 s 29 t 24 u 60 v 64 w 3
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table Population Profiles Sample Sample Size --------------------- 1 220 Response Profiles Response Active Passive ----------------------------- 1 r s 2 r t 3 r u 4 r v 5 r w 6 s r 7 s t 8 s u 9 s v 10 s w 11 u r 12 u s 13 u t 14 u v 15 u w 16 v r 17 v s 18 v t 19 v u 20 v w 21 w r 22 w s 23 w t 24 w u 25 w v
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table Response Frequencies Response Number Sample 1 2 3 4 5 6 7 8 ------------------------------------------------------------------------------ 1 1 5 8 9 0 29 14 46 Response Frequencies Response Number Sample 9 10 11 12 13 14 15 16 ------------------------------------------------------------------------------ 1 4 0 2 3 1 38 2 0 Response Frequencies Response Number Sample 17 18 19 20 21 22 23 24 ------------------------------------------------------------------------------ 1 0 0 0 1 9 25 4 6 Response Frequencies Response Number Sample 25 --------------- 1 13
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Active 4 56.58 <.0001 Passive 5 47.94 <.0001 Likelihood Ratio 15 135.17 <.0001
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table Contrasts of Maximum Likelihood Estimates Contrast DF Chi-Square Pr > ChiSq ------------------------------------------------- Passive, U vs. V 1 1.31 0.2524 Active, U vs. V 1 14.87 0.0001
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table The CATMOD Procedure Maximum Likelihood Predicted Values for Response Functions ------Observed------ ------Predicted---- Function Standard Standard Number Function Error Function Error Residual -------------------------------------------------------------------- 1 2.56495 1.037749 0.97355 0.339019 1.5914 2 0.95551 0.526235 1.72504 0.345438 0.769529 3 0.48551 0.449359 0.52751 0.309254 0.042007 4 0.36772 0.433629 0.73927 0.249006 0.371543 5 . . 3.56052 0.634104 . 6 0.802346 0.333775 0.320589 0.26629 0.481758 7 0.074108 0.385164 0.29934 0.295634 0.37345 8 1.263692 0.314105 0.898184 0.250857 0.365508 9 1.17865 0.571772 0.686431 0.173396 1.86509 10 . . 2.13482 0.608071 . 11 1.8718 0.759555 0.2415 0.287218 1.63031 12 1.46634 0.640513 0.10994 0.303568 1.3564 13 2.56495 1.037749 0.86143 0.314794 1.70352 14 1.072637 0.321308 0.124346 0.204345 0.94829 15 1.8718 0.759555 2.6969 0.617433 0.8251 16 . . 4.14787 1.024508 . 17 . . 4.01632 1.030062 . 18 . . 4.76781 1.032457 . 19 . . 3.57028 1.020794 . 20 2.56495 1.037749 6.60328 1.161289 4.038332 21 0.36772 0.433629 0.36584 0.202959 0.00188 22 0.653926 0.34194 0.23429 0.232794 0.888212 23 1.17865 0.571772 0.98577 0.239408 0.19288 24 0.77319 0.493548 0.211754 0.185007 0.98494
Behavior of Squirrel Monkeys Test Quasi-Independence for the Incomplete Table Maximum Likelihood Predicted Values for Frequencies -------Observed------ ------Predicted----- Standard Standard Active Passive Frequency Error Frequency Error Residual ------------------------------------------------------------------------------ r s 1 0.997725 5.259508 1.36156 4.25951 r t 5 2.210512 2.480726 0.691066 2.519274 r u 8 2.776525 8.215948 1.855146 0.21595 r v 9 2.937996 6.648049 1.50932 2.351951 r w 0 0 0.395769 0.240268 0.39577 s r 29 5.017696 19.18599 3.147915 9.814007 s t 14 3.620648 10.32172 2.169599 3.678284 s u 46 6.031734 34.18463 4.428706 11.81537 s v 4 1.981735 27.66096 3.722788 23.661 s w 0 0 1.6467 0.952712 1.6467 u r 2 1.407771 10.9364 2.12322 8.9364 u s 3 1.720201 12.47407 2.554336 9.47407 u t 1 0.997725 5.883583 1.380655 4.88358 u v 38 5.606814 15.7673 2.684692 22.2327 u w 2 1.407771 0.938652 0.551645 1.061348 v r 0 0 0.219966 0.221779 0.21997 v s 0 0 0.250893 0.253706 0.25089 v t 0 0 0.118338 0.120314 0.11834 v u 0 0 0.391924 0.393255 0.39192 v w 1 0.997725 0.018879 0.021728 0.981121 w r 9 2.937996 9.657645 1.808656 0.65765 w s 25 4.707344 11.01553 2.275019 13.98447 w t 4 1.981735 5.195638 1.184452 1.19564 w u 6 2.415857 17.2075 2.772098 11.2075 w v 13 3.497402 13.92369 2.24158 0.92369
The results of the ONEWAY option are shown in Output 22.5.2. Monkey ˜t does not show up as a value for the Active variable since that row was removed.
Sampling zeros are displayed as 0 in Output 22.5.4. The Response Number corresponds to the value displayed in the Response Profiles in Output 22.5.3.
The analysis of variance table (Output 22.5.5) shows that the model of independence does not fit since the likelihood ratio test for the interaction is significant. In other words, active and passive behaviors of the squirrel monkeys are dependent behavior roles.
If the model fit these data, then the contrasts in Output 22.5.6 show that monkeys ˜u and ˜v appear to have similar passive behavior patterns but very different active behavior patterns.
Output 22.5.7 displays the predicted response functions and Output 22.5.8 displays predicted cell frequencies (from the PRED=FREQ option), but since the model does not fit, these should be ignored. Note that, since the response function is the generalized logit with the twenty-fifth response as the baseline, the observed response functions for the sampling zeros are missing.
The preceding PROC CATMOD step uses cell count data as input. Prior to invoking the CATMOD procedure, structural and sampling zeros are easily identified and manipulated in a single DATA step. For the situation where structural or sampling zeros (or both) may exist and the input data set is raw data, use the following steps:
Run PROC FREQ on the raw data. In the TABLES statement, list all dependent and independent variables separated by asterisks and use the SPARSE option and the OUT= option. This creates an output data set that contains all possible zero frequencies. Since the tabled output can be huge, you should also specify the NOPRINT option on the TABLES statement.
Use a DATA step to change the zero frequencies associated with either sampling zeros or structural zeros to missing.
Use the resulting data set as input to PROC CATMOD, specify the statement WEIGHT COUNT to use adjusted frequencies, and specify the ZERO= and MISSING= options to define your sampling and structural zeros.
For example, suppose the data set RawDisplay contains the raw data for the squirrel monkey data. The following statements show how to obtain the same analysis as shown previously:
proc freq data=RawDisplay; tables Active*Passive / sparse out=Combos noprint; run; data Combos2; set Combos; if Active ne 't'; if Active eq Passive then count=.; run; proc catmod data=Combos2; weight count; model Active*Passive=_response_ / zero=sampling missing=structural freq pred=freq noparm noresponse; loglin Active Passive; quit;
The first IF statement in the DATA step is needed only for this particular example; since observations for Monkey ˜t were deleted from the Display data set, they also need to be deleted from Combos2 .
In this multi-population repeated measures example, from Guthrie (1981), subjects from three groups have their responses (0 or 1) recorded in each of four trials. The analysis of the marginal probabilities is directed at assessing the main effects of the repeated measurement factor ( Trial ) and the independent variable ( Group ), as well as their interaction. Although the contingency table is incomplete (only thirteen of the sixteen possible responses are observed), this poses no problem in the computation of the marginal probabilities. The following statements produce Output 22.6.1 through Output 22.6.5:
data group; input a b c d Group wt @@; datalines; 1 1 1 1 2 2 0 0 0 0 2 2 0 0 1 0 1 2 0 0 1 0 2 2 0 0 0 1 1 4 0 0 0 1 2 1 0 0 0 1 3 3 1 0 0 1 2 1 0 0 1 1 1 1 0 0 1 1 2 2 0 0 1 1 3 5 0 1 0 0 1 4 0 1 0 0 2 1 0 1 0 1 2 1 0 1 0 1 3 2 0 1 1 0 3 1 1 0 0 0 1 3 1 0 0 0 2 1 0 1 1 1 2 1 0 1 1 1 3 2 1 0 1 0 1 1 1 0 1 1 2 1 1 0 1 1 3 2 ; title 'Multi-Population Repeated Measures'; proc catmod data=group; weight wt; response marginals; model a*b*c*d=Group _response_ Group*_response_ / freq; repeated Trial 4; title2 'Saturated Model'; run;
Multi-Population Repeated Measures Saturated Model The CATMOD Procedure Data Summary Response a*b*c*d Response Levels 13 Weight Variable wt Populations 3 Data Set GROUP Total Frequency 45 Frequency Missing 0 Observations 23 Population Profiles Sample Group Sample Size ------------------------------ 1 1 15 2 2 15 3 3 15
Multi-Population Repeated Measures Saturated Model Response Profiles Response a b c d --------------------------- 1 0 0 0 0 2 0 0 0 1 3 0 0 1 0 4 0 0 1 1 5 0 1 0 0 6 0 1 0 1 7 0 1 1 0 8 0 1 1 1 9 1 0 0 0 10 1 0 0 1 11 1 0 1 0 12 1 0 1 1 13 1 1 1 1
Multi-Population Repeated Measures Saturated Model Response Frequencies Response Number Sample 1 2 3 4 5 6 7 8 ----------------------------------------------------------------------------- 1 0 4 2 1 4 0 0 0 2 2 1 2 2 1 1 0 1 3 0 3 0 5 0 2 1 2 Response Frequencies Response Number Sample 9 10 11 12 13 --------------------------------------------------- 1 3 0 1 0 0 2 1 1 0 1 2 3 0 0 0 2 0
Multi-Population Repeated Measures Saturated Model Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------------- Intercept 1 354.88 <.0001 Group 2 24.79 <.0001 Trial 3 21.45 <.0001 Group*Trial 6 18.71 0.0047 Residual 0 . .
Multi-Population Repeated Measures Saturated Model Analysis of Weighted Least Squares Estimates Standard Chi Effect Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------------------------ Intercept 1 0.5833 0.0310 354.88 <.0001 Group 2 0.1333 0.0335 15.88 <.0001 3 0.0333 0.0551 0.37 0.5450 Trial 4 0.1722 0.0557 9.57 0.0020 5 0.1056 0.0647 2.66 0.1028 6 0.0722 0.0577 1.57 0.2107 Group*Trial 7 0.1556 0.0852 3.33 0.0679 8 0.0556 0.0800 0.48 0.4877 9 0.0889 0.0953 0.87 0.3511 10 0.0111 0.0866 0.02 0.8979 11 0.0889 0.0822 1.17 0.2793 12 0.0111 0.0824 0.02 0.8927
The analysis of variance table in Output 22.6.4 shows that there is a significant interaction between the independent variable Group and the repeated measurement factor Trial . Thus, an intermediate model (not shown) is fit in which the effects Trial and Group * Trial are replaced by Trial ( Group =1), Trial ( Group =2), and Trial ( Group =3). Of these three effects, only the last is significant, so it is retained in the final model. The following statements produce Output 22.6.6 and Output 22.6.7:
model a*b*c*d=Group _response_(Group=3) / noprofile noparm design; title2 'Trial Nested within Group 3'; quit;
Multi-Population Repeated Measures Trial Nested within Group 3 The CATMOD Procedure Data Summary Response a*b*c*d Response Levels 13 Weight Variable wt Populations 3 Data Set GROUP Total Frequency 45 Frequency Missing 0 Observations 23 Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 4 5 6 ------------------------------------------------------------------------------ 1 1 0.73333 1 1 0 0 0 0 2 0.73333 1 1 0 0 0 0 3 0.73333 1 1 0 0 0 0 4 0.66667 1 1 0 0 0 0 2 1 0.66667 1 0 1 0 0 0 2 0.66667 1 0 1 0 0 0 3 0.46667 1 0 1 0 0 0 4 0.40000 1 0 1 0 0 0 3 1 0.86667 1 1 1 1 0 0 2 0.66667 1 1 1 0 1 0 3 0.33333 1 1 1 0 0 1 4 0.06667 1 1 1 1 1 1
Multi-Population Repeated Measures Trial Nested within Group 3 Analysis of Variance Source DF Chi-Square Pr > ChiSq ----------------------------------------------------- Intercept 1 386.94 <.0001 Group 2 25.42 <.0001 Trial(Group=3) 3 75.07 <.0001 Residual 6 5.09 0.5319
Output 22.6.6 displays the design matrix resulting from retaining the nested effect.
The residual goodness-of-fit statistic tests the joint effect of Trial ( Group =1) and Trial ( Group =2). The analysis of variance table in Output 22.6.7 shows that the final model fits, that there is a significant Group effect, and that there is a significant Trial effect in Group 3.
This example illustrates a repeated measurement analysis in which there are more than two levels of response. In this study, from Grizzle, Starmer, and Koch (1969, p. 493), 7,477 women aged 30 “39 are tested for vision in both right and left eyes. Since there are four response levels for each dependent variable, the RESPONSE statement computes three marginal probabilities for each dependent variable, resulting in six response functions for analysis. Since the model contains a repeated measurement factor ( Side ) with two levels ( Right , Left ), PROC CATMOD groups the functions into sets of three (=6/2). Therefore, the Side effect has three degrees of freedom (one for each marginal probability), and it is the appropriate test of marginal homogeneity. The following statements produce Output 22.7.1 through Output 22.7.6:
title 'Vision Symmetry'; data vision; input Right Left count @@; datalines; 1 1 1520 1 2 266 1 3 124 1 4 66 2 1 234 2 2 1512 2 3 432 2 4 78 3 1 117 3 2 362 3 3 1772 3 4 205 4 1 36 4 2 82 4 3 179 4 4 492 ; proc catmod data=vision; weight count; response marginals; model Right*Left=_response_ / freq design; repeated Side 2; title2 'Test of Marginal Homogeneity'; quit;
Vision Symmetry Test of Marginal Homogeneity The CATMOD Procedure Data Summary Response Right*Left Response Levels 16 Weight Variable count Populations 1 Data Set VISION Total Frequency 7477 Frequency Missing 0 Observations 16 Population Profiles Sample Sample Size --------------------- 1 7477
Test of Marginal Homogeneity Response Profiles Response Right Left ------------------------- 1 1 1 2 1 2 3 1 3 4 1 4 5 2 1 6 2 2 7 2 3 8 2 4 9 3 1 10 3 2 11 3 3 12 3 4 13 4 1 14 4 2 15 4 3 16 4 4
Test of Marginal Homogeneity Response Frequencies Response Number Sample 1 2 3 4 5 6 7 8 ------------------------------------------------------------------------------ 1 1520 266 124 66 234 1512 432 78 Response Frequencies Response Number Sample 9 10 11 12 13 14 15 16 ------------------------------------------------------------------------------ 1 117 362 1772 205 36 82 179 492
Test of Marginal Homogeneity Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 4 5 6 ------------------------------------------------------------------------------ 1 1 0.26428 1 0 0 1 0 0 2 0.30173 0 1 0 0 1 0 3 0.32847 0 0 1 0 0 1 4 0.25505 1 0 0 -1 0 0 5 0.29718 0 1 0 0 -1 0 6 0.33529 0 0 1 0 0 -1
Test of Marginal Homogeneity Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------- Intercept 3 78744.17 <.0001 Side 3 11.98 0.0075 Residual 0 . .
Test of Marginal Homogeneity Analysis of Weighted Least Squares Estimates Standard Chi- Effect Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------------------ Intercept 1 0.2597 0.00468 3073.03 <.0001 2 0.2995 0.00464 4160.17 <.0001 3 0.3319 0.00483 4725.25 <.0001 Side 4 0.00461 0.00194 5.65 0.0174 5 0.00227 0.00255 0.80 0.3726 6 -0.00341 0.00252 1.83 0.1757
The analysis of variance table in Output 22.7.5 shows that the Side effect is significant, so there is not marginal homogeneity between left-eye vision and right-eye vision. In other words, the distribution of the quality of right-eye vision differs significantly from the quality of left-eye vision in the same subjects. The test of the Side effect is equivalent to Bhapkar s test (Agresti 1990).
The data, from a longitudinal study reported in Koch et al. (1977), are from patients in four populations (2 diagnostic groups — 2 treatments) who are measured at three times to assess their response (n=normal or a=abnormal) to treatment.
title 'Growth Curve Analysis'; data growth2; input Diagnosis $ Treatment $ week1 $ week2 $ week4 $ count @@; datalines; mild std n n n 16 severe std n n n 2 mild std n n a 13 severe std n n a 2 mild std n a n 9 severe std n a n 8 mild std n a a 3 severe std n a a 9 mild std a n n 14 severe std a n n 9 mild std a n a 4 severe std a n a 15 mild std a a n 15 severe std a a n 27 mild std a a a 6 severe std a a a 28 mild new n n n 31 severe new n n n 7 mild new n n a 0 severe new n n a 2 mild new n a n 6 severe new n a n 5 mild new n a a 0 severe new n a a 2 mild new a n n 22 severe new a n n 31 mild new a n a 2 severe new a n a 5 mild new a a n 9 severe new a a n 32 mild new a a a 0 severe new a a a 6 ;
The analysis is directed at assessing the effect of the repeated measurement factor, Time , as well as the independent variables, Diagnosis (mild or severe) and Treatment (std or new). The RESPONSE statement is used to compute the logits of the marginal probabilities. The times used in the design matrix (0, 1, 2) correspond to the logarithms (base 2) of the actual times (1, 2, 4). The following statements produce Output 22.8.1 through Output 22.8.7:
proc catmod data=growth2 order=data; title2 'Reduced Logistic Model'; weight count; population Diagnosis Treatment; response logit; model week1*week2*week4=(1 0 0 0, /* mild, std */ 1 0 1 0, 1 0 2 0, 1 0 0 0, /* mild, new */ 1 0 0 1, 1 0 0 2, 0 1 0 0, /* severe, std */ 0 1 1 0, 0 1 2 0, 0 1 0 0, /* severe, new */ 0 1 0 1, 0 1 0 2) (1='Mild diagnosis, week 1', 2='Severe diagnosis, week 1', 3='Time effect for std trt', 4='Time effect for new trt') / freq design; contrast 'Diagnosis effect, week 1' all_parms 1 -1 0 0; contrast 'Equal time effects' all_parms 0 0 1 -1; quit;
Growth Curve Analysis Reduced Logistic Model The CATMOD Procedure Data Summary Response week1*week2*week4 Response Levels 8 Weight Variable count Populations 4 Data Set GROWTH2 Total Frequency 340 Frequency Missing 0 Observations 29
Growth Curve Analysis Reduced Logistic Model Population Profiles Sample Diagnosis Treatment Sample Size ----------------------------------------------- 1 mild std 80 2 mild new 70 3 severe std 100 4 severe new 90 Response Profiles Response week1 week2 week4 ----------------------------------- 1 n n n 2 n n a 3 n a n 4 n a a 5 a n n 6 a n a 7 a a n 8 a a a
Growth Curve Analysis Reduced Logistic Model Response Frequencies Response Number Sample 1 2 3 4 5 6 7 8 ------------------------------------------------------------------------------ 1 16 13 9 3 14 4 15 6 2 31 0 6 0 22 2 9 0 3 2 2 8 9 9 15 27 28 4 7 2 5 2 31 5 32 6
Growth Curve Analysis Reduced Logistic Model Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 4 -------------------------------------------------------------------- 1 1 0.05001 1 0 0 0 2 0.35364 1 0 1 0 3 0.73089 1 0 2 0 2 1 0.11441 1 0 0 0 2 1.29928 1 0 0 1 3 3.52636 1 0 0 2 3 1 1.32493 0 1 0 0 2 0.94446 0 1 1 0 3 0.16034 0 1 2 0 4 1 1.53148 0 1 0 0 2 0.00000 0 1 0 1 3 1.60944 0 1 0 2
Growth Curve Analysis Reduced Logistic Model Analysis of Variance Source DF Chi-Square Pr > ChiSq ---------------------------------------------------------- Mild diagnosis, week 1 1 0.28 0.5955 Severe diagnosis, week 1 1 100.48 <.0001 Time effect for std trt 1 26.35 <.0001 Time effect for new trt 1 125.09 <.0001 Residual 8 4.20 0.8387
Growth Curve Analysis Reduced Logistic Model Analysis of Weighted Least Squares Estimates Standard Chi Effect Parameter Estimate Error Square Pr > ChiSq -------------------------------------------------------------------- Model 1 0.0716 0.1348 0.28 0.5955 2 1.3529 0.1350 100.48 <.0001 3 0.4944 0.0963 26.35 <.0001 4 1.4552 0.1301 125.09 <.0001
Growth Curve Analysis Reduced Logistic Model Analysis of Contrasts Contrast DF Chi-Square Pr > ChiSq --------------------------------------------------------- Diagnosis effect, week 1 1 77.02 <.0001 Equal time effects 1 59.12 <.0001
The samples and the response numbers are defined in Output 22.8.2, and Output 22.8.3 displays the frequency distribution of the response numbers within the samples. Output 22.8.4 displays the design matrix specified in the MODEL statement, and the observed logits of the marginal probabilities are displayed in the Response Function column.
The analysis of variance table (Output 22.8.5) shows that the data can be adequately modeled by two parameters that represent diagnosis effects at week 1 and two log-linear time effects (one for each treatment). Both of the time effects are significant.
The analysis of contrasts (Output 22.8.7) shows that the diagnosis effect at week 1 is highly significant. In Output 22.8.6, since the estimate of the logit for the severe diagnosis effect (parameter 2) is more negative than it is for the mild diagnosis effect (parameter 1), there is a smaller predicted probability of the first response (normal) for the severe diagnosis group. In other words, those subjects with a severe diagnosis have a significantly higher probability of abnormal response at week 1 than those subjects with a mild diagnosis.
The analysis of contrasts also shows that the time effect for the standard treatment is significantly different than the one for the new treatment. The table of parameter estimates (Output 22.8.6) shows that the time effect for the new treatment (parameter 4) is stronger than it is for the standard treatment (parameter 3).
This example, from MacMillan et al. (1981), illustrates a repeated measurement analysis in which there are two repeated measurement factors. Two diagnostic procedures (standard and test) are performed on each subject, and the results of both are evaluated at each of two times as being positive or negative.
title 'Diagnostic Procedure Comparison'; data a; input std1 $ test1 $ std2 $ test2 $ wt @@; datalines; neg neg neg neg 509 neg neg neg pos 4 neg neg pos neg 17 neg neg pos pos 3 neg pos neg neg 13 neg pos neg pos 8 neg pos pos pos 8 pos neg neg neg 14 pos neg neg pos 1 pos neg pos neg 17 pos neg pos pos 9 pos pos neg neg 7 pos pos neg pos 4 pos pos pos neg 9 pos pos pos pos 170 ;
For the initial model, the response functions are marginal probabilities, and the repeated measurement factors are Time and Treatment . The model is a saturated one, containing effects for Time , Treatment ,and Time * Treatment . The following statements produce Output 22.9.1 through Output 22.9.5:
proc catmod data=a; title2 'Marginal Symmetry, Saturated Model'; weight wt; response marginals; model std1*test1*std2*test2=_response_ / freq design noparm; repeated Time 2, Treatment 2 / _response_=Time Treatment Time*Treatment; run;
Diagnostic Procedure Comparison Marginal Symmetry, Saturated Model The CATMOD Procedure Data Summary Response std1*test1*std2*test2 Response Levels 15 Weight Variable wt Populations 1 Data Set A Total Frequency 793 Frequency Missing 0 Observations 15 Population Profiles Sample Sample Size --------------------- 1 793
Diagnostic Procedure Comparison Marginal Symmetry, Saturated Model Response Profiles Response std1 test1 std2 test2 ----------------------------------------- 1 neg neg neg neg 2 neg neg neg pos 3 neg neg pos neg 4 neg neg pos pos 5 neg pos neg neg 6 neg pos neg pos 7 neg pos pos pos 8 pos neg neg neg 9 pos neg neg pos 10 pos neg pos neg 11 pos neg pos pos 12 pos pos neg neg 13 pos pos neg pos 14 pos pos pos neg 15 pos pos pos pos
Diagnostic Procedure Comparison Marginal Symmetry, Saturated Model Response Frequencies Response Number Sample 1 2 3 4 5 6 7 8 ------------------------------------------------------------------------------ 1 509 4 17 3 13 8 8 14 Response Frequencies Response Number Sample 9 10 11 12 13 14 15 --------------------------------------------------------------------- 1 1 17 9 7 4 9 170
Diagnostic Procedure Comparison Marginal Symmetry, Saturated Model Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 4 -------------------------------------------------------------------- 1 1 0.70870 1 1 1 1 2 0.72383 1 1 -1 -1 3 0.70618 1 -1 1 -1 4 0.73897 1 -1 -1 1
Diagnostic Procedure Comparison Marginal Symmetry, Saturated Model Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------------ Intercept 1 2385.34 <.0001 Time 1 0.85 0.3570 Treatment 1 8.20 0.0042 Time*Treatment 1 2.40 0.1215 Residual 0 . .
The analysis of variance table in Output 22.9.5 shows that there is no significant effect of Time , either by itself or in its interaction with Treatment . Thus, the second model includes only the Treatment effect. Again, the response functions are marginal prob- abilities , and the repeated measurement factors are Time and Treatment . A main effect model with respect to Treatment is fit. The following statements produce Output 22.9.6 through Output 22.9.10:
title2 'Marginal Symmetry, Reduced Model'; model std1*test1*std2*test2=_response_ / corrb design noprofile; repeated Time 2, Treatment 2 / _response_=Treatment; run;
Diagnostic Procedure Comparison Marginal Symmetry, Reduced Model The CATMOD Procedure Data Summary Response std1*test1*std2*test2 Response Levels 15 Weight Variable wt Populations 1 Data Set A Total Frequency 793 Frequency Missing 0 Observations 15
Diagnostic Procedure Comparison Marginal Symmetry, Reduced Model Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 -------------------------------------------------- 1 1 0.70870 1 1 2 0.72383 1 -1 3 0.70618 1 1 4 0.73897 1 -1
Diagnostic Procedure Comparison Marginal Symmetry, Reduced Model Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------- Intercept 1 2386.97 <.0001 Treatment 1 9.55 0.0020 Residual 2 3.51 0.1731
Diagnostic Procedure Comparison Marginal Symmetry, Reduced Model Analysis of Weighted Least Squares Estimates Standard Chi Effect Parameter Estimate Error Square Pr > ChiSq ----------------------------------------------------------------------- Intercept 1 0.7196 0.0147 2386.97 <.0001 Treatment 2 0.0128 0.00416 9.55 0.0020
Diagnostic Procedure Comparison Marginal Symmetry, Reduced Model Correlation Matrix of the Parameter Estimates Row Col1 Col2 ---------------------------------- 1 1.00000 0.04194 2 0.04194 1.00000
The analysis of variance table for the reduced model (Output 22.9.8) shows that the model fits (since the Residual is nonsignificant) and that the treatment effect is significant. The negative parameter estimate for Treatment in Output 22.9.9 shows that the first level of treatment (std) has a smaller probability of the first response level (neg) than the second level of treatment (test). In other words, the standard diagnostic procedure gives a significantly higher probability of a positive response than the test diagnostic procedure.
The next example illustrates a RESPONSE statement that, at each time, computes the sensitivity and specificity of the test diagnostic procedure with respect to the standard procedure. Since these are measures of the relative accuracy of the two diagnostic procedures, the repeated measurement factors in this case are labeled Time and Accuracy . Only fifteen of the sixteen possible responses are observed, so additional care must be taken in formulating the RESPONSE statement for computation of sensitivity and specificity.
The following statements produce Output 22.9.11 through Output 22.9.15:
title2 'Sensitivity and Specificity Analysis, ' 'Main-Effects Model'; model std1*test1*std2*test2=_response_ / covb design noprofile; repeated Time 2, Accuracy 2 / _response_=Time Accuracy; response exp 1 1 0 0 0 0 0 0, 0 0 1 1 0 0 0 0, 0 0 0 0 1 1 0 0, 0 0 0 0 0 0 1 1 log 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1, 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1, 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0, 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0, 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1, 0 0 1 1 0 0 1 0 0 1 1 0 0 1 1, 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0, 1 1 0 0 1 1 0 1 1 0 0 1 1 0 0; quit;
Diagnostic Procedure Comparison Sensitivity and Specificity Analysis, Main-Effects Model The CATMOD Procedure Data Summary Response std1*test1*std2*test2 Response Levels 15 Weight Variable wt Populations 1 Data Set A Total Frequency 793 Frequency Missing 0 Observations 15
Diagnostic Procedure Comparison Sensitivity and Specificity Analysis, Main-Effects Model Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 ----------------------------------------------------------- 1 1 0.82251 1 1 1 2 0.94840 1 1 1 3 0.81545 1 1 1 4 0.96964 1 1 1
Diagnostic Procedure Comparison Sensitivity and Specificity Analysis, Main-Effects Model Analysis of Variance Source DF Chi-Square Pr > ChiSq --------------------------------------------- Intercept 1 6448.79 <.0001 Time 1 4.10 0.0428 Accuracy 1 38.81 <.0001 Residual 1 1.00 0.3178
Diagnostic Procedure Comparison Sensitivity and Specificity Analysis, Main-Effects Model Analysis of Weighted Least Squares Estimates Standard Chi- Effect Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------------------- Intercept 1 0.8892 0.0111 6448.79 <.0001 Time 2 0.00932 0.00460 4.10 0.0428 Accuracy 3 0.0702 0.0113 38.81 <.0001
Diagnostic Procedure Comparison Sensitivity and Specificity Analysis, Main-Effects Model Covariance Matrix of the Parameter Estimates Row Col1 Col2 Col3 ---------------------------------------------------- 1 0.00012260 0.00000229 0.00010137 2 0.00000229 0.00002116 .00000587 3 0.00010137 .00000587 0.00012697
For the sensitivity and specificity analysis, the four response functions displayed next to the design matrix (Output 22.9.12) represent the following:
sensitivity, time 1
specificity, time 1
sensitivity, time 2
specificity, time 2
The sensitivities and specificities are for the test diagnostic procedure relative to the standard procedure.
The ANOVA table (Output 22.9.13) shows that an additive model fits, that there is a significant effect of time, and that the sensitivity is significantly different from the specificity.
Output 22.9.14 shows that the predicted sensitivities and specificities are lower for time 1 (since parameter 2 is negative). It also shows that the sensitivity is significantly less than the specificity.
This example illustrates the ability of PROC CATMOD to operate on an existing vector of functions and the corresponding covariance matrix. The estimates under investigation are composite indices summarizing the responses to eighteen psychological questions pertaining to general well-being. These estimates are computed for domains corresponding to an age by sex cross-classification, and the covariance matrix is calculated via the method of balanced repeated replications. The analysis is directed at obtaining a description of the variation among these domain estimates. The data are from Koch and Stokes (1979).
data fbeing(type=est); input b1-b5 _type_ $ _name_ $ b6-b10 #2; datalines; 7.93726 7.92509 7.82815 7.73696 8.16791 parms . 7.24978 7.18991 7.35960 7.31937 7.55184 0.00739 0.00019 0.00146 0.00082 0.00076 cov b1 0.00189 0.00118 0.00140 0.00140 0.00039 0.00019 0.01172 0.00183 0.00029 0.00083 cov b2 -0.00123 0.00629 -0.00088 0.00232 0.00034 0.00146 0.00183 0.01050 0.00173 0.00011 cov b3 0.00434 0.00059 -0.00055 0.00023 -0.00013 -0.00082 0.00029 -0.00173 0.01335 0.00140 cov b4 0.00158 0.00212 0.00211 0.00066 0.00240 0.00076 0.00083 0.00011 0.00140 0.01430 cov b5 -0.00050 0.00098 0.00239 0.00010 0.00213 0.00189 0.00123 0.00434 0.00158 0.00050 cov b6 0.01110 0.00101 0.00177 0.00018 0.00082 0.00118 0.00629 0.00059 0.00212 0.00098 cov b7 0.00101 0.02342 0.00144 0.00369 0.00253 0.00140 0.00088 0.00055 0.00211 0.00239 cov b8 0.00177 0.00144 0.01060 0.00157 0.00226 -0.00140 0.00232 0.00023 0.00066 -0.00010 cov b9 -0.00018 0.00369 0.00157 0.02298 0.00918 0.00039 0.00034 0.00013 0.00240 0.00213 cov b10 -0.00082 0.00253 0.00226 0.00918 0.01921 ;
The following statements produce Output 22.10.1 through Output 22.10.3:
proc catmod data=fbeing; title 'Complex Sample Survey Analysis'; response read b1-b10; factors sex $ 2, age $ 5 / _response_=sex age profile=(male '25-34', male '35-44', male '45-54', male '55-64', male '65-74', female '25-34', female '35-44', female '45-54', female '55-64', female '65-74'); model _f_=_response_ / design title='Main Effects for Sex and Age'; run;
Complex Sample Survey Analysis Main Effects for Sex and Age The CATMOD Procedure Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 4 5 6 ------------------------------------------------------------------------------ 1 1 7.93726 1 1 1 0 0 0 2 7.92509 1 1 0 1 0 0 3 7.82815 1 1 0 0 1 0 4 7.73696 1 1 0 0 0 1 5 8.16791 1 1 1 1 1 1 6 7.24978 1 1 1 0 0 0 7 7.18991 1 1 0 1 0 0 8 7.35960 1 1 0 0 1 0 9 7.31937 1 1 0 0 0 1 10 7.55184 1 1 1 1 1 1
Complex Sample Survey Analysis Analysis of Variance Source DF Chi-Square Pr > ChiSq --------------------------------------------- Intercept 1 28089.07 <.0001 sex 1 65.84 <.0001 age 4 9.21 0.0561 Residual 4 2.92 0.5713
Complex Sample Survey Analysis Analysis of Weighted Least Squares Estimates Standard Chi- Effect Parameter Estimate Error Square Pr > ChiSq ------------------------------------------------------------------------- Intercept 1 7.6319 0.0455 28089.07 <.0001 sex 2 0.2900 0.0357 65.84 <.0001 age 3 0.00780 0.0645 0.01 0.9037 4 0.0465 0.0636 0.54 0.4642 5 0.0343 0.0557 0.38 0.5387 6 0.1098 0.0764 2.07 0.1506
The analysis of variance table (Output 22.10.2) shows that the additive model fits and that there is a significant effect of both sex and age. The following statements produce Output 22.10.4:
contrast 'No Age Effect for Age<65' all_parms00100-1, all_parms00010-1, all_parms00001-1; run;
Complex Sample Survey Analysis Main Effects for Sex and Age The CATMOD Procedure Analysis of Contrasts Contrast DF Chi-Square Pr > ChiSq --------------------------------------------------------- No Age Effect for Age<65 3 0.72 0.8678
The analysis of the contrast shows that there is no significant difference among the four age groups that are under age 65. Thus, the next model contains a binary age effect (less than 65 versus 65 and over). The following statements produce Output 22.10.5 through Output 22.10.7:
model _f_=(1 1 1, 1 1 1, 1 1 1, 1 1 1, 1 1 -1, 1 -1 1, 1 -1 1, 1 -1 1, 1 -1 1, 1 -1 -1) (1='Intercept' , 2='Sex' , 3='Age (25-64 vs. 65-74)') / design title='Binary Age Effect (25-64 vs. 65-74)' ; run; quit;
Complex Sample Survey Analysis Binary Age Effect (25-64 vs. 65-74) The CATMOD Procedure Response Functions and Design Matrix Function Response Design Matrix Sample Number Function 1 2 3 ----------------------------------------------------------- 1 1 7.93726 1 1 1 2 7.92509 1 1 1 3 7.82815 1 1 1 4 7.73696 1 1 1 5 8.16791 1 1 1 6 7.24978 1 1 1 7 7.18991 1 1 1 8 7.35960 1 1 1 9 7.31937 1 1 1 10 7.55184 1 1 1
Complex Sample Survey Analysis Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------------------- Intercept 1 19087.16 <.0001 Sex 1 72.64 <.0001 Age (25-64 vs. 65-74) 1 8.49 0.0036 Residual 7 3.64 0.8198
Complex Sample Survey Analysis Analysis of Weighted Least Squares Estimates Standard Chi- Effect Parameter Estimate Error Square Pr > ChiSq -------------------------------------------------------------------- Model 1 7.7183 0.0559 19087.16 <.0001 2 0.2800 0.0329 72.64 <.0001 3 0.1304 0.0448 8.49 0.0036
The analysis of variance table in Output 22.10.6 shows that the model fits (note that the goodness-of-fit statistic is the sum of the previous one (Output 22.10.2) plus the chi-square for the contrast matrix in Output 22.10.4). The age and sex effects are significant. Since the second parameter in the table of estimates is positive, males (the first level for the sex variable) have a higher predicted index of well-being than females. Since the third parameter estimate is negative, those younger than age 65 (the first level of age) have a lower predicted index of well-being than those 65 and older.
Suppose you have collected marketing research data to examine the relationship between a prospect s likelihood of buying your product and their education and income. Specifically, the variables are as follows .
Variable | Levels | Interpretation |
---|---|---|
Education | high, low | prospect s education level |
Income | high, low | prospect s income level |
Purchase | yes, no | Did prospect purchase product? |
The following statements first create a data set, loan , that contains the marketing research data, then they use the CATMOD procedure to fit a model, obtain the parameter estimates, and obtain the predicted probabilities of interest. These statements produce Output 22.11.1 through Output 22.11.5.
data loan; input Education $ Income $ Purchase $ wt; datalines; high high yes 54 high high no 23 high low yes 41 high low no 12 low high yes 35 low high no 42 low low yes 19 low low no 8 ; ods output PredictedValues=Predicted (keep=Education Income PredFunction); proc catmod data=loan order=data; weight wt; response marginals; model Purchase=Education Income / pred design; run; proc sort data=Predicted; by descending PredFunction; run; proc print data=Predicted; run;
The CATMOD Procedure Data Summary Response Purchase Response Levels 2 Weight Variable wt Populations 4 Data Set LOAN Total Frequency 234 Frequency Missing 0 Observations 8
Population Profiles Sample Education Income Sample Size -------------------------------------------- 1 high high 77 2 high low 53 3 low high 77 4 low low 27 Response Profiles Response Purchase -------------------- 1 yes 2 no Response Functions and Design Matrix Response Design Matrix Sample Function 1 2 3 ----------------------------------------------- 1 0.70130 1 1 1 2 0.77358 1 1 1 3 0.45455 1 1 1 4 0.70370 1 1 1
Analysis of Variance Source DF Chi-Square Pr > ChiSq ------------------------------------------- Intercept 1 418.36 <.0001 Education 1 8.85 0.0029 Income 1 4.70 0.0302 Residual 1 1.84 0.1745 Analysis of Weighted Least Squares Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq --------------------------------------------------------------- Intercept 0.6481 0.0317 418.36 <.0001 Education high 0.0924 0.0311 8.85 0.0029 Income high 0.0675 0.0312 4.70 0.0302
Predicted Values for Response Functions ------Observed------ ------Predicted----- Function Standard Standard Education Income Number Function Error Function Error Residual ------------------------------------------------------------------------------------------- high high 1 0.701299 0.052158 0.67294 0.047794 0.028359 high low 1 0.773585 0.057487 0.808034 0.051586 -0.03445 low high 1 0.454545 0.056744 0.48811 0.051077 -0.03356 low low 1 0.703704 0.087877 0.623204 0.064867 0.080499
Pred Obs Education Income Function 1 high low 0.808034 2 high high 0.67294 3 low low 0.623204 4 low high 0.48811
Notice that the preceding statements use the Output Delivery System (ODS) to output the parameter estimates instead of the OUT= option, though either can be used.
You can use the predicted values (values of PredFunction in Output 22.11.5) as scores representing the likelihood that a randomly chosen subject from one of these populations will purchase the product. Notice that the Response Profiles in Output 22.11.2 show you that the first sorted level of Purchase is yes, indicating that the predicted probabilities are for Pr( Purchase = yes). For example, someone with high education and low income has an estimated probability of purchase of 0.808. As with any response function estimate given by PROC CATMOD, this estimate can be obtained by cross-multiplying the row from the design matrix corresponding to the sample (sample number 2 in this case) with the vector of parameter estimates ((1 * 0 . 6481) + (1 * 0 . 0924) + ( ˆ’ 1 * ( ˆ’ . 0675))).
This ranking of scores can help in decision making (for example, with respect to allocation of advertising dollars, choice of advertising media, choice of print media, and so on).