Examples | SAS/STAT 9.1 Users Guide Volume 2 only

Example 22.1. Linear Response Function, r=2 Responses

In an example from Ries and Smith (1963), the choice of detergent brand ( Brand = M or X) is related to three other categorical variables : the softness of the laundry water ( Softness = soft, medium, or hard), the temperature of the water ( Temperature = high or low), and whether the subject was a previous user of Brand M ( Previous = yes or no). The linear response function, which could also be specified as RESPONSE MARGINALS, yields one probability, Pr(brand preference=M), as the response function to be analyzed . Two models are fit in this example: the first model is a saturated one, containing all of the main effects and interactions, while the second is a reduced model containing only the main effects. The following statements produce Output 22.1.1 through Output 22.1.4:

  data detergent;   input Softness $ Brand $ Previous $ Temperature $ Count @@;   datalines;   soft X yes high 19   soft X yes low 57   soft X no  high 29   soft X no  low 63   soft M yes high 29   soft M yes low 49   soft M no  high 27   soft M no  low 53   med  X yes high 23   med  X yes low 47   med  X no  high 33   med  X no  low 66   med  M yes high 47   med  M yes low 55   med  M no  high 23   med  M no  low 50   hard X yes high 24   hard X yes low 37   hard X no  high 42   hard X no  low 68   hard M yes high 43   hard M yes low 52   hard M no  high 30   hard M no  low 42   ;   title 'Detergent Preference Study';   proc catmod data=detergent;   response 1 0;   weight Count;   model Brand=SoftnessPreviousTemperature / freq prob;   title2 'Saturated Model';   run;

Output 22.1.1: Detergent Preference Study ”Linear Model Analysis

  Detergent Preference Study   Saturated Model   The CATMOD Procedure   Data Summary   Response           Brand         Response Levels     2   Weight Variable    Count         Populations        12   Data Set           DETERGENT     Total Frequency  1008   Frequency Missing  0             Observations       24

Output 22.1.2: Population Profiles

  Detergent Preference Study   Saturated Model   Population Profiles   Sample    Softness    Previous    Temperature    Sample Size   ------------------------------------------------------------   1     hard        no          high                    72   2     hard        no          low                    110   3     hard        yes         high                    67   4     hard        yes         low                     89   5     med         no          high                    56   6     med         no          low                    116   7     med         yes         high                    70   8     med         yes         low                    102   9     soft        no          high                    56   10     soft        no          low                    116   11     soft        yes         high                    48   12     soft        yes         low                    106

Output 22.1.3: Response Profiles, Frequencies, and Probabilities

  Detergent Preference Study   Saturated Model   Response Profiles   Response    Brand   -----------------   1       M   2       X   Response Frequencies   Response Number   Sample        1        2   ------------------------   1        30       42   2        42       68   3        43       24   4        52       37   5        23       33   6        50       66   7        47       23   8        55       47   9        27       29   10        53       63   11        29       19   12        49       57   Response Probabilities   Response Number   Sample          1          2   ----------------------------   1     0.41667    0.58333   2     0.38182    0.61818   3     0.64179    0.35821   4     0.58427    0.41573   5     0.41071    0.58929   6     0.43103    0.56897   7     0.67143    0.32857   8     0.53922    0.46078   9     0.48214    0.51786   10     0.45690    0.54310   11     0.60417    0.39583   12     0.46226    0.53774

Output 22.1.4: Analysis of Variance and WLS Estimates

  Detergent Preference Study   Saturated Model   Analysis of Variance   Source                       DF   Chi-Square    Pr > ChiSq   ----------------------------------------------------------   Intercept                     1       983.13        <.0001   Softness                      2         0.09        0.9575   Previous                      1        22.68        <.0001   Softness*Previous             2         3.85        0.1457   Temperature                   1         3.67        0.0555   Softness*Temperature          2         0.23        0.8914   Previous*Temperature          1         2.26        0.1324   Softnes*Previou*Temperat      2         0.76        0.6850   Residual                      0          .           .   Analysis of Weighted Least Squares Estimates   Standard      Chi   Parameter                              Estimate      Error    Square Pr > ChiSq   --------------------------------------------------------------------------------   Intercept                                0.5069     0.0162    983.13      <.0001   Softness                 hard   0.00073     0.0225      0.00      0.9740   med            0.00623     0.0226      0.08      0.7830   Previous                 no   0.0770     0.0162     22.68      <.0001   Softness*Previous        hard no        -0.0299     0.0225      1.77      0.1831   med no         -0.0152     0.0226      0.45      0.5007   Temperature              high            0.0310     0.0162      3.67      0.0555   Softness*Temperature     hard high   0.00786     0.0225      0.12      0.7265   med high   0.00298     0.0226      0.02      0.8953   Previous*Temperature     no high   0.0243     0.0162      2.26      0.1324   Softnes*Previou*Temperat hard no high    0.0187     0.0225      0.69      0.4064   med no high   0.0138     0.0226      0.37      0.5415

The Data Summary table (Output 22.1.1) indicates that you have two response levels and twelve populations.

The Population Profiles table in Output 22.1.2 displays the ordering of independent variable levels as used in the table of parameter estimates.

Since Brand Misthefirst level in the Response Profiles table (Output 22.1.3), the RESPONSE statement causes Pr( Brand =M) to be the single response function modeled .

The Analysis of Variance table in Output 22.1.4 shows that all of the interactions are nonsignificant. Therefore, a main-effects model is fit with the following statements:

  model Brand=Softness Previous Temperature   / clparm noprofile design;   title2 'Main-Effects Model';   run;   quit;

The PROC CATMOD statement is not required due to the interactive capability of the CATMOD procedure. The NOPROFILE option suppresses the redisplay of the Response Profiles table. The CLPARM option produces 95% confidence limits for the parameter estimates. Output 22.1.5 through Output 22.1.7 are produced.

Output 22.1.5: Main-Effects Design Matrix

  Detergent Preference Study   Main-Effects Model   The CATMOD Procedure   Data Summary   Response           Brand         Response Levels     2   Weight Variable    Count         Populations        12   Data Set           DETERGENT     Total Frequency  1008   Frequency Missing  0             Observations       24   Response Functions and Design Matrix   Response                  Design Matrix   Sample      Function        1        2        3        4        5   -----------------------------------------------------------------   1        0.41667        1        1        0        1        1   2        0.38182        1        1        0        1   1   3        0.64179        1        1        0   1        1   4        0.58427        1        1        0   1   1   5        0.41071        1        0        1        1        1   6        0.43103        1        0        1        1   1   7        0.67143        1        0        1   1       1   8        0.53922        1        0        1   1   1   9        0.48214        1   1   1        1       1   10        0.45690        1   1   1        1   1   11        0.60417        1   1   1   1       1   12        0.46226        1   1   1   1   1

Output 22.1.6: ANOVA Table for the Main-Effects Model

  Detergent Preference Study   Main-Effects Model   Analysis of Variance   Source          DF   Chi-Square    Pr > ChiSq   ---------------------------------------------   Intercept        1      1004.93        <.0001   Softness         2         0.24        0.8859   Previous         1        20.96        <.0001   Temperature      1         3.95        0.0468   Residual         7         8.26        0.3100

Output 22.1.7: WLS Estimates for the Main-Effects Model

  Detergent Preference Study   Main-Effects Model   Analysis of Weighted Least Squares Estimates   Standard      Chi-                95% Confidence   Parameter         Estimate      Error    Square  Pr > ChiSq        Limits   -------------------------------------------------------------------------------   Intercept           0.5080     0.0160   1004.93      <.0001    0.4766    0.5394   Softness    hard   0.00256     0.0218      0.01      0.9066   0.0454    0.0402   med     0.0104     0.0218      0.23      0.6342    -0.0323    0.0530   Previous    no   0.0711     0.0155     20.96      <.0001   0.1015   0.0407   Temperature high    0.0319     0.0161      3.95      0.0468  0.000446    0.0634

The design matrix in Output 22.1.5 displays the results of the factor effects modeling used in PROC CATMOD.

The analysis of variance table in Output 22.1.6 shows that previous use of Brand M, together with the temperature of the laundry water, are significant factors in preferring Brand M laundry detergent. The table also shows that the additive model fits since the goodness-of-fit statistic (the Residual Chi-Square) is nonsignificant.

The chi-square test in Output 22.1.7 shows that the Softness parameters are not significantly different from zero; as expected, the Wald confidence limits for these two estimates contain zero. So softness of the water is not a factor in choosing Brand M.

The negative coefficient for Previous ( ˆ’ . 0711) indicates that the first level of Previous (which, from the table of population profiles, is ˜no) is associated with a smaller probability of preferring Brand M than the second level of Previous (with coefficient constrained to be 0.0711 since the parameter estimates for a given effect must sum to zero). In other words, previous users of Brand M are much more likely to prefer it than those who have never used it before.

Similarly, the positive coefficient for Temperature indicates that the first level of Temperature (which, from the Population Profiles table, is ˜high) has a larger probability of preferring Brand M than the second level of Temperature . In other words, those who do their laundry in hot water are more likely to prefer Brand M than those who do their laundry in cold water.

Example 22.2. Mean Score Response Function, r=3 Responses

Four surgical operations for duodenal ulcers are compared in a clinical trial at four hospitals . The operations performed are: Treatment =a, drainage and vagotomy; Treatment =b, 25%resection and vagotomy; Treatment =c, 50%resection and vagotomy; and Treatment =d, 75%resection. The response is severity of an undesirable complication called dumping syndrome. The data are from Grizzle, Starmer, and Koch (1969, pp. 489 “504).

  data operate;   input Hospital Treatment $ Severity $ wt @@;   datalines;   1 a none 23    1 a slight  7    1 a moderate 2   1 b none 23    1 b slight 10    1 b moderate 5   1 c none 20    1 c slight 13    1 c moderate 5   1 d none 24    1 d slight 10    1 d moderate 6   2 a none 18    2 a slight  6    2 a moderate 1   2 b none 18    2 b slight  6    2 b moderate 2   2 c none 13    2 c slight 13    2 c moderate 2   2 d none  9    2 d slight 15    2 d moderate 2   3 a none  8    3 a slight  6    3 a moderate 3   3 b none 12    3 b slight  4    3 b moderate 4   3 c none 11    3 c slight  6    3 c moderate 2   3 d none  7    3 d slight  7    3 d moderate 4   4 a none 12    4 a slight  9    4 a moderate 1   4 b none 15    4 b slight  3    4 b moderate 2   4 c none 14    4 c slight  8    4 c moderate 3   4 d none 13    4 d slight  6    4 d moderate 4   ;

The response variable ( Severity ) is ordinally scaled with three levels, so assignment of scores is appropriate (0=none, 0.5=slight, 1=moderate). For these scores, the response function yields the mean score. The following statements produce Output 22.2.1 through Output 22.2.6.

  title 'Dumping Syndrome Data';   proc catmod data=operate order=data ;   weight wt;   response 0  0.5  1;   model Severity=Treatment Hospital / freq oneway design;   title2 'Main-Effects Model';   quit;

Output 22.2.1: Surgical Data ”Analysis of Mean Scores

  Dumping Syndrome Data   Main-Effects Model   The CATMOD Procedure   Data Summary   Response           Severity     Response Levels    3   Weight Variable    wt           Populations       16   Data Set           OPERATE      Total Frequency  417   Frequency Missing  0            Observations      48   One-Way Frequencies   Variable     Value      Frequency   ----------------------------------   Severity     none             240   slight           129   moderate          48   Treatment    a                 96   b                104   c                110   d                107   Hospital            1         148   2         105   3          74   4          90

Output 22.2.2: Population Sizes

  Dumping Syndrome Data   Main-Effects Model   Population Profiles   Sample    Treatment    Hospital    Sample Size   -----------------------------------------------   1     a            1                    32   2     a            2                    25   3     a            3                    17   4     a            4                    22   5     b            1                    38   6     b            2                    26   7     b            3                    20   8     b            4                    20   9     c            1                    38   10     c            2                    28   11     c            3                    19   12     c            4                    25   13     d            1                    40   14     d            2                    26   15     d            3                    18   16     d            4                    23

Output 22.2.3: Response Frequencies

  Dumping Syndrome Data   Main-Effects Model   Response Profiles   Response    Severity   ---------------------   1       none   2       slight   3       moderate   Response Frequencies   Response Number   Sample        1        2        3   ----------------------------------   1        23        7        2   2        18        6        1   3         8        6        3   4        12        9        1   5        23       10        5   6        18        6        2   7        12        4        4   8        15        3        2   9        20       13        5   10        13       13        2   11        11        6        2   12        14        8        3   13        24       10        6   14         9       15        2   15         7        7        4   16        13        6        4

Output 22.2.4: Design Matrix

  Dumping Syndrome Data   Main-Effects Model   Response Functions and Design Matrix   Response                       Design Matrix   Sample     Function       1       2       3       4       5       6       7   ----------------------------------------------------------------------------   1       0.17188       1       1       0       0       1       0       0   2       0.16000       1       1       0       0       0       1       0   3       0.35294       1       1       0       0       0       0       1   4       0.25000       1       1       0       0      -1      -1      -1   5       0.26316       1       0       1       0       1       0       0   6       0.19231       1       0       1       0       0       1       0   7       0.30000       1       0       1       0       0       0       1   8       0.17500       1       0       1       0      -1      -1      -1   9       0.30263       1       0       0       1       1       0       0   10       0.30357       1       0       0       1       0       1       0   11       0.26316       1       0       0       1       0       0       1   12       0.28000       1       0       0       1      -1      -1      -1   13       0.27500       1      -1      -1      -1       1       0       0   14       0.36538       1      -1      -1      -1       0       1       0   15       0.41667       1      -1      -1      -1       0       0       1   16       0.30435       1      -1      -1      -1      -1      -1      -1

Output 22.2.5: ANOVA Table

  Dumping Syndrome Data   Main-Effects Model   Analysis of Variance   Source        DF   Chi-Square    Pr > ChiSq   --------------------------------------------   Intercept      1       248.77        <.0001   Treatment      3         8.90        0.0307   Hospital       3         2.33        0.5065   Residual       9         6.33        0.7069

Output 22.2.6: Parameter Estimates

  Dumping Syndrome Data   Main-Effects Model   Analysis of Weighted Least Squares Estimates   Standard        Chi   Parameter      Estimate      Error      Square    Pr > ChiSq   ------------------------------------------------------------   Intercept        0.2724     0.0173      248.77        <.0001   Treatment a     -0.0552     0.0270        4.17        0.0411   b     -0.0365     0.0289        1.59        0.2073   c      0.0248     0.0280        0.78        0.3757   Hospital 1      -0.0204     0.0264        0.60        0.4388   2     -0.0178     0.0268        0.44        0.5055   3      0.0531     0.0352        2.28        0.1312

The ORDER= option is specified so that the levels of the response variable remain in the correct order. A main effects model is fit. The FREQ option displays the frequency of each response within each sample (Output 22.2.3), and the ONEWAY option produces a table of the number of subjects within each variable level (Output 22.2.1).

You can use the oneway frequencies (Output 22.2.1) and the response profiles (Output 22.2.3) to verify that the response levels are in the desired order (none, slight, moderate) so that the response scores (0, 0.5, 1.0) are applied appropriately. If the ORDER=DATA option had not been used, the levels would have been in a different order.

The analysis of variance table (Output 22.2.5) shows that the additive model fits (since the Residual Chi-Square is not significant), that the Treatment effect is significant, and that the Hospital effect is not significant.

The coefficients of Treatment in Output 22.2.6 show that the first two treatments (with negative coefficients) have lower mean scores than the last two treatments (the fourth coefficient, not shown, must be positive since the four coefficients must sum to zero). In other words, the less severe treatments (the first two) cause significantly less severe dumping syndrome complications.

Example 22.3. Logistic Regression, Standard Response Function

In this data set, from Cox and Snell (1989), ingots are prepared with different heating and soaking times and tested for their readiness to be rolled. The response variable Y has value 1 for ingots that are not ready and value 0 otherwise . The explanatory variables are Heat and Soak .

  data ingots;   input Heat Soak nready ntotal @@;   Count=nready;   Y=1;   output;   Count=ntotal-nready;   Y=0;   output;   drop nready ntotal;   datalines;   7 1.0 0 10   14 1.0 0 31   27 1.0 1 56   51 1.0 3 13   7 1.7 0 17   14 1.7 0 43   27 1.7 4 44   51 1.7 0  1   7 2.2 0  7   14 2.2 2 33   27 2.2 0 21   51 2.2 0  1   7 2.8 0 12   14 2.8 0 31   27 2.8 1 22   51 4.0 0  1   7 4.0 0  9   14 4.0 0 19   27 4.0 1 16   ;

Logistic regression analysis is often used to investigate the relationship between discrete response variables and continuous explanatory variables. For logistic regression, the continuous design-effects are declared in a DIRECT statement. The following statements produce Output 22.3.1 through Output 22.3.8.

  title 'Maximum Likelihood Logistic Regression';   proc catmod data=ingots;   weight Count;   direct Heat Soak;   model Y=Heat Soak / freq covb corrb itprint design;   quit;

Output 22.3.1: Maximum Likelihood Logistic Regression

  Maximum Likelihood Logistic Regression   The CATMOD Procedure   Data Summary   Response           Y          Response Levels    2   Weight Variable    Count      Populations       19   Data Set           INGOTS     Total Frequency  387   Frequency Missing  0          Observations      25   Population Profiles   Sample    Heat    Soak    Sample Size   -------------------------------------   1      7        1              10   2      7      1.7              17   3      7      2.2               7   4      7      2.8              12   5      7        4               9   6     14        1              31   7     14      1.7              43   8     14      2.2              33   9     14      2.8              31   10     14        4              19   11     27        1              56   12     27      1.7              44   13     27      2.2              21   14     27      2.8              22   15     27        4              16   16     51        1              13   17     51      1.7               1   18     51      2.2               1   19     51        4               1

Output 22.3.2: Response Summaries

  Maximum Likelihood Logistic Regression   Response Profiles   Response    Y   -------------   1       0   2       1   Response Frequencies   Response Number   Sample        1        2   ------------------------   1        10        0   2        17        0   3         7        0   4        12        0   5         9        0   6        31        0   7        43        0   8        31        2   9        31        0   10        19        0   11        55        1   12        40        4   13        21        0   14        21        1   15        15        1   16        10        3   17         1        0   18         1        0   19         1        0

Output 22.3.3: Design Matrix

  Maximum Likelihood Logistic Regression   Response Functions and Design Matrix   Response         Design Matrix   Sample      Function        1        2        3   -----------------------------------------------   1        2.99573        1        7        1   2        3.52636        1        7      1.7   3        2.63906        1        7      2.2   4        3.17805        1        7      2.8   5        2.89037        1        7        4   6        4.12713        1       14        1   7        4.45435        1       14      1.7   8        2.74084        1       14      2.2   9        4.12713        1       14      2.8   10        3.63759        1       14        4   11        4.00733        1       27        1   12        2.30259        1       27      1.7   13        3.73767        1       27      2.2   14        3.04452        1       27      2.8   15        2.70805        1       27        4   16        1.20397        1       51        1   17        0.69315        1       51      1.7   18        0.69315        1       51      2.2   19        0.69315        1       51        4

Output 22.3.4: Iteration History

  Maximum Likelihood Logistic Regression   Maximum Likelihood Analysis   Sub         -2 Log  Convergence        Parameter Estimates   Iteration Iteration   Likelihood    Criterion          1          2          3   ------------------------------------------------------------------------------   0         0       536.49592       1.0000          0          0          0   1         0       152.58961       0.7156     2.1594   0.0139   0.003733   2         0       106.76066       0.3003     3.5334   0.0363   0.0120   3         0       96.692171       0.0943     4.7489   0.0640   0.0299   4         0       95.383825       0.0135     5.4138   0.0790   0.0498   5         0       95.345659     0.000400     5.5539   0.0819   0.0564   6         0       95.345613    4.8289E-7     5.5592   0.0820   0.0568   7         0       95.345613    7.731E-13     5.5592   0.0820   0.0568   Maximum likelihood computations converged.

Output 22.3.5: Analysis of Variance Table

  Maximum Likelihood Logistic Regression   Maximum Likelihood Analysis of Variance   Source               DF   Chi-Square    Pr > ChiSq   --------------------------------------------------   Intercept             1        24.65        <.0001   Heat                  1        11.95        0.0005   Soak                  1         0.03        0.8639   Likelihood Ratio     16        13.75        0.6171

Output 22.3.6: Maximum Likelihood Estimates

  Maximum Likelihood Logistic Regression   Analysis of Maximum Likelihood Estimates   Standard        Chi   Parameter    Estimate      Error      Square    Pr > ChiSq   ----------------------------------------------------------   Intercept      5.5592     1.1197       24.65        <.0001   Heat   0.0820     0.0237       11.95        0.0005   Soak   0.0568     0.3312        0.03        0.8639

Output 22.3.7: Covariance Matrix

  Maximum Likelihood Logistic Regression   Covariance Matrix of the Maximum Likelihood Estimates   Row      Parameter            Col1            Col2            Col3   ------------------------------------------------------------------   1      Intercept       1.2537133   0.0215664   0.2817648   2      Heat   0.0215664       0.0005633       0.0026243   3      Soak   0.2817648       0.0026243       0.1097020

Output 22.3.8: Correlation Matrix

  Maximum Likelihood Logistic Regression   Correlation Matrix of the Maximum Likelihood Estimates   Row      Parameter            Col1            Col2            Col3   ------------------------------------------------------------------   1      Intercept         1.00000   0.81152   0.75977   2      Heat   0.81152         1.00000         0.33383   3      Soak   0.75977         0.33383         1.00000

You can verify that the populations are defined as you intended by looking at the Population Profiles table in Output 22.3.1.

Since the Response Profiles table shows the response level ordering as 0, 1, the default response function, the logit, is defined as log .

The values of the continuous variable are inserted into the design matrix.

Seven Newton-Raphson iterations are required to find the maximum likelihood estimates.

The analysis of variance table (Output 22.3.5) shows that the model fits since the likelihood ratio goodness-of-fit test is nonsignificant. It also shows that the length of heating time is a significant factor with respect to readiness but that length of soaking time is not.

From the table of maximum likelihood estimates (Output 22.3.6), the fitted model is

For example, for Sample 1 with Heat =7and Soak =1, the estimate is

Predicted values of the logits, as well as the probabilities of readiness, could be obtained by specifying PRED=PROB in the MODEL statement. For the example of Sample 1 with Heat =7and Soak =1, PRED=PROB would give an estimate of the probability of readiness equal to 0.9928 since

implies that

As another consideration, since soaking time is nonsignificant, you could fit another model that deleted the variable Soak .

Example 22.4. Log-Linear Model, Three Dependent Variables

This analysis reproduces the predicted cell frequencies for Bartlett s data using a log-linear model of no three-variable interaction (Bishop, Fienberg, and Holland 1975, p. 89). Cuttings of two different lengths ( Length =short or long) are planted at one of two time points ( Time =now or spring), and their survival status ( Status =dead or alive ) is recorded.

As in the text, the variable levels are simply labeled 1 and 2. The following statements produce Output 22.4.1 through Output 22.4.5:

  data bartlett;   input Length Time Status wt @@;   datalines;   1 1 1 156     1 1 2  84     1 2 1 84     1 2 2 156   2 1 1 107     2 1 2 133     2 2 1 31     2 2 2 209   ;   title 'Bartlett''s Data';   proc catmod data=bartlett;   weight wt;   model Length*Time*Status=_response_   / noparm pred=freq;   loglin LengthTimeStatus @ 2;   title2 'Model with No 3-Variable Interaction';   quit;

Output 22.4.1: Analysis of Bartlett s Data ”Log-Linear Model

  Bartlett's Data   Model with No 3-Variable Interaction   The CATMOD Procedure   Data Summary   Response           Length*Time*Status     Response Levels    8   Weight Variable    wt                     Populations        1   Data Set           BARTLETT               Total Frequency  960   Frequency Missing  0                      Observations       8   Population Profiles   Sample    Sample Size   ---------------------   1             960

Output 22.4.2: Response Profiles

  Bartlett's Data   Model with No 3-Variable Interaction   Response Profiles   Response    Length    Time    Status   ------------------------------------   1       1         1       1   2       1         1       2   3       1         2       1   4       1         2       2   5       2         1       1   6       2         1       2   7       2         2       1   8       2         2       2

Output 22.4.3: Analysis of Variance Table

  Bartlett's Data   Model with No 3-Variable Interaction   Maximum Likelihood Analysis of Variance   Source               DF   Chi-Square    Pr > ChiSq   -------------------------------------------------   Length                1         2.64        0.1041   Time                  1         5.25        0.0220   Length*Time           1         5.25        0.0220   Status                1        48.94        <.0001   Length*Status         1        48.94        <.0001   Time*Status           1        95.01        <.0001   Likelihood Ratio      1         2.29        0.1299

Output 22.4.4: Response Function Predicted Values

  Bartlett's Data   Model with No 3-Variable Interaction   The CATMOD Procedure   Maximum Likelihood Predicted Values for Response Functions   ------Observed------    ------Predicted----   Function                Standard                Standard   Number     Function       Error    Function       Error    Residual   --------------------------------------------------------------------   1   0.29248    0.105806   0.23565    0.098486   0.05683   2   0.91152    0.129188   0.94942    0.129948    0.037901   3   0.91152    0.129188   0.94942    0.129948    0.037901   4   0.29248    0.105806   0.23565    0.098486   0.05683   5   0.66951    0.118872   0.69362    0.120172    0.024113   6   0.45199    0.110921   0.3897    0.102267   0.06229   7   1.90835    0.192465   1.73146    0.142969   0.17688

Output 22.4.5: Predicted Frequencies

  Bartlett's Data   Model with No 3-Variable Interaction   Maximum Likelihood Predicted Values for Frequencies   -------Observed------    ------Predicted-----   Standard                 Standard   Length    Time    Status    Frequency       Error    Frequency       Error    Residual   --------------------------------------------------------------------------------------   1         1       1               156    11.43022     161.0961    11.07379   5.09614   1         1       2                84    8.754999     78.90386    7.808613    5.096139   1         2       1                84    8.754999     78.90386    7.808613    5.096139   1         2       2               156    11.43022     161.0961    11.07379   5.09614   2         1       1               107    9.750588     101.9039    8.924304    5.096139   2         1       2               133    10.70392     138.0961    10.33434   5.09614   2         2       1                31     5.47713     36.09614    4.826315   5.09614   2         2       2               209    12.78667     203.9039    12.21285     5.09614

The analysis of variance table shows that the model fits since the likelihood ratio test for the three-variable interaction is nonsignificant. All of the two-variable interactions, however, are significant; this shows that there is mutual dependence among all three variables.

The predicted values table (Output 22.4.4) displays observed and predicted values for the generalized logits. The predicted frequencies table (Output 22.4.5) displays observed and predicted cell frequencies, their standard errors, and residuals.

Example 22.5. Log-Linear Model, Structural and Sampling Zeros

This example illustrates a log-linear model of independence, using data that contain structural zero frequencies as well as sampling (random) zero frequencies.

In a population of six squirrel monkeys , the joint distribution of genital display with respect to active or passive role was observed. The data are from Fienberg (1980, Table 8-2). Since a monkey cannot have both the active and passive roles in the same interaction, the diagonal cells of the table are structural zeros. See Agresti (2002) for more information on the quasi-independence model.

The DATA step replaces the structural zeros with missing values, and the MISSING=STRUCTURAL option is specified in the MODEL statement to remove these zeros from the analysis. The ZERO=SAMPLING option treats the off-diagonal zeros as sampling zeros. Also, the row for Monkey ˜t is deleted since it contains all zeros; therefore, the cell frequencies predicted by a model of independence are also zero. In addition, the CONTRAST statement compares the behavior of the two monkeys labeled ˜u and ˜v . See the Structural and Sampling Zeros with Raw Data section on page 924 for information on how to perform this analysis when you have raw data. The following statements produce Output 22.5.1 through Output 22.5.8:

  data Display;   input Active $ Passive $ wt @@;   if Active ne 't';   if Active eq Passive then wt=.;   datalines;   r r  0   r s 1    r t  5   r u  8   r v  9   r w  0   s r 29   s s 0    s t 14   s u 46   s v  4   s w  0   t r  0   t s 0    t t  0   t u  0   t v  0   t w  0   u r  2   u s 3    u t  1   u u  0   u v 38   u w  2   v r  0   v s 0    v t  0   v u  0   v v  0   v w  1   ;   title 'Behavior of Squirrel Monkeys';   proc catmod data=Display;   weight wt;   model Active*Passive=_response_ /   missing=structural zero=sampling   freq pred=freq noparm oneway;   loglin Active Passive;   contrast 'Passive, U vs. V' Passive 0 0 0 1 -1;   contrast 'Active,  U vs. V' Active  0 0 1 -1;   title2 'Test Quasi-Independence for the Incomplete Table';   quit;

Output 22.5.1: Log-Linear Model Analysis with Zero Frequencies

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   The CATMOD Procedure   Data Summary   Response           Active*Passive     Response Levels   25   Weight Variable    wt                 Populations        1   Data Set           DISPLAY            Total Frequency  220   Frequency Missing  0                  Observations      25

Output 22.5.2: Output from the ONEWAY option

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   One-Way Frequencies   Variable    Value   Frequency   -----------------------------   Active      r              23   s              93   u              46   v               1   w              57   Passive     r              40   s              29   t              24   u              60   v              64   w               3

Output 22.5.3: Profiles

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   Population Profiles   Sample    Sample Size   ---------------------   1             220   Response Profiles   Response    Active    Passive   -----------------------------   1       r         s   2       r         t   3       r         u   4       r         v   5       r         w   6       s         r   7       s         t   8       s         u   9       s         v   10       s         w   11       u         r   12       u         s   13       u         t   14       u         v   15       u         w   16       v         r   17       v         s   18       v         t   19       v         u   20       v         w   21       w         r   22       w         s   23       w         t   24       w         u   25       w         v

Output 22.5.4: Frequency of Response by Response Number

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   Response Frequencies   Response Number   Sample        1        2        3        4        5        6        7        8   ------------------------------------------------------------------------------   1         1        5        8        9        0       29       14       46   Response Frequencies   Response Number   Sample        9       10       11       12       13       14       15       16   ------------------------------------------------------------------------------   1         4        0        2        3        1       38        2        0   Response Frequencies   Response Number   Sample       17       18       19       20       21       22       23       24   ------------------------------------------------------------------------------   1         0        0        0        1        9       25        4        6   Response Frequencies   Response   Number   Sample       25   ---------------   1        13

Output 22.5.5: Analysis of Variance Table

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   Maximum Likelihood Analysis of Variance   Source               DF   Chi-Square    Pr > ChiSq   --------------------------------------------------   Active                4        56.58        <.0001   Passive               5        47.94        <.0001   Likelihood Ratio     15       135.17        <.0001

Output 22.5.6: Contrasts between Monkeys ˜u and ˜v

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   Contrasts of Maximum Likelihood Estimates   Contrast           DF    Chi-Square    Pr > ChiSq   -------------------------------------------------   Passive, U vs. V    1           1.31       0.2524   Active,  U vs. V    1          14.87       0.0001

Output 22.5.7: Response Function Predicted Values

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   The CATMOD Procedure   Maximum Likelihood Predicted Values for Response Functions   ------Observed------    ------Predicted----   Function                Standard                Standard   Number     Function       Error    Function       Error    Residual   --------------------------------------------------------------------   1   2.56495    1.037749   0.97355    0.339019   1.5914   2   0.95551    0.526235   1.72504    0.345438    0.769529   3   0.48551    0.449359   0.52751    0.309254    0.042007   4   0.36772    0.433629   0.73927    0.249006    0.371543   5              .           .   3.56052    0.634104           .   6       0.802346    0.333775    0.320589     0.26629    0.481758   7       0.074108    0.385164   0.29934    0.295634     0.37345   8       1.263692    0.314105    0.898184    0.250857    0.365508   9   1.17865    0.571772    0.686431    0.173396   1.86509   10              .           .   2.13482    0.608071           .   11   1.8718    0.759555   0.2415    0.287218   1.63031   12   1.46634    0.640513   0.10994    0.303568   1.3564   13   2.56495    1.037749   0.86143    0.314794   1.70352   14       1.072637    0.321308    0.124346    0.204345     0.94829   15   1.8718    0.759555   2.6969    0.617433      0.8251   16              .           .   4.14787    1.024508           .   17              .           .   4.01632    1.030062           .   18              .           .   4.76781    1.032457           .   19              .           .   3.57028    1.020794           .   20   2.56495    1.037749   6.60328    1.161289    4.038332   21   0.36772    0.433629   0.36584    0.202959   0.00188   22       0.653926     0.34194   0.23429    0.232794    0.888212   23   1.17865    0.571772   0.98577    0.239408   0.19288   24   0.77319    0.493548    0.211754    0.185007   0.98494

Output 22.5.8: Predicted Frequencies

  Behavior of Squirrel Monkeys   Test Quasi-Independence for the Incomplete Table   Maximum Likelihood Predicted Values for Frequencies   -------Observed------    ------Predicted-----   Standard                 Standard   Active    Passive    Frequency       Error    Frequency       Error    Residual   ------------------------------------------------------------------------------   r         s                  1    0.997725     5.259508     1.36156   4.25951   r         t                  5    2.210512     2.480726    0.691066    2.519274   r         u                  8    2.776525     8.215948    1.855146   0.21595   r         v                  9    2.937996     6.648049     1.50932    2.351951   r         w                  0           0     0.395769    0.240268   0.39577   s         r                 29    5.017696     19.18599    3.147915    9.814007   s         t                 14    3.620648     10.32172    2.169599    3.678284   s         u                 46    6.031734     34.18463    4.428706    11.81537   s         v                  4    1.981735     27.66096    3.722788   23.661   s         w                  0           0       1.6467    0.952712   1.6467   u         r                  2    1.407771      10.9364     2.12322   8.9364   u         s                  3    1.720201     12.47407    2.554336   9.47407   u         t                  1    0.997725     5.883583    1.380655   4.88358   u         v                 38    5.606814      15.7673    2.684692     22.2327   u         w                  2    1.407771     0.938652    0.551645    1.061348   v         r                  0           0     0.219966    0.221779   0.21997   v         s                  0           0     0.250893    0.253706   0.25089   v         t                  0           0     0.118338    0.120314   0.11834   v         u                  0           0     0.391924    0.393255   0.39192   v         w                  1    0.997725     0.018879    0.021728    0.981121   w         r                  9    2.937996     9.657645    1.808656   0.65765   w         s                 25    4.707344     11.01553    2.275019    13.98447   w         t                  4    1.981735     5.195638    1.184452   1.19564   w         u                  6    2.415857      17.2075    2.772098   11.2075   w         v                 13    3.497402     13.92369     2.24158   0.92369

The results of the ONEWAY option are shown in Output 22.5.2. Monkey ˜t does not show up as a value for the Active variable since that row was removed.

Sampling zeros are displayed as 0 in Output 22.5.4. The Response Number corresponds to the value displayed in the Response Profiles in Output 22.5.3.

The analysis of variance table (Output 22.5.5) shows that the model of independence does not fit since the likelihood ratio test for the interaction is significant. In other words, active and passive behaviors of the squirrel monkeys are dependent behavior roles.

If the model fit these data, then the contrasts in Output 22.5.6 show that monkeys ˜u and ˜v appear to have similar passive behavior patterns but very different active behavior patterns.

Output 22.5.7 displays the predicted response functions and Output 22.5.8 displays predicted cell frequencies (from the PRED=FREQ option), but since the model does not fit, these should be ignored. Note that, since the response function is the generalized logit with the twenty-fifth response as the baseline, the observed response functions for the sampling zeros are missing.

Structural and Sampling Zeros with Raw Data

The preceding PROC CATMOD step uses cell count data as input. Prior to invoking the CATMOD procedure, structural and sampling zeros are easily identified and manipulated in a single DATA step. For the situation where structural or sampling zeros (or both) may exist and the input data set is raw data, use the following steps:

Run PROC FREQ on the raw data. In the TABLES statement, list all dependent and independent variables separated by asterisks and use the SPARSE option and the OUT= option. This creates an output data set that contains all possible zero frequencies. Since the tabled output can be huge, you should also specify the NOPRINT option on the TABLES statement.
Use a DATA step to change the zero frequencies associated with either sampling zeros or structural zeros to missing.
Use the resulting data set as input to PROC CATMOD, specify the statement WEIGHT COUNT to use adjusted frequencies, and specify the ZERO= and MISSING= options to define your sampling and structural zeros.

For example, suppose the data set RawDisplay contains the raw data for the squirrel monkey data. The following statements show how to obtain the same analysis as shown previously:

  proc freq data=RawDisplay;   tables Active*Passive / sparse out=Combos noprint;   run;   data Combos2;   set Combos;   if Active ne 't';   if Active eq Passive then count=.;   run;   proc catmod data=Combos2;   weight count;   model Active*Passive=_response_ /   zero=sampling missing=structural   freq pred=freq noparm noresponse;   loglin Active Passive;   quit;

The first IF statement in the DATA step is needed only for this particular example; since observations for Monkey ˜t were deleted from the Display data set, they also need to be deleted from Combos2 .

Example 22.6. Repeated Measures, 2 Response Levels, 3 Populations

In this multi-population repeated measures example, from Guthrie (1981), subjects from three groups have their responses (0 or 1) recorded in each of four trials. The analysis of the marginal probabilities is directed at assessing the main effects of the repeated measurement factor ( Trial ) and the independent variable ( Group ), as well as their interaction. Although the contingency table is incomplete (only thirteen of the sixteen possible responses are observed), this poses no problem in the computation of the marginal probabilities. The following statements produce Output 22.6.1 through Output 22.6.5:

  data group;   input a b c d Group wt @@;   datalines;   1 1 1 1 2 2     0 0 0 0 2 2     0 0 1 0 1 2    0 0 1 0 2 2   0 0 0 1 1 4     0 0 0 1 2 1     0 0 0 1 3 3    1 0 0 1 2 1   0 0 1 1 1 1     0 0 1 1 2 2     0 0 1 1 3 5    0 1 0 0 1 4   0 1 0 0 2 1     0 1 0 1 2 1     0 1 0 1 3 2    0 1 1 0 3 1   1 0 0 0 1 3     1 0 0 0 2 1     0 1 1 1 2 1    0 1 1 1 3 2   1 0 1 0 1 1     1 0 1 1 2 1     1 0 1 1 3 2   ;   title 'Multi-Population Repeated Measures';   proc catmod data=group;   weight wt;   response marginals;   model a*b*c*d=Group _response_ Group*_response_   / freq;   repeated Trial 4;   title2 'Saturated Model';   run;

Output 22.6.1: Analysis of Multiple-Population Repeated Measures

  Multi-Population Repeated Measures   Saturated Model   The CATMOD Procedure   Data Summary   Response           a*b*c*d     Response Levels  13   Weight Variable    wt          Populations       3   Data Set           GROUP       Total Frequency  45   Frequency Missing  0           Observations     23   Population Profiles   Sample    Group    Sample Size   ------------------------------   1     1                 15   2     2                 15   3     3                 15

Output 22.6.2: Response Profiles

  Multi-Population Repeated Measures   Saturated Model   Response Profiles   Response    a    b    c    d   ---------------------------   1       0    0    0    0   2       0    0    0    1   3       0    0    1    0   4       0    0    1    1   5       0    1    0    0   6       0    1    0    1   7       0    1    1    0   8       0    1    1    1   9       1    0    0    0   10       1    0    0    1   11       1    0    1    0   12       1    0    1    1   13       1    1    1    1

Output 22.6.3: Response Frequencies

  Multi-Population Repeated Measures   Saturated Model   Response Frequencies   Response Number   Sample        1        2        3        4        5        6        7        8   -----------------------------------------------------------------------------   1         0        4        2        1        4        0        0        0   2         2        1        2        2        1        1        0        1   3         0        3        0        5        0        2        1        2   Response Frequencies   Response Number   Sample        9       10       11       12       13   ---------------------------------------------------   1         3        0        1        0        0   2         1        1        0        1        2   3         0        0        0        2        0

Output 22.6.4: Analysis of Variance Table

  Multi-Population Repeated Measures   Saturated Model   Analysis of Variance   Source               DF   Chi-Square    Pr > ChiSq   -------------------------------------------------   Intercept             1       354.88        <.0001   Group                 2        24.79        <.0001   Trial                 3        21.45        <.0001   Group*Trial           6        18.71        0.0047   Residual              0          .           .

Output 22.6.5: Parameter Estimates

  Multi-Population Repeated Measures   Saturated Model   Analysis of Weighted Least Squares Estimates   Standard        Chi   Effect              Parameter    Estimate      Error      Square    Pr > ChiSq   ------------------------------------------------------------------------------   Intercept                1         0.5833     0.0310      354.88        <.0001   Group                    2         0.1333     0.0335       15.88        <.0001   3   0.0333     0.0551        0.37        0.5450   Trial                    4         0.1722     0.0557        9.57        0.0020   5         0.1056     0.0647        2.66        0.1028   6   0.0722     0.0577        1.57        0.2107   Group*Trial              7   0.1556     0.0852        3.33        0.0679   8   0.0556     0.0800        0.48        0.4877   9   0.0889     0.0953        0.87        0.3511   10         0.0111     0.0866        0.02        0.8979   11         0.0889     0.0822        1.17        0.2793   12   0.0111     0.0824        0.02        0.8927

The analysis of variance table in Output 22.6.4 shows that there is a significant interaction between the independent variable Group and the repeated measurement factor Trial . Thus, an intermediate model (not shown) is fit in which the effects Trial and Group * Trial are replaced by Trial ( Group =1), Trial ( Group =2), and Trial ( Group =3). Of these three effects, only the last is significant, so it is retained in the final model. The following statements produce Output 22.6.6 and Output 22.6.7:

  model a*b*c*d=Group _response_(Group=3)   / noprofile noparm design;   title2 'Trial Nested within Group 3';   quit;

Output 22.6.6: Final Model ”Design Matrix

  Multi-Population Repeated Measures   Trial Nested within Group 3   The CATMOD Procedure   Data Summary   Response           a*b*c*d     Response Levels  13   Weight Variable    wt          Populations       3   Data Set           GROUP       Total Frequency  45   Frequency Missing  0           Observations     23   Response Functions and Design Matrix   Function     Response                   Design Matrix   Sample    Number      Function       1       2       3       4       5       6   ------------------------------------------------------------------------------   1         1        0.73333       1       1       0       0       0       0   2        0.73333       1       1       0       0       0       0   3        0.73333       1       1       0       0       0       0   4        0.66667       1       1       0       0       0       0   2         1        0.66667       1       0       1       0       0       0   2        0.66667       1       0       1       0       0       0   3        0.46667       1       0       1       0       0       0   4        0.40000       1       0       1       0       0       0   3         1        0.86667       1   1   1       1       0       0   2        0.66667       1   1   1       0       1       0   3        0.33333       1   1   1       0       0       1   4        0.06667       1   1   1   1   1   1

Output 22.6.7: ANOVA Table

  Multi-Population Repeated Measures   Trial Nested within Group 3   Analysis of Variance   Source                  DF   Chi-Square    Pr > ChiSq   -----------------------------------------------------   Intercept                1       386.94        <.0001   Group                    2        25.42        <.0001   Trial(Group=3)           3        75.07        <.0001   Residual                 6         5.09        0.5319

Output 22.6.6 displays the design matrix resulting from retaining the nested effect.

The residual goodness-of-fit statistic tests the joint effect of Trial ( Group =1) and Trial ( Group =2). The analysis of variance table in Output 22.6.7 shows that the final model fits, that there is a significant Group effect, and that there is a significant Trial effect in Group 3.

Example 22.7. Repeated Measures, 4 Response Levels, 1 Population

This example illustrates a repeated measurement analysis in which there are more than two levels of response. In this study, from Grizzle, Starmer, and Koch (1969, p. 493), 7,477 women aged 30 “39 are tested for vision in both right and left eyes. Since there are four response levels for each dependent variable, the RESPONSE statement computes three marginal probabilities for each dependent variable, resulting in six response functions for analysis. Since the model contains a repeated measurement factor ( Side ) with two levels ( Right , Left ), PROC CATMOD groups the functions into sets of three (=6/2). Therefore, the Side effect has three degrees of freedom (one for each marginal probability), and it is the appropriate test of marginal homogeneity. The following statements produce Output 22.7.1 through Output 22.7.6:

  title 'Vision Symmetry';   data vision;   input Right Left count @@;   datalines;   1 1 1520    1 2  266    1 3  124    1 4  66   2 1 234     2 2 1512    2 3  432    2 4  78   3 1 117     3 2  362    3 3 1772    3 4 205   4 1  36     4 2   82    4 3  179    4 4 492   ;   proc catmod data=vision;   weight count;   response marginals;   model Right*Left=_response_ / freq design;   repeated Side 2;   title2 'Test of Marginal Homogeneity';   quit;

Output 22.7.1: Vision Study ”Analysis of Marginal Homogeneity

  Vision Symmetry   Test of Marginal Homogeneity   The CATMOD Procedure   Data Summary   Response           Right*Left     Response Levels    16   Weight Variable    count          Populations         1   Data Set           VISION         Total Frequency  7477   Frequency Missing  0              Observations       16   Population Profiles   Sample    Sample Size   ---------------------   1            7477

Output 22.7.2: Response Profiles

  Test of Marginal Homogeneity   Response Profiles   Response    Right    Left   -------------------------   1       1        1   2       1        2   3       1        3   4       1        4   5       2        1   6       2        2   7       2        3   8       2        4   9       3        1   10       3        2   11       3        3   12       3        4   13       4        1   14       4        2   15       4        3   16       4        4

Output 22.7.3: Response Frequencies

  Test of Marginal Homogeneity   Response Frequencies   Response Number   Sample        1        2        3        4        5        6        7        8   ------------------------------------------------------------------------------   1      1520      266      124       66      234     1512      432       78   Response Frequencies   Response Number   Sample        9       10       11       12       13       14       15       16   ------------------------------------------------------------------------------   1       117      362     1772      205       36       82      179      492

Output 22.7.4: Design Matrix

  Test of Marginal Homogeneity   Response Functions and Design Matrix   Function     Response                   Design Matrix   Sample    Number      Function       1       2       3       4       5       6   ------------------------------------------------------------------------------   1         1        0.26428       1       0       0       1       0       0   2        0.30173       0       1       0       0       1       0   3        0.32847       0       0       1       0       0       1   4        0.25505       1       0       0      -1       0       0   5        0.29718       0       1       0       0      -1       0   6        0.33529       0       0       1       0       0      -1

Output 22.7.5: ANOVA Table

  Test of Marginal Homogeneity   Analysis of Variance   Source         DF   Chi-Square    Pr > ChiSq   --------------------------------------------   Intercept       3     78744.17        <.0001   Side            3        11.98        0.0075   Residual        0          .           .

Output 22.7.6: Parameter Estimates

  Test of Marginal Homogeneity   Analysis of Weighted Least Squares Estimates   Standard        Chi-   Effect        Parameter    Estimate      Error      Square    Pr > ChiSq   ------------------------------------------------------------------------   Intercept          1         0.2597    0.00468     3073.03        <.0001   2         0.2995    0.00464     4160.17        <.0001   3         0.3319    0.00483     4725.25        <.0001   Side               4        0.00461    0.00194        5.65        0.0174   5        0.00227    0.00255        0.80        0.3726   6       -0.00341    0.00252        1.83        0.1757

The analysis of variance table in Output 22.7.5 shows that the Side effect is significant, so there is not marginal homogeneity between left-eye vision and right-eye vision. In other words, the distribution of the quality of right-eye vision differs significantly from the quality of left-eye vision in the same subjects. The test of the Side effect is equivalent to Bhapkar s test (Agresti 1990).

Example 22.8. Repeated Measures, Logistic Analysis of Growth Curve

The data, from a longitudinal study reported in Koch et al. (1977), are from patients in four populations (2 diagnostic groups — 2 treatments) who are measured at three times to assess their response (n=normal or a=abnormal) to treatment.

  title 'Growth Curve Analysis';   data growth2;   input Diagnosis $ Treatment $ week1 $ week2 $ week4 $ count @@;   datalines;   mild std n n n 16    severe std n n n  2   mild std n n a 13    severe std n n a  2   mild std n a n  9    severe std n a n  8   mild std n a a  3    severe std n a a  9   mild std a n n 14    severe std a n n  9   mild std a n a  4    severe std a n a 15   mild std a a n 15    severe std a a n 27   mild std a a a  6    severe std a a a 28   mild new n n n 31    severe new n n n  7   mild new n n a  0    severe new n n a  2   mild new n a n  6    severe new n a n  5   mild new n a a  0    severe new n a a  2   mild new a n n 22    severe new a n n 31   mild new a n a  2    severe new a n a  5   mild new a a n  9    severe new a a n 32   mild new a a a  0    severe new a a a  6   ;

The analysis is directed at assessing the effect of the repeated measurement factor, Time , as well as the independent variables, Diagnosis (mild or severe) and Treatment (std or new). The RESPONSE statement is used to compute the logits of the marginal probabilities. The times used in the design matrix (0, 1, 2) correspond to the logarithms (base 2) of the actual times (1, 2, 4). The following statements produce Output 22.8.1 through Output 22.8.7:

  proc catmod data=growth2 order=data;   title2 'Reduced Logistic Model';   weight count;   population Diagnosis Treatment;   response logit;   model week1*week2*week4=(1 0 0 0, /* mild, std */   1 0 1 0,   1 0 2 0,   1 0 0 0, /* mild, new */   1 0 0 1,   1 0 0 2,   0 1 0 0, /* severe, std */   0 1 1 0,   0 1 2 0,   0 1 0 0, /* severe, new */   0 1 0 1,   0 1 0 2)   (1='Mild diagnosis, week 1',   2='Severe diagnosis, week 1',   3='Time effect for std trt',   4='Time effect for new trt')   / freq design;   contrast 'Diagnosis effect, week 1' all_parms 1 -1 0 0;   contrast 'Equal time effects' all_parms 0 0 1 -1;   quit;

Output 22.8.1: Logistic Analysis of Growth Curve

  Growth Curve Analysis   Reduced Logistic Model   The CATMOD Procedure   Data Summary   Response           week1*week2*week4     Response Levels    8   Weight Variable    count                 Populations        4   Data Set           GROWTH2               Total Frequency  340   Frequency Missing  0                     Observations      29

Output 22.8.2: Population and Response Profiles

  Growth Curve Analysis   Reduced Logistic Model   Population Profiles   Sample    Diagnosis    Treatment    Sample Size   -----------------------------------------------   1     mild         std                   80   2     mild         new                   70   3     severe       std                  100   4     severe       new                   90   Response Profiles   Response    week1    week2    week4   -----------------------------------   1       n        n        n   2       n        n        a   3       n        a        n   4       n        a        a   5       a        n        n   6       a        n        a   7       a        a        n   8       a        a        a

Output 22.8.3: Response Frequencies

  Growth Curve Analysis   Reduced Logistic Model   Response Frequencies   Response Number   Sample        1        2        3        4        5        6        7        8   ------------------------------------------------------------------------------   1        16       13        9        3       14        4       15        6   2        31        0        6        0       22        2        9        0   3         2        2        8        9        9       15       27       28   4         7        2        5        2       31        5       32        6

Output 22.8.4: Design Matrix

  Growth Curve Analysis   Reduced Logistic Model   Response Functions and Design Matrix   Function      Response              Design Matrix   Sample     Number       Function        1        2        3        4   --------------------------------------------------------------------   1          1         0.05001        1        0        0        0   2         0.35364        1        0        1        0   3         0.73089        1        0        2        0   2          1         0.11441        1        0        0        0   2         1.29928        1        0        0        1   3         3.52636        1        0        0        2   3          1   1.32493        0        1        0        0   2   0.94446        0        1        1        0   3   0.16034        0        1        2        0   4          1   1.53148        0        1        0        0   2          0.00000        0        1        0        1   3          1.60944        0        1        0        2

Output 22.8.5: Analysis of Variance

  Growth Curve Analysis   Reduced Logistic Model   Analysis of Variance   Source                       DF   Chi-Square    Pr > ChiSq   ----------------------------------------------------------   Mild diagnosis, week 1        1         0.28        0.5955   Severe diagnosis, week 1      1       100.48        <.0001   Time effect for std trt       1        26.35        <.0001   Time effect for new trt       1       125.09        <.0001   Residual                      8         4.20        0.8387

Output 22.8.6: Parameter Estimates

  Growth Curve Analysis   Reduced Logistic Model   Analysis of Weighted Least Squares Estimates   Standard        Chi   Effect    Parameter    Estimate      Error      Square    Pr > ChiSq   --------------------------------------------------------------------   Model          1   0.0716     0.1348        0.28        0.5955   2   1.3529     0.1350      100.48        <.0001   3         0.4944     0.0963       26.35        <.0001   4         1.4552     0.1301      125.09        <.0001

Output 22.8.7: Contrasts

  Growth Curve Analysis   Reduced Logistic Model   Analysis of Contrasts   Contrast                   DF    Chi-Square    Pr > ChiSq   ---------------------------------------------------------   Diagnosis effect, week 1    1         77.02        <.0001   Equal time effects          1         59.12        <.0001

The samples and the response numbers are defined in Output 22.8.2, and Output 22.8.3 displays the frequency distribution of the response numbers within the samples. Output 22.8.4 displays the design matrix specified in the MODEL statement, and the observed logits of the marginal probabilities are displayed in the Response Function column.

The analysis of variance table (Output 22.8.5) shows that the data can be adequately modeled by two parameters that represent diagnosis effects at week 1 and two log-linear time effects (one for each treatment). Both of the time effects are significant.

The analysis of contrasts (Output 22.8.7) shows that the diagnosis effect at week 1 is highly significant. In Output 22.8.6, since the estimate of the logit for the severe diagnosis effect (parameter 2) is more negative than it is for the mild diagnosis effect (parameter 1), there is a smaller predicted probability of the first response (normal) for the severe diagnosis group. In other words, those subjects with a severe diagnosis have a significantly higher probability of abnormal response at week 1 than those subjects with a mild diagnosis.

The analysis of contrasts also shows that the time effect for the standard treatment is significantly different than the one for the new treatment. The table of parameter estimates (Output 22.8.6) shows that the time effect for the new treatment (parameter 4) is stronger than it is for the standard treatment (parameter 3).

Example 22.9. Repeated Measures, Two Repeated Measurement Factors

This example, from MacMillan et al. (1981), illustrates a repeated measurement analysis in which there are two repeated measurement factors. Two diagnostic procedures (standard and test) are performed on each subject, and the results of both are evaluated at each of two times as being positive or negative.

  title 'Diagnostic Procedure Comparison';   data a;   input std1 $ test1 $ std2 $ test2 $ wt @@;   datalines;   neg neg neg neg 509  neg neg neg pos  4  neg neg pos neg  17   neg neg pos pos   3  neg pos neg neg 13  neg pos neg pos   8   neg pos pos pos   8  pos neg neg neg 14  pos neg neg pos   1   pos neg pos neg  17  pos neg pos pos  9  pos pos neg neg   7   pos pos neg pos   4  pos pos pos neg  9  pos pos pos pos 170   ;

For the initial model, the response functions are marginal probabilities, and the repeated measurement factors are Time and Treatment . The model is a saturated one, containing effects for Time , Treatment ,and Time * Treatment . The following statements produce Output 22.9.1 through Output 22.9.5:

  proc catmod data=a;   title2 'Marginal Symmetry, Saturated Model';   weight wt;   response marginals;   model std1*test1*std2*test2=_response_ / freq design noparm;   repeated Time 2, Treatment 2 / _response_=Time Treatment   Time*Treatment;   run;

Output 22.9.1: Diagnosis Data ”Two Repeated Measurement Factors

  Diagnostic Procedure Comparison   Marginal Symmetry, Saturated Model   The CATMOD Procedure   Data Summary   Response           std1*test1*std2*test2     Response Levels   15   Weight Variable    wt                        Populations        1   Data Set           A                         Total Frequency  793   Frequency Missing  0                         Observations      15   Population Profiles   Sample    Sample Size   ---------------------   1             793

Output 22.9.2: Response Profiles

  Diagnostic Procedure Comparison   Marginal Symmetry, Saturated Model   Response Profiles   Response    std1    test1    std2    test2   -----------------------------------------   1       neg     neg      neg     neg   2       neg     neg      neg     pos   3       neg     neg      pos     neg   4       neg     neg      pos     pos   5       neg     pos      neg     neg   6       neg     pos      neg     pos   7       neg     pos      pos     pos   8       pos     neg      neg     neg   9       pos     neg      neg     pos   10       pos     neg      pos     neg   11       pos     neg      pos     pos   12       pos     pos      neg     neg   13       pos     pos      neg     pos   14       pos     pos      pos     neg   15       pos     pos      pos     pos

Output 22.9.3: Response Frequencies

  Diagnostic Procedure Comparison   Marginal Symmetry, Saturated Model   Response Frequencies   Response Number   Sample        1        2        3        4        5        6        7        8   ------------------------------------------------------------------------------   1       509        4       17        3       13        8        8       14   Response Frequencies   Response Number   Sample        9       10       11       12       13       14       15   ---------------------------------------------------------------------   1         1       17        9        7        4        9      170

Output 22.9.4: Design Matrix

  Diagnostic Procedure Comparison   Marginal Symmetry, Saturated Model   Response Functions and Design Matrix   Function      Response              Design Matrix   Sample     Number       Function        1        2        3        4   --------------------------------------------------------------------   1          1         0.70870        1        1        1        1   2         0.72383        1        1       -1       -1   3         0.70618        1       -1        1       -1   4         0.73897        1       -1       -1        1

Output 22.9.5: ANOVA Table

  Diagnostic Procedure Comparison   Marginal Symmetry, Saturated Model   Analysis of Variance   Source             DF   Chi-Square    Pr > ChiSq   ------------------------------------------------   Intercept           1      2385.34        <.0001   Time                1         0.85        0.3570   Treatment           1         8.20        0.0042   Time*Treatment      1         2.40        0.1215   Residual            0          .           .

The analysis of variance table in Output 22.9.5 shows that there is no significant effect of Time , either by itself or in its interaction with Treatment . Thus, the second model includes only the Treatment effect. Again, the response functions are marginal prob- abilities , and the repeated measurement factors are Time and Treatment . A main effect model with respect to Treatment is fit. The following statements produce Output 22.9.6 through Output 22.9.10:

  title2 'Marginal Symmetry, Reduced Model';   model std1*test1*std2*test2=_response_ / corrb design noprofile;   repeated Time 2, Treatment 2 / _response_=Treatment;   run;

Output 22.9.6: Diagnosis Data ”Reduced Model

  Diagnostic Procedure Comparison   Marginal Symmetry, Reduced Model   The CATMOD Procedure   Data Summary   Response           std1*test1*std2*test2     Response Levels   15   Weight Variable    wt                        Populations        1   Data Set           A                         Total Frequency  793   Frequency Missing  0                         Observations      15

Output 22.9.7: Design Matrix

  Diagnostic Procedure Comparison   Marginal Symmetry, Reduced Model   Response Functions and Design Matrix   Function      Response     Design Matrix   Sample     Number       Function        1        2   --------------------------------------------------   1          1         0.70870        1        1   2         0.72383        1       -1   3         0.70618        1        1   4         0.73897        1       -1

Output 22.9.8: ANOVA Table

  Diagnostic Procedure Comparison   Marginal Symmetry, Reduced Model   Analysis of Variance   Source         DF   Chi-Square    Pr > ChiSq   --------------------------------------------   Intercept       1      2386.97        <.0001   Treatment       1         9.55        0.0020   Residual        2         3.51        0.1731

Output 22.9.9: Parameter Estimates

  Diagnostic Procedure Comparison   Marginal Symmetry, Reduced Model   Analysis of Weighted Least Squares Estimates   Standard        Chi   Effect        Parameter    Estimate      Error      Square    Pr > ChiSq   -----------------------------------------------------------------------   Intercept          1         0.7196     0.0147     2386.97        <.0001   Treatment          2   0.0128    0.00416        9.55        0.0020

Output 22.9.10: Correlation Matrix

  Diagnostic Procedure Comparison   Marginal Symmetry, Reduced Model   Correlation Matrix of the Parameter Estimates   Row            Col1            Col2   ----------------------------------   1         1.00000         0.04194   2         0.04194         1.00000

The analysis of variance table for the reduced model (Output 22.9.8) shows that the model fits (since the Residual is nonsignificant) and that the treatment effect is significant. The negative parameter estimate for Treatment in Output 22.9.9 shows that the first level of treatment (std) has a smaller probability of the first response level (neg) than the second level of treatment (test). In other words, the standard diagnostic procedure gives a significantly higher probability of a positive response than the test diagnostic procedure.

The next example illustrates a RESPONSE statement that, at each time, computes the sensitivity and specificity of the test diagnostic procedure with respect to the standard procedure. Since these are measures of the relative accuracy of the two diagnostic procedures, the repeated measurement factors in this case are labeled Time and Accuracy . Only fifteen of the sixteen possible responses are observed, so additional care must be taken in formulating the RESPONSE statement for computation of sensitivity and specificity.

The following statements produce Output 22.9.11 through Output 22.9.15:

  title2 'Sensitivity and Specificity Analysis, '   'Main-Effects Model';   model std1*test1*std2*test2=_response_ / covb design noprofile;   repeated Time 2, Accuracy 2 / _response_=Time Accuracy;   response exp  1   1  0  0  0  0  0  0,   0  0  1   1  0  0  0  0,   0  0  0  0  1   1  0  0,   0  0  0  0  0  0  1   1   log 0 0 0 0   0 0 0   0 0 0 0   1 1 1 1,   0 0 0 0   0 0 0   1 1 1 1   1 1 1 1,   1 1 1 1   0 0 0   0 0 0 0   0 0 0 0,   1 1 1 1   1 1 1   0 0 0 0   0 0 0 0,   0 0 0 1   0 0 1   0 0 0 1   0 0 0 1,   0 0 1 1   0 0 1   0 0 1 1   0 0 1 1,   1 0 0 0   1 0 0   1 0 0 0   1 0 0 0,   1 1 0 0   1 1 0   1 1 0 0   1 1 0 0;   quit;

Output 22.9.11: Diagnosis Data ”Sensitivity and Specificity Analysis

  Diagnostic Procedure Comparison   Sensitivity and Specificity Analysis, Main-Effects Model   The CATMOD Procedure   Data Summary   Response           std1*test1*std2*test2     Response Levels   15   Weight Variable    wt                        Populations        1   Data Set           A                         Total Frequency  793   Frequency Missing  0                         Observations      15

Output 22.9.12: Design Matrix

  Diagnostic Procedure Comparison   Sensitivity and Specificity Analysis, Main-Effects Model   Response Functions and Design Matrix   Function      Response         Design Matrix   Sample     Number       Function        1        2        3   -----------------------------------------------------------   1          1         0.82251        1        1        1   2         0.94840        1        1   1   3         0.81545        1   1        1   4         0.96964        1   1   1

Output 22.9.13: ANOVA Table

  Diagnostic Procedure Comparison   Sensitivity and Specificity Analysis, Main-Effects Model   Analysis of Variance   Source         DF   Chi-Square    Pr > ChiSq   ---------------------------------------------   Intercept       1      6448.79        <.0001   Time            1         4.10        0.0428   Accuracy        1        38.81        <.0001   Residual        1         1.00        0.3178

Output 22.9.14: Parameter Estimates

  Diagnostic Procedure Comparison   Sensitivity and Specificity Analysis, Main-Effects Model   Analysis of Weighted Least Squares Estimates   Standard        Chi-   Effect        Parameter    Estimate      Error      Square    Pr > ChiSq   -------------------------------------------------------------------------   Intercept          1         0.8892     0.0111     6448.79        <.0001   Time               2   0.00932    0.00460        4.10        0.0428   Accuracy           3   0.0702     0.0113       38.81        <.0001

Output 22.9.15: Covariance Matrix

  Diagnostic Procedure Comparison   Sensitivity and Specificity Analysis, Main-Effects Model   Covariance Matrix of the Parameter Estimates   Row            Col1            Col2            Col3   ----------------------------------------------------   1      0.00012260      0.00000229      0.00010137   2      0.00000229      0.00002116   .00000587   3      0.00010137   .00000587      0.00012697

For the sensitivity and specificity analysis, the four response functions displayed next to the design matrix (Output 22.9.12) represent the following:

sensitivity, time 1
specificity, time 1
sensitivity, time 2
specificity, time 2

The sensitivities and specificities are for the test diagnostic procedure relative to the standard procedure.

The ANOVA table (Output 22.9.13) shows that an additive model fits, that there is a significant effect of time, and that the sensitivity is significantly different from the specificity.

Output 22.9.14 shows that the predicted sensitivities and specificities are lower for time 1 (since parameter 2 is negative). It also shows that the sensitivity is significantly less than the specificity.

Example 22.10. Direct Input of Response Functions and Covariance Matrix

This example illustrates the ability of PROC CATMOD to operate on an existing vector of functions and the corresponding covariance matrix. The estimates under investigation are composite indices summarizing the responses to eighteen psychological questions pertaining to general well-being. These estimates are computed for domains corresponding to an age by sex cross-classification, and the covariance matrix is calculated via the method of balanced repeated replications. The analysis is directed at obtaining a description of the variation among these domain estimates. The data are from Koch and Stokes (1979).

  data fbeing(type=est);   input   b1-b5   _type_ $  _name_ $  b6-b10 #2;   datalines;   7.93726   7.92509   7.82815   7.73696   8.16791  parms    .   7.24978   7.18991   7.35960   7.31937   7.55184   0.00739   0.00019   0.00146   0.00082   0.00076  cov      b1   0.00189   0.00118   0.00140   0.00140   0.00039   0.00019   0.01172   0.00183   0.00029   0.00083  cov      b2   -0.00123   0.00629  -0.00088   0.00232   0.00034   0.00146   0.00183   0.01050   0.00173   0.00011  cov      b3   0.00434   0.00059  -0.00055   0.00023  -0.00013   -0.00082   0.00029  -0.00173   0.01335   0.00140  cov      b4   0.00158   0.00212   0.00211   0.00066   0.00240   0.00076   0.00083   0.00011   0.00140   0.01430  cov      b5   -0.00050   0.00098   0.00239   0.00010   0.00213   0.00189   0.00123   0.00434   0.00158   0.00050 cov      b6   0.01110   0.00101   0.00177   0.00018   0.00082   0.00118   0.00629   0.00059   0.00212   0.00098 cov      b7   0.00101   0.02342   0.00144   0.00369   0.00253   0.00140   0.00088   0.00055   0.00211   0.00239  cov      b8   0.00177   0.00144   0.01060   0.00157   0.00226   -0.00140   0.00232   0.00023   0.00066  -0.00010  cov      b9   -0.00018   0.00369   0.00157   0.02298   0.00918   0.00039   0.00034   0.00013   0.00240   0.00213  cov     b10   -0.00082   0.00253   0.00226   0.00918   0.01921   ;

The following statements produce Output 22.10.1 through Output 22.10.3:

  proc catmod data=fbeing;   title 'Complex Sample Survey Analysis';   response read b1-b10;   factors sex $ 2, age $ 5 / _response_=sex age   profile=(male     '25-34',   male     '35-44',   male     '45-54',   male     '55-64',   male     '65-74',   female   '25-34',   female   '35-44',   female   '45-54',   female   '55-64',   female   '65-74');   model _f_=_response_   / design title='Main Effects for Sex and Age';   run;

Output 22.10.1: Health Survey Data ”Using Direct Input

  Complex Sample Survey Analysis   Main Effects for Sex and Age   The CATMOD Procedure   Response Functions and Design Matrix   Function     Response                   Design Matrix   Sample    Number      Function       1       2       3       4       5       6   ------------------------------------------------------------------------------   1         1        7.93726       1       1       1       0       0       0   2        7.92509       1       1       0       1       0       0   3        7.82815       1       1       0       0       1       0   4        7.73696       1       1       0       0       0       1   5        8.16791       1       1   1   1   1   1   6        7.24978       1   1       1       0       0       0   7        7.18991       1   1       0       1       0       0   8        7.35960       1   1       0       0       1       0   9        7.31937       1   1       0       0       0       1   10        7.55184       1   1   1   1   1   1

Output 22.10.2: ANOVA Table

  Complex Sample Survey Analysis   Analysis of Variance   Source         DF   Chi-Square    Pr > ChiSq   ---------------------------------------------   Intercept       1     28089.07        <.0001   sex             1        65.84        <.0001   age             4         9.21        0.0561   Residual        4         2.92        0.5713

Output 22.10.3: Parameter Estimates

  Complex Sample Survey Analysis   Analysis of Weighted Least Squares Estimates   Standard        Chi-   Effect        Parameter    Estimate      Error      Square    Pr > ChiSq   -------------------------------------------------------------------------   Intercept          1         7.6319     0.0455    28089.07        <.0001   sex                2         0.2900     0.0357       65.84        <.0001   age                3   0.00780     0.0645        0.01        0.9037   4   0.0465     0.0636        0.54        0.4642   5   0.0343     0.0557        0.38        0.5387   6   0.1098     0.0764        2.07        0.1506

The analysis of variance table (Output 22.10.2) shows that the additive model fits and that there is a significant effect of both sex and age. The following statements produce Output 22.10.4:

  contrast 'No Age Effect for Age<65' all_parms00100-1,   all_parms00010-1,   all_parms00001-1;   run;

Output 22.10.4: Age<65 Contrast

  Complex Sample Survey Analysis   Main Effects for Sex and Age   The CATMOD Procedure   Analysis of Contrasts   Contrast                   DF    Chi-Square    Pr > ChiSq   ---------------------------------------------------------   No Age Effect for Age<65    3          0.72        0.8678

The analysis of the contrast shows that there is no significant difference among the four age groups that are under age 65. Thus, the next model contains a binary age effect (less than 65 versus 65 and over). The following statements produce Output 22.10.5 through Output 22.10.7:

  model _f_=(1  1  1,   1  1  1,   1  1  1,   1  1  1,   1  1 -1,   1 -1  1,   1 -1  1,   1 -1  1,   1 -1  1,   1 -1 -1)   (1='Intercept' ,   2='Sex'       ,   3='Age (25-64 vs. 65-74)')   / design title='Binary Age Effect (25-64 vs. 65-74)' ;   run;   quit;

Output 22.10.5: Design Matrix

  Complex Sample Survey Analysis   Binary Age Effect (25-64 vs. 65-74)   The CATMOD Procedure   Response Functions and Design Matrix   Function      Response         Design Matrix   Sample     Number       Function        1        2        3   -----------------------------------------------------------   1          1         7.93726        1        1        1   2         7.92509        1        1        1   3         7.82815        1        1        1   4         7.73696        1        1        1   5         8.16791        1        1   1   6         7.24978        1   1        1   7         7.18991        1   1        1   8         7.35960        1   1        1   9         7.31937        1   1        1   10         7.55184        1   1   1

Output 22.10.6: ANOVA Table

  Complex Sample Survey Analysis   Analysis of Variance   Source                    DF   Chi-Square    Pr > ChiSq   -------------------------------------------------------   Intercept                  1     19087.16        <.0001   Sex                        1        72.64        <.0001   Age (25-64 vs. 65-74)      1         8.49        0.0036   Residual                   7         3.64        0.8198

Output 22.10.7: Parameter Estimates

  Complex Sample Survey Analysis   Analysis of Weighted Least Squares Estimates   Standard        Chi-   Effect    Parameter    Estimate      Error      Square    Pr > ChiSq   --------------------------------------------------------------------   Model          1         7.7183     0.0559    19087.16        <.0001   2         0.2800     0.0329       72.64        <.0001   3   0.1304     0.0448        8.49        0.0036

The analysis of variance table in Output 22.10.6 shows that the model fits (note that the goodness-of-fit statistic is the sum of the previous one (Output 22.10.2) plus the chi-square for the contrast matrix in Output 22.10.4). The age and sex effects are significant. Since the second parameter in the table of estimates is positive, males (the first level for the sex variable) have a higher predicted index of well-being than females. Since the third parameter estimate is negative, those younger than age 65 (the first level of age) have a lower predicted index of well-being than those 65 and older.

Example 22.11. Predicted Probabilities

Suppose you have collected marketing research data to examine the relationship between a prospect s likelihood of buying your product and their education and income. Specifically, the variables are as follows .

Variable	Levels	Interpretation
Education	high, low	prospect s education level
Income	high, low	prospect s income level
Purchase	yes, no	Did prospect purchase product?

The following statements first create a data set, loan , that contains the marketing research data, then they use the CATMOD procedure to fit a model, obtain the parameter estimates, and obtain the predicted probabilities of interest. These statements produce Output 22.11.1 through Output 22.11.5.

  data loan;   input Education $ Income $ Purchase $ wt;   datalines;   high  high  yes    54   high  high  no     23   high  low   yes    41   high  low   no     12   low   high  yes    35   low   high  no     42   low   low   yes    19   low   low   no      8   ;   ods output PredictedValues=Predicted (keep=Education Income PredFunction);   proc catmod data=loan order=data;   weight wt;   response marginals;   model Purchase=Education Income / pred design;   run;   proc sort data=Predicted;   by descending PredFunction;   run;   proc print data=Predicted;   run;

Output 22.11.1: Marketing Research Data ”Obtaining Predicted Probabilities

  The CATMOD Procedure   Data Summary   Response           Purchase     Response Levels    2   Weight Variable    wt           Populations        4   Data Set           LOAN         Total Frequency  234   Frequency Missing  0            Observations       8

Output 22.11.2: Profiles and Design Matrix

  Population Profiles   Sample    Education    Income    Sample Size   --------------------------------------------   1     high         high               77   2     high         low                53   3     low          high               77   4     low          low                27   Response Profiles   Response    Purchase   --------------------   1       yes   2       no   Response Functions and Design Matrix   Response         Design Matrix   Sample      Function        1        2        3   -----------------------------------------------   1        0.70130        1        1        1   2        0.77358        1        1   1   3        0.45455        1   1        1   4        0.70370        1   1   1

Output 22.11.3: ANOVA Table and Parameter Estimates

  Analysis of Variance   Source        DF   Chi-Square    Pr > ChiSq   -------------------------------------------   Intercept      1       418.36        <.0001   Education      1         8.85        0.0029   Income         1         4.70        0.0302   Residual       1         1.84        0.1745   Analysis of Weighted Least Squares Estimates   Standard        Chi-   Parameter         Estimate      Error      Square    Pr > ChiSq   ---------------------------------------------------------------   Intercept           0.6481     0.0317      418.36        <.0001   Education high      0.0924     0.0311        8.85        0.0029   Income    high   0.0675     0.0312        4.70        0.0302

Output 22.11.4: Predicted Values and Residuals

  Predicted Values for Response Functions   ------Observed------    ------Predicted-----   Function                Standard                Standard   Education    Income     Number     Function       Error    Function       Error    Residual   -------------------------------------------------------------------------------------------   high         high         1        0.701299    0.052158     0.67294    0.047794    0.028359   high         low          1        0.773585    0.057487    0.808034    0.051586    -0.03445   low          high         1        0.454545    0.056744     0.48811    0.051077    -0.03356   low          low          1        0.703704    0.087877    0.623204    0.064867    0.080499

Output 22.11.5: Predicted Probabilities Data Set

  Pred   Obs    Education    Income    Function   1       high        low      0.808034   2       high        high      0.67294   3       low         low      0.623204   4       low         high      0.48811

Notice that the preceding statements use the Output Delivery System (ODS) to output the parameter estimates instead of the OUT= option, though either can be used.

You can use the predicted values (values of PredFunction in Output 22.11.5) as scores representing the likelihood that a randomly chosen subject from one of these populations will purchase the product. Notice that the Response Profiles in Output 22.11.2 show you that the first sorted level of Purchase is yes, indicating that the predicted probabilities are for Pr( Purchase = yes). For example, someone with high education and low income has an estimated probability of purchase of 0.808. As with any response function estimate given by PROC CATMOD, this estimate can be obtained by cross-multiplying the row from the design matrix corresponding to the sample (sample number 2 in this case) with the vector of parameter estimates ((1 * 0 . 6481) + (1 * 0 . 0924) + ( ˆ’ 1 * ( ˆ’ . 0675))).

This ranking of scores can help in decision making (for example, with respect to allocation of advertising dollars, choice of advertising media, choice of print media, and so on).