Examples | Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4

Example 2.1. Creating an Output Data Set with Table Cell Frequencies

The eye and hair color of children from two different regions of Europe are recorded in the data set Color . Instead of recording one observation per child, the data are recorded as cell counts, where the variable Count contains the number of children exhibiting each of the 15 eye and hair color combinations. The data set does not include missing combinations.

  data Color;   input Region Eyes $ Hair $ Count @@;   label Eyes  ='Eye Color'   Hair  ='Hair Color'   Region='Geographic Region';   datalines;   1 blue  fair   23  1 blue  red     7  1 blue  medium 24   1 blue  dark   11  1 green fair   19  1 green red     7   1 green medium 18  1 green dark   14  1 brown fair   34   1 brown red     5  1 brown medium 41  1 brown dark   40   1 brown black   3  2 blue  fair   46  2 blue  red    21   2 blue  medium 44  2 blue  dark   40  2 blue  black   6   2 green fair   50  2 green red    31  2 green medium 37   2 green dark   23  2 brown fair   56  2 brown red    42   2 brown medium 53  2 brown dark   54  2 brown black  13   ;

The following statements read the Color data set and create an output data set containing the frequencies, percentages, and expected cell frequencies of the Eyes by Hair two-way table. The TABLES statement requests three tables: Eyes and Hair frequency tables and an Eyes by Hair crosstabulation table. The OUT= option creates the FreqCnt data set, which contains the crosstabulation table frequencies. The OUTEXPECT option outputs the expected cell frequencies to FreqCnt , and the SPARSE option includes the zero cell counts. The WEIGHT statement specifies that Count contains the observation weights. These statements create Output 2.1.1 through Output 2.1.3.

  proc freq data=Color;   weight Count;   tables Eyes Hair Eyes*Hair / out=FreqCnt outexpect sparse;   title1 'Eye and Hair Color of European Children';   run;   proc print data=FreqCnt noobs;   title2 'Output Data Set from PROC FREQ';   run;

Output 2.1.1: Frequency Tables

  Eye and Hair Color of European Children   The FREQ Procedure   Eye Color   Cumulative     Cumulative   Eyes   Frequency     Percent     Frequency       Percent   ---------------------------------------------------------   blue        222       29.13           222         29.13   brown       341       44.75           563         73.88   green       199       26.12           762        100.00   Hair Color   Cumulative    Cumulative   Hair    Frequency     Percent      Frequency      Percent   ----------------------------------------------------------   black         22        2.89             22         2.89   dark         182       23.88            204        26.77   fair         228       29.92            432        56.69   medium       217       28.48            649        85.17   red          113       14.83            762       100.00

Output 2.1.2: Crosstabulation Table

  Eye and Hair Color of European Children   Table of Eyes by Hair   Eyes(Eye Color)     Hair(Hair Color)   Frequency   Percent   Row Pct   Col Pct  black   dark   fair     medium  red       Total   ---------+--------+-------+--------+---------+---------+   blue           6     51      69       68       28    222   0.79   6.69    9.06     8.92     3.67  29.13   2.70  22.97   31.08    30.63    12.61   27.27  28.02   30.26    31.34    24.78   ---------+--------+-------+--------+---------+---------+   brown         16     94      90       94       47    341   2.10  12.34   11.81    12.34     6.17  44.75   4.69  27.57   26.39    27.57    13.78   72.73  51.65   39.47    43.32    41.59   ---------+--------+-------+--------+---------+---------+   green          0     37      69       55       38    199   0.00   4.86    9.06     7.22     4.99  26.12   0.00  18.59   34.67    27.64    19.10   0.00  20.33   30.26    25.35    33.63   ---------+--------+-------+--------+---------+---------+   Total          22     182      228       217       113     762   2.89   23.88    29.92     28.48     14.83  100.00

Output 2.1.3: OUT= Data Set

  Output Data Set from PROC FREQ   Eyes        Hair      COUNT    EXPECTED     PERCENT   blue        black        6        6.409      0.7874   blue        dark        51       53.024      6.6929   blue        fair        69       66.425      9.0551   blue        medium      68       63.220      8.9239   blue        red         28       32.921      3.6745   brown       black       16        9.845      2.0997   brown       dark        94       81.446     12.3360   brown       fair        90      102.031     11.8110   brown       medium      94       97.109     12.3360   brown       red         47       50.568      6.1680   green       black        0        5.745      0.0000   green       dark        37       47.530      4.8556   green       fair        69       59.543      9.0551   green       medium      55       56.671      7.2178   green       red         38       29.510      4.9869

Output 2.1.1 displays the two frequency tables produced, one showing the distribution of eye color, and one showing the distribution of hair color. By default, PROC FREQ lists the variables values in alphabetical order. The Eyes*Hair specification produces a crosstabulation table, shown in Output 2.1.2, with eye color defining the table rows and hair color defining the table columns . A zero cell count for green eyes and black hair indicates that this eye and hair color combination does not occur in the data.

The output data set (Output 2.1.3) contains frequency counts and percentages for the last table. The data set also includes an observation for the zero cell count (SPARSE) and a variable with the expected cell frequency for each table cell (OUTEXPECT).

Example 2.2. Computing Chi-Square Tests for One-Way Frequency Tables

This example examines whether the children s hair color (from Example 2.1 on page 161) has a specified multinomial distribution for the two regions. The hypothesized distribution for hair color is 30% fair, 12% red, 30% medium, 25% dark, and 3% black.

In order to test the hypothesis for each region, the data are first sorted by Region . Then the FREQ procedure uses a BY statement to produce a separate table for each BY group ( Region ). The option ORDER=DATA orders the frequency table values (hair color) by their order in the data set. The TABLES statement requests a frequency table for hair color, and the option NOCUM suppresses the display of the cumulative frequencies and percentages. The TESTP= option specifies the hypothesized percentages for the chi-square test; the number of percentages specified equals the number of table levels, and the percentages sum to 100. The following statements produce Output 2.2.1.

  proc sort data=Color;   by Region;   run;   proc freq data=Color order=data;   weight Count;   tables Hair / nocum testp=(30 12 30 25 3);   by Region;   title 'Hair Color of European Children';   run;

Output 2.2.1: One-Way Frequency Table with BY Groups

  Hair Color of European Children   ----------------------------- Geographic Region=1 ------------------------------   The FREQ Procedure   Hair Color   Test   Hair         Frequency     Percent       Percent   ------------------------------------------------   fair               76       30.89         30.00   red                19        7.72         12.00   medium             83       33.74         30.00   dark               65       26.42         25.00   black               3        1.22          3.00   Chi-Square Test   for Specified Proportions   -------------------------   Chi-Square         7.7602   DF                      4   Pr > ChiSq         0.1008   Sample Size = 246   Hair Color of European Children   ----------------------------- Geographic Region=2 ------------------------------   Hair Color   Test   Hair      Frequency     Percent         Percent   -----------------------------------------------   fair           152       29.46           30.00   red             94       18.22           12.00   medium         134       25.97           30.00   dark           117       22.67           25.00   black           19        3.68            3.00   Chi-Square Test   for Specified Proportions   -------------------------   Chi-Square        21.3824   DF                      4   Pr > ChiSq         0.0003   Sample Size = 516

The frequency tables in Output 2.2.1 list the variable values (hair color) in the order in which they appear in the data set. The Test Percent column lists the hypothesized percentages for the chi-square test. Always check that you have ordered the TESTP= percentages to correctly match the order of the variable levels.

PROC FREQ computes a chi-square statistic for each region. The chi-square statistic is significant at the 0.05 level for Region 2 ( p =0.0003) but not for Region 1. This indicates a significant departure from the hypothesized percentages in Region 2.

Example 2.3. Computing Binomial Proportions for One-Way Frequency Tables

The binomial proportion is computed as the proportion of observations for the first level of the variable that you are studying . The following statements compute the proportion of children with brown eyes (from the data set in Example 2.1 on page 161) and test this value against the hypothesis that the proportion is 50%. Also, these statements test whether the proportion of children with fair hair is 28%.

  proc freq data=Color order=freq;   weight Count;   tables Eyes / binomial alpha=.1;   tables Hair / binomial(p=.28);   title 'Hair and Eye Color of European Children';   run;

The first TABLES statement produces a frequency table for eye color. The BINOMIAL option computes the binomial proportion and confidence limits, and it tests the hypothesis that the proportion for the first eye color level (brown) is 0.5. The option ALPHA=.1 specifies that 90% confidence limits should be computed. The second TABLES statement creates a frequency table for hair color and computes the binomial proportion and confidence limits, but it tests that the proportion for the first hair color (fair) is 0.28. These statements produce Output 2.3.1 and Output 2.3.2.

Output 2.3.1: Binomial Proportion for Eye Color

  Hair and Eye Color of European Children   The FREQ Procedure   Eye Color   Cumulative     Cumulative   Eyes    Frequency     Percent     Frequency       Percent   ----------------------------------------------------------   brown        341       44.75           341         44.75   blue         222       29.13           563         73.88   green        199       26.12           762        100.00   Binomial Proportion   for Eyes = brown   --------------------------------   Proportion                0.4475   ASE                       0.0180   90% Lower Conf Limit      0.4179   90% Upper Conf Limit      0.4771   Exact Conf Limits   90% Lower Conf Limit      0.4174   90% Upper Conf Limit      0.4779   Test of H0: Proportion = 0.5   ASE under H0              0.0181   Z                        -2.8981   One-sided Pr < Z          0.0019   Two-sided Pr > Z        0.0038   Sample Size = 762

Output 2.3.2: Binomial Proportion for Hair Color

  Hair and Eye Color of European Children   Hair Color   Cumulative      Cumulative   Hair      Frequency     Percent     Frequency        Percent   -------------------------------------------------------------   fair           228       29.92           228          29.92   medium         217       28.48           445          58.40   dark           182       23.88           627          82.28   red            113       14.83           740          97.11   black           22        2.89           762         100.00   Binomial Proportion   for Hair = fair   --------------------------------   Proportion                0.2992   ASE                       0.0166   95% Lower Conf Limit      0.2667   95% Upper Conf Limit      0.3317   Exact Conf Limits   95% Lower Conf Limit      0.2669   95% Upper Conf Limit      0.3331   Test of H0: Proportion = 0.28   ASE under H0              0.0163   Z                         1.1812   One-sided Pr > Z          0.1188   Two-sided Pr > Z        0.2375   Sample Size = 762

The frequency table in Output 2.3.1 displays the variable values in order of descending frequency count. Since the first variable level is brown , PROC FREQ computes the binomial proportion of children with brown eyes. PROC FREQ also computes its asymptotic standard error ( ASE ), and asymptotic and exact 90% confidence limits. If you do not specify the ALPHA= option, then PROC FREQ computes the default 95% confidence limits.

Because the value of Z is less than zero, PROC FREQ computes a left-sided p -value (0.0019). This small p -value supports the alternative hypothesis that the true value of the proportion of children with brown eyes is less than 50%.

Output 2.3.2 displays the results from the second TABLES statement. PROC FREQ computes the default 95% confidence limits since the ALPHA= option is not specified. The value of Z is greater than zero, so PROC FREQ computes a right-sided p -value (0.1188). This large p -value provides insufficient evidence to reject the null hypothesis that the proportion of children with fair hair is 28%.

Example 2.4. Analyzing a 2x2 Contingency Table

This example computes chi-square tests and Fisher s exact test to compare the probability of coronary heart disease for two types of diet. It also estimates the relative risks and computes exact confidence limits for the odds ratio.

The data set FatComp contains hypothetical data for a case-control study of high fat diet and the risk of coronary heart disease. The data are recorded as cell counts, where the variable Count contains the frequencies for each exposure and response combination. The data set is sorted in descending order by the variables Exposure and Response , so that the first cell of the 2 —2 table contains the frequency of positive exposure and positive response. The FORMAT procedure creates formats to identify the type of exposure and response with character values.

  proc format;   value ExpFmt 1='High Cholesterol Diet'   0='Low Cholesterol Diet';   value RspFmt 1=Yes   0=No;   run;   data FatComp;   input Exposure Response Count;   label Response='Heart Disease';   datalines;   0 0  6   0 1  2   1 0  4   1 1 11   ;   proc sort data=FatComp;   by descending Exposure descending Response;   run;

In the following statements, the TABLES statement creates a two-way table, and the option ORDER=DATA orders the contingency table values by their order in the data set. The CHISQ option produces several chi-square tests, while the RELRISK option produces relative risk measures. The EXACT statement creates the exact Pearson chisquare test and exact confidence limits for the odds ratio. These statements produce Output 2.4.1 through Output 2.4.3.

  proc freq data=FatComp order=data;   weight Count;   tables Exposure*Response / chisq relrisk;   exact pchi or;   format Exposure ExpFmt. Response RspFmt.;   title 'Case-Control Study of High Fat/Cholesterol Diet';   run;

Output 2.4.1: Contingency Table

  Case-Control Study of High Fat/Cholesterol Diet   The FREQ Procedure   Table of Exposure by Response   Exposure          Response(Heart Disease)   Frequency   Percent   Row Pct   Col Pct          Yes     No       Total   -----------------+--------+--------+   High Cholesterol      11       4     15   Diet              47.83   17.39  65.22   73.33   26.67   84.62   40.00   -----------------+--------+--------+   Low Cholesterol        2       6      8   Diet                8.70   26.09  34.78   25.00   75.00   15.38   60.00   -----------------+--------+--------+   Total                  13       10      23   56.52    43.48  100.00

Output 2.4.2: Chi-Square Statistics

  Case-Control Study of High Fat/Cholesterol Diet   Statistics for Table of Exposure by Response   Statistic                     DF       Value       Prob   -------------------------------------------------------   Chi-Square                     1      4.9597     0.0259   Likelihood Ratio Chi-Square    1      5.0975     0.0240   Continuity Adj. Chi-Square     1      3.1879     0.0742   Mantel-Haenszel Chi-Square     1      4.7441     0.0294   Phi Coefficient 0.4644   Contingency Coefficient 0.4212   Cramer's V 0.4644   WARNING: 50% of the cells have expected counts less than 5.   (Asymptotic) Chi-Square may not be a valid test.   Pearson Chi-Square Test   ----------------------------------   Chi-Square                  4.9597   DF                               1   Asymptotic Pr > ChiSq       0.0259   Exact      Pr >= ChiSq      0.0393   Fishers Exact Test   ----------------------------------   Cell (1,1) Frequency (F)        11   Left-sided Pr <= F          0.9967   Right-sided Pr >= F         0.0367   Table Probability (P)       0.0334   Two-sided Pr <= P           0.0393   Sample Size = 23

Output 2.4.3: Relative Risk

  Case-Control Study of High Fat/Cholesterol Diet   Statistics for Table of Exposure by Response   Estimates of the Relative Risk (Row1/Row2)   Type of Study                   Value       95% Confidence Limits   -----------------------------------------------------------------   Case-Control (Odds Ratio)      8.2500        1.1535       59.0029   Cohort (Col1 Risk)             2.9333        0.8502       10.1204   Cohort (Col2 Risk)             0.3556        0.1403        0.9009   Odds Ratio (Case-Control Study)   -----------------------------------   Odds Ratio                   8.2500   Asymptotic Conf Limits   95% Lower Conf Limit         1.1535   95% Upper Conf Limit        59.0029   Exact Conf Limits   95% Lower Conf Limit         0.8677   95% Upper Conf Limit       105.5488   Sample Size = 23

The contingency table in Output 2.4.1 displays the variable values so that the first table cell contains the frequency for the first cell in the data set, the frequency of positive exposure and positive response.

Output 2.4.2 displays the chi-square statistics. Since the expected counts in some of the table cells are small, PROC FREQ gives a warning that the asymptotic chi-square tests may not be appropriate. In this case, the exact tests are appropriate. The alternative hypothesis for this analysis states that coronary heart disease is more likely to be associated with a high fat diet, so a one-sided test is desired. Fisher s exact right-sided test analyzes whether the probability of heart disease in the high fat group exceeds the probability of heart disease in the low fat group; since this p -value is small, the alternative hypothesis is supported.

The odds ratio, displayed in Output 2.4.3, provides an estimate of the relative risk when an event is rare. This estimate indicates that the odds of heart disease is 8.25 times higher in the high fat diet group; however, the wide confidence limits indicate that this estimate has low precision.

Example 2.5. Creating an Output Data Set Containing Chi-Square Statistics

This example uses the Color data from Example 2.1 (page 161) to output the Pearson chi-square and the likelihood-ratio chi-square statistics to a SAS data set. The following statements create a two-way table of eye color versus hair color.

  proc freq data=Color order=data;   weight Count;   tables Eyes*Hair / chisq expected cellchi2 norow nocol;   output out=ChiSqData pchi lrchi n nmiss;   title 'Chi-Square Tests for 3 by 5 Table of Eye and Hair Color';   run;   proc print data=ChiSqData noobs;   title1 'Chi-Square Statistics for Eye and Hair Color';   title2 'Output Data Set from the FREQ Procedure';   run;

The CHISQ option produces chi-square tests, the EXPECTED option displays expected cell frequencies in the table, and the CELLCHI2 option displays the cell contribution to the chi-square. The NOROW and NOCOL options suppress the display of row and column percents in the table.

The OUTPUT statement creates the ChiSqData data set with eight variables: the N option stores the number of nonmissing observations, the NMISS option stores the number of missing observations, and the PCHI and LRCHI options store Pearson and likelihood-ratio chi-square statistics, respectively, together with their degrees of freedom and p -values.

The preceding statements produce Output 2.5.1 and Output 2.5.2.

Output 2.5.1: Contingency Table

  Chi-Square Tests for 3 by 5 Table of Eye and Hair Color   The FREQ Procedure   Table of Eyes by Hair   Eyes(Eye Color)     Hair(Hair Color)   Frequency   Expected   Cell Chi-Square   Percent        fair    red     medium   dark   black     Total   ---------------+--------+--------+---------+-------+--------+   blue                69      28       68     51       6     222   66.425  32.921    63.22 53.024  6.4094   0.0998  0.7357   0.3613 0.0772  0.0262   9.06    3.67     8.92   6.69    0.79   29.13   ---------------+--------+--------+---------+-------+--------+   green               69      38       55     37       0     199   59.543   29.51   56.671  47.53  5.7454   1.5019  2.4422   0.0492 2.3329  5.7454   9.06    4.99     7.22   4.86    0.00   26.12   ---------------+--------+--------+---------+-------+--------+   brown               90      47       94     94      16     341   102.03  50.568   97.109 81.446  9.8451   1.4187  0.2518   0.0995  1.935  3.8478   11.81    6.17    12.34  12.34    2.10   44.75   ---------------+--------+--------+---------+-------+--------+   Total               228      113       217     182       22      762   29.92    14.83     28.48   23.88     2.89   100.00

Output 2.5.2: Chi-Square Statistics

  Chi-Square Tests for 3 by 5 Table of Eye and Hair Color   Statistics for Table of Eyes by Hair   Statistic                     DF       Value      Prob   ------------------------------------------------------   Chi-Square                     8     20.9248    0.0073   Likelihood Ratio Chi-Square    8     25.9733    0.0011   Mantel-Haenszel Chi-Square     1      3.7838    0.0518   Phi Coefficient                       0.1657   Contingency Coefficient               0.1635   Cramer's V                            0.1172   Sample Size = 762

The contingency table in Output 2.5.1 displays eye and hair color in the order in which they appear in the Color data set. The Pearson chi-square statistic in Output 2.5.2 provides evidence of an association between eye and hair color ( p =0.0073). The cell chi-square values show that most of the association is due to more green-eyed children with fair or red hair and fewer with dark or black hair. The opposite occurs with the brown-eyed children.

The OUT= data set is displayed in Output 2.5.3. It contains one observation with the sample size, the number of missing values, and the chi-square statistics and corresponding degrees of freedom and p -values as in Output 2.5.2.

Output 2.5.3: Output Data Set

  Chi-Square Statistics for Eye and Hair Color   Output Data Set from the FREQ Procedure   N   NMISS    _PCHI_   DF_PCHI       P_PCHI   _LRCHI_   DF_LRCHI    P_LRCHI   762   0     20.9248      8      .007349898   25.9733       8     .001061424

Example 2.6. Computing Cochran-Mantel-Haenszel Statistics for a Stratified Table

The data set Migraine contains hypothetical data for a clinical trial of migraine treatment. Subjects of both genders receive either a new drug therapy or a placebo. Their response to treatment is coded as Better or Same . The data are recorded as cell counts, and the number of subjects for each treatment and response combination is recorded in the variable Count .

  data Migraine;   input Gender $ Treatment $ Response $ Count @@;   datalines;   female Active  Better 16   female Active  Same 11   female Placebo Better  5   female Placebo Same 20   male   Active  Better 12   male   Active  Same 16   male   Placebo Better  7   male   Placebo Same 19   ;

The following statements create a three-way table stratified by Gender , where Treatment forms the rows and Response forms the columns. The CMH option produces the Cochran-Mantel-Haenszel statistics. For this stratified 2 —2 table, estimates of the common relative risk and the Breslow-Day test for homogeneity of the odds ratios are also displayed. The NOPRINT option suppresses the display of the contingency tables. These statements produce Output 2.6.1 through Output 2.6.3.

  proc freq data=Migraine;   weight Count;   tables Gender*Treatment*Response / cmh noprint;   title 'Clinical Trial for Treatment of Migraine Headaches';   run;

Output 2.6.1: Cochran-Mantel-Haenszel Statistics

  Clinical Trial for Treatment of Migraine Headaches   The FREQ Procedure   Summary Statistics for Treatment by Response   Controlling for Gender   Cochran-Mantel-Haenszel Statistics (Based on Table Scores)   Statistic    Alternative Hypothesis    DF       Value      Prob   ---------------------------------------------------------------   1        Nonzero Correlation        1      8.3052    0.0040   2        Row Mean Scores Differ     1      8.3052    0.0040   3        General Association        1      8.3052    0.0040   Total Sample Size = 106

Output 2.6.2: CMH Option ”Relative Risks

  Clinical Trial for Treatment of Migraine Headaches   Summary Statistics for Treatment by Response   Controlling for Gender   Estimates of the Common Relative Risk (Row1/Row2)   Type of Study     Method                  Value     95% Confidence Limits   -------------------------------------------------------------------------   Case-Control      Mantel-Haenszel        3.3132       1.4456       7.5934   (Odds Ratio)    Logit                  3.2941       1.4182       7.6515   Cohort            Mantel-Haenszel        2.1636       1.2336       3.7948   (Col1 Risk)     Logit                  2.1059       1.1951       3.7108   Cohort            Mantel-Haenszel        0.6420       0.4705       0.8761   (Col2 Risk)     Logit                  0.6613       0.4852       0.9013   Total Sample Size = 106

Output 2.6.3: CMH Option ”Breslow-Day Test

  Clinical Trial for Treatment of Migraine Headaches   Summary Statistics for Treatment by Response   Controlling for Gender   Breslow-Day Test for   Homogeneity of the Odds Ratios   ------------------------------   Chi-Square              1.4929   DF                           1   Pr > ChiSq              0.2218   Total Sample Size = 106

For a stratified 2 —2 table, the three CMH statistics displayed in Output 2.6.1 test the same hypothesis. The significant p -value (0.004) indicates that the association between treatment and response remains strong after adjusting for gender.

The CMH option also produces a table of relative risks, as shown in Output 2.6.2. Because this is a prospective study, the relative risk estimate assesses the effectiveness of the new drug; the Cohort (Col1 Risk) values are the appropriate estimates for the first column, or the risk of improvement. The probability of migraine improvement with the new drug is just over two times the probability of improvement with the placebo.

The large p -value for the Breslow-Day test (0.2218) in Output 2.6.3 indicates no significant gender difference in the odds ratios.

Example 2.7. Computing the Cochran-Armitage Trend Test

The data set Pain contains hypothetical data for a clinical trial of a drug therapy to control pain. The clinical trial investigates whether adverse responses increase with larger drug doses. Subjects receive either a placebo or one of four drug doses. An adverse response is recorded as Adverse = Yes ; otherwise , it is recorded as Adverse = No . The number of subjects for each drug dose and response combination is contained in the variable Count .

  data pain;   input Dose Adverse $ Count @@;   datalines;   0 No 26   0 Yes  6   1 No 26   1 Yes  7   2 No 23   2 Yes  9   3 No 18   3 Yes 14   4 No  9   4 Yes 23   ;

The TABLES statement in the following program produces a two-way table. The MEASURES option produces measures of association, and the CL option produces confidence limits for these measures. The TREND option tests for a trend across the ordinal values of the Dose variable with the Cochran-Armitage test. The EXACT statement produces exact p -values for this test, and the MAXTIME= option terminates the exact computations if they do not complete within 60 seconds. The TEST statement computes an asymptotic test for Somer s D ( C R ). These statements produce Output 2.7.1 through Output 2.7.3.

  proc freq data=Pain;   weight Count;   tables Dose*Adverse / trend measures cl;   test smdcr;   exact trend / maxtime=60;   title1 'Clinical Trial for Treatment of Pain';   run;

Output 2.7.1: Contingency Table

  Clinical Trial for Treatment of Pain   The FREQ Procedure   Table of Dose by Adverse   Dose      Adverse   Frequency   Percent   Row Pct   Col Pct  No      Yes       Total   ---------+--------+--------+   0     26       6      32   16.15    3.73   19.88   81.25   18.75   25.49   10.17   ---------+--------+--------+   1      26       7      33   16.15    4.35   20.50   78.79   21.21   25.49   11.86   ---------+--------+--------+   2      23       9      32   14.29    5.59   19.88   71.88   28.13   22.55   15.25   ---------+--------+--------+   3      18      14      32   11.18    8.70   19.88   56.25   43.75   17.65   23.73   ---------+--------+--------+   4       9      23      32   5.59   14.29   19.88   28.13   71.88   8.82   38.98   ---------+--------+--------+   Total         102       59      161   63.35    36.65   100.00

Output 2.7.2: Measures of Association

  Clinical Trial for Treatment of Pain   Statistics for Table of Dose by Adverse   95%   Statistic                              Value      ASE       Confidence Limits   -----------------------------------------------------------------------------   Gamma                                 0.5313   0.0935       0.3480     0.7146   Kendalls Tau-b                       0.3373   0.0642       0.2114     0.4631   Stuarts Tau-c 0.                       4111   0.0798       0.2547     0.5675   Somers D CR                         0.2569   0.0499       0.1592     0.3547   Somers D RC                         0.4427   0.0837       0.2786     0.6068   Pearson Correlation                   0.3776   0.0714       0.2378     0.5175   Spearman Correlation                  0.3771   0.0718       0.2363     0.5178   Lambda Asymmetric CR                 0.2373   0.0837       0.0732     0.4014   Lambda Asymmetric RC                 0.1250   0.0662       0.0000     0.2547   Lambda Symmetric                      0.1604   0.0621       0.0388     0.2821   Uncertainty Coefficient CR           0.1261   0.0467       0.0346     0.2175   Uncertainty Coefficient RC           0.0515   0.0191       0.0140     0.0890   Uncertainty Coefficient Symmetric     0.0731   0.0271       0.0199     0.1262   Somers D CR   --------------------------------   Somers D CR            0.2569   ASE                      0.0499   95% Lower Conf Limit     0.1592   95% Upper Conf Limit     0.3547   Test of H0: Somers D CR = 0   ASE under H0             0.0499   Z                        5.1511   One-sided Pr >  Z        <.0001   Two-sided Pr > Z       <.0001   Sample Size = 161

Output 2.7.3: Trend Test

  Clinical Trial for Treatment of Pain   Statistics for Table of Dose by Adverse   Cochran-Armitage Trend Test   -------------------------------   Statistic (Z)           -4.7918   Asymptotic Test   One-sided Pr <  Z        <.0001   Two-sided Pr > Z       <.0001   Exact Test   One-sided Pr <=  Z    7.237E-07   Two-sided Pr >= Z   1.324E-06   Sample Size = 161

The Row Pct values in Output 2.7.1 show the expected increasing trend in the proportion of adverse effects due to increasing dosage (from 18.75% to 71.88%).

Output 2.7.2 displays the measures of association produced by the MEASURES option. Somer s D ( C R ) measures the association treating the column variable ( Adverse ) as the response and the row variable ( Dose ) as a predictor . Because the asymptotic 95% confidence limits do not contain zero, this indicates a strong positive association. Similarly, the Pearson and Spearman correlation coefficients show evidence of a strong positive association, as hypothesized.

The Cochran-Armitage test (Output 2.7.3) supports the trend hypothesis. The small left-sided p -values for the Cochran-Armitage test indicate that the probability of the Column 1 level ( Adverse = No) decreases as Dose increases or, equivalently, that the probability of the Column 2 level ( Adverse = Yes) increases as Dose increases. The two-sided p -value tests against either an increasing or decreasing alternative. This is an appropriate hypothesis when you want to determine whether the drug has progressive effects on the probability of adverse effects but the direction is unknown.

Example 2.8. Computing Friedman s Chi-Square Statistic

Friedman s test is a nonparametric test for treatment differences in a randomized complete block design. Each block of the design may be a subject or a homogeneous group of subjects. If blocks are groups of subjects, the number of subjects in each block must equal the number of treatments. Treatments are randomly assigned to subjects within each block. If there is one subject per block, then the subjects are repeatedly measured once under each treatment. The order of treatments is randomized for each subject.

In this setting, Friedman s test is identical to the ANOVA (row means scores) CMH statistic when the analysis uses rank scores (SCORES=RANK). The three-way table uses subject (or subject group) as the stratifying variable, treatment as the row variable, and response as the column variable. PROC FREQ handles ties by assigning midranks to tied response values. If there are multiple subjects per treatment in each block, the ANOVA CMH statistic is a generalization of Friedman s test.

The data set Hypnosis contains data from a study investigating whether hypnosis has the same effect on skin potential (measured in millivolts) for four emotions (Lehmann 1975, p. 264). Eight subjects are asked to display fear, joy, sadness, and calmness under hypnosis. The data are recorded as one observation per subject for each emotion.

  data Hypnosis;   length Emotion $ 10;   input Subject Emotion $ SkinResponse @@;   datalines;   1 fear 23.1  1 joy 22.7  1 sadness 22.5  1 calmness 22.6   2 fear 57.6  2 joy 53.2  2 sadness 53.7  2 calmness 53.1   3 fear 10.5  3 joy  9.7  3 sadness 10.8  3 calmness  8.3   4 fear 23.6  4 joy 19.6  4 sadness 21.1  4 calmness 21.6   5 fear 11.9  5 joy 13.8  5 sadness 13.7  5 calmness 13.3   6 fear 54.6  6 joy 47.1  6 sadness 39.2  6 calmness 37.0   7 fear 21.0  7 joy 13.6  7 sadness 13.7  7 calmness 14.8   8 fear 20.3  8 joy 23.6  8 sadness 16.3  8 calmness 14.8   ;

In the following statements, the TABLES statement creates a three-way table stratified by Subject and a two-way table; the variables Emotion and SkinResponse form the rows and columns of each table. The CMH2 option produces the first two Cochran-Mantel-Haenszel statistics, the option SCORES=RANK specifies that rank scores are used to compute these statistics, and the NOPRINT option suppresses the contingency tables. These statements produce Output 2.8.1 and Output 2.8.2.

  proc freq data=Hypnosis;   tables Subject*Emotion*SkinResponse   / cmh2 scores=rank noprint;   run;

Output 2.8.1: CMH Statistics ”Stratifying by Subject

  The FREQ Procedure   Summary Statistics for Emotion by SkinResponse   Controlling for Subject   Cochran-Mantel-Haenszel Statistics (Based on Rank Scores)   Statistic    Alternative Hypothesis    DF       Value      Prob   ---------------------------------------------------------------   1        Nonzero Correlation        1      0.2400    0.6242   2        Row Mean Scores Differ     3      6.4500    0.0917   Total Sample Size = 32

Output 2.8.2: CMH Statistics ”No Stratification

  The FREQ Procedure   Summary Statistics for Emotion by SkinResponse   Cochran-Mantel-Haenszel Statistics (Based on Rank Scores)   Statistic    Alternative Hypothesis    DF       Value      Prob   ---------------------------------------------------------------   1        Nonzero Correlation        1      0.0001    0.9933   2        Row Mean Scores Differ     3      0.5678    0.9038   Total Sample Size = 32

Because the CMH statistics in Output 2.8.1 are based on rank scores, the Row Mean Scores Differ statistic is identical to Friedman s chi-square ( Q = 6 . 45). The p -value of 0.0917 indicates that differences in skin potential response for different emotions are significant at the 10% level but not at the 5% level.

When you do not stratify by subject, the Row Mean Scores Differ CMH statistic is identical to a Kruskal-Wallis test and is not significant ( p =0.9038 in Output 2.8.2). Thus, adjusting for subject is critical to reducing the background variation due to subject differences.

Example 2.9. Testing Marginal Homogeneity with Cochran s Q

When a binary response is measured several times or under different conditions, Cochran s Q tests that the marginal probability of a positive response is unchanged across the times or conditions. When there are more than two response categories, you can use the CATMOD procedure to fit a repeated-measures model.

The data set Drugs contains data for a study of three drugs to treat a chronic disease (Agresti 1990). Forty-six subjects receive drugs A, B, and C. The response to each drug is either favorable (F) or unfavorable (U).

  proc format;   value $ResponseFmt 'F'=Favorable'   'U'='Unfavorable';   data drugs;   input Drug_A $ Drug_B $ Drug_C $ Count @@;   datalines;   F F F  6   U F F  2   F F U 16   U F U  4   F U F  2   U U F  6   F U U  4   U U U  6   ;

The following statements create one-way frequency tables of the responses to each drug. The AGREE option produces Cochran s Q and other measures of agreement for the three-way table. These statements produce Output 2.9.1 through Output 2.9.3.

  proc freq data=Drugs;   weight Count;   tables Drug_A Drug_B Drug_C / nocum;   tables Drug_A*Drug_B*Drug_C / agree noprint;   format Drug_A Drug_B Drug_C $ResponseFmt.;   title 'Study of Three Drug Treatments for a Chronic Disease';   run;

Output 2.9.1: One-Way Frequency Tables

  Study of Three Drug Treatments for a Chronic Disease   The FREQ Procedure   Drug_A         Frequency     Percent   ------------------------------------   Favorable            28       60.87   Unfavorable          18       39.13   Drug_B         Frequency     Percent   ------------------------------------   Favorable            28       60.87   Unfavorable          18       39.13   Drug_C         Frequency     Percent   ------------------------------------   Favorable            16       34.78   Unfavorable          30       65.22

Output 2.9.2: Measures of Agreement

  Study of Three Drug Treatments for a Chronic Disease   Statistics for Table 1 of Drug_B by Drug_C   Controlling for Drug_A=Favorable   McNemars Test   ------------------------   Statistic (S)    10.8889   DF                     1   Pr > S            0.0010   Simple Kappa Coefficient   --------------------------------   Kappa                    -0.0328   ASE                       0.1167   95% Lower Conf Limit     -0.2615   95% Upper Conf Limit      0.1960   Sample Size = 28   Statistics for Table 2 of Drug_B by Drug_C   Controlling for Drug_A=Unfavorable   McNemars Test   -----------------------   Statistic (S)   0.4000   DF                   1   Pr > S          0.5271   Simple Kappa Coefficient   --------------------------------   Kappa                    -0.1538   ASE                       0.2230   95% Lower Conf Limit     -0.5909   95% Upper Conf Limit      0.2832   Sample Size = 18   Study of Three Drug Treatments for a Chronic Disease   Summary Statistics for Drug_B by Drug_C   Controlling for Drug_A   Overall Kappa Coefficient   --------------------------------   Kappa                    -0.0588   ASE                       0.1034   95% Lower Conf Limit     -0.2615   95% Upper Conf Limit      0.1439   Test for Equal Kappa   Coefficients   --------------------   Chi-Square    0.2314   DF                 1   Pr > ChiSq    0.6305   Total Sample Size = 46

Output 2.9.3: Cochran s Q

  Study of Three Drug Treatments for a Chronic Disease   Summary Statistics for Drug_B by Drug_C   Controlling for Drug_A   Cochran's Q, for Drug_A   by Drug_B by Drug_C   -----------------------   Statistic (Q)    8.4706   DF                    2   Pr > Q           0.0145   Total Sample Size = 46

The one-way frequency tables in Output 2.9.1 provide the marginal response for each drug. For drugs A and B, 61% of the subjects reported a favorable response while 35% of the subjects reported a favorable response to drug C.

McNemar s test (Output 2.9.2) shows strong discordance between drugs B and C when the response to drug A is favorable. The small negative value of the simple kappa indicates no agreement between drug B response and drug C response.

Cochran s Q is statistically significant ( p =0.0144 in Output 2.9.3), which leads to rejection of the hypothesis that the probability of favorable response is the same for the three drugs.