The eye and hair color of children from two different regions of Europe are recorded in the data set Color . Instead of recording one observation per child, the data are recorded as cell counts, where the variable Count contains the number of children exhibiting each of the 15 eye and hair color combinations. The data set does not include missing combinations.
data Color; input Region Eyes $ Hair $ Count @@; label Eyes ='Eye Color' Hair ='Hair Color' Region='Geographic Region'; datalines; 1 blue fair 23 1 blue red 7 1 blue medium 24 1 blue dark 11 1 green fair 19 1 green red 7 1 green medium 18 1 green dark 14 1 brown fair 34 1 brown red 5 1 brown medium 41 1 brown dark 40 1 brown black 3 2 blue fair 46 2 blue red 21 2 blue medium 44 2 blue dark 40 2 blue black 6 2 green fair 50 2 green red 31 2 green medium 37 2 green dark 23 2 brown fair 56 2 brown red 42 2 brown medium 53 2 brown dark 54 2 brown black 13 ;
The following statements read the Color data set and create an output data set containing the frequencies, percentages, and expected cell frequencies of the Eyes by Hair two-way table. The TABLES statement requests three tables: Eyes and Hair frequency tables and an Eyes by Hair crosstabulation table. The OUT= option creates the FreqCnt data set, which contains the crosstabulation table frequencies. The OUTEXPECT option outputs the expected cell frequencies to FreqCnt , and the SPARSE option includes the zero cell counts. The WEIGHT statement specifies that Count contains the observation weights. These statements create Output 2.1.1 through Output 2.1.3.
proc freq data=Color; weight Count; tables Eyes Hair Eyes*Hair / out=FreqCnt outexpect sparse; title1 'Eye and Hair Color of European Children'; run; proc print data=FreqCnt noobs; title2 'Output Data Set from PROC FREQ'; run;
Eye and Hair Color of European Children The FREQ Procedure Eye Color Cumulative Cumulative Eyes Frequency Percent Frequency Percent --------------------------------------------------------- blue 222 29.13 222 29.13 brown 341 44.75 563 73.88 green 199 26.12 762 100.00 Hair Color Cumulative Cumulative Hair Frequency Percent Frequency Percent ---------------------------------------------------------- black 22 2.89 22 2.89 dark 182 23.88 204 26.77 fair 228 29.92 432 56.69 medium 217 28.48 649 85.17 red 113 14.83 762 100.00
Eye and Hair Color of European Children Table of Eyes by Hair Eyes(Eye Color) Hair(Hair Color) Frequency Percent Row Pct Col Pct black dark fair medium red Total ---------+--------+-------+--------+---------+---------+ blue 6 51 69 68 28 222 0.79 6.69 9.06 8.92 3.67 29.13 2.70 22.97 31.08 30.63 12.61 27.27 28.02 30.26 31.34 24.78 ---------+--------+-------+--------+---------+---------+ brown 16 94 90 94 47 341 2.10 12.34 11.81 12.34 6.17 44.75 4.69 27.57 26.39 27.57 13.78 72.73 51.65 39.47 43.32 41.59 ---------+--------+-------+--------+---------+---------+ green 0 37 69 55 38 199 0.00 4.86 9.06 7.22 4.99 26.12 0.00 18.59 34.67 27.64 19.10 0.00 20.33 30.26 25.35 33.63 ---------+--------+-------+--------+---------+---------+ Total 22 182 228 217 113 762 2.89 23.88 29.92 28.48 14.83 100.00
Output Data Set from PROC FREQ Eyes Hair COUNT EXPECTED PERCENT blue black 6 6.409 0.7874 blue dark 51 53.024 6.6929 blue fair 69 66.425 9.0551 blue medium 68 63.220 8.9239 blue red 28 32.921 3.6745 brown black 16 9.845 2.0997 brown dark 94 81.446 12.3360 brown fair 90 102.031 11.8110 brown medium 94 97.109 12.3360 brown red 47 50.568 6.1680 green black 0 5.745 0.0000 green dark 37 47.530 4.8556 green fair 69 59.543 9.0551 green medium 55 56.671 7.2178 green red 38 29.510 4.9869
Output 2.1.1 displays the two frequency tables produced, one showing the distribution of eye color, and one showing the distribution of hair color. By default, PROC FREQ lists the variables values in alphabetical order. The Eyes*Hair specification produces a crosstabulation table, shown in Output 2.1.2, with eye color defining the table rows and hair color defining the table columns . A zero cell count for green eyes and black hair indicates that this eye and hair color combination does not occur in the data.
The output data set (Output 2.1.3) contains frequency counts and percentages for the last table. The data set also includes an observation for the zero cell count (SPARSE) and a variable with the expected cell frequency for each table cell (OUTEXPECT).
This example examines whether the children s hair color (from Example 2.1 on page 161) has a specified multinomial distribution for the two regions. The hypothesized distribution for hair color is 30% fair, 12% red, 30% medium, 25% dark, and 3% black.
In order to test the hypothesis for each region, the data are first sorted by Region . Then the FREQ procedure uses a BY statement to produce a separate table for each BY group ( Region ). The option ORDER=DATA orders the frequency table values (hair color) by their order in the data set. The TABLES statement requests a frequency table for hair color, and the option NOCUM suppresses the display of the cumulative frequencies and percentages. The TESTP= option specifies the hypothesized percentages for the chi-square test; the number of percentages specified equals the number of table levels, and the percentages sum to 100. The following statements produce Output 2.2.1.
proc sort data=Color; by Region; run; proc freq data=Color order=data; weight Count; tables Hair / nocum testp=(30 12 30 25 3); by Region; title 'Hair Color of European Children'; run;
Hair Color of European Children ----------------------------- Geographic Region=1 ------------------------------ The FREQ Procedure Hair Color Test Hair Frequency Percent Percent ------------------------------------------------ fair 76 30.89 30.00 red 19 7.72 12.00 medium 83 33.74 30.00 dark 65 26.42 25.00 black 3 1.22 3.00 Chi-Square Test for Specified Proportions ------------------------- Chi-Square 7.7602 DF 4 Pr > ChiSq 0.1008 Sample Size = 246 Hair Color of European Children ----------------------------- Geographic Region=2 ------------------------------ Hair Color Test Hair Frequency Percent Percent ----------------------------------------------- fair 152 29.46 30.00 red 94 18.22 12.00 medium 134 25.97 30.00 dark 117 22.67 25.00 black 19 3.68 3.00 Chi-Square Test for Specified Proportions ------------------------- Chi-Square 21.3824 DF 4 Pr > ChiSq 0.0003 Sample Size = 516
The frequency tables in Output 2.2.1 list the variable values (hair color) in the order in which they appear in the data set. The Test Percent column lists the hypothesized percentages for the chi-square test. Always check that you have ordered the TESTP= percentages to correctly match the order of the variable levels.
PROC FREQ computes a chi-square statistic for each region. The chi-square statistic is significant at the 0.05 level for Region 2 ( p =0.0003) but not for Region 1. This indicates a significant departure from the hypothesized percentages in Region 2.
The binomial proportion is computed as the proportion of observations for the first level of the variable that you are studying . The following statements compute the proportion of children with brown eyes (from the data set in Example 2.1 on page 161) and test this value against the hypothesis that the proportion is 50%. Also, these statements test whether the proportion of children with fair hair is 28%.
proc freq data=Color order=freq; weight Count; tables Eyes / binomial alpha=.1; tables Hair / binomial(p=.28); title 'Hair and Eye Color of European Children'; run;
The first TABLES statement produces a frequency table for eye color. The BINOMIAL option computes the binomial proportion and confidence limits, and it tests the hypothesis that the proportion for the first eye color level (brown) is 0.5. The option ALPHA=.1 specifies that 90% confidence limits should be computed. The second TABLES statement creates a frequency table for hair color and computes the binomial proportion and confidence limits, but it tests that the proportion for the first hair color (fair) is 0.28. These statements produce Output 2.3.1 and Output 2.3.2.
Hair and Eye Color of European Children The FREQ Procedure Eye Color Cumulative Cumulative Eyes Frequency Percent Frequency Percent ---------------------------------------------------------- brown 341 44.75 341 44.75 blue 222 29.13 563 73.88 green 199 26.12 762 100.00 Binomial Proportion for Eyes = brown -------------------------------- Proportion 0.4475 ASE 0.0180 90% Lower Conf Limit 0.4179 90% Upper Conf Limit 0.4771 Exact Conf Limits 90% Lower Conf Limit 0.4174 90% Upper Conf Limit 0.4779 Test of H0: Proportion = 0.5 ASE under H0 0.0181 Z -2.8981 One-sided Pr < Z 0.0019 Two-sided Pr > Z 0.0038 Sample Size = 762
Hair and Eye Color of European Children Hair Color Cumulative Cumulative Hair Frequency Percent Frequency Percent ------------------------------------------------------------- fair 228 29.92 228 29.92 medium 217 28.48 445 58.40 dark 182 23.88 627 82.28 red 113 14.83 740 97.11 black 22 2.89 762 100.00 Binomial Proportion for Hair = fair -------------------------------- Proportion 0.2992 ASE 0.0166 95% Lower Conf Limit 0.2667 95% Upper Conf Limit 0.3317 Exact Conf Limits 95% Lower Conf Limit 0.2669 95% Upper Conf Limit 0.3331 Test of H0: Proportion = 0.28 ASE under H0 0.0163 Z 1.1812 One-sided Pr > Z 0.1188 Two-sided Pr > Z 0.2375 Sample Size = 762
The frequency table in Output 2.3.1 displays the variable values in order of descending frequency count. Since the first variable level is brown , PROC FREQ computes the binomial proportion of children with brown eyes. PROC FREQ also computes its asymptotic standard error ( ASE ), and asymptotic and exact 90% confidence limits. If you do not specify the ALPHA= option, then PROC FREQ computes the default 95% confidence limits.
Because the value of Z is less than zero, PROC FREQ computes a left-sided p -value (0.0019). This small p -value supports the alternative hypothesis that the true value of the proportion of children with brown eyes is less than 50%.
Output 2.3.2 displays the results from the second TABLES statement. PROC FREQ computes the default 95% confidence limits since the ALPHA= option is not specified. The value of Z is greater than zero, so PROC FREQ computes a right-sided p -value (0.1188). This large p -value provides insufficient evidence to reject the null hypothesis that the proportion of children with fair hair is 28%.
This example computes chi-square tests and Fisher s exact test to compare the probability of coronary heart disease for two types of diet. It also estimates the relative risks and computes exact confidence limits for the odds ratio.
The data set FatComp contains hypothetical data for a case-control study of high fat diet and the risk of coronary heart disease. The data are recorded as cell counts, where the variable Count contains the frequencies for each exposure and response combination. The data set is sorted in descending order by the variables Exposure and Response , so that the first cell of the 2 —2 table contains the frequency of positive exposure and positive response. The FORMAT procedure creates formats to identify the type of exposure and response with character values.
proc format; value ExpFmt 1='High Cholesterol Diet' 0='Low Cholesterol Diet'; value RspFmt 1=Yes 0=No; run; data FatComp; input Exposure Response Count; label Response='Heart Disease'; datalines; 0 0 6 0 1 2 1 0 4 1 1 11 ; proc sort data=FatComp; by descending Exposure descending Response; run;
In the following statements, the TABLES statement creates a two-way table, and the option ORDER=DATA orders the contingency table values by their order in the data set. The CHISQ option produces several chi-square tests, while the RELRISK option produces relative risk measures. The EXACT statement creates the exact Pearson chisquare test and exact confidence limits for the odds ratio. These statements produce Output 2.4.1 through Output 2.4.3.
proc freq data=FatComp order=data; weight Count; tables Exposure*Response / chisq relrisk; exact pchi or; format Exposure ExpFmt. Response RspFmt.; title 'Case-Control Study of High Fat/Cholesterol Diet'; run;
Case-Control Study of High Fat/Cholesterol Diet The FREQ Procedure Table of Exposure by Response Exposure Response(Heart Disease) Frequency Percent Row Pct Col Pct Yes No Total -----------------+--------+--------+ High Cholesterol 11 4 15 Diet 47.83 17.39 65.22 73.33 26.67 84.62 40.00 -----------------+--------+--------+ Low Cholesterol 2 6 8 Diet 8.70 26.09 34.78 25.00 75.00 15.38 60.00 -----------------+--------+--------+ Total 13 10 23 56.52 43.48 100.00
Case-Control Study of High Fat/Cholesterol Diet Statistics for Table of Exposure by Response Statistic DF Value Prob ------------------------------------------------------- Chi-Square 1 4.9597 0.0259 Likelihood Ratio Chi-Square 1 5.0975 0.0240 Continuity Adj. Chi-Square 1 3.1879 0.0742 Mantel-Haenszel Chi-Square 1 4.7441 0.0294 Phi Coefficient 0.4644 Contingency Coefficient 0.4212 Cramer's V 0.4644 WARNING: 50% of the cells have expected counts less than 5. (Asymptotic) Chi-Square may not be a valid test. Pearson Chi-Square Test ---------------------------------- Chi-Square 4.9597 DF 1 Asymptotic Pr > ChiSq 0.0259 Exact Pr >= ChiSq 0.0393 Fishers Exact Test ---------------------------------- Cell (1,1) Frequency (F) 11 Left-sided Pr <= F 0.9967 Right-sided Pr >= F 0.0367 Table Probability (P) 0.0334 Two-sided Pr <= P 0.0393 Sample Size = 23
Case-Control Study of High Fat/Cholesterol Diet Statistics for Table of Exposure by Response Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ----------------------------------------------------------------- Case-Control (Odds Ratio) 8.2500 1.1535 59.0029 Cohort (Col1 Risk) 2.9333 0.8502 10.1204 Cohort (Col2 Risk) 0.3556 0.1403 0.9009 Odds Ratio (Case-Control Study) ----------------------------------- Odds Ratio 8.2500 Asymptotic Conf Limits 95% Lower Conf Limit 1.1535 95% Upper Conf Limit 59.0029 Exact Conf Limits 95% Lower Conf Limit 0.8677 95% Upper Conf Limit 105.5488 Sample Size = 23
The contingency table in Output 2.4.1 displays the variable values so that the first table cell contains the frequency for the first cell in the data set, the frequency of positive exposure and positive response.
Output 2.4.2 displays the chi-square statistics. Since the expected counts in some of the table cells are small, PROC FREQ gives a warning that the asymptotic chi-square tests may not be appropriate. In this case, the exact tests are appropriate. The alternative hypothesis for this analysis states that coronary heart disease is more likely to be associated with a high fat diet, so a one-sided test is desired. Fisher s exact right-sided test analyzes whether the probability of heart disease in the high fat group exceeds the probability of heart disease in the low fat group; since this p -value is small, the alternative hypothesis is supported.
The odds ratio, displayed in Output 2.4.3, provides an estimate of the relative risk when an event is rare. This estimate indicates that the odds of heart disease is 8.25 times higher in the high fat diet group; however, the wide confidence limits indicate that this estimate has low precision.
This example uses the Color data from Example 2.1 (page 161) to output the Pearson chi-square and the likelihood-ratio chi-square statistics to a SAS data set. The following statements create a two-way table of eye color versus hair color.
proc freq data=Color order=data; weight Count; tables Eyes*Hair / chisq expected cellchi2 norow nocol; output out=ChiSqData pchi lrchi n nmiss; title 'Chi-Square Tests for 3 by 5 Table of Eye and Hair Color'; run; proc print data=ChiSqData noobs; title1 'Chi-Square Statistics for Eye and Hair Color'; title2 'Output Data Set from the FREQ Procedure'; run;
The CHISQ option produces chi-square tests, the EXPECTED option displays expected cell frequencies in the table, and the CELLCHI2 option displays the cell contribution to the chi-square. The NOROW and NOCOL options suppress the display of row and column percents in the table.
The OUTPUT statement creates the ChiSqData data set with eight variables: the N option stores the number of nonmissing observations, the NMISS option stores the number of missing observations, and the PCHI and LRCHI options store Pearson and likelihood-ratio chi-square statistics, respectively, together with their degrees of freedom and p -values.
The preceding statements produce Output 2.5.1 and Output 2.5.2.
Chi-Square Tests for 3 by 5 Table of Eye and Hair Color The FREQ Procedure Table of Eyes by Hair Eyes(Eye Color) Hair(Hair Color) Frequency Expected Cell Chi-Square Percent fair red medium dark black Total ---------------+--------+--------+---------+-------+--------+ blue 69 28 68 51 6 222 66.425 32.921 63.22 53.024 6.4094 0.0998 0.7357 0.3613 0.0772 0.0262 9.06 3.67 8.92 6.69 0.79 29.13 ---------------+--------+--------+---------+-------+--------+ green 69 38 55 37 0 199 59.543 29.51 56.671 47.53 5.7454 1.5019 2.4422 0.0492 2.3329 5.7454 9.06 4.99 7.22 4.86 0.00 26.12 ---------------+--------+--------+---------+-------+--------+ brown 90 47 94 94 16 341 102.03 50.568 97.109 81.446 9.8451 1.4187 0.2518 0.0995 1.935 3.8478 11.81 6.17 12.34 12.34 2.10 44.75 ---------------+--------+--------+---------+-------+--------+ Total 228 113 217 182 22 762 29.92 14.83 28.48 23.88 2.89 100.00
Chi-Square Tests for 3 by 5 Table of Eye and Hair Color Statistics for Table of Eyes by Hair Statistic DF Value Prob ------------------------------------------------------ Chi-Square 8 20.9248 0.0073 Likelihood Ratio Chi-Square 8 25.9733 0.0011 Mantel-Haenszel Chi-Square 1 3.7838 0.0518 Phi Coefficient 0.1657 Contingency Coefficient 0.1635 Cramer's V 0.1172 Sample Size = 762
The contingency table in Output 2.5.1 displays eye and hair color in the order in which they appear in the Color data set. The Pearson chi-square statistic in Output 2.5.2 provides evidence of an association between eye and hair color ( p =0.0073). The cell chi-square values show that most of the association is due to more green-eyed children with fair or red hair and fewer with dark or black hair. The opposite occurs with the brown-eyed children.
The OUT= data set is displayed in Output 2.5.3. It contains one observation with the sample size, the number of missing values, and the chi-square statistics and corresponding degrees of freedom and p -values as in Output 2.5.2.
Chi-Square Statistics for Eye and Hair Color Output Data Set from the FREQ Procedure N NMISS _PCHI_ DF_PCHI P_PCHI _LRCHI_ DF_LRCHI P_LRCHI 762 0 20.9248 8 .007349898 25.9733 8 .001061424
The data set Migraine contains hypothetical data for a clinical trial of migraine treatment. Subjects of both genders receive either a new drug therapy or a placebo. Their response to treatment is coded as Better or Same . The data are recorded as cell counts, and the number of subjects for each treatment and response combination is recorded in the variable Count .
data Migraine; input Gender $ Treatment $ Response $ Count @@; datalines; female Active Better 16 female Active Same 11 female Placebo Better 5 female Placebo Same 20 male Active Better 12 male Active Same 16 male Placebo Better 7 male Placebo Same 19 ;
The following statements create a three-way table stratified by Gender , where Treatment forms the rows and Response forms the columns. The CMH option produces the Cochran-Mantel-Haenszel statistics. For this stratified 2 —2 table, estimates of the common relative risk and the Breslow-Day test for homogeneity of the odds ratios are also displayed. The NOPRINT option suppresses the display of the contingency tables. These statements produce Output 2.6.1 through Output 2.6.3.
proc freq data=Migraine; weight Count; tables Gender*Treatment*Response / cmh noprint; title 'Clinical Trial for Treatment of Migraine Headaches'; run;
Clinical Trial for Treatment of Migraine Headaches The FREQ Procedure Summary Statistics for Treatment by Response Controlling for Gender Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob --------------------------------------------------------------- 1 Nonzero Correlation 1 8.3052 0.0040 2 Row Mean Scores Differ 1 8.3052 0.0040 3 General Association 1 8.3052 0.0040 Total Sample Size = 106
Clinical Trial for Treatment of Migraine Headaches Summary Statistics for Treatment by Response Controlling for Gender Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ------------------------------------------------------------------------- Case-Control Mantel-Haenszel 3.3132 1.4456 7.5934 (Odds Ratio) Logit 3.2941 1.4182 7.6515 Cohort Mantel-Haenszel 2.1636 1.2336 3.7948 (Col1 Risk) Logit 2.1059 1.1951 3.7108 Cohort Mantel-Haenszel 0.6420 0.4705 0.8761 (Col2 Risk) Logit 0.6613 0.4852 0.9013 Total Sample Size = 106
Clinical Trial for Treatment of Migraine Headaches Summary Statistics for Treatment by Response Controlling for Gender Breslow-Day Test for Homogeneity of the Odds Ratios ------------------------------ Chi-Square 1.4929 DF 1 Pr > ChiSq 0.2218 Total Sample Size = 106
For a stratified 2 —2 table, the three CMH statistics displayed in Output 2.6.1 test the same hypothesis. The significant p -value (0.004) indicates that the association between treatment and response remains strong after adjusting for gender.
The CMH option also produces a table of relative risks, as shown in Output 2.6.2. Because this is a prospective study, the relative risk estimate assesses the effectiveness of the new drug; the Cohort (Col1 Risk) values are the appropriate estimates for the first column, or the risk of improvement. The probability of migraine improvement with the new drug is just over two times the probability of improvement with the placebo.
The large p -value for the Breslow-Day test (0.2218) in Output 2.6.3 indicates no significant gender difference in the odds ratios.
The data set Pain contains hypothetical data for a clinical trial of a drug therapy to control pain. The clinical trial investigates whether adverse responses increase with larger drug doses. Subjects receive either a placebo or one of four drug doses. An adverse response is recorded as Adverse = Yes ; otherwise , it is recorded as Adverse = No . The number of subjects for each drug dose and response combination is contained in the variable Count .
data pain; input Dose Adverse $ Count @@; datalines; 0 No 26 0 Yes 6 1 No 26 1 Yes 7 2 No 23 2 Yes 9 3 No 18 3 Yes 14 4 No 9 4 Yes 23 ;
The TABLES statement in the following program produces a two-way table. The MEASURES option produces measures of association, and the CL option produces confidence limits for these measures. The TREND option tests for a trend across the ordinal values of the Dose variable with the Cochran-Armitage test. The EXACT statement produces exact p -values for this test, and the MAXTIME= option terminates the exact computations if they do not complete within 60 seconds. The TEST statement computes an asymptotic test for Somer s D ( C R ). These statements produce Output 2.7.1 through Output 2.7.3.
proc freq data=Pain; weight Count; tables Dose*Adverse / trend measures cl; test smdcr; exact trend / maxtime=60; title1 'Clinical Trial for Treatment of Pain'; run;
Clinical Trial for Treatment of Pain The FREQ Procedure Table of Dose by Adverse Dose Adverse Frequency Percent Row Pct Col Pct No Yes Total ---------+--------+--------+ 0 26 6 32 16.15 3.73 19.88 81.25 18.75 25.49 10.17 ---------+--------+--------+ 1 26 7 33 16.15 4.35 20.50 78.79 21.21 25.49 11.86 ---------+--------+--------+ 2 23 9 32 14.29 5.59 19.88 71.88 28.13 22.55 15.25 ---------+--------+--------+ 3 18 14 32 11.18 8.70 19.88 56.25 43.75 17.65 23.73 ---------+--------+--------+ 4 9 23 32 5.59 14.29 19.88 28.13 71.88 8.82 38.98 ---------+--------+--------+ Total 102 59 161 63.35 36.65 100.00
Clinical Trial for Treatment of Pain Statistics for Table of Dose by Adverse 95% Statistic Value ASE Confidence Limits ----------------------------------------------------------------------------- Gamma 0.5313 0.0935 0.3480 0.7146 Kendalls Tau-b 0.3373 0.0642 0.2114 0.4631 Stuarts Tau-c 0. 4111 0.0798 0.2547 0.5675 Somers D CR 0.2569 0.0499 0.1592 0.3547 Somers D RC 0.4427 0.0837 0.2786 0.6068 Pearson Correlation 0.3776 0.0714 0.2378 0.5175 Spearman Correlation 0.3771 0.0718 0.2363 0.5178 Lambda Asymmetric CR 0.2373 0.0837 0.0732 0.4014 Lambda Asymmetric RC 0.1250 0.0662 0.0000 0.2547 Lambda Symmetric 0.1604 0.0621 0.0388 0.2821 Uncertainty Coefficient CR 0.1261 0.0467 0.0346 0.2175 Uncertainty Coefficient RC 0.0515 0.0191 0.0140 0.0890 Uncertainty Coefficient Symmetric 0.0731 0.0271 0.0199 0.1262 Somers D CR -------------------------------- Somers D CR 0.2569 ASE 0.0499 95% Lower Conf Limit 0.1592 95% Upper Conf Limit 0.3547 Test of H0: Somers D CR = 0 ASE under H0 0.0499 Z 5.1511 One-sided Pr > Z <.0001 Two-sided Pr > Z <.0001 Sample Size = 161
Clinical Trial for Treatment of Pain Statistics for Table of Dose by Adverse Cochran-Armitage Trend Test ------------------------------- Statistic (Z) -4.7918 Asymptotic Test One-sided Pr < Z <.0001 Two-sided Pr > Z <.0001 Exact Test One-sided Pr <= Z 7.237E-07 Two-sided Pr >= Z 1.324E-06 Sample Size = 161
The Row Pct values in Output 2.7.1 show the expected increasing trend in the proportion of adverse effects due to increasing dosage (from 18.75% to 71.88%).
Output 2.7.2 displays the measures of association produced by the MEASURES option. Somer s D ( C R ) measures the association treating the column variable ( Adverse ) as the response and the row variable ( Dose ) as a predictor . Because the asymptotic 95% confidence limits do not contain zero, this indicates a strong positive association. Similarly, the Pearson and Spearman correlation coefficients show evidence of a strong positive association, as hypothesized.
The Cochran-Armitage test (Output 2.7.3) supports the trend hypothesis. The small left-sided p -values for the Cochran-Armitage test indicate that the probability of the Column 1 level ( Adverse = No) decreases as Dose increases or, equivalently, that the probability of the Column 2 level ( Adverse = Yes) increases as Dose increases. The two-sided p -value tests against either an increasing or decreasing alternative. This is an appropriate hypothesis when you want to determine whether the drug has progressive effects on the probability of adverse effects but the direction is unknown.
Friedman s test is a nonparametric test for treatment differences in a randomized complete block design. Each block of the design may be a subject or a homogeneous group of subjects. If blocks are groups of subjects, the number of subjects in each block must equal the number of treatments. Treatments are randomly assigned to subjects within each block. If there is one subject per block, then the subjects are repeatedly measured once under each treatment. The order of treatments is randomized for each subject.
In this setting, Friedman s test is identical to the ANOVA (row means scores) CMH statistic when the analysis uses rank scores (SCORES=RANK). The three-way table uses subject (or subject group) as the stratifying variable, treatment as the row variable, and response as the column variable. PROC FREQ handles ties by assigning midranks to tied response values. If there are multiple subjects per treatment in each block, the ANOVA CMH statistic is a generalization of Friedman s test.
The data set Hypnosis contains data from a study investigating whether hypnosis has the same effect on skin potential (measured in millivolts) for four emotions (Lehmann 1975, p. 264). Eight subjects are asked to display fear, joy, sadness, and calmness under hypnosis. The data are recorded as one observation per subject for each emotion.
data Hypnosis; length Emotion $ 10; input Subject Emotion $ SkinResponse @@; datalines; 1 fear 23.1 1 joy 22.7 1 sadness 22.5 1 calmness 22.6 2 fear 57.6 2 joy 53.2 2 sadness 53.7 2 calmness 53.1 3 fear 10.5 3 joy 9.7 3 sadness 10.8 3 calmness 8.3 4 fear 23.6 4 joy 19.6 4 sadness 21.1 4 calmness 21.6 5 fear 11.9 5 joy 13.8 5 sadness 13.7 5 calmness 13.3 6 fear 54.6 6 joy 47.1 6 sadness 39.2 6 calmness 37.0 7 fear 21.0 7 joy 13.6 7 sadness 13.7 7 calmness 14.8 8 fear 20.3 8 joy 23.6 8 sadness 16.3 8 calmness 14.8 ;
In the following statements, the TABLES statement creates a three-way table stratified by Subject and a two-way table; the variables Emotion and SkinResponse form the rows and columns of each table. The CMH2 option produces the first two Cochran-Mantel-Haenszel statistics, the option SCORES=RANK specifies that rank scores are used to compute these statistics, and the NOPRINT option suppresses the contingency tables. These statements produce Output 2.8.1 and Output 2.8.2.
proc freq data=Hypnosis; tables Subject*Emotion*SkinResponse / cmh2 scores=rank noprint; run;
The FREQ Procedure Summary Statistics for Emotion by SkinResponse Controlling for Subject Cochran-Mantel-Haenszel Statistics (Based on Rank Scores) Statistic Alternative Hypothesis DF Value Prob --------------------------------------------------------------- 1 Nonzero Correlation 1 0.2400 0.6242 2 Row Mean Scores Differ 3 6.4500 0.0917 Total Sample Size = 32
The FREQ Procedure Summary Statistics for Emotion by SkinResponse Cochran-Mantel-Haenszel Statistics (Based on Rank Scores) Statistic Alternative Hypothesis DF Value Prob --------------------------------------------------------------- 1 Nonzero Correlation 1 0.0001 0.9933 2 Row Mean Scores Differ 3 0.5678 0.9038 Total Sample Size = 32
Because the CMH statistics in Output 2.8.1 are based on rank scores, the Row Mean Scores Differ statistic is identical to Friedman s chi-square ( Q = 6 . 45). The p -value of 0.0917 indicates that differences in skin potential response for different emotions are significant at the 10% level but not at the 5% level.
When you do not stratify by subject, the Row Mean Scores Differ CMH statistic is identical to a Kruskal-Wallis test and is not significant ( p =0.9038 in Output 2.8.2). Thus, adjusting for subject is critical to reducing the background variation due to subject differences.
When a binary response is measured several times or under different conditions, Cochran s Q tests that the marginal probability of a positive response is unchanged across the times or conditions. When there are more than two response categories, you can use the CATMOD procedure to fit a repeated-measures model.
The data set Drugs contains data for a study of three drugs to treat a chronic disease (Agresti 1990). Forty-six subjects receive drugs A, B, and C. The response to each drug is either favorable (F) or unfavorable (U).
proc format; value $ResponseFmt 'F'=Favorable' 'U'='Unfavorable'; data drugs; input Drug_A $ Drug_B $ Drug_C $ Count @@; datalines; F F F 6 U F F 2 F F U 16 U F U 4 F U F 2 U U F 6 F U U 4 U U U 6 ;
The following statements create one-way frequency tables of the responses to each drug. The AGREE option produces Cochran s Q and other measures of agreement for the three-way table. These statements produce Output 2.9.1 through Output 2.9.3.
proc freq data=Drugs; weight Count; tables Drug_A Drug_B Drug_C / nocum; tables Drug_A*Drug_B*Drug_C / agree noprint; format Drug_A Drug_B Drug_C $ResponseFmt.; title 'Study of Three Drug Treatments for a Chronic Disease'; run;
Study of Three Drug Treatments for a Chronic Disease The FREQ Procedure Drug_A Frequency Percent ------------------------------------ Favorable 28 60.87 Unfavorable 18 39.13 Drug_B Frequency Percent ------------------------------------ Favorable 28 60.87 Unfavorable 18 39.13 Drug_C Frequency Percent ------------------------------------ Favorable 16 34.78 Unfavorable 30 65.22
Study of Three Drug Treatments for a Chronic Disease Statistics for Table 1 of Drug_B by Drug_C Controlling for Drug_A=Favorable McNemars Test ------------------------ Statistic (S) 10.8889 DF 1 Pr > S 0.0010 Simple Kappa Coefficient -------------------------------- Kappa -0.0328 ASE 0.1167 95% Lower Conf Limit -0.2615 95% Upper Conf Limit 0.1960 Sample Size = 28 Statistics for Table 2 of Drug_B by Drug_C Controlling for Drug_A=Unfavorable McNemars Test ----------------------- Statistic (S) 0.4000 DF 1 Pr > S 0.5271 Simple Kappa Coefficient -------------------------------- Kappa -0.1538 ASE 0.2230 95% Lower Conf Limit -0.5909 95% Upper Conf Limit 0.2832 Sample Size = 18 Study of Three Drug Treatments for a Chronic Disease Summary Statistics for Drug_B by Drug_C Controlling for Drug_A Overall Kappa Coefficient -------------------------------- Kappa -0.0588 ASE 0.1034 95% Lower Conf Limit -0.2615 95% Upper Conf Limit 0.1439 Test for Equal Kappa Coefficients -------------------- Chi-Square 0.2314 DF 1 Pr > ChiSq 0.6305 Total Sample Size = 46
Study of Three Drug Treatments for a Chronic Disease Summary Statistics for Drug_B by Drug_C Controlling for Drug_A Cochran's Q, for Drug_A by Drug_B by Drug_C ----------------------- Statistic (Q) 8.4706 DF 2 Pr > Q 0.0145 Total Sample Size = 46
The one-way frequency tables in Output 2.9.1 provide the marginal response for each drug. For drugs A and B, 61% of the subjects reported a favorable response while 35% of the subjects reported a favorable response to drug C.
McNemar s test (Output 2.9.2) shows strong discordance between drugs B and C when the response to drug A is favorable. The small negative value of the simple kappa indicates no agreement between drug B response and drug C response.
Cochran s Q is statistically significant ( p =0.0144 in Output 2.9.3), which leads to rejection of the hypothesis that the probability of favorable response is the same for the three drugs.