This example produces a correlation analysis with descriptive statistics and four measures of association: the Pearson product-moment correlation, the Spearman rank-order correlation, Kendall s tau-b coefficients, and Hoeffding s measure of dependence, D .
The Fitness data set created in the Getting Started section beginning on page 4 contains measurements from a study of physical fitness of 31 participants . The following statements request all four measures of association for the variables Weight , Oxygen , and Runtime .
ods html; ods graphics on; title 'Measures of Association for a Physical Fitness Study'; proc corr data=Fitness pearson spearman kendall hoeffding plots; var Weight Oxygen RunTime; run; ods graphics off; ods html close;
Note that Pearson correlations are computed by default only if all three nonparametric correlations (SPEARMAN, KENDALL, and HOEFFDING) are not specified. Otherwise, you need to specify the PEARSON option explicitly to compute Pearson correlations .
By default, observations with nonmissing values for each variable are used to derive the univariate statistics for that variable. When nonparametric measures of association are specified, the procedure displays the median instead of the sum as an additional descriptive measure.
Measures of Association for a Physical Fitness Study The CORR Procedure 3 Variables: Weight Oxygen RunTime Simple Statistics Variable N Mean Std Dev Median Minimum Maximum Weight 31 77.44452 8.32857 77.45000 59.08000 91.63000 Oxygen 29 47.22721 5.47718 46.67200 37.38800 60.05500 RunTime 29 10.67414 1.39194 10.50000 8.17000 14.03000
The Pearson correlation is a parametric measure of association for two continuous random variables. When there is missing data, the number of observations used to calculate the correlation can vary.
In Output 1.1.2, the Pearson correlation between Runtime and Oxygen is ˆ’ 0.86843, which is significant with a p -value less than 0.0001. This indicates a strong negative linear relationship between these two variables. As Runtime increases , Oxygen decreases linearly.
Measures of Association for a Physical Fitness Study Pearson Correlation Coefficients Prob > r under H0: Rho=0 Number of Observations Weight Oxygen RunTime Weight 1.00000 -0.15358 0.20072 0.4264 0.2965 31 29 29 Oxygen -0.15358 1.00000 -0.86843 0.4264 <.0001 29 29 28 RunTime 0.20072 -0.86843 1.00000 0.2965 <.0001 29 28 29
The Spearman rank-order correlation is a nonparametric measure of association based on the ranks of the data values. The Spearman Correlation Coefficients table shown in Output 1.1.3 displays results similar to those of the Pearson Correlation Coefficients table.
Measures of Association for a Physical Fitness Study Spearman Correlation Coefficients Prob > r under H0: Rho=0 Number of Observations Weight Oxygen RunTime Weight 1.00000 -0.06824 0.13749 0.7250 0.4769 31 29 29 Oxygen -0.06824 1.00000 -0.80131 0.7250 <.0001 29 29 28 RunTime 0.13749 -0.80131 1.00000 0.4769 <.0001 29 28 29
Kendall s tau-b is a nonparametric measure of association based on the number of concordances and discordances in paired observations. The Kendall Tau-b Correlation Coefficients table shown in Output 1.1.4 displays results similar to those of the Pearson Correlation Coefficients table in Output 1.1.2.
Measures of Association for a Physical Fitness Study Kendall Tau b Correlation Coefficients Prob > r under H0: Rho=0 Number of Observations Weight Oxygen RunTime Weight 1.00000 -0.00988 0.06675 0.9402 0.6123 31 29 29 Oxygen -0.00988 1.00000 -0.62434 0.9402 <.0001 29 29 28 RunTime 0.06675 -0.62434 1.00000 0.6123 <.0001 29 28 29
Hoeffding s measure of dependence, D , is a nonparametric measure of association that detects more general departures from independence. Without ties in the variables, the values of the D statistic can vary between -0.5 and 1, with 1 indicating complete dependence. Otherwise, the D statistic can result in a smaller value. Since ties occur in the variable Weight , the D statistic for the Weight variable is less than 1, as shown in the Hoeffding Dependence Coefficients table in Output 1.1.5.
Measures of Association for a Physical Fitness Study Hoeffding Dependence Coefficients Prob > D under H0: D=0 Number of Observations Weight Oxygen RunTime Weight 0.97690 -0.00497 -0.02355 <.0001 0.5101 1.0000 31 29 29 Oxygen -0.00497 1.00000 0.23449 0.5101 <.0001 29 29 28 RunTime -0.02355 0.23449 1.00000 1.0000 <.0001 29 28 29
The experimental PLOTS option requests a symmetric scatter plot for the analysis variables listed in the VAR statement. The strong negative linear relationship between Oxygen and Runtime is evident in Output 1.1.6.
This display is requested by specifying both the ODS GRAPHICS statement and the PLOTS option. For general information about ODS graphics, refer to Chapter 15, Statistical Graphics Using ODS ( SAS/STAT User s Guide ). For specific information about the graphics available in the CORR procedure, see the section ODS Graphics on page 31.
The following statements create a data set which contains measurements for four iris parts from Fisher s iris data (1936): sepal length, sepal width, petal length, and petal width. Each observation represents one specimen.
*------------------- Data on Iris Setosa --------------------* The data set contains 50 iris specimens from the species Iris Setosa with the following four measurements: SepalLength (sepal length) SepalWidth (sepal width) PetalLength (petal length) PetalWidth (petal width) Certain values were changed to missing for the analysis. *------------------------------------------------------------*; data Setosa; input SepalLength SepalWidth PetalLength PetalWidth @@; label sepallength='Sepal Length in mm.' sepalwidth='Sepal Width in mm.' petallength='Petal Length in mm.' petalwidth='Petal Width in mm.'; datalines; 50 33 14 02 46 34 14 03 46 36 . 02 51 33 17 05 55 35 13 02 48 31 16 02 52 34 14 02 49 36 14 01 44 32 13 02 50 35 16 06 44 30 13 02 47 32 16 02 48 30 14 03 51 38 16 02 48 34 19 02 50 30 16 02 50 32 12 02 43 30 11 . 58 40 12 02 51 38 19 04 49 30 14 02 51 35 14 02 50 34 16 04 46 32 14 02 57 44 15 04 50 36 14 02 54 34 15 04 52 41 15 . 55 42 14 02 49 31 15 02 54 39 17 04 50 34 15 02 44 29 14 02 47 32 13 02 46 31 15 02 51 34 15 02 50 35 13 03 49 31 15 01 54 37 15 02 54 39 13 04 51 35 14 03 48 34 16 02 48 30 14 01 45 23 13 03 57 38 17 03 51 38 15 03 54 34 17 02 51 37 15 04 52 35 15 02 53 37 15 02 ;
The following statements request a correlation analysis between two sets of variables, the sepal measurements and the petal measurements.
ods html; ods graphics on; title 'Fisher (1936) Iris Setosa Data'; proc corr data=Setosa sscp cov plots; var sepallength sepalwidth; with petallength petalwidth; run; ods graphics off; ods html close;
The CORR procedure displays univariate statistics for variables in the VAR and WITH statements.
Fisher (1936) Iris Setosa Data0 The CORR Procedure 2 With Variables: PetalLength PetalWidth 2 Variables: SepalLength SepalWidth Simple Statistics Variable N Mean StdDev Sum PetalLength 49 14.71429 1.62019 721.00000 PetalWidth 48 2.52083 1.03121 121.00000 SepalLength 50 50.06000 3.52490 2503 SepalWidth 50 34.28000 3.79064 1714 Simple Statistics Variable Minimum Maximum Label PetalLength 11.00000 19.00000 Petal Length in mm. PetalWidth 1.00000 6.00000 Petal Width in mm. SepalLength 43.00000 58.00000 Sepal Length in mm. SepalWidth 23.00000 44.00000 Sepal Width in mm.
When the WITH statement is specified together with the VAR statement, the CORR procedure produces rectangular matrices for statistics such as covariances and correlations. The matrix rows correspond to the WITH variables ( PetalLength and PetalWidth ) while the matrix columns correspond to the VAR variables ( SepalLength and SepalWidth ). The CORR procedure uses the WITH variable labels to label the matrix rows.
The SSCP option requests a table of the uncorrected sum-of-squares and crossproducts matrix, and the COV option requests a table of the covariance matrix. The SSCP and COV options also produce a table of the Pearson correlations.
The sum-of-squares and crossproducts statistics for each pair of variables are computed by using observations with nonmissing row and column variable values. The Sums of Squares and Crossproducts table shown in Output 1.2.2 displays the crossproduct, sum of squares for the row variable, and sum of squares for the column variable for each pair of variables.
Fisher (1936) Iris Setosa Data Sums of Squares and Crossproducts SSCP / Row Var SS / Col Var SS SepalLength SepalWidth PetalLength 36214.00000 24756.00000 Petal Length in mm. 10735.00000 10735.00000 123793.0000 58164.0000 PetalWidth 6113.00000 4191.00000 Petal Width in mm. 355.00000 355.00000 121356.0000 56879.0000
The variances are computed by using observations with nonmissing row and column variable values. The Variances and Covariances table shown in Output 1.2.3 displays the covariance, variance for the row variable, variance for the column variable, and the associated degrees of freedom for each pair of variables.
Fisher (1936) Iris Setosa Data Variances and Covariances Covariance / Row Var Variance / Col Var Variance / DF SepalLength SepalWidth PetalLength 1.270833333 1.363095238 Petal Length in mm. 2.625000000 2.625000000 12.33333333 14.60544218 48 48 PetalWidth 0.911347518 1.048315603 Petal Width in mm. 1.063386525 1.063386525 11.80141844 13.62721631 47 47
When there are missing values in the analysis variables, the Pearson Correlation Coefficients table shown in Output 1.2.4 displays the correlation, the p -value under the null hypothesis of zero correlation, and the number of observations for each pair of variables. Only the correlation between PetalWidth and SepalLength and the correlation between PetalWidth and SepalWidth are slightly positive.
Fisher (1936) Iris Setosa Data Pearson Correlation Coefficients Prob > r under H0: Rho=0 Number of Observations Sepal Sepal Length Width PetalLength 0.22335 0.22014 Petal Length in mm. 0.1229 0.1285 49 49 PetalWidth 0.25726 0.27539 Petal Width in mm. 0.0775 0.0582 48 48
The experimental PLOTS option displays a rectangular scatter plot matrix for the two sets of variables. The VAR variables SepalLength and SepalWidth are listed across the top of the matrix, and the WITH variables PetalLength and PetalWidth are listed down the side of the matrix. As measured in Output 1.2.4, the plot for PetalWidth and SepalLength and the plot for PetalWidth and SepalWidth show slight positive correlations.
This display is requested by specifying both the ODS GRAPHICS statement and the PLOTS option. For general information about ODS graphics, refer to Chapter 15, Statistical Graphics Using ODS ( SAS/STAT User s Guide ). For specific information about the graphics available in the CORR procedure, see the section ODS Graphics on page 31.
The following statements request Pearson correlation statistics using Fisher s z transformation for the data set Fitness .
proc corr data=Fitness nosimple fisher; var weight oxygen runtime; run;
The NOSIMPLE option suppresses the table of descriptive statistics. The Pearson Correlation Coefficients table is displayed by default.
Fisher (1936) Iris Setosa Data The CORR Procedure Pearson Correlation Coefficients Prob > r under H0: Rho=0 Number of Observations Weight Oxygen RunTime Weight 1.00000 -0.15358 0.20072 0.4264 0.2965 31 29 29 Oxygen -0.15358 1.00000 -0.86843 0.4264 <.0001 29 29 28 RunTime 0.20072 -0.86843 1.00000 0.2965 <.0001 29 28 29
The FISHER option requests correlation statistics using Fisher s z transformation, which are shown in Output 1.3.2.
Pearson Correlation Statistics (Fishers z Transformation) With Sample Bias Correlation Variable Variable N Correlation Fishers z Adjustment Estimate Weight Oxygen 29 -0.15358 -0.15480 -0.00274 -0.15090 Weight RunTime 29 0.20072 0.20348 0.00358 0.19727 Oxygen RunTime 28 -0.86843 -1.32665 -0.01608 -0.86442 Pearson Correlation Statistics (Fishers z Transformation) With p Value for Variable Variable 95% Confidence Limits H0:Rho=0 Weight Oxygen -0.490289 0.228229 0.4299 Weight RunTime -0.182422 0.525765 0.2995 Oxygen RunTime -0.935728 -0.725221 <.0001
See the section Fisher s z Transformation on page 21 for details on Fisher s z transformation.
The following statements request one-sided hypothesis tests and confidence limits for the correlation using Fisher s z transformation.
proc corr data=Fitness nosimple nocorr fisher (type=lower); var weight oxygen runtime; run;
The NOSIMPLE option suppresses the Simple Statistics table, and the NOCORR option suppresses the Pearson Correlation Coefficients table.
The CORR Procedure Pearson Correlation Statistics (Fisher's z Transformation) With Sample Bias Correlation Variable Variable N Correlation Fisher's z Adjustment Estimate Weight Oxygen 29 -0.15358 -0.15480 -0.00274 -0.15090 Weight RunTime 29 0.20072 0.20348 0.00358 0.19727 Oxygen RunTime 28 -0.86843 -1.32665 -0.01608 -0.86442 Pearson Correlation Statistics (Fisher's z Transformation) With p Value for Variable Variable Lower 95% CL H0:Rho<=0 Weight Oxygen -0.441943 0.7850 Weight RunTime -0.122077 0.1497 Oxygen RunTime -0.927408 1.0000
The TYPE=LOWER option requests a lower confidence limit and a p -value for the test of the one-sided hypothesis H0 : ‰ 0 against the alternative hypothesis H 1 : > 0. Here Fisher s z , the bias adjustment, and the estimate of the correlation are the same as for the two-sided alternative. However, because TYPE=LOWER is specified, only a lower confidence limit is computed for each correlation, and one-sided p -values are computed.
This example illustrates some applications of Fisher s z transformation. For details, see the section Fisher s z Transformation on page 21.
The following statements simulate independent samples of variables X and Y from a bivariate normal distribution. The first batch of 150 observations is sampled using a known correlation of 0.3, the second batch of 150 observations is sampled using a known correlation of 0.25, and the third batch of 100 observations is sampled using a known correlation of 0.3.
data Sim (drop=i); do i=1 to 400; X = rannor(135791); Batch = 1 + (i>150) + (i>300); if Batch = 1 then Y = 0.3*X + 0.9*rannor(246791); if Batch = 2 then Y = 0.25*X + sqrt(.8375)*rannor(246791); if Batch = 3 then Y = 0.3*X + 0.9*rannor(246791); output; end; run;
This data set will be used to illustrate the following applications of Fisher s z transformation:
Testing whether a population correlation is equal to a given value
Testing for equality of two population correlations
Combining correlation estimates from different samples
See the section Fisher s z Transformation on page 21.
You can use the following statements to test the null hypothesis H : = 0 . 5 against a two-sided alternative H 1 : ‰ . 5.
ods select FisherPearsonCorr; title 'Analysis for Batch 1'; proc corr data=Sim (where=(Batch=1)) fisher(rho0=.5); var X Y; run;
The test is requested with the option FISHER(RHO0=0.5). The results, which are based on Fisher s transformation, are shown in Output 1.4.1.
Analysis for Batch 1 The CORR Procedure Pearson Correlation Statistics (Fisher's z Transformation) With Sample Bias Correlation Variable Variable N Correlation Fisher's z Adjustment Estimate X Y 150 0.22081 0.22451 0.0007410 0.22011 Pearson Correlation Statistics (Fisher's z Transformation) With ------H0:Rho=Rho0----- Variable Variable 95% Confidence Limits Rho0 p Value X Y 0.062034 0.367409 0.50000 <.0001
The null hypothesis is rejected since the p -value is less than 0.0001.
You can use the following statements to test for equality of two population correlations, 1 and 2 . Here, the null hypothesis H : 1 = 2 is tested against the alternative H 1 : 1 ‰ 2 .
ods select FisherPearsonCorr; ods output FisherPearsonCorr=SimCorr; title 'Testing Equality of Population Correlations'; proc corr data=Sim (where=(Batch=1 or Batch=2)) fisher; var X Y; by Batch; run;
The ODS SELECT statement restricts the output from PROC CORR to the FisherPearsonCorr table, which is shown in Output 1.4.2; see the section ODS Table Names on page 30. The output data set SimCorr contains Fisher s z statistics for both batches.
Testing Equality of Population Correlations --------------------------------- Batch=1 ------------------------------------ The CORR Procedure Pearson Correlation Statistics (Fisher's z Transformation) With Sample Bias Correlation Variable Variable N Correlation Fisher's z Adjustment Estimate X Y 150 0.22081 0.22451 0.0007410 0.22011 Pearson Correlation Statistics (Fisher's z Transformation) With p Value for Variable Variable 95% Confidence Limits H0:Rho=0 X Y 0.062034 0.367409 0.0065 Testing Equality of Population Correlations --------------------------------- Batch=2 ------------------------------------ The CORR Procedure Pearson Correlation Statistics (Fisher's z Transformation) With Sample Bias Correlation Variable Variable N Correlation Fisher's z Adjustment Estimate X Y 150 0.33694 0.35064 0.00113 0.33594 Pearson Correlation Statistics (Fisher's z Transformation) With p Value for Variable Variable 95% Confidence Limits H0:Rho=0 X Y 0.185676 0.470853 <.0001
The p -value for testing H is derived by treating the difference z 1 ˆ’ z 2 as a normal random variable with mean zero and variance 1 / ( n 1 ˆ’ 3)+1 / ( n 2 ˆ’ 3), where z 1 and z 2 are Fisher s z transformation of the sample correlations r 1 and r 2 , respectively, and where n 1 and n 2 are the corresponding sample sizes.
The following statements compute the p -value shown in Output 1.4.3.
data SimTest (drop=Batch); merge SimCorr (where=(Batch=1) keep=Nobs ZVal Batch rename=(Nobs=n1 ZVal=z1)) SimCorr (where=(Batch=2) keep=Nobs ZVal Batch rename=(Nobs=n2 ZVal=z2)); variance = 1/(n1-3) + 1/(n2-3); z = (z1 - z2) / sqrt(variance); pval = probnorm(z); if (pval > 0.5) then pval = 1 - pval; pval = 2*pval; run; proc print data=SimTest noobs; run;
n1 z1 n2 z2 variance z pval 150 0.22451 150 0.35064 0.013605 -1.08135 0.27954
In Output 1.4.3, the p -value of 0.2795 does not provide evidence to reject the null hypothesis that 1 = 2 . The sample sizes n 1 = 150 and n 2 = 150 are not large enough to detect the difference 1 ˆ’ 2 = 0 . 05 at a significance level of ± = 0 . 05.
Assume that sample correlations r 1 and r 2 are computed from two independent samples of n 1 and n 2 observations, respectively. A combined correlation estimate is given by r = tanh( z ), where z is the weighted average of the z -transformations of r 1 and r 2 :
The following statements compute a combined estimate of using Batch 1 and Batch 3:
ods output FisherPearsonCorr=SimCorr2; proc corr data=Sim (where=(Batch=1 or Batch=3)) fisher noprint; var X Y; by Batch; run; data SimComb (drop=Batch); merge SimCorr2 (where=(Batch=1) keep=Nobs ZVal Batch rename=(Nobs=n1 ZVal=z1)) SimCorr2 (where=(Batch=3) keep=Nobs ZVal Batch rename=(Nobs=n2 ZVal=z2)); z = ((n1-3)*z1 + (n2-3)*z2) / (n1+n2-6); corr = tanh(z); var = 1/(n1+n2-6); lcl = corr - probit(0.975)*sqrt(var); ucl = corr + probit(0.975)*sqrt(var); run; proc print data=SimComb noobs; var n1 z1 n2 z2 corr lcl ucl; run;
Output 1.4.4 displays the combined estimate of .
n1 z1 n2 z2 corr lcl ucl 150 0.22451 100 0.23929 0.22640 0.10092 0.35187
Thus, a correlation estimate from the combined samples is r = 0 . 23. The 95% confidence interval displayed in Output 1.4.4 is (0 . 10 , . 35) using the variance of the combined estimate. Note that this interval contains the population correlation 0.3. See the section Applications of Fisher s z Transformation on page 23.
The following statements create the data set Fish1 from the Fish data set used in Chapter 67, The STEPDISC Procedure. The cubic root of the weight ( Weight3 ) is computed as a one-dimensional measure of the size of a fish.
*------------------- Fish Measurement Data ----------------------* The data set contains 35 fish from the species Bream caught in Finlands lake Laengelmavesi with the following measurements: Weight (in grams) Length3 (length from the nose to the end of its tail, in cm) HtPct (max height, as percentage of Length3) WidthPct (max width, as percentage of Length3) *----------------------------------------------------------------*; data Fish1 (drop=HtPct WidthPct); title 'Fish Measurement Data'; input Weight Length3 HtPct WidthPct @@; Weight3= Weight**(1/3); Height=HtPct*Length3/100; Width=WidthPct*Length3/100; datalines; 242.0 30.0 38.4 13.4 290.0 31.2 40.0 13.8 340.0 31.1 39.8 15.1 363.0 33.5 38.0 13.3 430.0 34.0 36.6 15.1 450.0 34.7 39.2 14.2 500.0 34.5 41.1 15.3 390.0 35.0 36.2 13.4 450.0 35.1 39.9 13.8 500.0 36.2 39.3 13.7 475.0 36.2 39.4 14.1 500.0 36.2 39.7 13.3 500.0 36.4 37.8 12.0 . 37.3 37.3 13.6 600.0 37.2 40.2 13.9 600.0 37.2 41.5 15.0 700.0 38.3 38.8 13.8 700.0 38.5 38.8 13.5 610.0 38.6 40.5 13.3 650.0 38.7 37.4 14.8 575.0 39.5 38.3 14.1 685.0 39.2 40.8 13.7 620.0 39.7 39.1 13.3 680.0 40.6 38.1 15.1 700.0 40.5 40.1 13.8 725.0 40.9 40.0 14.8 720.0 40.6 40.3 15.0 714.0 41.5 39.8 14.1 850.0 41.6 40.6 14.9 1000.0 42.6 44.5 15.5 920.0 44.1 40.9 14.3 955.0 44.0 41.1 14.3 925.0 45.3 41.4 14.9 975.0 45.9 40.6 14.7 950.0 46.5 37.9 13.7 ;
The following statements request a correlation analysis and compute Cronbach s coefficient alpha for the variables Weight3 , Length3 , Height , and Width .
ods html; ods graphics on; title 'Fish Measurement Data'; proc corr data=fish1 nomiss alpha plots; var Weight3 Length3 Height Width; run; ods graphics off; ods html close;
The NOMISS option excludes observations with missing values, and the PLOTS option requests a symmetric scatter plot matrix for the analysis variables.
By default, the CORR procedure displays descriptive statistics for each variable, as shown in Output 1.5.1.
Fish Measurement Data The CORR Procedure 4 Variables: Weight3 Length3 Height Width Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Weight3 34 8.44751 0.97574 287.21524 6.23168 10.00000 Length3 34 38.38529 4.21628 1305 30.00000 46.50000 Height 34 15.22057 1.98159 517.49950 11.52000 18.95700 Width 34 5.43805 0.72967 184.89370 4.02000 6.74970
Since the NOMISS option is specified, the same set of 34 observations is used to compute the correlation for each pair of variables. The correlations are shown in Output 1.5.2.
Fish Measurement Data Pearson Correlation Coefficients, N = 34 Prob > r under H0: Rho=0 Weight3 Length3 Height Width Weight3 1.00000 0.96523 0.96261 0.92789 <.0001 <.0001 <.0001 Length3 0.96523 1.00000 0.95492 0.92171 <.0001 <.0001 <.0001 Height 0.96261 0.95492 1.00000 0.92632 <.0001 <.0001 <.0001 Width 0.92789 0.92171 0.92632 1.00000 <.0001 <.0001 <.0001
Since the data set contains only one species of fish, all the variables are highly correlated. This is evidenced in the scatter plot matrix for the analysis variables, which is shown in Output 1.7.3, created in Example 1.7.
Positive correlation is needed for the alpha coefficient because variables measure a common entity.
With the ALPHA option, the CORR procedure computes Cronbach s coefficient alpha, which is a lower bound for the reliability coefficient for the raw variables and the standardized variables.
Fish Measurement Data Cronbach Coefficient Alpha Variables Alpha ---------------------------- Raw 0.822134 Standardized 0.985145
Because the variances of some variables vary widely, you should use the standardized score to estimate reliability. The overall standardized Cronbach s coefficient alpha of 0.985145 provides an acceptable lower bound for the reliability coefficient. This is much greater than the suggested value of 0.70 given by Nunnally and Bernstein (1994).
The standardized alpha coefficient provides information on how each variable reflects the reliability of the scale with standardized variables. If the standardized alpha decreases after removing a variable from the construct, then this variable is strongly correlated with other variables in the scale. On the other hand, if the standardized alpha increases after removing a variable from the construct, then removing this variable from the scale makes the construct more reliable. The Cronbach Coefficient Alpha with Deleted Variables table in Output 1.5.4 does not show significant increase or decrease for the standardized alpha coefficients. See the section Cronbach s Coefficient Alpha on page 24 for more information regarding constructs and Cronbach s alpha.
Fish Measurement Data Cronbach Coefficient Alpha with Deleted Variable Raw Variables Standardized Variables Deleted Correlation Correlation Variable with Total Alpha with Total Alpha ------------------------------------------------------------------------ Weight3 0.975379 0.783365 0.973464 0.977103 Length3 0.967602 0.881987 0.967177 0.978783 Height 0.964715 0.655098 0.968079 0.978542 Width 0.934635 0.824069 0.937599 0.986626
The following statements compute Pearson correlations and covariances.
title 'Correlations for a Fitness and Exercise Study'; proc corr data=Fitness nomiss outp=CorrOutp; var weight oxygen runtime; run;
The NOMISS option excludes observations with missing values of the VAR statement variables from the analysis. The NOSIMPLE option suppresses the display of descriptive statistics, and the OUTP= option creates an output data set named CorrOutp that contains the Pearson correlation statistics. Since the NOMISS option is specified, the same set of 28 observations is used to compute the correlation for each pair of variables.
Correlations for a Fitness and Exercise Study The CORR Procedure Pearson Correlation Coefficients, N = 28 Prob > r under H0: Rho=0 Weight Oxygen RunTime Weight 1.00000 -0.18419 0.19505 0.3481 0.3199 Oxygen -0.18419 1.00000 -0.86843 0.3481 <.0001 RunTime 0.19505 -0.86843 1.00000 0.3199 <.0001
The following statements display the output data set, which is shown in Output 1.6.2.
title Output Data Set from PROC CORR; proc print data=CorrOutp noobs; run;
Output Data Set from PROC CORR _TYPE_ _NAME_ Weight Oxygen RunTime MEAN 77.2168 47.1327 10.6954 STD 8.4495 5.5535 1.4127 N 28.0000 28.0000 28.0000 CORR Weight 1.0000 -0.1842 0.1950 CORR Oxygen -0.1842 1.0000 -0.8684 CORR RunTime 0.1950 -0.8684 1.0000
The output data set has the default type CORR and can be used as an input data set for regression or other statistical procedures. For example, the following statements request a regression analysis using CorrOutp , without reading the original data in the REG procedure:
title 'Input Type CORR Data Set from PROC REG'; proc reg data=CorrOutp; model runtime= weight oxygen; run;
The preceding statements generate the same results as the following statements:
proc reg data=Fitness nomiss; model runtime= weight oxygen; run;
The following statements request a correlation analysis and a scatter plot matrix for the variables in the data set Fish1 , which was created in Example 1.5. This data set contains 35 observations, one of which contains a missing value for the variable Weight3 .
ods html; ods graphics on; title 'Fish Measurement Data'; proc corr data=fish1 nomiss plots=matrix; var Height Width Length3 Weight3; run; ods graphics off; ods html close;
By default, the CORR procedure displays descriptive statistics for the VAR statement variables, which are shown in Output 1.7.1.
Fish Measurement Data The CORR Procedure 4 Variables: Height Width Length3 Weight3 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Height 34 15.22057 1.98159 517.49950 11.52000 18.95700 Width 34 5.43805 0.72967 184.89370 4.02000 6.74970 Length3 34 38.38529 4.21628 1305 30.00000 46.50000 Weight3 34 8.44751 0.97574 287.21524 6.23168 10.00000
Since the NOMISS option is specified, the same set of 34 observations is used to compute the correlation for each pair of variables. The correlations are shown in Output 1.7.2.
Fish Measurement Data Pearson Correlation Coefficients, N = 34 Prob > r under H0: Rho=0 Height Width Length3 Weight3 Height 1.00000 0.92632 0.95492 0.96261 <.0001 <.0001 <.0001 Width 0.92632 1.00000 0.92171 0.92789 <.0001 <.0001 <.0001 Length3 0.95492 0.92171 1.00000 0.96523 <.0001 <.0001 <.0001 Weight3 0.96261 0.92789 0.96523 1.00000 <.0001 <.0001 <.0001
The variables are highly correlated. For example, the correlation between Height and Width is 0.92632.
The experimental PLOTS=MATRIX option requests a scatter plot matrix for the VAR statement variables, which is shown in Output 1.7.3.
In order to create this display, you must specify the experimental ODS GRAPHICS statement in addition to the PLOTS=MATRIX option. For general information about ODS graphics, refer to Chapter 15, Statistical Graphics Using ODS ( SAS/STAT User s Guide ). For specific information about ODS graphics available in the CORR procedure, see the section ODS Graphics on page 31.
To explore the correlation between Height and Width , the following statements request a scatter plot with prediction ellipses for the two variables, which is shown in Output 1.7.4. A prediction ellipse is a region for predicting a new observation from the population, assuming bivariate normality. It also approximates a region containing a specified percentage of the population.
ods html; ods graphics on; proc corr data=fish1 nomiss noprint plots=scatter(nmaxvar=2 alpha=.20 .30); var Height Width Length3 Weight3; run; ods graphics off; ods html close;
The NOMISS option is specified with the original VAR statement to ensure that the same set of 34 observations is used for this analysis. The experimental PLOTS=SCATTER(NMAXVAR=2) option requests a scatter plot for the first two variables in the VAR list. The ALPHA= suboption requests 80% and 70% prediction ellipses.
The prediction ellipse is centered at the means ( x , y ). For further details, see the section Confidence and Prediction Ellipses on page 33.
Note that the following statements can also be used to create a scatter plot for Height and Width :
ods html; ods graphics on; proc corr data=fish1 noprint plots=scatter(alpha=.20 .30); var Height Width; run; ods graphics off; ods html close;
Output 1.7.5 includes the point (13 . 9, 5 . 1), which was excluded from Output 1.7.4 because the observation had a missing value for Weight3 . The prediction ellipses in Output 1.7.5 also reflect the inclusion of this observation.
The following statements request a scatter plot with confidence ellipses for the mean, which is shown in Output 1.7.6:
ods html; ods graphics on; title Fish Measurement Data ; proc corr data=fish1 nomiss noprint plots=scatter(ellipse=mean nmaxvar=2 alpha=.05 .01); var Height Width Length3 Weight3; run; ods graphics off; ods html close;
The experimental PLOTS=SCATTER option requests scatter plots for all the variables in the VAR statement, and the NMAXVAR=2 suboption restricts the number of plots created to the first two variables in the VAR statement. The ELLIPSE=MEAN and ALPHA= suboptions request 95% and 99% confidence ellipses for the mean.
The confidence ellipse for the mean is centered at the means ( x , y ). For further details, see the section Confidence and Prediction Ellipses on page 33.
A partial correlation measures the strength of the linear relationship between two variables, while adjusting for the effect of other variables.
The following statements request a partial correlation analysis of variables Height and Width while adjusting for the variables Length3 and Weight . The latter variables, which are said to be partialled out of the analysis, are specified with the PARTIAL statement.
ods html; ods graphics on; title 'Fish Measurement Data'; proc corr data=fish1 plots=scatter(alpha=.20 .30); var Height Width; partial Length3 Weight3; run; ods graphics off; ods html close;
By default, the CORR procedure displays descriptive statistics for all the variables and the partial variance and partial standard deviation for the VAR statement variables, as shown in Output 1.8.1.
Fish Measurement Data The CORR Procedure 2 Partial Variables: Length3 Weight3 2 Variables: Height Width Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Length3 34 38.38529 4.21628 1305 30.00000 46.50000 Weight3 34 8.44751 0.97574 287.21524 6.23168 10.00000 Height 34 15.22057 1.98159 517.49950 11.52000 18.95700 Width 34 5.43805 0.72967 184.89370 4.02000 6.74970 Simple Statistics Partial Partial Variable Variance Std Dev Length3 Weight3 Height 0.26607 0.51582 Width 0.07315 0.27047
When a PARTIAL statement is specified, observations with missing values are excluded from the analysis. The partial correlations for the VAR statement variables are shown in Output 1.8.2.
Fish Measurement Data Pearson Partial Correlation Coefficients, N = 34 Prob > r under H0: Partial Rho=0 Height Width Height 1.00000 0.25692 0.1558 Width 0.25692 1.00000 0.1558
The partial correlation between the variables Height and Width is 0.25692, which is much less than the unpartialled correlation, 0.92632. The p -value for the partial correlation is 0.1558.
The PLOTS=SCATTER option requests a scatter plot of the residuals for the variables Height and Width after controlling for the effect of variables Length3 and Weight .
The ALPHA= suboption requests 80% and 70% prediction ellipses. The scatter plot is shown in Output 1.8.3.
In Output 1.8.3, a standard deviation of Height has roughly the same length on the X-axis as a standard deviation of Width on the Y-axis. The major axis length is not significantly larger than the minor axis length, indicating a weak partial correlation between Height and Width .