Examples | Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4

Example 1.1. Computing Four Measures of Association

This example produces a correlation analysis with descriptive statistics and four measures of association: the Pearson product-moment correlation, the Spearman rank-order correlation, Kendall s tau-b coefficients, and Hoeffding s measure of dependence, D .

The Fitness data set created in the Getting Started section beginning on page 4 contains measurements from a study of physical fitness of 31 participants . The following statements request all four measures of association for the variables Weight , Oxygen , and Runtime .

  ods html;   ods graphics on;   title 'Measures of Association for a Physical Fitness Study';   proc corr data=Fitness pearson spearman kendall hoeffding   plots;   var Weight Oxygen RunTime;   run;   ods graphics off;   ods html close;

Note that Pearson correlations are computed by default only if all three nonparametric correlations (SPEARMAN, KENDALL, and HOEFFDING) are not specified. Otherwise, you need to specify the PEARSON option explicitly to compute Pearson correlations .

By default, observations with nonmissing values for each variable are used to derive the univariate statistics for that variable. When nonparametric measures of association are specified, the procedure displays the median instead of the sum as an additional descriptive measure.

Output 1.1.1: Simple Statistics

  Measures of Association for a Physical Fitness Study   The CORR Procedure   3 Variables:     Weight   Oxygen   RunTime   Simple Statistics   Variable      N       Mean     Std Dev       Median      Minimum       Maximum   Weight       31   77.44452     8.32857     77.45000     59.08000      91.63000   Oxygen       29   47.22721     5.47718     46.67200     37.38800      60.05500   RunTime      29   10.67414     1.39194     10.50000      8.17000      14.03000

The Pearson correlation is a parametric measure of association for two continuous random variables. When there is missing data, the number of observations used to calculate the correlation can vary.

In Output 1.1.2, the Pearson correlation between Runtime and Oxygen is ˆ’ 0.86843, which is significant with a p -value less than 0.0001. This indicates a strong negative linear relationship between these two variables. As Runtime increases , Oxygen decreases linearly.

Output 1.1.2: Pearson Correlation Coefficients

  Measures of Association for a Physical Fitness Study   Pearson Correlation Coefficients   Prob > r under H0: Rho=0   Number of Observations   Weight        Oxygen       RunTime   Weight         1.00000      -0.15358       0.20072   0.4264        0.2965   31            29            29   Oxygen        -0.15358       1.00000      -0.86843   0.4264                      <.0001   29            29            28   RunTime        0.20072      -0.86843       1.00000   0.2965        <.0001   29            28            29

The Spearman rank-order correlation is a nonparametric measure of association based on the ranks of the data values. The Spearman Correlation Coefficients table shown in Output 1.1.3 displays results similar to those of the Pearson Correlation Coefficients table.

Output 1.1.3: Spearman Correlation Coefficients

  Measures of Association for a Physical Fitness Study   Spearman Correlation Coefficients   Prob > r under H0: Rho=0   Number of Observations   Weight        Oxygen       RunTime   Weight         1.00000      -0.06824       0.13749   0.7250        0.4769   31            29            29   Oxygen        -0.06824       1.00000      -0.80131   0.7250                      <.0001   29            29            28   RunTime        0.13749      -0.80131       1.00000   0.4769        <.0001   29            28            29

Kendall s tau-b is a nonparametric measure of association based on the number of concordances and discordances in paired observations. The Kendall Tau-b Correlation Coefficients table shown in Output 1.1.4 displays results similar to those of the Pearson Correlation Coefficients table in Output 1.1.2.

Output 1.1.4: Kendall s Tau-b Correlation Coefficients

  Measures of Association for a Physical Fitness Study   Kendall Tau b Correlation Coefficients   Prob > r under H0: Rho=0   Number of Observations   Weight        Oxygen        RunTime   Weight         1.00000      -0.00988        0.06675   0.9402         0.6123   31            29             29   Oxygen        -0.00988       1.00000       -0.62434   0.9402                       <.0001   29            29             28   RunTime        0.06675      -0.62434        1.00000   0.6123        <.0001   29            28             29

Hoeffding s measure of dependence, D , is a nonparametric measure of association that detects more general departures from independence. Without ties in the variables, the values of the D statistic can vary between -0.5 and 1, with 1 indicating complete dependence. Otherwise, the D statistic can result in a smaller value. Since ties occur in the variable Weight , the D statistic for the Weight variable is less than 1, as shown in the Hoeffding Dependence Coefficients table in Output 1.1.5.

Output 1.1.5: Hoeffding s Dependence Coefficients

  Measures of Association for a Physical Fitness Study   Hoeffding Dependence Coefficients   Prob > D under H0: D=0   Number of Observations   Weight        Oxygen       RunTime   Weight        0.97690      -0.00497      -0.02355   <.0001        0.5101        1.0000   31            29            29   Oxygen       -0.00497       1.00000       0.23449   0.5101                      <.0001   29            29            28   RunTime      -0.02355       0.23449       1.00000   1.0000        <.0001   29            28            29

The experimental PLOTS option requests a symmetric scatter plot for the analysis variables listed in the VAR statement. The strong negative linear relationship between Oxygen and Runtime is evident in Output 1.1.6.

Output 1.1.6: Symmetric Scatter Plot Matrix (Experimental)

This display is requested by specifying both the ODS GRAPHICS statement and the PLOTS option. For general information about ODS graphics, refer to Chapter 15, Statistical Graphics Using ODS ( SAS/STAT User s Guide ). For specific information about the graphics available in the CORR procedure, see the section ODS Graphics on page 31.

Example 1.2. Computing Correlations between Two Sets of Variables

The following statements create a data set which contains measurements for four iris parts from Fisher s iris data (1936): sepal length, sepal width, petal length, and petal width. Each observation represents one specimen.

  *------------------- Data on Iris Setosa --------------------*   The data set contains 50 iris specimens from the species   Iris Setosa with the following four measurements:   SepalLength (sepal length)   SepalWidth (sepal width)   PetalLength (petal length)   PetalWidth (petal width)   Certain values were changed to missing for the analysis.   *------------------------------------------------------------*;   data Setosa;   input SepalLength SepalWidth PetalLength PetalWidth @@;   label sepallength='Sepal Length in mm.'   sepalwidth='Sepal Width in mm.'   petallength='Petal Length in mm.'   petalwidth='Petal Width in mm.';   datalines;   50 33 14 02  46 34 14 03  46 36 .  02   51 33 17 05  55 35 13 02  48 31 16 02   52 34 14 02  49 36 14 01  44 32 13 02   50 35 16 06  44 30 13 02  47 32 16 02   48 30 14 03  51 38 16 02  48 34 19 02   50 30 16 02  50 32 12 02  43 30 11 .   58 40 12 02  51 38 19 04  49 30 14 02   51 35 14 02  50 34 16 04  46 32 14 02   57 44 15 04  50 36 14 02  54 34 15 04   52 41 15 .   55 42 14 02  49 31 15 02   54 39 17 04  50 34 15 02  44 29 14 02   47 32 13 02  46 31 15 02  51 34 15 02   50 35 13 03  49 31 15 01  54 37 15 02   54 39 13 04  51 35 14 03  48 34 16 02   48 30 14 01  45 23 13 03  57 38 17 03   51 38 15 03  54 34 17 02  51 37 15 04   52 35 15 02  53 37 15 02   ;

The following statements request a correlation analysis between two sets of variables, the sepal measurements and the petal measurements.

  ods html;   ods graphics on;   title 'Fisher (1936) Iris Setosa Data';   proc corr data=Setosa sscp cov plots;   var sepallength sepalwidth;   with petallength petalwidth;   run;   ods graphics off;   ods html close;

The CORR procedure displays univariate statistics for variables in the VAR and WITH statements.

Output 1.2.1: Simple Statistics

  Fisher (1936) Iris Setosa Data0   The CORR Procedure   2 With Variables: PetalLength PetalWidth   2      Variables: SepalLength SepalWidth   Simple Statistics   Variable              N          Mean    StdDev             Sum   PetalLength          49      14.71429    1.62019      721.00000   PetalWidth           48       2.52083    1.03121      121.00000   SepalLength          50      50.06000    3.52490           2503   SepalWidth           50      34.28000    3.79064           1714   Simple Statistics   Variable          Minimum      Maximum  Label   PetalLength       11.00000    19.00000  Petal Length in mm.   PetalWidth         1.00000     6.00000  Petal Width in mm.   SepalLength       43.00000    58.00000  Sepal Length in mm.   SepalWidth        23.00000    44.00000  Sepal Width in mm.

When the WITH statement is specified together with the VAR statement, the CORR procedure produces rectangular matrices for statistics such as covariances and correlations. The matrix rows correspond to the WITH variables ( PetalLength and PetalWidth ) while the matrix columns correspond to the VAR variables ( SepalLength and SepalWidth ). The CORR procedure uses the WITH variable labels to label the matrix rows.

The SSCP option requests a table of the uncorrected sum-of-squares and crossproducts matrix, and the COV option requests a table of the covariance matrix. The SSCP and COV options also produce a table of the Pearson correlations.

The sum-of-squares and crossproducts statistics for each pair of variables are computed by using observations with nonmissing row and column variable values. The Sums of Squares and Crossproducts table shown in Output 1.2.2 displays the crossproduct, sum of squares for the row variable, and sum of squares for the column variable for each pair of variables.

Output 1.2.2: Sum-of-squares and Crossproducts

  Fisher (1936) Iris Setosa Data   Sums of Squares and Crossproducts   SSCP / Row Var SS / Col Var SS   SepalLength        SepalWidth   PetalLength               36214.00000       24756.00000   Petal Length in mm.       10735.00000       10735.00000   123793.0000        58164.0000   PetalWidth                 6113.00000        4191.00000   Petal Width in mm.          355.00000         355.00000   121356.0000        56879.0000

The variances are computed by using observations with nonmissing row and column variable values. The Variances and Covariances table shown in Output 1.2.3 displays the covariance, variance for the row variable, variance for the column variable, and the associated degrees of freedom for each pair of variables.

Output 1.2.3: Variances and Covariances

  Fisher (1936) Iris Setosa Data   Variances and Covariances   Covariance / Row Var Variance / Col Var Variance / DF   SepalLength        SepalWidth   PetalLength               1.270833333       1.363095238   Petal Length in mm.       2.625000000       2.625000000   12.33333333       14.60544218   48                48   PetalWidth                0.911347518       1.048315603   Petal Width in mm.        1.063386525       1.063386525   11.80141844       13.62721631   47                47

When there are missing values in the analysis variables, the Pearson Correlation Coefficients table shown in Output 1.2.4 displays the correlation, the p -value under the null hypothesis of zero correlation, and the number of observations for each pair of variables. Only the correlation between PetalWidth and SepalLength and the correlation between PetalWidth and SepalWidth are slightly positive.

Output 1.2.4: Pearson Correlation Coefficients

  Fisher (1936) Iris Setosa Data   Pearson Correlation Coefficients   Prob > r under H0: Rho=0   Number of Observations   Sepal         Sepal   Length         Width   PetalLength              0.22335       0.22014   Petal Length in mm.       0.1229        0.1285   49            49   PetalWidth               0.25726       0.27539   Petal Width in mm.        0.0775        0.0582   48            48

The experimental PLOTS option displays a rectangular scatter plot matrix for the two sets of variables. The VAR variables SepalLength and SepalWidth are listed across the top of the matrix, and the WITH variables PetalLength and PetalWidth are listed down the side of the matrix. As measured in Output 1.2.4, the plot for PetalWidth and SepalLength and the plot for PetalWidth and SepalWidth show slight positive correlations.

Output 1.2.5: Rectangular Matrix Plot (Experimental)

Example 1.3. Analysis Using Fisher s z Transformation

The following statements request Pearson correlation statistics using Fisher s z transformation for the data set Fitness .

  proc corr data=Fitness nosimple fisher;   var weight oxygen runtime;   run;

The NOSIMPLE option suppresses the table of descriptive statistics. The Pearson Correlation Coefficients table is displayed by default.

Output 1.3.1: Sample Correlations

  Fisher (1936) Iris Setosa Data   The CORR Procedure   Pearson Correlation Coefficients   Prob > r under H0: Rho=0   Number of Observations   Weight        Oxygen       RunTime   Weight        1.00000      -0.15358       0.20072   0.4264        0.2965   31            29            29   Oxygen       -0.15358       1.00000      -0.86843   0.4264                      <.0001   29            29            28   RunTime       0.20072      -0.86843       1.00000   0.2965        <.0001   29            28            29

The FISHER option requests correlation statistics using Fisher s z transformation, which are shown in Output 1.3.2.

Output 1.3.2: Correlation Statistics Using Fisher s z Transformation

  Pearson Correlation Statistics (Fishers z Transformation)   With                     Sample                    Bias  Correlation   Variable   Variable         N  Correlation  Fishers z  Adjustment     Estimate   Weight     Oxygen          29     -0.15358    -0.15480    -0.00274     -0.15090   Weight     RunTime         29      0.20072     0.20348     0.00358      0.19727   Oxygen     RunTime         28     -0.86843    -1.32665    -0.01608     -0.86442   Pearson Correlation Statistics (Fishers z Transformation)   With                                   p Value for   Variable  Variable     95% Confidence Limits        H0:Rho=0   Weight    Oxygen       -0.490289      0.228229        0.4299   Weight    RunTime      -0.182422      0.525765        0.2995   Oxygen    RunTime      -0.935728     -0.725221        <.0001

See the section Fisher s z Transformation on page 21 for details on Fisher s z transformation.

The following statements request one-sided hypothesis tests and confidence limits for the correlation using Fisher s z transformation.

  proc corr data=Fitness nosimple nocorr fisher (type=lower);   var weight oxygen runtime;   run;

The NOSIMPLE option suppresses the Simple Statistics table, and the NOCORR option suppresses the Pearson Correlation Coefficients table.

Output 1.3.3: One-sided Correlation Analysis Using Fisher s z Transformation

  The CORR Procedure   Pearson Correlation Statistics (Fisher's z Transformation)   With                     Sample                    Bias  Correlation   Variable Variable         N  Correlation  Fisher's z  Adjustment     Estimate   Weight   Oxygen          29     -0.15358    -0.15480     -0.00274    -0.15090   Weight   RunTime         29      0.20072     0.20348      0.00358     0.19727   Oxygen   RunTime         28     -0.86843    -1.32665     -0.01608    -0.86442   Pearson Correlation Statistics (Fisher's z Transformation)   With                     p Value for   Variable  Variable  Lower 95% CL     H0:Rho<=0   Weight    Oxygen       -0.441943        0.7850   Weight    RunTime      -0.122077        0.1497   Oxygen    RunTime      -0.927408        1.0000

The TYPE=LOWER option requests a lower confidence limit and a p -value for the test of the one-sided hypothesis H0 : ‰ 0 against the alternative hypothesis H ₁ : > 0. Here Fisher s z , the bias adjustment, and the estimate of the correlation are the same as for the two-sided alternative. However, because TYPE=LOWER is specified, only a lower confidence limit is computed for each correlation, and one-sided p -values are computed.

Example 1.4. Applications of Fisher s z Transformation

This example illustrates some applications of Fisher s z transformation. For details, see the section Fisher s z Transformation on page 21.

The following statements simulate independent samples of variables X and Y from a bivariate normal distribution. The first batch of 150 observations is sampled using a known correlation of 0.3, the second batch of 150 observations is sampled using a known correlation of 0.25, and the third batch of 100 observations is sampled using a known correlation of 0.3.

  data Sim (drop=i);   do i=1 to 400;   X = rannor(135791);   Batch = 1 + (i>150) + (i>300);   if Batch = 1 then Y = 0.3*X + 0.9*rannor(246791);   if Batch = 2 then Y = 0.25*X + sqrt(.8375)*rannor(246791);   if Batch = 3 then Y = 0.3*X + 0.9*rannor(246791);   output;   end;   run;

This data set will be used to illustrate the following applications of Fisher s z transformation:

Testing whether a population correlation is equal to a given value
Testing for equality of two population correlations
Combining correlation estimates from different samples

See the section Fisher s z Transformation on page 21.

Testing Whether a Population Correlation Is Equal to a Given Value

You can use the following statements to test the null hypothesis H : = 0 . 5 against a two-sided alternative H ₁ : ‰ . 5.

  ods select FisherPearsonCorr;   title 'Analysis for Batch 1';   proc corr data=Sim (where=(Batch=1)) fisher(rho0=.5);   var X Y;   run;

The test is requested with the option FISHER(RHO0=0.5). The results, which are based on Fisher s transformation, are shown in Output 1.4.1.

Output 1.4.1: Fisher s Test for H : =

  Analysis for Batch 1   The CORR Procedure   Pearson Correlation Statistics (Fisher's z Transformation)   With                     Sample                    Bias  Correlation   Variable Variable         N  Correlation  Fisher's z  Adjustment     Estimate   X        Y              150      0.22081     0.22451   0.0007410      0.22011   Pearson Correlation Statistics (Fisher's z Transformation)   With                                   ------H0:Rho=Rho0-----   Variable Variable     95% Confidence Limits           Rho0     p Value   X        Y             0.062034      0.367409      0.50000      <.0001

The null hypothesis is rejected since the p -value is less than 0.0001.

Testing for Equality of Two Population Correlations

You can use the following statements to test for equality of two population correlations, ₁ and ₂ . Here, the null hypothesis H : ₁ = ₂ is tested against the alternative H ₁ : ₁ ‰ ₂ .

  ods select FisherPearsonCorr;   ods output FisherPearsonCorr=SimCorr;   title 'Testing Equality of Population Correlations';   proc corr data=Sim (where=(Batch=1 or Batch=2)) fisher;   var X Y;   by Batch;   run;

The ODS SELECT statement restricts the output from PROC CORR to the FisherPearsonCorr table, which is shown in Output 1.4.2; see the section ODS Table Names on page 30. The output data set SimCorr contains Fisher s z statistics for both batches.

Output 1.4.2: Fisher s Correlation Statistics

  Testing Equality of Population Correlations   --------------------------------- Batch=1 ------------------------------------   The CORR Procedure   Pearson Correlation Statistics (Fisher's z Transformation)   With                     Sample                     Bias Correlation   Variable Variable         N Correlation   Fisher's z  Adjustment     Estimate   X        Y              150     0.22081      0.22451   0.0007410      0.22011   Pearson Correlation Statistics (Fisher's z Transformation)   With                                   p Value for   Variable  Variable     95% Confidence Limits        H0:Rho=0   X         Y             0.062034      0.367409        0.0065   Testing Equality of Population Correlations   --------------------------------- Batch=2 ------------------------------------   The CORR Procedure   Pearson Correlation Statistics (Fisher's z Transformation)   With                     Sample                    Bias  Correlation   Variable Variable         N  Correlation  Fisher's z  Adjustment     Estimate   X        Y              150      0.33694     0.35064     0.00113      0.33594   Pearson Correlation Statistics (Fisher's z Transformation)   With                                   p Value for   Variable  Variable     95% Confidence Limits        H0:Rho=0   X         Y             0.185676      0.470853        <.0001

The p -value for testing H is derived by treating the difference z ₁ ˆ’ z ₂ as a normal random variable with mean zero and variance 1 / ( n ₁ ˆ’ 3)+1 / ( n ₂ ˆ’ 3), where z ₁ and z ₂ are Fisher s z transformation of the sample correlations r ₁ and r ₂ , respectively, and where n ₁ and n ₂ are the corresponding sample sizes.

The following statements compute the p -value shown in Output 1.4.3.

  data SimTest (drop=Batch);   merge SimCorr (where=(Batch=1) keep=Nobs ZVal Batch   rename=(Nobs=n1 ZVal=z1))   SimCorr (where=(Batch=2) keep=Nobs ZVal Batch   rename=(Nobs=n2 ZVal=z2));   variance = 1/(n1-3) + 1/(n2-3);   z = (z1 - z2) / sqrt(variance);   pval = probnorm(z);   if (pval > 0.5) then pval = 1 - pval;   pval = 2*pval;   run;   proc print data=SimTest noobs;   run;

Output 1.4.3: Test of Equality of Observed Correlations

  n1            z1           n2          z2    variance      z       pval   150       0.22451          150     0.35064    0.013605  -1.08135  0.27954

In Output 1.4.3, the p -value of 0.2795 does not provide evidence to reject the null hypothesis that ₁ = ₂ . The sample sizes n ₁ = 150 and n ₂ = 150 are not large enough to detect the difference ₁ ˆ’ ₂ = 0 . 05 at a significance level of ± = 0 . 05.

Combining Correlation Estimates from Different Samples

Assume that sample correlations r ₁ and r ₂ are computed from two independent samples of n ₁ and n ₂ observations, respectively. A combined correlation estimate is given by r = tanh( z ), where z is the weighted average of the z -transformations of r ₁ and r ₂ :

The following statements compute a combined estimate of using Batch 1 and Batch 3:

  ods output FisherPearsonCorr=SimCorr2;   proc corr data=Sim (where=(Batch=1 or Batch=3)) fisher noprint;   var X Y;   by Batch;   run;   data SimComb (drop=Batch);   merge SimCorr2 (where=(Batch=1) keep=Nobs ZVal Batch   rename=(Nobs=n1 ZVal=z1))   SimCorr2 (where=(Batch=3) keep=Nobs ZVal Batch   rename=(Nobs=n2 ZVal=z2));   z = ((n1-3)*z1 + (n2-3)*z2) / (n1+n2-6);   corr = tanh(z);   var = 1/(n1+n2-6);   lcl = corr - probit(0.975)*sqrt(var);   ucl = corr + probit(0.975)*sqrt(var);   run;   proc print data=SimComb noobs;   var n1 z1 n2 z2 corr lcl ucl;   run;

Output 1.4.4 displays the combined estimate of .

Output 1.4.4: Combined Correlation Estimate

  n1          z1         n2            z2      corr    lcl      ucl   150     0.22451        100       0.23929   0.22640  0.10092  0.35187

Thus, a correlation estimate from the combined samples is r = 0 . 23. The 95% confidence interval displayed in Output 1.4.4 is (0 . 10 , . 35) using the variance of the combined estimate. Note that this interval contains the population correlation 0.3. See the section Applications of Fisher s z Transformation on page 23.

Example 1.5. Computing Cronbach s Coefficient Alpha

The following statements create the data set Fish1 from the Fish data set used in Chapter 67, The STEPDISC Procedure. The cubic root of the weight ( Weight3 ) is computed as a one-dimensional measure of the size of a fish.

  *------------------- Fish Measurement Data ----------------------*   The data set contains 35 fish from the species Bream caught in   Finlands lake Laengelmavesi with the following measurements:   Weight   (in grams)   Length3  (length from the nose to the end of its tail, in cm)   HtPct    (max height, as percentage of Length3)   WidthPct (max width,  as percentage of Length3)   *----------------------------------------------------------------*;   data Fish1 (drop=HtPct WidthPct);   title 'Fish Measurement Data';   input Weight Length3 HtPct WidthPct @@;   Weight3= Weight**(1/3);   Height=HtPct*Length3/100;   Width=WidthPct*Length3/100;   datalines;   242.0 30.0 38.4 13.4     290.0 31.2 40.0 13.8   340.0 31.1 39.8 15.1     363.0 33.5 38.0 13.3   430.0 34.0 36.6 15.1     450.0 34.7 39.2 14.2   500.0 34.5 41.1 15.3     390.0 35.0 36.2 13.4   450.0 35.1 39.9 13.8     500.0 36.2 39.3 13.7   475.0 36.2 39.4 14.1     500.0 36.2 39.7 13.3   500.0 36.4 37.8 12.0        .  37.3 37.3 13.6   600.0 37.2 40.2 13.9     600.0 37.2 41.5 15.0   700.0 38.3 38.8 13.8     700.0 38.5 38.8 13.5   610.0 38.6 40.5 13.3     650.0 38.7 37.4 14.8   575.0 39.5 38.3 14.1     685.0 39.2 40.8 13.7   620.0 39.7 39.1 13.3     680.0 40.6 38.1 15.1   700.0 40.5 40.1 13.8     725.0 40.9 40.0 14.8   720.0 40.6 40.3 15.0     714.0 41.5 39.8 14.1   850.0 41.6 40.6 14.9    1000.0 42.6 44.5 15.5   920.0 44.1 40.9 14.3     955.0 44.0 41.1 14.3   925.0 45.3 41.4 14.9     975.0 45.9 40.6 14.7   950.0 46.5 37.9 13.7   ;

The following statements request a correlation analysis and compute Cronbach s coefficient alpha for the variables Weight3 , Length3 , Height , and Width .

  ods html;   ods graphics on;   title 'Fish Measurement Data';   proc corr data=fish1 nomiss alpha plots;   var Weight3 Length3 Height Width;   run;   ods graphics off;   ods html close;

The NOMISS option excludes observations with missing values, and the PLOTS option requests a symmetric scatter plot matrix for the analysis variables.

By default, the CORR procedure displays descriptive statistics for each variable, as shown in Output 1.5.1.

Output 1.5.1: Simple Statistics

  Fish Measurement Data   The CORR Procedure   4 Variables:     Weight3  Length3   Height   Width   Simple Statistics   Variable       N        Mean      Std Dev          Sum     Minimum     Maximum   Weight3       34     8.44751      0.97574    287.21524     6.23168    10.00000   Length3       34    38.38529      4.21628         1305    30.00000    46.50000   Height        34    15.22057      1.98159    517.49950    11.52000    18.95700   Width         34     5.43805      0.72967    184.89370     4.02000     6.74970

Since the NOMISS option is specified, the same set of 34 observations is used to compute the correlation for each pair of variables. The correlations are shown in Output 1.5.2.

Output 1.5.2: Pearson Correlation Coefficients

  Fish Measurement Data   Pearson Correlation Coefficients, N = 34   Prob > r under H0: Rho=0   Weight3       Length3        Height          Width   Weight3            1.00000       0.96523       0.96261        0.92789   <.0001       <.0001         <.0001   Length3            0.96523       1.00000       0.95492        0.92171   <.0001                     <.0001         <.0001   Height             0.96261       0.95492       1.00000        0.92632   <.0001       <.0001                       <.0001   Width              0.92789       0.92171       0.92632        1.00000   <.0001       <.0001        <.0001

Since the data set contains only one species of fish, all the variables are highly correlated. This is evidenced in the scatter plot matrix for the analysis variables, which is shown in Output 1.7.3, created in Example 1.7.

Output 1.7.3: Scatter Plot Matrix (Experimental)

Positive correlation is needed for the alpha coefficient because variables measure a common entity.

With the ALPHA option, the CORR procedure computes Cronbach s coefficient alpha, which is a lower bound for the reliability coefficient for the raw variables and the standardized variables.

Output 1.5.3: Cronbach s Coefficient Alpha

  Fish Measurement Data   Cronbach Coefficient Alpha   Variables              Alpha   ----------------------------   Raw                 0.822134   Standardized        0.985145

Because the variances of some variables vary widely, you should use the standardized score to estimate reliability. The overall standardized Cronbach s coefficient alpha of 0.985145 provides an acceptable lower bound for the reliability coefficient. This is much greater than the suggested value of 0.70 given by Nunnally and Bernstein (1994).

The standardized alpha coefficient provides information on how each variable reflects the reliability of the scale with standardized variables. If the standardized alpha decreases after removing a variable from the construct, then this variable is strongly correlated with other variables in the scale. On the other hand, if the standardized alpha increases after removing a variable from the construct, then removing this variable from the scale makes the construct more reliable. The Cronbach Coefficient Alpha with Deleted Variables table in Output 1.5.4 does not show significant increase or decrease for the standardized alpha coefficients. See the section Cronbach s Coefficient Alpha on page 24 for more information regarding constructs and Cronbach s alpha.

Output 1.5.4: Cronbach s Coefficient Alpha with Deleted Variables

  Fish Measurement Data   Cronbach Coefficient Alpha with Deleted Variable   Raw Variables              Standardized Variables   Deleted       Correlation                     Correlation   Variable       with Total           Alpha      with Total          Alpha   ------------------------------------------------------------------------   Weight3          0.975379        0.783365        0.973464       0.977103   Length3          0.967602        0.881987        0.967177       0.978783   Height           0.964715        0.655098        0.968079       0.978542   Width            0.934635        0.824069        0.937599       0.986626

Example 1.6. Saving Correlations in an Output Data Set

The following statements compute Pearson correlations and covariances.

  title 'Correlations for a Fitness and Exercise Study';   proc corr data=Fitness nomiss outp=CorrOutp;   var weight oxygen runtime;   run;

The NOMISS option excludes observations with missing values of the VAR statement variables from the analysis. The NOSIMPLE option suppresses the display of descriptive statistics, and the OUTP= option creates an output data set named CorrOutp that contains the Pearson correlation statistics. Since the NOMISS option is specified, the same set of 28 observations is used to compute the correlation for each pair of variables.

Output 1.6.1: Pearson Correlation Coefficients

  Correlations for a Fitness and Exercise Study   The CORR Procedure   Pearson Correlation Coefficients, N = 28   Prob > r under H0: Rho=0   Weight        Oxygen        RunTime   Weight        1.00000      -0.18419        0.19505   0.3481         0.3199   Oxygen       -0.18419       1.00000       -0.86843   0.3481                       <.0001   RunTime       0.19505      -0.86843        1.00000   0.3199        <.0001

The following statements display the output data set, which is shown in Output 1.6.2.

  title Output Data Set from PROC CORR;   proc print data=CorrOutp noobs;   run;

Output 1.6.2: OUTP= Data Set with Pearson Correlations

  Output Data Set from PROC CORR   _TYPE_    _NAME_       Weight      Oxygen      RunTime   MEAN                 77.2168     47.1327      10.6954   STD                   8.4495      5.5535       1.4127   N                    28.0000     28.0000      28.0000   CORR     Weight       1.0000     -0.1842       0.1950   CORR     Oxygen      -0.1842      1.0000      -0.8684   CORR     RunTime      0.1950     -0.8684       1.0000

The output data set has the default type CORR and can be used as an input data set for regression or other statistical procedures. For example, the following statements request a regression analysis using CorrOutp , without reading the original data in the REG procedure:

  title 'Input Type CORR Data Set from PROC REG';   proc reg data=CorrOutp;   model runtime= weight oxygen;   run;

The preceding statements generate the same results as the following statements:

  proc reg data=Fitness nomiss;   model runtime= weight oxygen;   run;

Example 1.7. Creating Scatter Plots

The following statements request a correlation analysis and a scatter plot matrix for the variables in the data set Fish1 , which was created in Example 1.5. This data set contains 35 observations, one of which contains a missing value for the variable Weight3 .

  ods html;   ods graphics on;   title 'Fish Measurement Data';   proc corr data=fish1 nomiss plots=matrix;   var Height Width Length3 Weight3;   run;   ods graphics off;   ods html close;

By default, the CORR procedure displays descriptive statistics for the VAR statement variables, which are shown in Output 1.7.1.

Output 1.7.1: Simple Statistics

  Fish Measurement Data   The CORR Procedure   4 Variables:     Height   Width    Length3    Weight3   Simple Statistics   Variable       N         Mean     Std Dev         Sum      Minimum     Maximum   Height        34     15.22057     1.98159   517.49950     11.52000    18.95700   Width         34      5.43805     0.72967   184.89370      4.02000     6.74970   Length3       34     38.38529     4.21628        1305     30.00000    46.50000   Weight3       34      8.44751     0.97574   287.21524      6.23168    10.00000

Since the NOMISS option is specified, the same set of 34 observations is used to compute the correlation for each pair of variables. The correlations are shown in Output 1.7.2.

Output 1.7.2: Pearson Correlation Coefficients

  Fish Measurement Data   Pearson Correlation Coefficients, N = 34   Prob > r under H0: Rho=0   Height         Width       Length3         Weight3   Height             1.00000       0.92632       0.95492        0.96261   <.0001       <.0001         <.0001   Width              0.92632       1.00000       0.92171        0.92789   <.0001                     <.0001         <.0001   Length3            0.95492       0.92171       1.00000        0.96523   <.0001       <.0001                       <.0001   Weight3            0.96261       0.92789       0.96523        1.00000   <.0001       <.0001        <.0001

The variables are highly correlated. For example, the correlation between Height and Width is 0.92632.

The experimental PLOTS=MATRIX option requests a scatter plot matrix for the VAR statement variables, which is shown in Output 1.7.3.

In order to create this display, you must specify the experimental ODS GRAPHICS statement in addition to the PLOTS=MATRIX option. For general information about ODS graphics, refer to Chapter 15, Statistical Graphics Using ODS ( SAS/STAT User s Guide ). For specific information about ODS graphics available in the CORR procedure, see the section ODS Graphics on page 31.

To explore the correlation between Height and Width , the following statements request a scatter plot with prediction ellipses for the two variables, which is shown in Output 1.7.4. A prediction ellipse is a region for predicting a new observation from the population, assuming bivariate normality. It also approximates a region containing a specified percentage of the population.

  ods html;   ods graphics on;   proc corr data=fish1 nomiss noprint   plots=scatter(nmaxvar=2 alpha=.20 .30);   var Height Width Length3 Weight3;   run;   ods graphics off;   ods html close;

The NOMISS option is specified with the original VAR statement to ensure that the same set of 34 observations is used for this analysis. The experimental PLOTS=SCATTER(NMAXVAR=2) option requests a scatter plot for the first two variables in the VAR list. The ALPHA= suboption requests 80% and 70% prediction ellipses.

Output 1.7.4: Scatter Plot with Prediction Ellipses (Experimental)

The prediction ellipse is centered at the means ( x , y ). For further details, see the section Confidence and Prediction Ellipses on page 33.

Note that the following statements can also be used to create a scatter plot for Height and Width :

  ods html;   ods graphics on;   proc corr data=fish1 noprint   plots=scatter(alpha=.20 .30);   var Height Width;   run;   ods graphics off;   ods html close;

Output 1.7.5 includes the point (13 . 9, 5 . 1), which was excluded from Output 1.7.4 because the observation had a missing value for Weight3 . The prediction ellipses in Output 1.7.5 also reflect the inclusion of this observation.

Output 1.7.5: Scatter Plot with Prediction Ellipses (Experimental)

The following statements request a scatter plot with confidence ellipses for the mean, which is shown in Output 1.7.6:

  ods html;   ods graphics on;   title   Fish Measurement Data   ;   proc corr data=fish1 nomiss noprint   plots=scatter(ellipse=mean nmaxvar=2 alpha=.05 .01);   var Height Width Length3 Weight3;   run;   ods graphics off;   ods html close;

Output 1.7.6: Scatter Plot with Confidence Ellipses (Experimental)

The experimental PLOTS=SCATTER option requests scatter plots for all the variables in the VAR statement, and the NMAXVAR=2 suboption restricts the number of plots created to the first two variables in the VAR statement. The ELLIPSE=MEAN and ALPHA= suboptions request 95% and 99% confidence ellipses for the mean.

The confidence ellipse for the mean is centered at the means ( x , y ). For further details, see the section Confidence and Prediction Ellipses on page 33.

Example 1.8. Computing Partial Correlations

A partial correlation measures the strength of the linear relationship between two variables, while adjusting for the effect of other variables.

The following statements request a partial correlation analysis of variables Height and Width while adjusting for the variables Length3 and Weight . The latter variables, which are said to be partialled out of the analysis, are specified with the PARTIAL statement.

  ods html;   ods graphics on;   title 'Fish Measurement Data';   proc corr data=fish1 plots=scatter(alpha=.20 .30);   var Height Width;   partial Length3 Weight3;   run;   ods graphics off;   ods html close;

By default, the CORR procedure displays descriptive statistics for all the variables and the partial variance and partial standard deviation for the VAR statement variables, as shown in Output 1.8.1.

Output 1.8.1: Descriptive Statistics

  Fish Measurement Data   The CORR Procedure   2  Partial Variables:   Length3  Weight3   2          Variables:   Height   Width   Simple Statistics   Variable            N         Mean     Std Dev         Sum     Minimum       Maximum   Length3            34     38.38529     4.21628        1305    30.00000      46.50000   Weight3            34      8.44751     0.97574   287.21524     6.23168      10.00000   Height             34     15.22057     1.98159   517.49950    11.52000      18.95700   Width              34      5.43805     0.72967   184.89370     4.02000       6.74970   Simple Statistics   Partial       Partial   Variable    Variance       Std Dev   Length3   Weight3   Height       0.26607       0.51582   Width        0.07315       0.27047

When a PARTIAL statement is specified, observations with missing values are excluded from the analysis. The partial correlations for the VAR statement variables are shown in Output 1.8.2.

Output 1.8.2: Pearson Partial Correlation Coefficients

  Fish Measurement Data   Pearson Partial Correlation Coefficients, N = 34   Prob > r under H0: Partial Rho=0   Height         Width   Height       1.00000       0.25692   0.1558   Width        0.25692       1.00000   0.1558

The partial correlation between the variables Height and Width is 0.25692, which is much less than the unpartialled correlation, 0.92632. The p -value for the partial correlation is 0.1558.

The PLOTS=SCATTER option requests a scatter plot of the residuals for the variables Height and Width after controlling for the effect of variables Length3 and Weight .

The ALPHA= suboption requests 80% and 70% prediction ellipses. The scatter plot is shown in Output 1.8.3.

Output 1.8.3: Partial Residual Scatter Plot (Experimental)

In Output 1.8.3, a standard deviation of Height has roughly the same length on the X-axis as a standard deviation of Width on the Y-axis. The major axis length is not significantly larger than the minor axis length, indicating a weak partial correlation between Height and Width .