If an observation has a missing value for any of the variables in the analysis, that observation is omitted from the analysis.
Assume without loss of generality that the two sets of variables, X with p variables and Y with q variables, have means of zero. Let n be the number of observations, and m be n ˆ’ 1.
Note that the scales of eigenvectors and canonical coefficients are arbitrary. PROC CANCORR follows the usual procedure of rescaling the canonical coefficients so that each canonical variable has a variance of one.
There are several different sets of formulas that can be used to compute the canonical correlations , i , i =1 , , min( p, q ), and unscaled canonical coefficients:
Let S XX = X ² X /m be the covariance matrix of X , S YY = Y ² Y /m be the covariance matrix of Y , and S XY = X ² Y /m be the covariance matrix between X and Y . Then the eigenvalues of are the squared canonical correlations, and the right eigenvectors are raw canonical coefficients for the Y variables. The eigenvalues of are the squared canonical correlations, and the right eigenvectors are raw canonical coefficients for the X variables.
Let T = Y ² Y and H = Y ² X ( X ² X ) ˆ’ 1 X ² Y . The eigenvalues ¾ i of T ˆ’ 1 H are the squared canonical correlations, , and the right eigenvectors are raw canonical coefficients for the Y variables. Interchange X and Y in the above formulas, and the eigenvalues remain the same, but the right eigenvectors are raw canonical coefficients for the X variables.
Let E = T ˆ’ H . The eigenvalues of E ˆ’ 1 H are . The right eigenvectors of E ˆ’ 1 H are the same as the right eigenvectors of T ˆ’ 1 H .
Canonical correlation can be viewed as a principal component analysis of the predicted values of one set of variables from a regression on the other set of variables, in the metric of the error covariance matrix. For example, regress the Y variables on the X variables. Call the predicted values P = X ( X ² X ) ˆ’ 1 X ² Y and the residuals R = Y ˆ’ P = ( I ˆ’ X ( X ² X ) ˆ’ 1 X ² ) Y . The error covariance matrix is R ² R /m . Choose a transformation Q that converts the error covariance matrix to an identity, that is, ( RQ ) ² ( RQ ) = Q ² R ² RQ = m I . Apply the same transformation to the predicted values to yield, say, Z = PQ . Now do a principal component analysis on the covariance matrix of Z , and you get the eigenvalues of E ˆ’ 1 H . Repeat with X and Y variables interchanged, and you get the same eigenvalues.
To show this relationship between canonical correlation and principal components , note that P ² P = H , R ² R = E , and QQ ² = m E ˆ’ 1 . Let the covariance matrix of Z be G . Then G = Z ² Z /m = ( PQ ) ² PQ /m = Q ² P ² PQ /m = Q ² HQ /m . Let u be an eigenvector of G and be the corresponding eigenvalue . Then by definition, Gu = u , hence Q ² HQu /m = u . Premultiplying both sides by Q yields QQ ² HQu /m = Qu and thus E ˆ’ 1 HQu = Qu . Hence Qu is an eigenvector of E ˆ’ 1 H and is also an eigenvalue of E ˆ’ 1 H .
If the covariance matrices are replaced by correlation matrices, the formulas above yield standardized canonical coefficients instead of raw canonical coefficients.
The formulas for multivariate test statistics are shown in 'Multivariate Tests' in Chapter 2, 'Introduction to Regression Procedures.' Formulas for linear regression are provided in other sections of that chapter.
The OUT= data set contains all the variables in the original data set plus new variables containing the canonical variable scores. The number of new variables is twice that specified by the NCAN= option. The names of the new variables are formed by concatenating the values given by the VPREFIX= and WPREFIX= options (the defaults are V and W) with the numbers 1, 2, 3, and so on. The new variables have mean 0 and variance equal to 1. An OUT= data set cannot be created if the DATA= data set is TYPE=CORR, COV, FACTOR, SSCP, UCORR, or UCOV or if a PARTIAL statement is used.
The OUTSTAT= data set is similar to the TYPE=CORR or TYPE=UCORR data set produced by the CORR procedure, but it contains several results in addition to those produced by PROC CORR.
The new data set contains the following variables:
the BY variables, if any
two new character variables, _TYPE_ and _NAME_
Intercept , if the INT option is used
the variables analyzed (those in the VAR statement and the WITH statement)
Each observation in the new data set contains some type of statistic as indicated by the _TYPE_ variable. The values of the _TYPE_ variable are as follows:
_TYPE_ | |
---|---|
MEAN | means |
STD | standard deviations |
USTD | uncorrected standard deviations. When you specify the NOINT option in the PROC CANCORR statement, the OUTSTAT= data set contains standard deviations not corrected for the mean ( _TYPE_ ='USTD'). |
N | number of observations on which the analysis is based. This value is the same for each variable. |
SUMWGT | sum of the weights if a WEIGHT statement is used. This value is the same for each variable. |
CORR | correlations. The _NAME_ variable contains the name of the variable corresponding to each row of the correlation matrix. |
UCORR | uncorrected correlation matrix. When you specify the NOINT option in the PROC CANCORR statement, the OUTSTAT= data set contains a matrix of correlations not corrected for the means. |
CORRB | correlations among the regression coefficient estimates |
STB | standardized regression coefficients. The _NAME_ variable contains the name of the dependent variable. |
B | raw regression coefficients |
SEB | standard errors of the regression coefficients |
LCLB | 95% lower confidence limits for the regression coefficients |
UCLB | 95% upper confidence limits for the regression coefficients |
T | t statistics for the regression coefficients |
PROBT | probability levels for the t statistics |
SPCORR | semipartial correlations between regressors and dependent variables |
SQSPCORR | squared semipartial correlations between regressors and dependent variables |
PCORR | partial correlations between regressors and dependent variables |
SQPCORR | squared partial correlations between regressors and dependent variables |
RSQUARED | R 2 s for the multiple regression analyses |
ADJRSQ | adjusted R 2 s |
LCLRSQ | approximate 95% lower confidence limits for the R 2 s |
UCLRSQ | approximate 95% upper confidence limits for the R 2 s |
F | F statistics for the multiple regression analyses |
PROBF | probability levels for the F statistics |
CANCORR | canonical correlations |
SCORE | standardized canonical coefficients. The _NAME_ variable contains the name of the canonical variable. To obtain the canonical variable scores, these coefficients should be multiplied by the standardized data using means obtained from the observation with _TYPE_ ='MEAN' and standard deviations obtained from the observation with _TYPE_ ='STD'. |
RAWSCORE | raw canonical coefficients. To obtain the canonical variable scores, these coefficients should be multiplied by the raw data centered by means obtained from the observation with _TYPE_ ='MEAN'. |
USCORE | scoring coefficients to be applied without subtracting the mean from the raw variables. These are standardized canonical coefficients computed under a NOINT model. To obtain the canonical variable scores, these coefficients should be multiplied by the data that are standardized by the uncorrected standard deviations obtained from the observation with _TYPE_ ='USTD'. |
STRUCTUR | canonical structure |
n = number of observations
v = number of variables
w = number of WITH variables
p = max( v , w )
q = min( v , w )
b = v + w
t = total number of variables (VAR, WITH, and PARTIAL)
The time required to compute the correlation matrix is roughly proportional to
The time required for the canonical analysis is roughly proportional to
but the coefficient for q 3 varies depending on the number of QR iterations in the singular value decomposition.
The minimum memory required is approximately
bytes. Additional memory is required if you request the VDEP or WDEP option.
If the SIMPLE option is specified, PROC CANCORR produces means and standard deviations for each input variable. If the CORR option is specified, PROC CANCORR produces correlations among the input variables. Unless the NOPRINT option is specified, PROC CANCORR displays a table of canonical correlations containing the following:
Canonical Correlations. These are always nonnegative.
Adjusted Canonical Correlations (Lawley 1959), which are asymptotically less biased than the raw correlations and may be negative. The adjusted canonical correlations may not be computable, and they are displayed as missing values if two canonical correlations are nearly equal or if some are close to zero. A missing value is also displayed if an adjusted canonical correlation is larger than a previous adjusted canonical correlation.
Approx Standard Errors, which are the approximate standard errors of the canonical correlations
Squared Canonical Correlations
Eigenvalues of INV(E)*H, which are equal to CanRsq/(1-CanRsq), where CanRsq is the corresponding squared canonical correlation. Also displayed for each eigenvalue is the Difference from the next eigenvalue, the Proportion of the sum of the eigenvalues, and the Cumulative proportion.
Likelihood Ratio for the hypothesis that the current canonical correlation and all smaller ones are 0 in the population. The likelihood ratio for all canonical correlations equals Wilks' lambda.
Approx F statistic based on Rao's approximation to the distribution of the likelihood ratio (Rao 1973, p. 556; Kshirsagar 1972, p. 326)
Num DF and Den DF (numerator and denominator degrees of freedom) and Pr >F (probability level) associated with the F statistic
Unless you specify the NOPRINT option, PROC CANCORR produces a table of multivariate statistics for the null hypothesis that all canonical correlations are zero in the population. These statistics, as described in the section 'Multivariate Tests' in Chapter 2, 'Introduction to Regression Procedures.' , are:
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
For each of the preceding statistics, PROC CANCORR displays
an F approximation or upper bound
Num DF, the numerator degrees of freedom
Den DF, the denominator degrees of freedom
Pr > F , the probability level
Unless you specify the SHORT or NOPRINT option, PROC CANCORR displays the following:
both Raw (unstandardized) and Standardized Canonical Coefficients normalized to give canonical variables with unit variance. Standardized coefficients can be used to compute canonical variable scores from the standardized (zero mean and unit variance) input variables. Raw coefficients can be used to compute canonical variable scores from the input variables without standardizing them.
all four Canonical Structure matrices, giving Correlations Between the canonical variables and the original variables
If you specify the REDUNDANCY option, PROC CANCORR displays
the Canonical Redundancy Analysis (Stewart and Love 1968; Cooley and Lohnes 1971), including Raw (unstandardized) and Standardized Variance and Cumulative Proportion of the Variance of each set of variables Explained by Their Own Canonical Variables and Explained by The Opposite Canonical Variables
the Squared Multiple Correlations of each variable with the first m canonical variables of the opposite set, where m varies from 1 to the number of canonical correlations
If you specify the VDEP option, PROC CANCORR performs multiple regression analyses with the VAR variables as dependent variables and the WITH variables as regressors. If you specify the WDEP option, PROC CANCORR performs multiple regression analyses with the WITH variables as dependent variables and the VAR variables as regressors. If you specify the VDEP or WDEP option and also specify the ALL option, PROC CANCORR displays the following items. You can also specify individual options to request a subset of the output generated by the ALL option; or you can suppress the output by specifying the NOPRINT option.
if you specify the SMC option, Squared Multiple Correlations and F Tests. For each regression model, identified by its dependent variable name, PROC CANCORR displays the R-Squared, Adjusted R-Squared (Wherry 1931), F Statistic, and Pr > F . Also for each regression model, PROC CANCORR displays an Approximate 95% Confidence Interval for the population R 2 (Helland 1987). These confidence limits are valid only when the regressors are random and when the regressors and dependent variables are approximately distributed according to a multivariate normal distribution.
The average R 2 s for the models considered , unweighted and weighted by variance, are also given.
if you specify the CORRB option, Correlations Among the Regression Coefficient Estimates
if you specify the STB option, Standardized Regression Coefficients
if you specify the B option, Raw Regression Coefficients
if you specify the SEB option, Standard Errors of the Regression Coefficients
if you specify the CLB option, 95% confidence limits for the regression coefficients
if you specify the T option, T Statistics for the Regression Coefficients
if you specify the PROBT option, Probability > T for the Regression Coefficients
if you specify the SPCORR option, Semipartial Correlations between regressors and dependent variables, Removing from Each Regressor the Effects of All Other Regressors
if you specify the SQSPCORR option, Squared Semipartial Correlations between regressors and dependent variables, Removing from Each Regressor the Effects of All Other Regressors
if you specify the PCORR option, Partial Correlations between regressors and dependent variables, Removing the Effects of All Other Regressors from Both Regressor and Criterion
if you specify the SQPCORR option, Squared Partial Correlations between regressors and dependent variables, Removing the Effects of All Other Regressors from Both Regressor and Criterion
PROC CANCORR assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 20.2.
ODS Table Name | Description | Statement | Option |
---|---|---|---|
AvgRSquare | Average R-Squares (weighted and unweighted) | PROC CANCORR | VDEP (or WDEP) SMC (or ALL) |
CanCorr | Canonical correlations | PROC CANCORR | default |
CanStructureVCan | Correlations between the VAR canonical variables and the VAR and WITH variables | PROC CANCORR | default (unless SHORT) |
CanStructureWCan | Correlations between the WITH canonical variables and the WITH and VAR variables | PROC CANCORR | default (unless SHORT) |
ConfidenceLimits | 95% Confidence limits for the regression coefficients | PROC CANCORR | VDEP (or WDEP) CLB (or ALL) |
Corr | Correlations among the original variables | PROC CANCORR | CORR (or ALL) |
CorrOnPartial | Partial correlations | PARTIAL | CORR (or ALL) |
CorrRegCoefEst | Correlations among the regression coefficient estimates | PROC CANCORR | VDEP (or WDEP) CORRB (or ALL) |
MultStat | Multivariate | statistics | default |
NObsNVar | Number of observations and variables | PROC CANCORR | SIMPLE (or ALL) |
ParCorr | Partial correlations | PROC CANCORR | VDEP (or WDEP) PCORR (or ALL) |
ProbtRegCoef | Prob > t for the regression coefficients | PROC CANCORR | VDEP (or WDEP) PROBT (or ALL) |
RawCanCoefV | Raw canonical coefficients for the var variables | PROC CANCORR | default (unless SHORT) |
RawCanCoefW | Raw canonical coefficients for the with variables | PROC CANCORR | default (unless SHORT) |
RawRegCoef | Raw regression coefficients | PROC CANCORR | VDEP (or WDEP) B (or ALL) |
Redundancy | Canonical redundancy analysis | PROC CANCORR | REDUNDANCY (or ALL) |
Regression | Squared multiple correlations and F tests | PROC CANCORR | VDEP (or WDEP) SMC (or ALL) |
RSquareRMSEOnPartial | R-Squares and RMSEs on PARTIAL | PARTIAL | CORR (or ALL) |
SemiParCorr | Semi-partial correlations | PROC CANCORR | VDEP (or WDEP) SPCORR (or ALL) |
SimpleStatistics | Simple statistics | PROC CANCORR | SIMPLE (or ALL) |
SqMultCorr | Canonical redundancy analysis: squared multiple correlations | PROC CANCORR | REDUNDANCY (or ALL) |
SqParCorr | Squared partial correlations | PROC CANCORR | VDEP (or WDEP) SQPCORR (or ALL) |
SqSemiParCorr | Squared semi-partial correlations | PROC CANCORR | VDEP (or WDEP) SQSPCORR (or ALL) |
StdCanCoefV | Standardized Canonical coefficients for the VAR variables | PROC CANCORR | default (unless SHORT) |
StdCanCoefW | Standardized Canonical coefficients for the WITH variables | PROC CANCORR | default (unless SHORT) |
StdErrRawRegCoef | Standard errors of the raw regression coefficients | PROC CANCORR | VDEP (or WDEP) SEB (or ALL) |
StdRegCoef | Standardized regression coefficients | PROC CANCORR | VDEP (or WDEP) STB (or ALL) |
StdRegCoefOnPartial | Standardized regression coefficients on PARTIAL | PARTIAL | CORR (or ALL) |
tValueRegCoef | t values for the regression coefficients | PROC CANCORR | VDEP (or WDEP) T (or ALL) |
For more information on ODS, see Chapter 14, 'Using the Output Delivery System.'