Details


Missing Values

If an observation has a missing value for any of the variables in the analysis, that observation is omitted from the analysis.

Formulas

Assume without loss of generality that the two sets of variables, X with p variables and Y with q variables, have means of zero. Let n be the number of observations, and m be n ˆ’ 1.

Note that the scales of eigenvectors and canonical coefficients are arbitrary. PROC CANCORR follows the usual procedure of rescaling the canonical coefficients so that each canonical variable has a variance of one.

There are several different sets of formulas that can be used to compute the canonical correlations , i , i =1 , , min( p, q ), and unscaled canonical coefficients:

  1. Let S XX = X ² X /m be the covariance matrix of X , S YY = Y ² Y /m be the covariance matrix of Y , and S XY = X ² Y /m be the covariance matrix between X and Y . Then the eigenvalues of click to expand are the squared canonical correlations, and the right eigenvectors are raw canonical coefficients for the Y variables. The eigenvalues of click to expand are the squared canonical correlations, and the right eigenvectors are raw canonical coefficients for the X variables.

  2. Let T = Y ² Y and H = Y ² X ( X ² X ) ˆ’ 1 X ² Y . The eigenvalues ¾ i of T ˆ’ 1 H are the squared canonical correlations, , and the right eigenvectors are raw canonical coefficients for the Y variables. Interchange X and Y in the above formulas, and the eigenvalues remain the same, but the right eigenvectors are raw canonical coefficients for the X variables.

  3. Let E = T ˆ’ H . The eigenvalues of E ˆ’ 1 H are . The right eigenvectors of E ˆ’ 1 H are the same as the right eigenvectors of T ˆ’ 1 H .

  4. Canonical correlation can be viewed as a principal component analysis of the predicted values of one set of variables from a regression on the other set of variables, in the metric of the error covariance matrix. For example, regress the Y variables on the X variables. Call the predicted values P = X ( X ² X ) ˆ’ 1 X ² Y and the residuals R = Y ˆ’ P = ( I ˆ’ X ( X ² X ) ˆ’ 1 X ² ) Y . The error covariance matrix is R ² R /m . Choose a transformation Q that converts the error covariance matrix to an identity, that is, ( RQ ) ² ( RQ ) = Q ² R ² RQ = m I . Apply the same transformation to the predicted values to yield, say, Z = PQ . Now do a principal component analysis on the covariance matrix of Z , and you get the eigenvalues of E ˆ’ 1 H . Repeat with X and Y variables interchanged, and you get the same eigenvalues.

    To show this relationship between canonical correlation and principal components , note that P ² P = H , R ² R = E , and QQ ² = m E ˆ’ 1 . Let the covariance matrix of Z be G . Then G = Z ² Z /m = ( PQ ) ² PQ /m = Q ² P ² PQ /m = Q ² HQ /m . Let u be an eigenvector of G and be the corresponding eigenvalue . Then by definition, Gu = u , hence Q ² HQu /m = u . Premultiplying both sides by Q yields QQ ² HQu /m = Qu and thus E ˆ’ 1 HQu = Qu . Hence Qu is an eigenvector of E ˆ’ 1 H and is also an eigenvalue of E ˆ’ 1 H .

  5. If the covariance matrices are replaced by correlation matrices, the formulas above yield standardized canonical coefficients instead of raw canonical coefficients.

The formulas for multivariate test statistics are shown in 'Multivariate Tests' in Chapter 2, 'Introduction to Regression Procedures.' Formulas for linear regression are provided in other sections of that chapter.

Output Data Sets

OUT= Data Set

The OUT= data set contains all the variables in the original data set plus new variables containing the canonical variable scores. The number of new variables is twice that specified by the NCAN= option. The names of the new variables are formed by concatenating the values given by the VPREFIX= and WPREFIX= options (the defaults are V and W) with the numbers 1, 2, 3, and so on. The new variables have mean 0 and variance equal to 1. An OUT= data set cannot be created if the DATA= data set is TYPE=CORR, COV, FACTOR, SSCP, UCORR, or UCOV or if a PARTIAL statement is used.

OUTSTAT= Data Set

The OUTSTAT= data set is similar to the TYPE=CORR or TYPE=UCORR data set produced by the CORR procedure, but it contains several results in addition to those produced by PROC CORR.

The new data set contains the following variables:

  • the BY variables, if any

  • two new character variables, _TYPE_ and _NAME_

  • Intercept , if the INT option is used

  • the variables analyzed (those in the VAR statement and the WITH statement)

Each observation in the new data set contains some type of statistic as indicated by the _TYPE_ variable. The values of the _TYPE_ variable are as follows:

_TYPE_

 

MEAN

means

STD

standard deviations

USTD

uncorrected standard deviations. When you specify the NOINT option in the PROC CANCORR statement, the OUTSTAT= data set contains standard deviations not corrected for the mean ( _TYPE_ ='USTD').

N

number of observations on which the analysis is based. This value is the same for each variable.

SUMWGT

sum of the weights if a WEIGHT statement is used. This value is the same for each variable.

CORR

correlations. The _NAME_ variable contains the name of the variable corresponding to each row of the correlation matrix.

UCORR

uncorrected correlation matrix. When you specify the NOINT option in the PROC CANCORR statement, the OUTSTAT= data set contains a matrix of correlations not corrected for the means.

CORRB

correlations among the regression coefficient estimates

STB

standardized regression coefficients. The _NAME_ variable contains the name of the dependent variable.

B

raw regression coefficients

SEB

standard errors of the regression coefficients

LCLB

95% lower confidence limits for the regression coefficients

UCLB

95% upper confidence limits for the regression coefficients

T

t statistics for the regression coefficients

PROBT

probability levels for the t statistics

SPCORR

semipartial correlations between regressors and dependent variables

SQSPCORR

squared semipartial correlations between regressors and dependent variables

PCORR

partial correlations between regressors and dependent variables

SQPCORR

squared partial correlations between regressors and dependent variables

RSQUARED

R 2 s for the multiple regression analyses

ADJRSQ

adjusted R 2 s

LCLRSQ

approximate 95% lower confidence limits for the R 2 s

UCLRSQ

approximate 95% upper confidence limits for the R 2 s

F

F statistics for the multiple regression analyses

PROBF

probability levels for the F statistics

CANCORR

canonical correlations

SCORE

standardized canonical coefficients. The _NAME_ variable contains the name of the canonical variable.

To obtain the canonical variable scores, these coefficients should be multiplied by the standardized data using means obtained from the observation with _TYPE_ ='MEAN' and standard deviations obtained from the observation with _TYPE_ ='STD'.

RAWSCORE

raw canonical coefficients.

To obtain the canonical variable scores, these coefficients should be multiplied by the raw data centered by means obtained from the observation with _TYPE_ ='MEAN'.

USCORE

scoring coefficients to be applied without subtracting the mean from the raw variables. These are standardized canonical coefficients computed under a NOINT model.

To obtain the canonical variable scores, these coefficients should be multiplied by the data that are standardized by the uncorrected standard deviations obtained from the observation with _TYPE_ ='USTD'.

STRUCTUR

canonical structure

Computational Resources

Notation

  • n = number of observations

  • v = number of variables

  • w = number of WITH variables

  • p = max( v , w )

  • q = min( v , w )

  • b = v + w

  • t = total number of variables (VAR, WITH, and PARTIAL)

Time Requirements

The time required to compute the correlation matrix is roughly proportional to

The time required for the canonical analysis is roughly proportional to

click to expand

but the coefficient for q 3 varies depending on the number of QR iterations in the singular value decomposition.

Memory Requirements

The minimum memory required is approximately

bytes. Additional memory is required if you request the VDEP or WDEP option.

Displayed Output

If the SIMPLE option is specified, PROC CANCORR produces means and standard deviations for each input variable. If the CORR option is specified, PROC CANCORR produces correlations among the input variables. Unless the NOPRINT option is specified, PROC CANCORR displays a table of canonical correlations containing the following:

  • Canonical Correlations. These are always nonnegative.

  • Adjusted Canonical Correlations (Lawley 1959), which are asymptotically less biased than the raw correlations and may be negative. The adjusted canonical correlations may not be computable, and they are displayed as missing values if two canonical correlations are nearly equal or if some are close to zero. A missing value is also displayed if an adjusted canonical correlation is larger than a previous adjusted canonical correlation.

  • Approx Standard Errors, which are the approximate standard errors of the canonical correlations

  • Squared Canonical Correlations

  • Eigenvalues of INV(E)*H, which are equal to CanRsq/(1-CanRsq), where CanRsq is the corresponding squared canonical correlation. Also displayed for each eigenvalue is the Difference from the next eigenvalue, the Proportion of the sum of the eigenvalues, and the Cumulative proportion.

  • Likelihood Ratio for the hypothesis that the current canonical correlation and all smaller ones are 0 in the population. The likelihood ratio for all canonical correlations equals Wilks' lambda.

  • Approx F statistic based on Rao's approximation to the distribution of the likelihood ratio (Rao 1973, p. 556; Kshirsagar 1972, p. 326)

  • Num DF and Den DF (numerator and denominator degrees of freedom) and Pr >F (probability level) associated with the F statistic

Unless you specify the NOPRINT option, PROC CANCORR produces a table of multivariate statistics for the null hypothesis that all canonical correlations are zero in the population. These statistics, as described in the section 'Multivariate Tests' in Chapter 2, 'Introduction to Regression Procedures.' , are:

  • Wilks' Lambda

  • Pillai's Trace

  • Hotelling-Lawley Trace

  • Roy's Greatest Root

For each of the preceding statistics, PROC CANCORR displays

  • an F approximation or upper bound

  • Num DF, the numerator degrees of freedom

  • Den DF, the denominator degrees of freedom

  • Pr > F , the probability level

Unless you specify the SHORT or NOPRINT option, PROC CANCORR displays the following:

  • both Raw (unstandardized) and Standardized Canonical Coefficients normalized to give canonical variables with unit variance. Standardized coefficients can be used to compute canonical variable scores from the standardized (zero mean and unit variance) input variables. Raw coefficients can be used to compute canonical variable scores from the input variables without standardizing them.

  • all four Canonical Structure matrices, giving Correlations Between the canonical variables and the original variables

If you specify the REDUNDANCY option, PROC CANCORR displays

  • the Canonical Redundancy Analysis (Stewart and Love 1968; Cooley and Lohnes 1971), including Raw (unstandardized) and Standardized Variance and Cumulative Proportion of the Variance of each set of variables Explained by Their Own Canonical Variables and Explained by The Opposite Canonical Variables

  • the Squared Multiple Correlations of each variable with the first m canonical variables of the opposite set, where m varies from 1 to the number of canonical correlations

If you specify the VDEP option, PROC CANCORR performs multiple regression analyses with the VAR variables as dependent variables and the WITH variables as regressors. If you specify the WDEP option, PROC CANCORR performs multiple regression analyses with the WITH variables as dependent variables and the VAR variables as regressors. If you specify the VDEP or WDEP option and also specify the ALL option, PROC CANCORR displays the following items. You can also specify individual options to request a subset of the output generated by the ALL option; or you can suppress the output by specifying the NOPRINT option.

  • if you specify the SMC option, Squared Multiple Correlations and F Tests. For each regression model, identified by its dependent variable name, PROC CANCORR displays the R-Squared, Adjusted R-Squared (Wherry 1931), F Statistic, and Pr > F . Also for each regression model, PROC CANCORR displays an Approximate 95% Confidence Interval for the population R 2 (Helland 1987). These confidence limits are valid only when the regressors are random and when the regressors and dependent variables are approximately distributed according to a multivariate normal distribution.

    The average R 2 s for the models considered , unweighted and weighted by variance, are also given.

  • if you specify the CORRB option, Correlations Among the Regression Coefficient Estimates

  • if you specify the STB option, Standardized Regression Coefficients

  • if you specify the B option, Raw Regression Coefficients

  • if you specify the SEB option, Standard Errors of the Regression Coefficients

  • if you specify the CLB option, 95% confidence limits for the regression coefficients

  • if you specify the T option, T Statistics for the Regression Coefficients

  • if you specify the PROBT option, Probability > T for the Regression Coefficients

  • if you specify the SPCORR option, Semipartial Correlations between regressors and dependent variables, Removing from Each Regressor the Effects of All Other Regressors

  • if you specify the SQSPCORR option, Squared Semipartial Correlations between regressors and dependent variables, Removing from Each Regressor the Effects of All Other Regressors

  • if you specify the PCORR option, Partial Correlations between regressors and dependent variables, Removing the Effects of All Other Regressors from Both Regressor and Criterion

  • if you specify the SQPCORR option, Squared Partial Correlations between regressors and dependent variables, Removing the Effects of All Other Regressors from Both Regressor and Criterion

ODS Table Names

PROC CANCORR assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 20.2.

Table 20.2: ODS Tables Produced in PROC CANCORR

ODS Table Name

Description

Statement

Option

AvgRSquare

Average R-Squares (weighted and unweighted)

PROC CANCORR

VDEP (or WDEP) SMC (or ALL)

CanCorr

Canonical correlations

PROC CANCORR

default

CanStructureVCan

Correlations between the VAR canonical variables and the VAR and WITH variables

PROC CANCORR

default (unless SHORT)

CanStructureWCan

Correlations between the WITH canonical variables and the WITH and VAR variables

PROC CANCORR

default (unless SHORT)

ConfidenceLimits

95% Confidence limits for the regression coefficients

PROC CANCORR

VDEP (or WDEP) CLB (or ALL)

Corr

Correlations among the original variables

PROC CANCORR

CORR (or ALL)

CorrOnPartial

Partial correlations

PARTIAL

CORR (or ALL)

CorrRegCoefEst

Correlations among the regression coefficient estimates

PROC CANCORR

VDEP (or WDEP) CORRB (or ALL)

MultStat

Multivariate

statistics

default

NObsNVar

Number of observations and variables

PROC CANCORR

SIMPLE (or ALL)

ParCorr

Partial correlations

PROC CANCORR

VDEP (or WDEP) PCORR (or ALL)

ProbtRegCoef

Prob > t for the regression coefficients

PROC CANCORR

VDEP (or WDEP) PROBT (or ALL)

RawCanCoefV

Raw canonical coefficients for the var variables

PROC CANCORR

default (unless SHORT)

RawCanCoefW

Raw canonical coefficients for the with variables

PROC CANCORR

default (unless SHORT)

RawRegCoef

Raw regression coefficients

PROC CANCORR

VDEP (or WDEP) B (or ALL)

Redundancy

Canonical redundancy analysis

PROC CANCORR

REDUNDANCY (or ALL)

Regression

Squared multiple correlations and F tests

PROC CANCORR

VDEP (or WDEP) SMC (or ALL)

RSquareRMSEOnPartial

R-Squares and RMSEs on PARTIAL

PARTIAL

CORR (or ALL)

SemiParCorr

Semi-partial correlations

PROC CANCORR

VDEP (or WDEP) SPCORR (or ALL)

SimpleStatistics

Simple statistics

PROC CANCORR

SIMPLE (or ALL)

SqMultCorr

Canonical redundancy analysis: squared multiple correlations

PROC CANCORR

REDUNDANCY (or ALL)

SqParCorr

Squared partial correlations

PROC CANCORR

VDEP (or WDEP) SQPCORR (or ALL)

SqSemiParCorr

Squared semi-partial correlations

PROC CANCORR

VDEP (or WDEP) SQSPCORR (or ALL)

StdCanCoefV

Standardized Canonical coefficients for the VAR variables

PROC CANCORR

default (unless SHORT)

StdCanCoefW

Standardized Canonical coefficients for the WITH variables

PROC CANCORR

default (unless SHORT)

StdErrRawRegCoef

Standard errors of the raw regression coefficients

PROC CANCORR

VDEP (or WDEP)

SEB (or ALL)

StdRegCoef

Standardized regression coefficients

PROC CANCORR

VDEP (or WDEP)

STB (or ALL)

StdRegCoefOnPartial

Standardized regression coefficients on PARTIAL

PARTIAL

CORR (or ALL)

tValueRegCoef

t values for the regression coefficients

PROC CANCORR

VDEP (or WDEP) T (or ALL)

For more information on ODS, see Chapter 14, 'Using the Output Delivery System.'




SAS.STAT 9.1 Users Guide (Vol. 1)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net