Chapter 27: The FACTOR Procedure | SAS/STAT 9.1 Users Guide Volume 2 only

Overview

The FACTOR procedure performs a variety of common factor and component analyses and rotations . Input can be multivariate data, a correlation matrix, a covariance matrix, a factor pattern, or a matrix of scoring coefficients. The procedure can factor either the correlation or covariance matrix, and you can save most results in an output data set.

PROC FACTOR can process output from other procedures. For example, it can rotate the canonical coefficients from multivariate analyses in the GLM procedure.

The methods for factor extraction are principal component analysis, principal factor analysis, iterated principal factor analysis, unweighted least-squares factor analysis, maximum likelihood (canonical) factor analysis, alpha factor analysis, image component analysis, and Harris component analysis. A variety of methods for prior communality estimation is also available.

Specific methods for orthogonal rotation are varimax, quartimax, biquartimax, equamax, parsimax, and factor parsimax. Oblique versions of these methods are also available. In addition, quartimin, biquartimin, and covarimin methods for (direct) oblique rotation are available. General methods for orthogonal rotation are orthomax with user-specified gamma, Crawford-Ferguson family with user-specified weights on variable parsimony and factor parsimony, and generalized Crawford-Ferguson family with user-specified weights. General methods for oblique rotation are direct oblimin with user-specified tau, Crawford-Ferguson family with user-specified weights on variable parsimony and factor parsimony, generalized Crawford-Ferguson family with user-specified weights, promax with user-specified exponent, Harris-Kaiser case II with user -specified exponent, and Procrustean with a user-specified target pattern.

Output includes means, standard deviations, correlations, Kaiser s measure of sampling adequacy, eigenvalues, a scree plot, eigenvectors, prior and final communality estimates, the unrotated factor pattern, residual and partial correlations, the rotated primary factor pattern, the primary factor structure, interfactor correlations, the reference structure, reference axis correlations , the variance explained by each factor both ignoring and eliminating other factors, plots of both rotated and unrotated factors, squared multiple correlation of each factor with the variables , standard error estimates, confidence limits, coverage displays, and scoring coefficients.

Any topics that are not given explicit references are discussed in Mulaik (1972) or Harman (1976).

Background

See Chapter 58, The PRINCOMP Procedure, for a discussion of principal component analysis. See Chapter 19, The CALIS Procedure, for a discussion of confirmatory factor analysis.

Common factor analysis was invented by Spearman (1904). Kim and Mueller (1978a, 1978b) provide a very elementary discussion of the common factor model. Gorsuch (1974) contains a broad survey of factor analysis, and Gorsuch (1974) and Cattell (1978) are useful as guides to practical research methodology. Harman (1976) gives a lucid discussion of many of the more technical aspects of factor analysis, especially oblique rotation. Morrison (1976) and Mardia, Kent, and Bibby (1979) provide excellent statistical treatments of common factor analysis. Mulaik (1972) is the most thorough and authoritative general reference on factor analysis and is highly recommended to anyone familiar with matrix algebra. Stewart (1981) gives a nontechnical presentation of some issues to consider when deciding whether or not a factor analysis may be appropriate.

A frequent source of confusion in the field of factor analysis is the term factor . It sometimes refers to a hypothetical, unobservable variable, as in the phrase common factor . In this sense, factor analysis must be distinguished from component analysis since a component is an observable linear combination. Factor is also used in the sense of matrix factor, in that one matrix is a factor of a second matrix if the first matrix multiplied by its transpose equals the second matrix. In this sense, factor analysis refers to all methods of data analysis using matrix factors, including component analysis and common factor analysis.

A common factor is an unobservable, hypothetical variable that contributes to the variance of at least two of the observed variables. The unqualified term factor often refers to a common factor. A unique factor is an unobservable, hypothetical variable that contributes to the variance of only one of the observed variables. The model for common factor analysis posits one unique factor for each observed variable.

The equation for the common factor model is

where

y _ij	is the value of the i th observation on the j th variable
x _ik	is the value of the i th observation on the k th common factor
b _kj	is the regression coefficient of the k th common factor for predicting the j th variable
e _ij	is the value of the i th observation on the j th unique factor
q	is the number of common factors

It is assumed, for convenience, that all variables have a mean of 0. In matrix terms, these equations reduce to

In the preceding equation, X is the matrix of factor scores, and B ² is the factor pattern.

There are two critical assumptions:

The unique factors are uncorrelated with each other.
The unique factors are uncorrelated with the common factors.

In principal component analysis, the residuals are generally correlated with each other. In common factor analysis, the unique factors play the role of residuals and are defined to be uncorrelated both with each other and with the common factors. Each common factor is assumed to contribute to at least two variables; otherwise , it would be a unique factor.

When the factors are initially extracted, it is also assumed, for convenience, that the common factors are uncorrelated with each other and have unit variance. In this case, the common factor model implies that the covariance s _jk between the j th and k th variables, j ‰ k , is given by

where S is the covariance matrix of the observed variables, and U ² is the diagonal covariance matrix of the unique factors.

If the original variables are standardized to unit variance, the preceding formula yields correlations instead of covariances. It is in this sense that common factors explain the correlations among the observed variables. The difference between the correlation predicted by the common factor model and the actual correlation is the residual correlation . A good way to assess the goodness-of-fit of the common factor model is to examine the residual correlations.

The common factor model implies that the partial correlations among the variables, removing the effects of the common factors, must all be 0. When the common factors are removed, only unique factors, which are by definition uncorrelated, remain .

The assumptions of common factor analysis imply that the common factors are, in general, not linear combinations of the observed variables. In fact, even if the data contain measurements on the entire population of observations, you cannot compute the scores of the observations on the common factors. Although the common factor scores cannot be computed directly, they can be estimated in a variety of ways.

The problem of factor score indeterminacy has led several factor analysts to propose methods yielding components that can be considered approximations to common factors. Since these components are defined as linear combinations, they are computable. The methods include Harris component analysis and image component analysis. The advantage of producing determinate component scores is offset by the fact that, even if the data fit the common factor model perfectly , component methods do not generally recover the correct factor solution. You should not use any type of component analysis if you really want a common factor analysis (Dziuban and Harris 1973; Lee and Comrey 1979).

After the factors are estimated, it is necessary to interpret them. Interpretation usually means assigning to each common factor a name that reflects the salience of the factor in predicting each of the observed variables, that is, the coefficients in the pattern matrix corresponding to the factor. Factor interpretation is a subjective process. It can sometimes be made less subjective by rotating the common factors, that is, by applying a nonsingular linear transformation. A rotated pattern matrix in which all the coefficients are close to 0 or ±1 is easier to interpret than a pattern with many intermediate elements. Therefore, most rotation methods attempt to optimize a simplicity function of the rotated pattern matrix that measures, in some sense, how close the elements are to 0 or ±1. Because the loading estimates are subject to sampling variability, it is useful to obtain the standard error estimates for the loadings for assessing the uncertainty due to random sampling. Notice that the salience of a factor loading refers to the magnitude of the loading, while statistical significance refers to the statistical evidence against a particular hypothetical value. A loading significantly different from 0 does not automatically mean it must be salient. For example, if salience is defined as a magnitude bigger than 0.4 while the entire 95% confidence interval for a loading lies between 0.1 and 0.3, the loading is statistically significant larger than 0 but it is not salient. Under the maximum likelihood method, you can obtain standard errors and confidence intervals for judging the salience of factor loadings.

After the initial factor extraction, the common factors are uncorrelated with each other. If the factors are rotated by an orthogonal transformation, the rotated factors are also uncorrelated. If the factors are rotated by an oblique transformation, the rotated factors become correlated. Oblique rotations often produce more useful patterns than do orthogonal rotations. However, a consequence of correlated factors is that there is no single unambiguous measure of the importance of a factor in explaining a variable. Thus, for oblique rotations, the pattern matrix does not provide all the necessary information for interpreting the factors; you must also examine the factor structure and the reference structure .

Rotating a set of factors does not change the statistical explanatory power of the factors. You cannot say that any rotation is better than any other rotation from a statistical point of view; all rotations, orthogonal or oblique, are equally good statistically. Therefore, the choice among different rotations must be based on nonstatistical grounds. For most applications, the preferred rotation is that which is most easily interpretable, or which is most compatible with substantive theories .

If two rotations give rise to different interpretations, those two interpretations must not be regarded as conflicting. Rather, they are two different ways of looking at the same thing, two different points of view in the common-factor space. Any conclusion that depends on one and only one rotation being correct is invalid.

Outline of Use

Principal Component Analysis

One important type of analysis performed by the FACTOR procedure is principal component analysis. The statements

  proc factor;   run;

result in a principal component analysis. The output includes all the eigenvalues and the pattern matrix for eigenvalues greater than one.

Most applications require additional output. For example, you may want to compute principal component scores for use in subsequent analyses or obtain a graphical aid to help decide how many components to keep. You can save the results of the analysis in a permanent SAS data library by using the OUTSTAT= option. (Refer to the SAS Language Reference: Dictionary for more information on permanent SAS data libraries and librefs.) Assuming that your SAS data library has the libref save and that the data are in a SAS data set called raw , you could do a principal component analysis as follows :

  proc factor data=raw method=principal scree mineigen=0 score   outstat=save.fact_all;   run;

The SCREE option produces a plot of the eigenvalues that is helpful in deciding how many components to use. The MINEIGEN=0 option causes all components with variance greater than zero to be retained. The SCORE option requests that scoring coefficients be computed. The OUTSTAT= option saves the results in a specially structured SAS data set. The name of the data set, in this case fact_all , is arbitrary. To compute principal component scores, use the SCORE procedure:

  proc score data=raw score=save.fact_all out=save.scores;   run;

The SCORE procedure uses the data and the scoring coefficients that are saved in save.fact_all to compute principal component scores. The component scores are placed in variables named Factor1 , Factor2 , , Factor n andaresavedinthedata set save.scores . If you know ahead of time how many principal components you want to use, you can obtain the scores directly from PROC FACTOR by specifying the NFACTORS= and OUT= options. To get scores from three principal components, specify

  proc factor data=raw method=principal   nfactors=3 out=save.scores;   run;

To plot the scores for the first three components, use the PLOT procedure:

  proc plot;   plot factor2*factor1 factor3*factor1 factor3*factor2;   run;

Principal Factor Analysis

The simplest and computationally most efficient method of common factor analysis is principal factor analysis, which is obtained the same way as principal component analysis except for the use of the PRIORS = option. The usual form of the initial analysis is

  proc factor data=raw method=principal scree   mineigen=0 priors=smc outstat=save.fact_all;   run;

The squared multiple correlations (SMC) of each variable with all the other variables are used as the prior communality estimates. If your correlation matrix is singular, you should specify PRIORS=MAX instead of PRIORS=SMC. The SCREE and MINEIGEN= options serve the same purpose as in the preceding principal component analysis. Saving the results with the OUTSTAT= option enables you to examine the eigenvalues and scree plot before deciding how many factors to rotate and to try several different rotations without re-extracting the factors. The OUTSTAT= data set is automatically marked TYPE=FACTOR, so the FACTOR procedure realizes that it contains statistics from a previous analysis instead of raw data.

After looking at the eigenvalues to estimate the number of factors, you can try some rotations. Two and three factors can be rotated with the following statements:

  proc factor data=save.fact_all method=principal n=2   rotate=promax reorder score outstat=save.fact_2;   proc factor data=save.fact_all method=principal n=3   rotate=promax reorder score outstat=save.fact_3;   run;

The output data set from the previous run is used as input for these analyses. The options N=2 and N=3 specify the number of factors to be rotated. The specification ROTATE=PROMAX requests a promax rotation, which has the advantage of providing both orthogonal and oblique rotations with only one invocation of PROC FACTOR. The REORDER option causes the variables to be reordered in the output so that variables associated with the same factor appear next to each other.

You can now compute and plot factor scores for the two-factor promax-rotated solution as follows:

  proc score data=raw score=save.fact_2 out=save.scores;   proc plot;   plot factor2*factor1;   run;

Maximum Likelihood Factor Analysis

Although principal factor analysis is perhaps the most commonly used method of common factor analysis, most statisticians prefer maximum likelihood (ML) factor analysis (Lawley and Maxwell 1971). The ML method of estimation has desirable asymptotic properties (Bickel and Doksum 1977) and produces better estimates than principal factor analysis in large samples. You can test hypotheses about the number of common factors using the ML method. You can also obtain standard error and confidence interval estimates for many classes of rotated or unrotated factor loadings, factor correlations, and structure loadings under the ML theory.

The unrotated ML solution is equivalent to Rao s (1955) canonical factor solution and Howe s solution maximizing the determinant of the partial correlation matrix (Morrison 1976). Thus, as a descriptive method, ML factor analysis does not require a multivariate normal distribution. The validity of Bartlett s ² test for the number of factors does require approximate normality plus additional regularity conditions that are usually satisfied in practice (Geweke and Singleton 1980).

Lawley and Maxwell (1971) derive the standard error formulas for unrotated loadings, while Archer and Jennrich (1973) and Jennrich (1973, 1974) derive the standard error formulas for several classes of rotated solutions. Extended results appear in Browne, Cudeck, Tateneni, and Mels (1998), Hayashi and Yung (1999), and Yung and Hayashi (2001). A combination of these methods is used to compute standard errors in an efficient manner. Confidence intervals are computed using the asymptotic normality of the estimates. To ensure that the confidence intervals are range respecting, transformation methods due to Browne (1982) are used. The validity of the standard error estimates and confidence limits requires the assumptions of multivariate normality and a fixed number of factors.

The ML method is more computationally demanding than principal factor analysis for two reasons. First, the communalities are estimated iteratively, and each iteration takes about as much computer time as principal factor analysis. The number of iterations typically ranges from about five to twenty. Second, if you want to extract different numbers of factors, as is often the case, you must run the FACTOR procedure once for each number of factors. Therefore, an ML analysis can take 100 times as long as a principal factor analysis. This does not include the time for computing standard error estimates, which is even more computationally demanding. For analyses with less than 35 variables, the computing time for the ML method, including the computation of standard errors, usually ranges from a few seconds to well under a minute. This seems to be a reasonable performance.

You can use principal factor analysis to get a rough idea of the number of factors before doing an ML analysis. If you think that there are between one and three factors, you can use the following statements for the ML analysis:

  proc factor data=raw method=ml n=1   outstat=save.fact1;   run;   proc factor data=raw method=ml n=2 rotate=promax   outstat=save.fact2;   run;   proc factor data=raw method=ml n=3 rotate=promax   outstat=save.fact3;   run;

The output data sets can be used for trying different rotations, computing scoring coefficients, or restarting the procedure in case it does not converge within the allotted number of iterations.

If you can determine how many factors should be retained before an analysis, you can get the standard errors and confidence limits to aid interpretations for the ML analysis:

  proc factor data=raw method=ml n=3 rotate=quartimin se   cover=.4;   run;

In the analysis, you define salience as a magnitude greater than 0.4. You can then use the coverage displays to determine the salience. See the section Confidence Intervals and the Salience of Factor Loadings on page 1327 for more details.

The ML method cannot be used with a singular correlation matrix, and it is especially prone to Heywood cases. (See the section Heywood Cases and Other Anomalies on page 1332 for a discussion of Heywood cases.) If you have problems with ML, the best alternative is to use the METHOD=ULS option for unweighted least-squares factor analysis.

Factor Rotation

After the initial factor extraction, the factors are uncorrelated with each other. If the factors are rotated by an orthogonal transformation, the rotated factors are also uncorrelated. If the factors are rotated by an oblique transformation, the rotated factors become correlated. Oblique rotations often produce more useful patterns than do orthogonal rotations. However, a consequence of correlated factors is that there is no single unambiguous measure of the importance of a factor in explaining a variable. Thus, for oblique rotations, the pattern matrix does not provide all the necessary information for interpreting the factors; you must also examine the factor structure and the reference structure .

Nowadays, most rotations are done analytically. There are many choices for orthogonal and oblique rotations. An excellent summary of a wide class of analytic rotations is in Crawford and Ferguson (1970). The Crawford-Ferguson family of orthogonal rotations includes the orthomax rotation as a subclass and the popular varimax rotation as a special case. For example, assuming that there are nine variables in the analysis, the following four specifications for orthogonal rotations give the same results:

  /* Orthogonal Crawford-Ferguson Family with   variable parsimony weight = 8   factor parsimony weight = 1  */   proc factor data=raw method=ml n=3 rotate=orthcf(8,1);   run;   /* Orthomax without the GAMMA= option */   proc factor data=raw method=ml n=3 rotate=orthomax(1);   run;   /* Orthomax without the GAMMA= option */   proc factor data=raw method=ml n=3 rotate=orthomax gamma=1;   run;   /* Varimax */   proc factor data=raw method=ml n=3 rotate=varimax;   run;

You can also get the oblique versions of the varimax in two equivalent ways:

  /* Oblique Crawford-Ferguson Family with   variable parsimony weight = 8   factor parsimony weight = 1; */   proc factor data=raw method=ml n=3 rotate=oblicf(8,1);   run;   /* Oblique Varimax */   proc factor data=raw method=ml n=3 rotate=obvarimax;   run;

Jennrich (1973) proposes a generalized Crawford-Ferguson family that includes the Crawford-Ferguson family and the (direct) oblimin family (refer to Harman 1976) as subclasses. The more well-known quartimin rotation is a special case of the oblimin class, and hence a special case of the generalized Crawford-Ferguson family. For example, the following four specifications of oblique rotations are equivalent:

  /* Oblique generalized Crawford-Ferguson Family   with weights 0, 1, 0 , -1 */   proc factor data=raw method=ml n=3 rotate=obligencf(0,1,0,-1);   run;   /* Oblimin family without the TAU= option */   proc factor data=raw method=ml n=3 rotate=oblimin(0);   run;   /* Oblimin family with the TAU= option */   proc factor data=raw method=ml n=3 rotate=oblimin tau=0;   run;   /* Quartimin */   proc factor data=raw method=ml n=3 rotate=quartimin;   run;

In addition to the generalized Crawford-Ferguson family, the available oblique rotation methods include Harris-Kaiser, promax, and Procrustean. See the section Simplicity Functions for Rotations on page 1329 for details about the definitions of various rotations. Refer to Harman (1976) and Mulaik (1972) for further information.