Details


Missing Values

Observations with missing values for any variable in the VAR, PARTIAL, FREQ, or WEIGHT statement are omitted from the analysis and are given missing values for principal component scores in the OUT= data set. If a correlation, covariance, or SSCP matrix is read, it can contain missing values as long as every pair of variables has at least one nonmissing entry.

Output Data Sets

OUT= Data Set

The OUT= data set contains all the variables in the original data set plus new variables containing the principal component scores. The N= option determines the number of new variables. The names of the new variables are formed by concatenating the value given by the PREFIX= option (or Prin if PREFIX= is omitted) and the numbers 1, 2, 3, and so on. The new variables have mean 0 and variance equal to the corresponding eigenvalue , unless you specify the STANDARD option to standardize the scores to unit variance. Also, if you specify the COV option, the procedure computes the principal component scores from the corrected or the uncorrected (if the NOINT option is specified) variables rather than the standardized variables.

If you use a PARTIAL statement, the OUT= data set also contains the residuals from predicting the VAR variables from the PARTIAL variables. The names of the residual variables are formed by prefixing R_ to the names of the VAR variables.

An OUT= data set cannot be created if the DATA= data set is TYPE=ACE, TYPE=CORR, TYPE=COV, TYPE=EST, TYPE=FACTOR, TYPE=SSCP, TYPE=UCORR, or TYPE=UCOV.

OUTSTAT= Data Set

The OUTSTAT= data set is similar to the TYPE=CORR data set produced by the CORR procedure. The following table relates the TYPE= value for the OUTSTAT= data set to the options specified in the PROC PRINCOMP statement.

Options

TYPE=

(default)

CORR

COV

COV

NOINT

UCORR

COV NOINT

UCOV

Notice that the default ( neither the COV nor NOINT option) produces a TYPE=CORR data set.

The new data set contains the following variables:

  • the BY variables, if any

  • two new variables, _ TYPE_ and _ NAME_ , both character variables

  • the variables analyzed , that is, those in the VAR statement; or, if there is no VAR statement, all numeric variables not listed in any other statement; or, if there is a PARTIAL statement, the residual variables as described under the OUT= data set

Each observation in the new data set contains some type of statistic as indicated by the _ TYPE_ variable. The values of the _ TYPE_ variable are as follows :

_ TYPE_

 

MEAN

mean of each variable. If you specify the PARTIAL statement, this observation is omitted.

STD

standard deviations. If you specify the COV option, this observation is omitted, so the SCORE procedure does not standardize the variables before computing scores. If you use the PARTIAL statement, the standard deviation of a variable is computed as its root mean squared error as predicted from the PARTIAL variables.

USTD

uncorrected standard deviations. When you specify the NOINT option in the PROC PRINCOMP statement, the OUTSTAT= data set contains standard deviations not corrected for the mean. However, if you also specify the COV option in the PROC PRINCOMP statement, this observation is omitted.

N

number of observations on which the analysis is based. This value is the same for each variable. If you specify the PARTIAL statement and the value of the VARDEF= option is DF or unspecified, then the number of observations is decremented by the degrees of freedom for the PARTIAL variables.

SUMWGT

the sum of the weights of the observations. This value is the same for each variable. If you specify the PARTIAL statement and VARDEF=WDF, then the sum of the weights is decremented by the degrees of freedom for the PARTIAL variables. This observation is output only if the value is different from that in the observation with _ TYPE_ = ˜N .

CORR

correlations between each variable and the variable specified by the _ NAME_ variable. The number of observations with _ TYPE_ = ˜CORR is equal to the number of variables being analyzed. If you specify the COV option, no _ TYPE_ = ˜CORR observations are produced. If you use the PARTIAL statement, the partial correlations, not the raw correlations, are output.

UCORR

uncorrected correlation matrix. When you specify the NOINT option without the COV option in the PROC PRINCOMP statement, the OUTSTAT= data set contains a matrix of correlations not corrected for the means. However, if you also specify the COV option in the PROC PRINCOMP statement, this observation is omitted.

COV

covariances between each variable and the variable specified by the _ NAME_ variable. _ TYPE_ = ˜COV observations are produced only if you specify the COV option. If you use the PARTIAL statement, the partial covariances, not the raw covariances, are output.

UCOV

uncorrected covariance matrix. When you specify the NOINT and COV options in the PROC PRINCOMP statement, the OUTSTAT= data set contains a matrix of covariances not corrected for the means.

EIGENVAL

eigenvalues. If the N= option requested fewer than the maximum number of principal components , only the specified number of eigenvalues are produced, with missing values filling out the observation.

SCORE

eigenvectors. The _ NAME_ variable contains the name of the corresponding principal component as constructed from the PREFIX= option. The number of observations with _ TYPE_ = ˜SCORE equals the number of principal components computed. The eigenvectors have unit length unless you specify the STD option, in which case the unit-length eigenvectors are divided by the square roots of the eigenvalues to produce scores with unit standard deviations.

To obtain the principal component scores, if the COV option is not specified, these coefficients should be multiplied by the standardized data. With the COV option, these coefficients should be multiplied by the centered data. Means obtained from the observation with _ TYPE_ = MEAN and standard deviations obtained from the observation with _ TYPE_ = STD should be used for centering and standardizing the data.

USCORE

scoring coefficients to be applied without subtracting the mean from the raw variables. _ TYPE_ = ˜USCORE observations are produced when you specify the NOINT option in the PROC PRINCOMP statement.

To obtain the principal component scores, these coefficients should be multiplied by the data that are standardized by the uncorrected standard deviations obtained from the observation with _ TYPE_ = USTD .

RSQUARED

R-squares for each VAR variable as predicted by the PARTIAL variables

B

regression coefficients for each VAR variable as predicted by the PARTIAL variables. This observation is produced only if you specify the COV option.

STB

standardized regression coefficients for each VAR variable as predicted by the PARTIAL variables. If you specify the COV option, this observation is omitted.

The data set can be used with the SCORE procedure to compute principal component scores, or it can be used as input to the FACTOR procedure specifying METHOD=SCORE to rotate the components. If you use the PARTIAL statement, the scoring coefficients should be applied to the residuals, not the original variables.

Computational Resources

Let

  • n = number of observations

  • v = number of VAR variables

  • p = number of PARTIAL variables

  • c = number of components

  • The minimum allocated memory required is

    click to expand
  • bytes

  • The time required to compute the correlation matrix is roughly proportional to

    click to expand
  • The time required to compute eigenvalues is roughly proportional to v 3 .

  • The time required to compute eigenvectors is roughly proportional to cv 2 .

Displayed Output

The PRINCOMP procedure displays the following items if the DATA= data set is not TYPE=CORR, TYPE=COV, TYPE=SSCP, TYPE=UCORR, or TYPE=UCOV:

  • Simple Statistics, including the Mean and Std (standard deviation) for each variable. If you specify the NOINT option, the uncorrected standard deviation (UStD) is displayed.

  • the Correlation or, if you specify the COV option, the Covariance Matrix

The PRINCOMP procedure displays the following items if you use the PARTIAL statement.

  • Regression Statistics, giving the R -square and RMSE (root mean square error) for each VAR variable as predicted by the PARTIAL variables (not shown)

  • Standardized Regression Coefficients or, if you specify the COV option, Regression Coefficients for predicting the VAR variables from the PARTIAL variables (not shown)

  • the Partial Correlation Matrix or, if you specify the COV option, the Partial Covariance Matrix (not shown)

The PRINCOMP procedure displays the following item if you specify the COV option:

  • the Total Variance

The PRINCOMP procedure displays the following items unless you specify the NOPRINT option:

  • Eigenvalues of the correlation or covariance matrix, as well as the Difference between successive eigenvalues, the Proportion of variance explained by each eigenvalue, and the Cumulative proportion of variance explained

  • the Eigenvectors

ODS Table Names

PROC PRINCOMP assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table.

For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 58.1: ODS Tables Produced in PROC PRINCOMP

ODS Table Name

Description

Statement / Option

Corr

Correlation Matrix

default unless COV is specified

Cov

Covariance Matrix

default if COV is specified

Eigenvalues

Eigenvalues

default

Eigenvectors

Eigenvectors

default

NObsNVar

Number of Observations, Variables and (Partial) Variables

default

ParCorr

Partial Correlation Matrix

PARTIAL statement

ParCov

Uncorrected Partial Covariance Matrix

PARTIAL statement COV

RegCoef

Regression Coefficients

PARTIAL statement COV

RSquareRMSE

Regression Statistics: R-Squares and RMSEs

PARTIAL statement

SimpleStatistics

Simple Statistics

default

StdRegCoef

Standardized Regression Coefficients

PARTIAL statement

TotalVariance

Total Variance

PROC PRINCOMP COV

ODS Graphics (Experimental)

This section describes the use of ODS for creating graphics with the PRINCOMP procedure. These graphics are experimental in this release, meaning that both the graphical results and the syntax for specifying them are subject to change in a future release.

To request these graphs, you must specify the ODS GRAPHICS statement. For more information on the ODS GRAPHICS statement, see Chapter 15, Statistical Graphics Using ODS.

You can specify the N= option in the PRINCOMP statement to control the number of principal components to be displayed.

ODS Graph Names

PROC PRINCOMP assigns a name to each graph it creates using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 58.2.

Table 58.2: ODS Graphics Produced by PROC PRINCOMP

ODS Graph Name

Plot Description

Statement

EigenvaluePlot

Eigenvalues and Proportion Plot

default

PaintedPrinCompScoresPlot

Painted Component Scores Plot: 2nd versus 3rd, painted by 1st

default and nvar [ *] > = 3

PrinCompMatrixPlot

Component Scores Matrix Plot

default and nvar > = 2

PrinCompPatternPlot

Component Pattern Plot

default

PrinCompScoresPlot12

Component Scores Plot: 1st versus 2nd

default and nvar > = 2

PrinCompScoresPlot13

Component Scores Plot: 1st versus 3rd

default and nvar > = 3

[ *] number of variables to be analyzed

To request these graphs, you must specify the ODS GRAPHICS statement. For more information on the ODS GRAPHICS statement, see Chapter 15, Statistical Graphics Using ODS.




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net