Syntax


The following statements are available in PROC PRINCOMP.

  • PROC PRINCOMP < options > ;

    • BY variables ;

    • FREQ variable ;

    • PARTIAL variables ;

    • VAR variables ;

    • WEIGHT variable ;

Usually only the VAR statement is used in addition to the PROC PRINCOMP statement. The rest of this section provides detailed syntax information for each of the preceding statements, beginning with the PROC PRINCOMP statement. The remaining statements are described in alphabetical order.

PROC PRINCOMP Statement

  • PROC PRINCOMP < options > ;

The PROC PRINCOMP statement starts the PRINCOMP procedure and, optionally , identifies input and output data sets, specifies details of the analysis, or suppresses the display of output. You can specify the following options in the PROC PRINCOMP statement.

Task

Options

Specify data sets

DATA=

OUT=

OUTSTAT=

Specify details of analysis

COV

N=

NOINT

PREFIX=

SINGULAR=

STD

VARDEF=

Suppress the display of output

NOPRINT

The following list provides details on these options.

COVARIANCE

COV

  • computes the principal components from the covariance matrix. If you omit the COV option, the correlation matrix is analyzed . Use of the COV option causes variables with large variances to be more strongly associated with components with large eigenvalues and causes variables with small variances to be more strongly associated with components with small eigenvalues. You should not specify the COV option unless the units in which the variables are measured are comparable or the variables are standardized in some way.

DATA = SAS-data-set

  • specifies the SAS data set to be analyzed. The data set can be an ordinary SAS data set or a TYPE=ACE, TYPE=CORR, TYPE=COV, TYPE=FACTOR, TYPE=SSCP, TYPE=UCORR, or TYPE=UCOV data set (see Appendix A, Special SAS Data Sets, ). Also, the PRINCOMP procedure can read the _ TYPE_ = ˜COVB matrix from a TYPE=EST data set. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

N= number

  • specifies the number of principal components to be computed. The default is the number of variables. The value of the N= option must be an integer greater than or equal to zero.

NOINT

  • omits the intercept from the model. In other words, the NOINT option requests that the covariance or correlation matrix not be corrected for the mean. When you use the PRINCOMP procedure with the NOINT option, the covariance matrix and, hence, the standard deviations are not corrected for the mean. If you are interested in the standard deviations corrected for the mean, you can get them by using a procedure such as the MEANS procedure.

  • If you use a TYPE=SSCP data set as input to the PRINCOMP procedure and list the variable Intercept in the VAR statement, the procedure acts as if you had also specified the NOINT option. If you use NOINT and also create an OUTSTAT= data set, the data set is TYPE=UCORR or TYPE=UCOV rather than TYPE=CORR or TYPE=COV.

NOPRINT

  • suppresses the display of all output. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, Using the Output Delivery System.

OUT= SAS-data-set

  • creates an output SAS data set that contains all the original data as well as the principal component scores. If you want to create a permanent SAS data set, you must specify a two-level name (refer to SAS Language Reference: Concepts for information on permanent SAS data sets).

OUTSTAT= SAS-data-set

  • creates an output SAS data set that contains means, standard deviations, number of observations, correlations or covariances, eigenvalues, and eigenvectors. If you specify the COV option, the data set is TYPE=COV or TYPE=UCOV, depending on the NOINT option, and it contains covariances; otherwise , the data set is TYPE=CORR or TYPE=UCORR, depending on the NOINT option, and it contains correlations . If you specify the PARTIAL statement, the OUTSTAT= data set contains R -squares as well. If you want to create a permanent SAS data set, you must specify a two-level name (refer to SAS Language Reference: Concepts for information on permanent SAS data sets).

PREFIX= name

  • specifies a prefix for naming the principal components. By default, the names are Prin1 , Prin2 , , Prin n . If you specify PREFIX=ABC, the components are named ABC1 , ABC2 , ABC3 , and so on. The number of characters in the prefixplusthe number of digits required to designate the variables should not exceed the current name length defined by the VALIDVARNAME= system option.

SINGULAR= p

SING= p

  • specifies the singularity criterion, where 0 < p < 1. If a variable in a PARTIAL statement has an R-square as large as 1 ˆ’ p when predicted from the variables listed before it in the statement, the variable is assigned a standardized coefficient of 0. By default, SINGULAR=1E ˆ’ 8.

STANDARD

STD

  • standardizes the principal component scores in the OUT= data set to unit variance. If you omit the STANDARD option, the scores have variance equal to the corresponding eigenvalue . Note that STANDARD has no effect on the eigenvalues themselves .

VARDEF=DFNWDFWEIGHT WGT

  • specifies the divisor used in calculating variances and standard deviations. By default, VARDEF=DF. The following table displays the values and associated divisors.

Value

Divisor

Formula

 

DF

error degrees of freedom

n ˆ’ i

(before partialling)

   

n ˆ’ p ˆ’ i

(after partialling)

N

number of observations

n

 

WEIGHT WGT

sum of weights

 

WDF

sum of weights minus one

(before partialling)

   

click to expand

(after partialling)

In the formulas for VARDEF=DF and VARDEF=WDF, p is the number of degrees of freedom of the variables in the PARTIAL statement, and i is 0 if the NOINT option is specified and 1 otherwise.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC PRINCOMP to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the PRINCOMP procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

FREQ Statement

  • FREQ variable ;

The FREQ statement specifies a variable that provides frequencies for each observation in the DATA= data set. Specifically, if n is the value of the FREQ variable for a given observation, then that observation is used n times.

The analysis produced using a FREQ statement reflects the expanded number of observations. The total number of observations is considered equal to the sum of the FREQ variable. You could produce the same analysis (without the FREQ statement) by first creating a new data set that contains the expanded number of observations. For example, if the value of the FREQ variable is 5 for the first observation, the first 5 observations in the new data set would be identical. Each observation in the old data set would be replicated n j times in the new data set, where n j is the value of the FREQ variable for that observation.

If the value of the FREQ variable is missing or is less than one, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.

PARTIAL Statement

  • PARTIAL variables ;

If you want to analyze a partial correlation or covariance matrix, specify the names of the numeric variables to be partialled out in the PARTIAL statement. The PRINCOMP procedure computes the principal components of the residuals from the prediction of the VAR variables by the PARTIAL variables. If you request an OUT= or OUTSTAT= data set, the residual variables are named by prefixing the characters R_ to the VAR variables. Thus, the number of characters required to distinguish the VAR variables should be, at most, two characters fewer than the current name length defined by the VALIDVARNAME= system option.

VAR Statement

  • VAR variables ;

The VAR statement lists the numeric variables to be analyzed. If you omit the VAR statement, all numeric variables not specified in other statements are analyzed. If, however, the DATA= data set is TYPE=SSCP, the default set of variables used as VAR variables does not include Intercept so that the correlation or covariance matrix is constructed correctly. If you want to analyze Intercept as a separate variable, you should specify it in the VAR statement.

WEIGHT Statement

  • WEIGHT variable ;

If you want to use relative weights for each observation in the input data set, place the weights in a variable in the data set and specify the name in a WEIGHT statement. This is often done when the variance associated with each observation is different and the values of the weight variable are proportional to the reciprocals of the variances.

The observation is used in the analysis only if the value of the WEIGHT statement variable is nonmissing and is greater than zero.




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net