Special SAS Data Sets | SAS/STAT 9.1 Users Guide, Volumes 1-7

TYPE=CORR Data Sets

A TYPE=CORR data set usually contains a correlation matrix and possibly other statistics including means, standard deviations, and the number of observations in the original SAS data set from which the correlation matrix was computed.

Using PROC CORR with an output data set option (OUTP=, OUTS=, OUTK=, OUTH=, or OUT=) produces a TYPE=CORR data set. (For a complete description of the CORR procedure, refer to the SAS Procedures Guide ). The CALIS, CANCORR, CANDISC, DISCRIM , PRINCOMP, and VARCLUS procedures can also create a TYPE=CORR data set with additional statistics.

A TYPE=CORR data set containing a correlation matrix can be used as input for the ACECLUS, CALIS, CANCORR, CANDISC, DISCRIM, FACTOR, PRINCOMP, REG, SCORE, STEPDISC, and VARCLUS procedures.

The variables in a TYPE=CORR data set are

the BY variable or variables, if a BY statement is used with the procedure
_TYPE_ , a character variable of length eight with values identifying the type of statistic in each observation, such as 'MEAN', 'STD', 'N', and 'CORR'
_NAME_ , a character variable with values identifying the variable with which a given row of the correlation matrix is associated
other variables that were analyzed by the CORR procedure or other procedures

The usual values of the _TYPE_ variable are as follows .

_TYPE_	Contents
MEAN	mean of each variable analyzed
STD	standard deviation of each variable
N	number of observations used in the analysis. PROC CORR records the number of nonmissing values for each variable unless the NOMISS option is used. If the NOMISS option is specified, or if the CALIS, CANCORR, CANDISC, PRINCOMP, or VARCLUS procedure is used to create the data set, observations with one or more missing values are omitted from the analysis, so this value is the same for each variable and provides the number of observations with no missing values. If a FREQ statement is used with the procedure that creates the data set, the number of observations is the sum of the relevant values of the variable in the FREQ statement. Procedures that read a TYPE=CORR data set use the smallest value in the observation with _TYPE_ ='N' as the number of observations in the analysis.
SUMWGT	sum of the observation weights if a WEIGHT statement is used with the procedure that creates the data set. The values are determined analogously to those of the _TYPE_ ='N' observation.
CORR	correlations with the variable named by the _NAME_ variable

There may be additional observations in a TYPE=CORR data set depending on the particular procedure and options used.

If you create a TYPE=CORR data set yourself, the data set need not contain the observations with _TYPE_ ='MEAN', 'STD', 'N', or 'SUMWGT', unless you intend to use one of the discriminant procedures. Procedures assume that all of the means are 0.0 and that the standard deviations are 1.0 if this information is not in the TYPE=CORR data set. If _TYPE_ ='N' does not appear, most procedures assume that the number of observations is 10,000; significance tests and other statistics that depend on the number of observations are, of course, meaningless. In the CALIS and CANCORR procedures, you can use the EDF= option instead of including a _TYPE_= 'N' observation.

A correlation matrix is symmetric; that is, the correlation between X and Y is the same as the correlation between Y and X . The CALIS, CANCORR, CANDISC, CORR, DISCRIM, PRINCOMP, and VARCLUS procedures output the entire correlation matrix. If you create the data set yourself, you need to include only one of the two occurrences of the correlation between two variables; the other may be given a missing value.

If you create a TYPE=CORR data set yourself, the _TYPE_ and _NAME_ variables are not necessary except for use with the discriminant procedures and PROC SCORE. If there is no _TYPE_ variable, then all observations are assumed to contain correlations. If there is no _NAME_ variable, the first observation is assumed to correspond to the first variable in the analysis, the second observation to the second variable, and so on. However, if you omit the _NAME_ variable, you will not be able to analyze arbitrary subsets of the variables or list the variables in a VAR or MODEL statement in a different order.

Example A.1: A TYPE=CORR Data Set Produced by PROC CORR

See Output A.1.1 for an example of a TYPE=CORR data set produced by the following SAS statements. Output A.1.2 displays partial output from the CONTENTS procedure, which indicates that the 'Data Set Type' is 'CORR'.

  title 'Five Socioeconomic Variables';   data SocEcon;   title2 'Harman (1976), Modern Factor Analysis, 3rd ed';   input pop school employ services house;   datalines;   5700     12.8      2500      270       25000   1000     10.9      600       10        10000   3400     8.8       1000      10        9000   3800     13.6      1700      140       25000   4000     12.8      1600      140       25000   8200     8.3       2600      60        12000   1200     11.4      400       10        16000   9100     11.5      3300      60        14000   9900     12.5      3400      180       18000   9600     13.7      3600      390       25000   9600     9.6       3300      80        12000   9400     11.4      4000      100       13000   ;   proc corr noprint out=corrcorr;   run;   proc print data=corrcorr;   run;   proc contents data=corrcorr;   run;

Output A.1.1: A TYPE=CORR Data Set Produced by PROC CORR

  Five Socioeconomic Variables   Harman (1976), Modern Factor Analysis, 3rd ed   Obs   _TYPE_   _NAME_         pop    school    employ   services      house   1     MEAN               6241.67   11.4417   2333.33    120.833   17000.00   2     STD                3439.99    1.7865   1241.21    114.928    6367.53   3     N                    12.00   12.0000     12.00     12.000      12.00   4     CORR    pop           1.00    0.0098      0.97      0.439       0.02   5     CORR    school        0.01    1.0000      0.15      0.691       0.86   6     CORR    employ        0.97    0.1543      1.00      0.515       0.12   7     CORR    services      0.44    0.6914      0.51      1.000       0.78   8     CORR    house         0.02    0.8631      0.12      0.778       1.00

Output A.1.2: Contents of a TYPE=CORR Data Set

  The CONTENTS Procedure   Data Set Name WORK.CORRCORR                      Observations          8   Member Type    DATA                              Variables             7   Engine         V8                                Indexes               0   Created        13:56 Wednesday, July 25, 2001    Observation Length    56   Last Modified  13:56 Wednesday, July 25, 2001    Deleted Observations  0   Protection                                       Compressed            NO   Data Set Type  CORR                              Sorted                NO   Label          Pearson Correlation Matrix

Example A.2: Creating a TYPE=CORR Data Set in a DATA Step

This example creates a TYPE=CORR data set by reading a correlation matrix in a DATA step. Output A.2.2 shows the resulting data set.

Output A.2.2: A TYPE=CORR Data Set Created by a DATA Step

  Five Socioeconomic Variables   Obs    type_    _name_        pop       school     employ    services    house   1     corr     POP         1.00000     .          .           .           .   2     corr     SCHOOL      0.00975    1.00000     .           .           .   3     corr     EMPLOY      0.97245    0.15428    1.00000      .           .   4     corr     SERVICES    0.43887    0.69141    0.51472     1.00000      .   5     corr     HOUSE       0.02241    0.86307    0.12193     0.77765      1

  title 'Five Socioeconomic Variables';   data datacorr(type=corr);   infile cards missover;   type_='corr';   input _name_ $ pop school employ services house;   datalines;   POP        1.00000   SCHOOL     0.00975   1.00000   EMPLOY     0.97245   0.15428   1.00000   SERVICES   0.43887   0.69141   0.51472   1.00000   HOUSE      0.02241   0.86307   0.12193   0.77765   1.00000   ;   run;   proc print data=datacorr;   run;

TYPE=UCORR Data Sets

A TYPE=UCORR data set is almost identical to a TYPE=CORR data set, except that the correlations are uncorrected for the mean. The corresponding value of the _TYPE_ variable is 'UCORR' instead of 'CORR'. Uncorrected standard deviations are in observations with _TYPE_ ='USTD'.

A TYPE=UCORR data set can be used as input for every SAS/STAT procedure that uses a TYPE=CORR data set, except for the CANDISC, DISCRIM, and STEPDISC procedures. TYPE=UCORR data sets can be created by the CALIS, CANCORR, PRINCOMP, and VARCLUS procedures.

TYPE=COV Data Sets

A TYPE=COV data set is similar to a TYPE=CORR data set except that it has _TYPE_ ='COV' observations containing covariances instead of or in addition to _TYPE_ ='CORR' observations containing correlations. The CALIS and PRINCOMP procedures create a TYPE=COV data set if the COV option is used. You can also create a TYPE=COV data set by using PROC CORR with the COV and NOCORR options and specifying the data set option TYPE=COV in parentheses following the name of the output data set. You can use only the OUTP= or OUT= options to create a TYPE=COV data set with PROC CORR.

Another way to create a TYPE=COV data set is to read a covariance matrix in a data set, in the same manner as shown in Example A.2 on page 4896 for a TYPE=CORR data set.

TYPE=COV data sets are used by the same procedures that use TYPE=CORR data sets.

TYPE=UCOV Data Sets

A TYPE=UCOV data set is similar to a TYPE=COV data set, except that the covariances are uncorrected for the mean. Also, the corresponding value of the _TYPE_ variable is 'UCOV' instead of 'COV'.

A TYPE=UCOV data set can be used as input for every SAS/STAT procedure that uses a TYPE=COV data set, except for the CANDISC, DISCRIM, and STEPDISC procedures. TYPE=UCOV data sets can be created by the CALIS and PRINCOMP procedures.

TYPE=SSCP Data Sets

A TYPE=SSCP data set contains an uncorrected sum of squares and crossproducts (SSCP) matrix. TYPE=SSCP data sets are produced by PROC REG when the OUTSSCP= option is specified in the PROC REG statement. You can also create a TYPE=SSCP data set by using PROC CORR with the SSCP option and specifying the data set option TYPE=SSCP in parentheses following the name of the OUTP= or OUT= data set. You can also create TYPE=SSCP data sets in a DATA step; in this case, TYPE=SSCP must be specified as a data set option.

The variables in a TYPE=SSCP data set include those found in a TYPE=CORR data set. In addition, there is a variable called Intercept that contains crossproducts for the intercept (sums of the variables). The SSCP matrix is stored in observations with _TYPE_ ='SSCP', including a row with _NAME_ ='Intercept'. PROC REG also outputs an observation with _TYPE_ ='N'. PROC CORR includes observations with _TYPE_ ='MEAN' and _TYPE_ ='STD' as well. TYPE=SSCP data sets are used by the same procedures that use TYPE=CORR data sets.

Example A.3: A TYPE=SSCP Data Set Produced by PROC REG

Output A.3.1 shows a TYPE=SSCP data set produced by PROC REG from the SocEcon data set created in Example A.1 on page 4895.

  proc reg data=SocEcon outsscp=regsscp;   model house=pop school employ services / noprint;   run;   proc print data=regsscp;   run;

Output A.3.1: A TYPE=SSCP Data Set Produced by PROC REG

  Obs  _TYPE_  _NAME_     Intercept         pop      school     employ  services      house   1    SSCP   Intercept       12.0       74900      137.30      28000      1450     204000   2    SSCP   pop          74900.0   597670000   857640.00  220440000  10959000 1278700000   3    SSCP   school         137.3      857640     1606.05     324130     18152    2442100   4    SSCP   employ       28000.0   220440000   324130.00   82280000   4191000  486600000   5    SSCP   services      1450.0    10959000    18152.00    4191000    320500   30910000   6    SSCP   house       204000.0  1278700000  2442100.00  486600000  30910000 3914000000   7    N                      12.0          12       12.00         12        12         12

TYPE=CSSCP Data Sets

A TYPE=CSSCP data set contains a corrected sum of squares and crossproducts (CSSCP) matrix. TYPE=CSSCP data sets are created by using the CORR procedure with the CSSCP option and specifying the data set option TYPE=CSSCP in parentheses following the name of the OUTP= or OUT= data set. You can also create TYPE=CSSCP data sets in a DATA step; in this case, TYPE=CSSCP must be specified as a data set option.

The variables in a TYPE=CSSCP data set are the same as those found in a TYPE=SSCP data set, except that there is not a variable called Intercept or a row with _NAME_ ='Intercept'.

TYPE=CSSCP data sets are read by only the CANDISC, DISCRIM, and STEPDISC procedures.

TYPE=EST Data Sets

A TYPE=EST data set contains parameter estimates. The CALIS, CATMOD, LIFEREG, LOGISTIC, NLIN, ORTHOREG, PHREG, PROBIT, and REG procedures create TYPE=EST data sets when the OUTEST= option is specified. A TYPE=EST data set produced by PROC LIFEREG, PROC ORTHOREG, or PROC REG can be used with PROC SCORE to compute residuals or predicted values.

The variables in a TYPE=EST data set include

the BY variables, if a BY statement is used
_TYPE_ , a character variable of length eight, that indicates the type of estimate. The values depend on which procedure created the data set. Usually a value of 'PARM' or 'PARMS' indicates estimated regression coefficients, and a value of 'COV' or 'COVB' indicates estimated covariances of the parameter estimates. Some procedures, such as PROC NLIN, have other values of _TYPE_ for special purposes.
_NAME_ , a character variable that contains the values of the names of the rows of the covariance matrix when the procedure outputs the covariance matrix of the parameter estimates.
variables that contain the parameter estimates, usually the same variables that appear in the VAR statement or in any MODEL statement. See Chapter 19, 'The CALIS Procedure,' Chapter 22, 'The CATMOD Procedure,' and Chapter 50, 'The NLIN Procedure,' for details on the variable names used in output data sets created by those procedures.

Other variables can be included depending on the particular procedure and options used.

Example A.4: A TYPE=EST Data Set Produced by PROC REG

Output A.4.1 shows the TYPE=EST data set produced by the following statements:

  proc reg data=SocEcon outest=regest covout;   full: model house=pop school employ services / noprint;   empser: model house=employ services / noprint;   run;   proc print data=regest;   run;

Output A.4.1: A TYPE=EST Data Set Produced by PROC REG

  Obs    _MODEL_    _TYPE_    _NAME_       _DEPVAR_     _RMSE_        Intercept   1    full       PARMS                   house      3122.03         -8074.21   2    full       COV       Intercept     house      3122.03     109408014.44   3    full       COV       pop           house      3122.03         -9157.04   4    full       COV       school        house      3122.03      -9784744.54   5    full       COV       employ        house      3122.03         20612.49   6    full       COV       services      house      3122.03        102764.89   7    empser     PARMS                   house      3789.96         15021.71   8    empser     COV       Intercept     house      3789.96       5824096.19   9    empser     COV       employ        house      3789.96         -1915.99   10    empser     COV       services      house      3789.96         -1294.94   Obs         pop         school       employ      services    house   1        0.65        2140.10   2.92         27.81   1   2   9157.04   9784744.54     20612.49     102764.89        .   3        2.32         852.86   6.20   5.20       .   4      852.86      907886.36   2042.24   9608.59       .   5   6.20   2042.24        17.44          6.50       .   6   5.20       -9608.59          6.50        202.56       .   7         .              .   1.94         53.88      -1   8         .              .   1915.99   1294.94       .   9         .              .           1.15   6.41        .   10         .              .   6.41        134.49        .

TYPE=ACE Data Sets

A TYPE=ACE data set is created by the ACECLUS procedure, and it contains the approximate within-cluster covariance estimate, as well as eigenvalues and eigenvectors from a canonical analysis, among other statistics. It can be used as input to the ACECLUS procedure to initialize another execution of PROC ACECLUS. It can also be used to compute canonical variable scores with the SCORE procedure and as input to the FACTOR procedure, specifying METHOD=SCORE, to rotate the canonical variables. See Chapter 16, 'The ACECLUS Procedure,' for details.

TYPE=DISTANCE Data Sets

You can create a TYPE=DISTANCE data set containing distance or dissimilarity measures using the DISTANCE procedure. The proximity measures are stored as a lower triangular matrix or a square matrix in the OUT= data set (depending on the SHAPE= option). See Chapter 26, 'The DISTANCE Procedure,' for details. You can also create a TYPE=DISTANCE data set in a DATA step by reading or computing a lower triangular or symmetric matrix of dissimilarity values, such as a chart of mileage between cities. The number of observations must be equal to the number of variables used in the analysis. This type of data set is used as input by the CLUSTER and MODECLUS procedures. PROC CLUSTER ignores the upper triangular portion of a TYPE=DISTANCE data set and assumes that all main diagonal values are zero, even if they are missing. PROC MODECLUS uses the entire distance matrix and does not require the matrix to be symmetric. See Chapter 23, 'The CLUSTER Procedure,' and Chapter 47, 'The MODECLUS Procedure,' for examples and details.

TYPE=FACTOR Data Sets

A TYPE=FACTOR data set is created by PROC FACTOR when the OUTSTAT= option is specified. The CALIS, CANCORR, FACTOR, PRINCOMP, SCORE, and VARCLUS procedures can use TYPE=FACTOR data sets as input. The variables are the same as in a TYPE=CORR data set. The statistics include means, standard deviations, sample size , correlations, eigenvalues, eigenvectors, factor patterns, residual correlations, scoring coefficients, and others depending on the options specified. See Chapter 27, 'The FACTOR Procedure,' for details.

When the NOINT option is used with the OUTSTAT= option in PROC FACTOR, the value of the _TYPE_ variable is set to 'USCORE' instead of 'SCORE' to indicate that the scoring coefficients have not been corrected for the mean. If this data set is used with the SCORE procedure, the value of the _TYPE_ variable tells PROC SCORE whether or not to subtract the mean from the scoring coefficients.

TYPE=RAM Data Sets

The CALIS procedure creates and accepts as input a TYPE=RAM data set. This data set contains the model specification and the computed parameter estimates. A TYPE=RAM data set is intended to be reused as an input data set to specify good initial values in subsequent analyses by PROC CALIS. See Chapter 19, 'The CALIS Procedure,' for details.

TYPE=WEIGHT Data Sets

The CALIS procedure creates and accepts as input a TYPE=WEIGHT data set. This data set contains the weight matrix used in generalized, weighted, or diagonally weighted least-squares estimation. See Chapter 19, 'The CALIS Procedure,' for details.

TYPE=LINEAR Data Sets

A TYPE=LINEAR data set contains the coefficients of a linear function of the variables in observations with _TYPE_ ='LINEAR'.

The DISCRIM procedure stores linear discriminant function coefficients in a TYPE=LINEAR data set when you specify METHOD=NORMAL (the default method), POOL=YES, and an OUTSTAT= data set; the data set can be used in a subsequent invocation of PROC DISCRIM to classify additional observations. Many other statistics can be included depending on the options used. See Chapter 25, 'The DISCRIM Procedure,' for details.

TYPE=QUAD Data Sets

A TYPE=QUAD data set contains the coefficients of a quadratic function of the variables in observations with _TYPE_ ='QUAD'.

The DISCRIM procedure stores quadratic discriminant function coefficients in a TYPE=QUAD data set when you specify METHOD=NORMAL (the default method), POOL=NO, and an OUTSTAT= data set; the data set can be used in a subsequent invocation of PROC DISCRIM to classify additional observations. Many other statistics can be included depending on the options used. See Chapter 25, 'The DISCRIM Procedure,' for details.

TYPE=MIXED Data Sets

A TYPE=MIXED data set contains coefficients of either a linear or a quadratic function, or both if there are BY groups.

The DISCRIM procedure produces a TYPE=MIXED data set when you specify METHOD=NORMAL (the default method), POOL=TEST, and an OUTSTAT= data set. See Chapter 25, 'The DISCRIM Procedure,' for details.