Syntax


The following statements are available in the CORRESP procedure.

  • PROC CORRESP < options > ;

    • TABLES < row- variables , > column-variables ;

    • VAR variables ;

    • BY variables ;

    • ID variable ;

    • SUPPLEMENTARY variables ;

    • WEIGHT variable ;

There are two separate forms of input to PROC CORRESP. One form is specified in the TABLES statement, the other in the VAR statement. You must specify either the TABLES or the VAR statement, but not both, each time you run PROC CORRESP.

Specify the TABLES statement if you are using raw, categorical data, the levels of which define the rows and columns of a table.

Specify the VAR statement if your data are already in tabular form. PROC CORRESP is generally more efficient with VAR statement input than with TABLES statement input.

The other statements are optional. Each of the statements is explained in alphabetical order following the PROC CORRESP statement. All of the options in PROC CORRESP can be abbreviated to their first three letters , except for the OUTF= option. This is a special feature of PROC CORRESP and is not generally true of SAS/STAT procedures.

PROC CORRESP Statement

  • PROC CORRESP < options > ;

The PROC CORRESP statement invokes the procedure. You can specify the following options in the PROC CORRESP statement. These options are described following Table 24.1.

Table 24.1: Summary of PROC CORRESP Statement Options

Task

Options

Specify data sets

 

specify input SAS data set

DATA=

 

specify output coordinate SAS data set

OUTC=

 

specify output frequency SAS data set

OUTF=

Compute row and column coordinates

 

specify the number of dimensions or axes

DIMENS=

 

perform multiple correspondence analysis

MCA

 

standardize the row and column coordinates

PROFILE=

Construct tables

 

specify binary table

BINARY

 

specify cross levels of TABLES variables

CROSS=

 

specify input data in PROC FREQ output

FREQOUT

 

include observations with missing values

MISSING

Display output

 

display all output

ALL

 

display inertias adjusted by Benz cri s method

BENZECRI

 

display cell contributions to chi-square

CELLCHI2

 

display column profile matrix

CP

 

display observed minus expected values

DEVIATION

 

display chi-square expected values

EXPECTED

 

display inertias adjusted by Greenacre s method

GREENACRE

 

suppress the display of column coordinates

NOCOLUMN=

 

suppress the display of all output

NOPRINT

 

suppress the display of row coordinates

NOROW=

 

display contingency table of observed frequencies

OBSERVED

 

display percentages or frequencies

PRINT=

 

display row profile matrix

RP

 

suppress all point and coordinate statistics

SHORT

 

display unadjusted inertias

UNADJUSTED

Other tasks

 

specify rarely used column coordinate standardizations

COLUMN=

 

specify minimum inertia

MININERTIA=

 

specify number of classification variables

NVARS=

 

specify rarely used row coordinate standardizations

ROW=

 

specify effective zero

SINGULAR=

 

include level source in the OUTC= data set

SOURCE

The display options control the amount of displayed output. The CELLCHI2, EXPECTED, and DEVIATION options display additional chi-square information. See the Details section on page 1082 for more information. The unit of the matrices displayed by the CELLCHI2, CP, DEVIATION, EXPECTED, OBSERVED, and RP options depends on the value of the PRINT= option. The table construction options control the construction of the contingency table; these options are valid only when you also specify a TABLES statement.

You can specify the following options in the PROC CORRESP statement. They are described in alphabetical order.

ALL

  • is equivalent to specifying the OBSERVED, RP, CP, CELLCHI2, EXPECTED, and DEVIATION options. Specifying the ALL option does not affect the PRINT= option. Therefore, only frequencies (not percentages) for these options are displayed unless you specify otherwise with the PRINT= option.

BENZECRI BEN

  • displays adjusted inertias when performing multiple correspondence analysis. By default, unadjusted inertias, the usual inertias from multiple correspondence analysis, are displayed. However, adjusted inertias using a method proposed by Benz cri (1979) and described by Greenacre (1984, p. 145) can be displayed by specifying the BENZECRI option. Specify the UNADJUSTED option to output the usual table of unadjusted inertias as well. See the section MCA Adjusted Inertias on page 1102 for more information.

BINARY

  • enables you to create binary tables easily. When you specify the BINARY option, specify only column variables in the TABLES statement. Each input data set observation forms a single row in the constructed table.

CELLCHI2 CEL

  • displays the contribution to the total chi-square test statistic for each cell. See also the descriptions of the DEVIATION, EXPECTED, and OBSERVED options.

COLUMN=B BD DB DBD DBD1/2 DBID1/2

COL=B BD DB DBD DBD1/2 DBID1/2

  • provides other standardizations of the column coordinates. The COLUMN= option is rarely needed. Typically, you should use the PROFILE= option instead (see the section The PROFILE=, ROW=, and COLUMN= Options on page 1099). By default, COLUMN=DBD.

CP

  • displays the column profile matrix. Column profiles contain the observed conditional probabilities of row membership given column membership. See also the RP option.

CROSS=BOTH COLUMN NONE ROW

CRO=BOT COL NON ROW

  • specifies the method of crossing (factorially combining) the levels of the TABLES variables. The default is CROSS=NONE.

    • CROSS=NONE causes each level of every row variable to become a row label and each level of every column variable to become a column label.

    • CROSS=ROW causes each combination of levels for all row variables to become a row label, whereas each level of every column variable becomes a column label.

    • CROSS=COLUMN causes each combination of levels for all column variables to become a column label, whereas each level of every row variable becomes a row label.

    • CROSS=BOTH causes each combination of levels for all row variables to become a row label and each combination of levels for all column variables to become a column label.

DATA= SAS-data-set

  • specifies the SAS data set to be used by PROC CORRESP. If you do not specify the DATA= option, PROC CORRESP uses the most recently created SAS data set.

DEVIATION DEV

  • displays the matrix of deviations between the observed frequency matrix and the product of its row marginals and column marginals divided by its grand frequency. For ordinary two-way contingency tables, these are the observed minus expected frequencies under the hypothesis of row and column independence and are components of the chi-square test statistic. See also the CELLCHI2, EXPECTED, and OBSERVED options.

DIMENS= n

DIM= n

  • specifies the number of dimensions or axes to use. The default is DIMENS=2. The maximum value of the DIMENS= option in an ( n r n c ) table is n r ˆ’ 1 or n c ˆ’ 1, whichever is smaller. For example, in a table with 4 rows and 5 columns, the maximum specification is DIMENS=3. If your table has 2 rows or 2 columns, specify DIMENS=1.

EXPECTED EXP

  • displays the product of the row marginals and the column marginals divided by the grand frequency of the observed frequency table. For ordinary two-way contingency tables, these are the expected frequencies under the hypothesis of row and column independence and are components of the chi-square test statistic. In other situations, this interpretation is not strictly valid. See also the CELLCHI2, DEVIATION, and OBSERVED options.

FREQOUT FRE

  • indicates that the PROC CORRESP input data set has the same form as an output data set from the FREQ procedure, even if it was not directly produced by PROC FREQ. The FREQOUT option enables PROC CORRESP to take shortcuts in constructing the contingency table.

    When you specify the FREQOUT option, you must also specify a WEIGHT statement. The cell frequencies in a PROC FREQ output data set are contained in a variable called COUNT , so specify COUNT in a WEIGHT statement with PROC CORRESP. The FREQOUT option may produce unexpected results if the DATA= data set is structured incorrectly. Each of the two variable lists specified in the TABLES statement must consist of a single variable, and observations must be grouped by the levels of the row variable and then by the levels of the column variable. It is not required that the observations be sorted by the row variable and column variable, but they must be grouped consistently. There must be as many observations in the input data set (or BY group ) as there are cells in the completed contingency table. Zero cells must be specified with zero weights. When you use PROC FREQ to create the PROC CORRESP input data set, you must specify the SPARSE option in the FREQ procedure s TABLES statement so that the zero cells are written to the output data set.

GREENACRE GRE

  • displays adjusted inertias when performing multiple correspondence analysis. By default, unadjusted inertias, the usual inertias from multiple correspondence analysis, are displayed. However, adjusted inertias using a method proposed by Greenacre (1994, p. 156) can be displayed by specifying the GREENACRE option. Specify the UNADJUSTED option to output the usual table of unadjusted inertias as well. See the section MCA Adjusted Inertias on page 1102 for more information.

MCA

  • requests a multiple correspondence analysis. This option requires that the input table be a Burt table, which is a symmetric matrix of crosstabulations among several categorical variables. If you specify the MCA option and a VAR statement, you must also specify the NVARS= option, which gives the number of categorical variables that were used to create the table. With raw categorical data, if you want results for the individuals as well as the categories, use the BINARY option instead.

MININERTIA= n

MIN= n

  • specifies the minimum inertia (0 n 1) used to create the best tables ”the indicator of which points best explain the inertia of each dimension. By default, MININERTIA=0.8. See the Algorithm and Notation section on page 1097 for more information.

MISSING MIS

  • specifies that observations with missing values for the TABLES statement variables are included in the analysis. Missing values are treated as a distinct level of each categorical variable. By default, observations with missing values are excluded from the analysis.

NOCOLUMN < = BOTH DATA PRINT >

NOC < =BOTDATPRI >

  • suppresses the display of the column coordinates and statistics and omits them from the output coordinate data set.

    BOTH

    suppresses all column information from both the SAS listing and the output data set. The NOCOLUMN option is equivalent to the option NOCOLUMN=BOTH.

    DATA

    suppresses all column information from the output data set.

    PRINT

    suppresses all column information from the SAS listing.

NOPRINT NOP

  • suppresses the display of all output. This option is useful when you need only an output data set. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, Using the Output Delivery System.

NOROW < = BOTH DATA PRINT >

NOR < =BOTDATPRI >

  • suppresses the display of the row coordinates and statistics and omits them from the output coordinate data set.

    BOTH

    suppresses all row information from both the SAS listing and the output data set. The NOROW option is equivalent to the option NOROW=BOTH.

    DATA

    suppresses all row information from the output data set.

    PRINT

    suppresses all row information from the SAS listing.

  • The NOROW option can be useful when the rows of the contingency table are replications.

NVARS= n

NVA= n

  • specifies the number of classification variables that were used to create the Burt table. For example, if the Burt table was originally created with the statement

      tables a b c;  

    you must specify NVARS=3 to read the table with a VAR statement.

    The NVARS= option is required when you specify both the MCA option and a VAR statement. (See the section VAR Statement on page 1081 for an example.)

OBSERVED OBS

  • displays the contingency table of observed frequencies and its row, column, and grand totals. If you do not specify the OBSERVED or ALL option, the contingency table is not displayed.

OUTC= SAS-data-set

OUT= SAS-data-set

  • creates an output coordinate SAS data set to contain the row, column, supplementary observation, and supplementary variable coordinates. This data set also contains the masses, squared cosines, quality of each point s representation in the DIMENS= n dimensional display, relative inertias, partial contributions to inertia, and best indicators.

OUTF= SAS-data-set

  • creates an output frequency SAS data set to contain the contingency table, row, and column profiles, the expected values, and the observed minus expected values and contributions to the chi-square statistic.

PRINT=BOTH FREQ PERCENT

PRI=BOT FRE PER

  • affects the OBSERVED, RP, CP, CELLCHI2, EXPECTED, and DEVIATION options. The default is PRINT=FREQ.

    • The PRINT=FREQ option displays output in the appropriate raw or natural units. (That is, PROC CORRESP displays raw frequencies for the OBSERVED option, relative frequencies with row marginals of 1.0 for the RP option, and so on.)

    • The PRINT=PERCENT option scales results to percentages for the display of the output. (All elements in the OBSERVED matrix sum to 100.0, the row marginals are 100.0 for the RP option, and so on.)

    • The PRINT=BOTH option displays both percentages and frequencies.

PROFILE=BOTH COLUMN NONE ROW

PRO=BOT COL NON ROW

  • specifies the standardization for the row and column coordinates. The default is PROFILE=BOTH.

    PROFILE=BOTH

    specifies a standard correspondence analysis, which jointly displays the principal row and column coordinates. Row coordinates are computed from the row profile matrix, and column coordinates are computed from the column profile matrix.

    PROFILE=ROW

    specifies a correspondence analysis of the row profile matrix. The row coordinates are weighted centroids of the column coordinates.

    PROFILE=COLUMN

    specifies a correspondence analysis of the column profile matrix. The column coordinates are weighted centroids of the row coordinates.

    PROFILE=NONE

    is rarely needed. Row and column coordinates are the generalized singular vectors, without the customary standardizations.

ROW=A AD DA DAD DAD1/2 DAID1/2

  • provides other standardizations of the row coordinates. The ROW= option is rarely needed. Typically, you should use the PROFILE= option instead (see the section The PROFILE=, ROW=, and COLUMN= Options on page 1099). By default, ROW=DAD.

RP

  • displays the row profile matrix. Row profiles contain the observed conditional probabilities of column membership given row membership. See also the CP option.

SHORT SHO

  • suppresses the display of all point and coordinate statistics except the coordinates. The following information is suppressed: each point s mass, relative contribution to the total inertia, and quality of representation in the DIMENS= n dimensional display; the squared cosines of the angles between each axis and a vector from the origin to the point; the partial contributions of each point to the inertia of each dimension; and the best indicators.

SINGULAR= n

SIN= n

  • specifies the largest value that is considered to be within rounding error of zero. The default value is 1E ˆ’ 8. This parameter is used when checking for zero rows and columns, when checking Burt table diagonal sums for equality, when checking denominators before dividing, and so on. Typically, you should not assign a value outside the range 1E ˆ’ 6 to 1E ˆ’ 12.

SOURCE SOU

  • adds the variable _VAR_ , which contains the name or label of the variable corresponding to the current level, to the OUTC= and OUTF= data sets.

UNADJUSTED UNA

  • displays unadjusted inertias when performing multiple correspondence analysis. By default, unadjusted inertias, the usual inertias from multiple correspondence analysis, are displayed. However, if adjusted inertias are requested by either the GREENACRE option or the BENZECRI option, then the unadjusted inertia table is not displayed unless the UNADJUSTED option is specified. See the section MCA Adjusted Inertias on page 1102 for more information.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC CORRESP to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the CORRESP procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

ID Statement

  • ID variable ;

You specify the ID statement only in conjunction with the VAR statement. You cannot specify the ID statement when you use the TABLES statement or the MCA option. When you specify an ID variable, PROC CORRESP labels the rows of the tables with the ID values and places the ID variable in the output data set.

SUPPLEMENTARY Statement

  • SUPPLEMENTARY variables ;

  • SUP variables ;

The SUPPLEMENTARY statement specifies variables that are to be represented as points in the joint row and column space but that are not used when determining the locations of the other, active row and column points of the contingency table. Supplementary observations on supplementary variables are ignored in simple correspondence analysis but are needed to compute the squared cosines for multiple correspondence analysis. Variables that are specified in the SUPPLEMENTARY statement must also be specified in the TABLES or VAR statement.

When you specify a VAR statement, each SUPPLEMENTARY variable indicates one supplementary column of the table. Supplementary variables must be numeric with VAR statement input.

When you specify a TABLES statement, each SUPPLEMENTARY variable indicates a set of rows or columns of the table that is supplementary. Supplementary variables can be either character or numeric with TABLES statement input.

TABLES Statement

  • TABLES < row-variables, > column-variables ;

The TABLES statement instructs PROC CORRESP to create a contingency table, Burt table, or binary table from the values of two or more categorical variables. The TABLES statement specifies classification variables that are used to construct the rows and columns of the contingency table. The variables can be either numeric or character. The variable lists in the TABLES statement and the CROSS= option together determine the row and column labels of the contingency table.

You can specify both row variables and column variables separated by a comma, or you can specify only column variables and no comma. If you do not specify row variables (that is, if you list variables but do not use the comma as a delimiter ), then you should specify either the MCA or the BINARY option. With the MCA option, PROC CORRESP creates a Burt table, which is a crosstabulation of each variable with itself and every other variable. The Burt table is symmetric. With the BINARY option, PROC CORRESP creates a binary table, which consists of one row for each input data set observation and one column for each category of each TABLES statement variable. If the binary matrix is Z , then the Burt table is Z ² Z . Specifying the BINARY option with the NOROWS option produces the same results as specifying the MCA option (except for the chi-square statistics).

See Figure 24.3 for an example or see the section The MCA Option on page 1101 for a detailed description of Burt tables.

start figure
  Age   Age     Sex    Sex  Height  Height    Hair     Hair     Hair   Obs  Old  Young  Female  Male   Short   Tall    Blond    Brown    White    Age    Sex    Height  Hair   Name   1   1     0      0.0    1.0     1       0    0.00000  0.00000  1.00000  Old    Male    Short   White  Jones   2   0     1      1.0    0.0     0       1    0.00000  1.00000  0.00000  Young  Female  Tall    Brown  Smith   3   1     0      0.0    1.0     1       0    0.00000  1.00000  0.00000  Old    Male    Short   Brown  Kasavitz   4   1     0      1.0    0.0     0       1    0.00000  0.00000  1.00000  Old    Female  Tall    White  Ernst   5   1     0      1.0    0.0     1       0    0.00000  1.00000  0.00000  Old    Female  Short   Brown  Zannoria   6   0     1      0.0    1.0     0       1    1.00000  0.00000  0.00000  Young  Male    Tall    Blond  Spangel   7   0     1      0.0    1.0     0       1    0.00000  1.00000  0.00000  Young  Male    Tall    Brown  Myers   8   1     0      0.0    1.0     1       0    1.00000  0.00000  0.00000  Old    Male    Short   Blond  Kasinski   9   0     1      1.0    0.0     1       0    1.00000  0.00000  0.00000  Young  Female  Short   Blond  Colman   10   1     0      0.0    1.0     0       1    0.00000  1.00000  0.00000  Old    Male    Tall    Brown  Delafave   11   0     1      0.0    1.0     0       1    0.00000  1.00000  0.00000  Young  Male    Tall    Brown  Singer   12   1     0      0.5    0.5     1       0    0.33333  0.33333  0.33333  Old            Short          Igor  
end figure

Figure 24.3: Fuzzy Coding of Missing Values

You can use the WEIGHT statement with the TABLES statement to read category frequencies. Specify the SUPPLEMENTARY statement to name variables with categories that are supplementary rows or columns. You cannot specify the ID or VAR statement with the TABLES statement.

See the section Using the TABLES Statement on page 1088 for an example.

VAR Statement

  • VAR variables ;

You should specify the VAR statement when your data are in tabular form. The VAR variables must be numeric. The VAR statement instructs PROC CORRESP to read an existing contingency table, binary indicator matrix, fuzzy-coded indicator matrix, or Burt table, rather than raw data. See the Algorithm and Notation section on page 1097 for a description of a binary indicator matrix and a fuzzy-coded indicator matrix.

You can specify the WEIGHT statement with the VAR statement to read category frequencies and designate supplementary rows. Specify the SUPPLEMENTARY statement to name supplementary variables. You cannot specify the TABLES statement with the VAR statement.

WEIGHT Statement

  • WEIGHT variable ;

The WEIGHT statement specifies weights for each observation and indicates supplementary observations for simple correspondence analyses with VAR statement input. You can include only one WEIGHT statement, and the weight variable must be numeric.

If you omit the WEIGHT statement, each observation contributes a value of 1 to the frequency count for its category. That is, each observation represents one subject. When you specify a WEIGHT statement, each observation contributes the value of the weighting variable for that observation. For example, a weight of 3 means that the observation represents 3 subjects. Weight values are not required to be integers.

You can specify the WEIGHT statement with a TABLES statement to indicate category frequencies, as in the following example:

  proc freq;   tables a*b / out=outfreq sparse;   run;   proc corresp freqout;   tables a, b;   weight count;   run;  

If you specify a VAR statement, you can specify the WEIGHT statement to indicate supplementary observations and to weight some rows of the table more heavily than others. When the value of the WEIGHT variable is negative, the observation is treated as supplementary, and the absolute value of the weight is used as the weighting value.

You cannot specify a WEIGHT statement with a VAR statement and the MCA option, because the table must be symmetric. Supplementary variables are indicated with the SUPPLEMENTARY statement, so differential weighting of rows is inappropriate.




SAS.STAT 9.1 Users Guide (Vol. 2)
SAS/STAT 9.1 Users Guide Volume 2 only
ISBN: B003ZVJDOK
EAN: N/A
Year: 2004
Pages: 92

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net