Syntax


The following statements are available in PROC CATMOD.

  • PROC CATMOD < options > ;

    • DIRECT < variables > ;

    • MODEL response-effect=design-effects < / options > ;

    • CONTRAST 'label' row-description <, , row-description >

      • < / options > ;

    • BY variables ;

    • FACTORS factor-description <, ,factor-description >

      • < / options > ;

    • LOGLIN effects ;

    • POPULATION variables ;

    • REPEATED factor-description <, ,factor-description >

      • < / options > ;

    • RESPONSE function <, ,function >< / options > ;

    • RESTRICT parameter=value < parameter=value > ;

    • WEIGHT variable ;

You can use all of the statements in PROC CATMOD interactively. The first RUN statement executes all of the previous statements. Any subsequent RUN statement executes only those statements that appear between the previous RUN statement and the current one. However, if you specify a BY statement, interactive processing is disabled. That is, all statements through the following RUN statement are processed for each BY group in the data set, but no additional statements are accepted by the procedure.

If more than one CONTRAST statement appears between two RUN statements, all the CONTRAST statements are processed. If more than one RESPONSE statement appears between two RUN statements, then analyses associated with each RESPONSE statement are produced. For all other statements, there can be only one occurrence of the statement between any two RUN statements. For example, if there are two LOGLIN statements between two RUN statements, the first LOGLIN statement is ignored.

The PROC CATMOD and MODEL statements are required. If specified, the DIRECT statement must precede the MODEL statement. As a result, if you use the DIRECT statement interactively, you need to specify a MODEL statement in the same RUN group. See the section 'DIRECT Statement' on page 835 for an example.

The CONTRAST statements, if any, must follow the MODEL statement.

You can specify only one of the LOGLIN, REPEATED, and FACTORS statements between any two RUN statements, because they all specify the same information: how to partition the variation among the response functions within a population.

A QUIT statement executes any statements that have not been processed and then ends the CATMOD procedure.

The purpose of each statement, other than the PROC CATMOD statement, are summarized in the following list:

BY

determines groups in which data are to be processed separately.

CONTRAST

specifies a hypothesis to test.

DIRECT

specifies independent variables that are to be treated quantitatively (like continuous variables) rather than qualitatively (like class or discrete variables). These variables also help to determine the rows of the contingency table and distinguish response functions in one population from those in other populations.

FACTORS

specifies (1) the factors that distinguish response functions from others in the same population and (2) model effects, based on these factors, which help to determine the design matrix.

LOGLIN

specifies log-linear model effects.

MODEL

specifies (1) dependent variables, which determine the columns of the contingency table, (2) independent variables, which distinguish response functions in one population from those in other populations, and (3) model effects, which determine the design matrix and the way in which total variation among the response functions is partitioned.

POPULATION

specifies variables which determine the rows of the contingency table and distinguish response functions in one population from those in other populations.

REPEATED

specifies (1) the repeated measurement factors that distinguish response functions from others in the same population and (2) model effects, based on these factors, which help to determine the design matrix.

RESPONSE

determines the response functions that are to be modeled .

RESTRICT

restricts values of parameters to the values you specify.

WEIGHT

specifies a variable containing frequency counts.

PROC CATMOD Statement

  • PROC CATMOD < options > ;

The PROC CATMOD statement invokes the procedure. You can specify the following options.

DATA= SAS-data-set

  • names the SAS data set containing the data to be analyzed . By default, the CATMOD procedure uses the most recently created SAS data set. For details, see the section 'Input Data Sets' on page 860.

NAMELEN= n

  • specifies the length of effect names in tables and output data sets to be n characters long, where n is a value between 24 and 200 characters. The default length is 24 characters.

NOPRINT

  • suppresses the normal display of results. The NOPRINT option is useful when you only want to create output data sets with the OUT= or OUTEST= optioninthe RESPONSE statement. A NOPRINT option is also available in the MODEL statement. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, 'Using the Output Delivery System,' for more information.

ORDER=DATA FORMATTED FREQ INTERNAL

  • specifies the sorting order for the levels of classification variables. This affects the ordering of the populations, responses, and parameters, as well as the definitions of the parameters. The default, ORDER=INTERNAL, orders the variable levels by their unformatted values (for example, numeric order or alphabetical order).

  • The following table shows how PROC CATMOD interprets values of the ORDER= option.

Value of ORDER=

Levels Sorted By

DATA

order of appearance in the input data set

FORMATTED

external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

FREQ

descending frequency count; levels with the most observations come first in the order

INTERNAL

unformatted value

By default, ORDER=INTERNAL. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. See the section 'Ordering of Populations and Responses' on page 863 for more information and examples. For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts .

BY Statement

  • BY variables ;

You can specify a BY statement with PROC CATMOD to obtain separate analyses of groups determined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the CATMOD procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure (in base SAS software).

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

When you specify a BY statement with PROC CATMOD, no further interactive processing is possible. In other words, once the BY statement appears, all statements up to the associated RUN statement are executed for each BY group in the data set. After the RUN statement, no further statements are accepted by the procedure.

CONTRAST Statement

  • CONTRAST 'label' row-description < , , row-description >< / options > ;

where a row-description is

  • @ n > effect values <...<@ n > effect values >

The CONTRAST statement constructs and tests linear functions of the parameters in the MODEL statement or effects listed in the LOGLIN statement. Each set of effects (separated by commas) specifies one row or set of rows of the matrix C that PROC CATMOD uses to test the hypothesis C ² = .

CONTRAST statements must be preceded by the MODEL statement, and by the LOGLIN statement, if one is used. You can specify the following terms in the CONTRAST statement.

' label '

specifies up to 256 characters of identifying information displayed with the test. The ' label ' is required.

effect

is one of the effects specified in the MODEL or LOGLIN statement, INTERCEPT (for the intercept parameter), or ALL_PARMS (for the complete set of parameters).

The ALL_PARMS option is regarded as an effect with the same number of parameters as the number of columns in the design matrix. This is particularly useful when the design matrix is input directly, as in the following example:

  model y=(1000,   1010,   1100,   1111);   contrast 'Main Effect of B' all_parms0100;   contrast 'Main Effect of C' all_parms0010;   contrast 'B*C Interaction ' all_parms0001;  

values

are numbers that form the coefficients of the parameters associated with the given effect. If there are fewer values than parameters for an effect, the remaining coefficients become zero. For example, if you specify two values and the effect actually has five parameters, the final three are set to zero.

@ n

points to the parameters in the n th set when the model has a separate set of parameters for each of the response functions. The @ n notation is seldom needed. It enables you to test the variation among response functions in the same population. However, it is usually easier to model and test such variation by using the _RESPONSE_ effect in the MODEL statement or by using the ALL_PARMS designation. Usually, contrasts are performed with respect to all of the response functions, and this is what the CONTRAST statement does by default (in this case, do not use the @ n notation).

 

For example, if there are three response functions per population, then

  contrast 'Level 1 vs. Level 2'A 1   1 0;  
 

results in a three-degree-of-freedom test comparing the first two levels of A simultaneously on the three response functions.

 

If, however, you want to specify a contrast with respect to the parameters in the n th set only, then use a single @ n in a row-description . For example, to test that the first parameter of A and the first parameter of B are zero in the third response function, specify

  contrast 'A=0, B=0, Function 3' @3 A 1 B 1;  
 

To specify a contrast with respect to parameters in two or more different sets of effects, use @ n with each effect. For example,

  contrast 'Average over Functions' @1 A 1 0   1   @2 A 1 1   2;  
 

When the model does not have a separate set of parameters for each of the response functions, the @ n notation is invalid. This type of model is called AVERAGED. For details, see the description of the AVERAGED option on page 842 and the 'Generation of the Design Matrix' section on page 876.

You can specify the following options in the CONTRAST statement after a slash.

ALPHA= value

  • specifies the significance level of the confidence interval for each contrast when the ESTIMATE= option is specified. The default is ALPHA=0.05, resulting in a 95% confidence interval for each contrast.

ESTIMATE= keyword

EST= keyword

  • requests that each individual contrast (that is, each row, c i ² , of C ² ) or exponentiated contrast (exp( c i ² )) be estimated and tested . PROC CATMOD displays the point estimate, its standard error, a Wald confidence interval, and a Wald chi-square test for each contrast. The significance level of the confidence interval is controlled by the ALPHA= option.

  • You can estimate the contrast or the exponentiated contrast, or both, by specifying one of the following keywords:

PARM

specifies that the contrast itself be estimated.

EXP

specifies that the exponentiated contrast be estimated.

BOTH

specifies that both the contrast and the exponentiated contrast be estimated.

Specifying Contrasts

PROC CATMOD is parameterized differently than PROC GLM, so you must be careful not to use the same contrasts that you would with PROC GLM. Since PROC CATMOD uses a full-rank parameterization, all estimable parameters are directly estimable without involving other parameters.

For example, suppose a class variable A has four levels. Then there are four parameters ( ± 1 , ± 2 , ± 3 , ± 4 ), of which PROC CATMOD uses only the first three. The fourth parameter is related to the others by the equation

click to expand

To test the first versus the fourth level of A , you would test ± 1 = ± 4 , which is

click to expand

or, equivalently,

click to expand

Therefore, you would use the following CONTRAST statement:

  contrast '1 vs. 4'A211;  

To contrast the third level with the average of the first two levels, you would test

or, equivalently,

click to expand

Therefore, you would use the following CONTRAST statement:

  contrast '1&2 vs. 3' A 1 1 -2;  

Other CONTRAST statements are constructed similarly; for example,

  contrast '1 vs. 2    '   A 1 -1 0;   contrast '1&2 vs. 4  '   A 3  3 2;   contrast '1&2 vs. 3&4'   A 2  2 0;   contrast 'Main Effect'   A 1  0 0,   A 0  1 0,   A 0  0 1;  

The actual form of the C matrix depends on the effects in the model. The following examples assume a single response function for each population.

  proc catmod;   model y=a;   contrast '1 vs. 4' A 2 1 1;   run;  

The C matrix for the preceding statements is

since the first parameter corresponds to the intercept.

But if there is a variable B with three levels and you use the following statements,

  proc catmod;   model y=b a;   contrast '1 vs. 4' A 2 1 1;   run;  

then the CONTRAST statement induces the C matrix

since the first parameter corresponds to the intercept and the next two correspond to the B main effect.

You can also use the CONTRAST statement to test the joint effect of two or more effects in the MODEL statement. For example, the joint effect of A and B in the previous model has five degrees of freedom and is obtained by specifying

  contrast 'Joint Effect of A&B' A 1 0 0,   A 0 1 0,   A 0 0 1,   B 1 0,   B 0 1;  

The ordering of variable levels is determined by the ORDER= option in the PROC CATMOD statement. Whenever you specify a contrast that depends on the order of the variable levels, you should verify the order from the 'Population Profiles' table, the 'Response Profiles' table, or the 'One-Way Frequencies' table.

DIRECT Statement

  • DIRECT variables ;

The DIRECT statement lists numeric independent variables to be treated in a quantitative, rather than qualitative, way. The DIRECT statement is useful for logistic regression, which is described in the 'Logistic Regression' section on page 869. For limitations of models involving continuous variables, see the 'Continuous Variables' section on page 870.

If a DIRECT variable is formatted, then the unformatted (internal) values are used in the analysis and the formatted values are displayed. CAUTION: If you use a format to group the internal values into one formatted value, then the first internal value is used in the analysis.

If specified, the DIRECT statement must precede the MODEL statement. For example,

  proc catmod;   direct X;   model Y=X;   run;  

Suppose X has five levels. Then the main effect X induces only one column in the design matrix, rather than four. The values inserted into the design matrix are the actual values of X .

You can interactively change the variables declared as DIRECT variables by using the statement without listing any variables. The following statements are valid:

  proc catmod;   direct X;   model Y=X;   weight wt;   run;   direct;   model Y=X;   run;  

The first MODEL statement uses the actual values of X , and the second MODEL statement uses the four variables created when PROC CATMOD generates the design matrix. Note that the preceding statements can be run without a WEIGHT statement if the input data are raw data rather than cell counts.

For more details, see the discussions of main and direct effects in the section 'Generation of the Design Matrix' on page 876 .

FACTORS Statement

  • FACTORS factor-description <, ,factor-description >< / options > ;

where a factor-description is

  • factor- name < $ >< levels >

and factor-description s are separated from each other by a comma. The $ is required for character-valued factors. The value of levels provides the number of levels of the factor identified by a given factor-name . For only one factor, levels is optional; for two or more factors, it is required.

The FACTORS statement identifies factors that distinguish response functions from others in the same population. It also specifies how those factors are incorporated into the model. You can use the FACTORS statement whenever there is more than one response function per population and the keyword _RESPONSE_ is specified in the MODEL statement. You can specify the name, type, and number of levels of each factor and the identification of each level.

The FACTORS statement is most useful when the response functions and their covariance matrix are read directly from the input data set. In this case, PROC CATMOD reads the response functions as though they are from one population (this poses no problem in the multiple-population case because the appropriately constructed covariance matrix is also read directly). Thus, you can use the FACTORS statement to partition the variation among the response functions into appropriate sources, even when the functions actually represent separate populations.

The format of the FACTORS statement is identical to that of the REPEATED statement. In fact, repeated measurement factors are simply special cases of factors in which some of the response functions correspond to multiple dependent variables that are measurements on the same experimental (or sampling) units.

You cannot specify the FACTORS statement for an analysis that also contains the REPEATED or LOGLIN statement since all of them specify the same information: how to partition the variation among the response functions within a population.

In the FACTORS statement,

factor-name

names a factor that corresponds to two or more response functions. This name must be a valid SAS variable name, and it should not be the same as the name of a variable that already exists in the data set being analyzed.

$

indicates that the factor is character-valued. If the $ is omitted, then PROC CATMOD assumes that the factor is numeric. The type of the factor is relevant only when you use the PROFILE= option or when the _RESPONSE_= option (described later in this section) specifies nested- by-value effects.

levels

specifies the number of levels of the corresponding factor. If there is only one such factor, and the number is omitted, then PROC CATMOD assumes that the number of levels is equal to the number of response functions per population ( q ). Unless you specify the PROFILE= option, the number q must either be equal to or be a multiple of the product of the number of levels of all the factors.

You can specify the following options in the FACTORS statement after a slash.

PROFILE=( matrix )

  • specifies the values assumed by the factors for each response function. There should be one column for each factor, and the values in a given column (character or numeric) should match the type of the corresponding factor. Character values are restricted to 16 characters or less. If there are q response functions per population, then the matrix must have i rows, where q must either be equal to or be a multiple of i . Adjacent rows of the matrix should be separated by a comma.

  • The values in the PROFILE matrix are useful for specifying models in those situations where the study design is not a full factorial with respect to the factors. They can also be used to specify nested-by-value effects in the _RESPONSE_= option. If you specify character values in both places (the PROFILE= option and the _RESPONSE_= option), then the values must match with respect to whether or not they are enclosed in quotes (that is, enclosed in quotes in both places or in neither place).

  • For an example of using the PROFILE= option, see Example 22.10 on page 944.

_RESPONSE_= effects

  • specifies design effects. The variables named in the effects must be factor-names that appear in the FACTORS statement. If the _RESPONSE_= option is omitted, then PROC CATMOD builds a full factorial _RESPONSE_ effect with respect to the factors.

TITLE= ' title '

  • displays the title at the top of certain pages of output that correspond to the current FACTORS statement.

  • For an example of how the FACTORS statement is useful, consider the case where the response functions and their covariance matrix are read directly from the input data set. The TYPE=EST data set might be created in the following manner:

      data direct(type=est);   input b1-b4 _type_ $ _name_ .;   datalines;   0.590463   0.384720   0.273269  0.136458 parms .   0.001690   0.000911   0.000474  0.000432 cov   b1   0.000911   0.001823   0.000031  0.000102 cov   b2   0.000474   0.000031   0.001056  0.000477 cov   b3   0.000432   0.000102   0.000477  0.000396 cov   b4   ;  
  • Suppose the response functions correspond to four populations that represent the cross-classification of age (two groups) by sex. You can use the FACTORS statement to identify these two factors and to name the effects in the model. The statements required to fit a main-effects model to these data are

      proc catmod data=direct;   response read b1-b4;   model _f_=_response_;   factors age 2, sex 2 / _response_=age sex;   run;  

    If you want to specify some nested-by-value effects, you can change the FACTORS statement to

      factors age $ 2, sex $ 2 /   _response_=age sex(age='under 30') sex(age='30 & over')   profile=('under 30'   male,   'under 30'   female,   '30 & over'  male,   '30 & over'  female);  

    If, by design or by chance, the study contains no male subjects under 30 years of age, then there are only three response functions, and you can specify a main-effects model as

      proc catmod data=direct;   response read b2-b4;   model _f_=_response_;   factors age $ 2, sex $ 2 / _response_=age sex   profile=('under 30'   female,   '30 & over'  male,   '30 & over'  female);   run;  

    When you specify two or more factors and omit the PROFILE= option, PROC CATMOD presumes that the response functions are ordered so that the levels of the rightmost factor change most rapidly . For the preceding example, the order implied by the FACTORS statement is as follows .

    Response Function

    Dependent Variable

    Age

    Sex

    1

    b1

    1

    1

    2

    b2

    1

    2

    3

    b3

    2

    1

    4

    b4

    2

    2

    For additional examples of how to use the FACTORS statement, see the section 'Repeated Measures Analysis' on page 873. All of the examples in that section are applicable , with the REPEATED statement replaced by the FACTORS statement.

LOGLIN Statement

  • LOGLIN effects < / option > ;

The LOGLIN statement is used to define log-linear model effects. It can be used whenever the default response functions (generalized logits) are used.

In the LOGLIN statement, effects are design effects that contain dependent variables in the MODEL statement, including interaction, nested, and nested-by-value effects. You can use the bar () and at (@) operators as well. The following lists of effects are equivalent:

   a b c a*b a*c b*c   

and

   abc @2   

When you use the LOGLIN statement, the keyword _RESPONSE_ should be specified in the MODEL statement. For further information on log-linear model analysis, see the 'Log-Linear Model Analysis' section on page 870.

You cannot specify the LOGLIN statement for an analysis that also contains the REPEATED or FACTORS statement since all of them specify the same information: how to partition the variation among the response functions within a population. You can specify the following option in the LOGLIN statement after a slash.

TITLE= ' title '

  • displays the title at the top of certain pages of output that correspond to this LOGLIN statement.

  • The following statements give an example of how to use the LOGLIN statement.

      proc catmod;   model a*b*c=_response_;   loglin abc @ 2;   run;  
  • These statements yield a log-linear model analysis that contains all main effects and two-variable interactions. For more examples of log-linear model analysis, see the 'Log-Linear Model Analysis' section on page 870.

MODEL Statement

  • MODEL response-effect= < design-effects >< / options > ;

PROC CATMOD requires a MODEL statement. You can specify the following in a MODEL statement:

response-effect

can be either a single variable, a crossed effect with two or more variables joined by asterisks , or _F_.The_F_ specification indicates that the response functions and their estimated covariance matrix are to be read directly into the procedure (see the 'Inputting Response Functions and Covariances Directly' section on page 862 for details). The response-effect indicates the dependent variables that determine the response categories (the columns of the underlying contingency table).

design-effects

specify potential sources of variation (such as main effects and interactions) in the model. Thus, these effects determine the number of model parameters, as well as the interpretation of such parameters. In addition, if there is no POPULATION statement, PROC CATMOD uses these variables to determine the populations (the rows of the underlying contingency table). When fitting the model, PROC CATMOD adjusts the independent effects in the model for all other independent effects in the model.

Design-effects can be any of those described in the section 'Specification of Effects' on page 864, or they can be defined by specifying the actual design matrix, enclosed in parentheses (see the 'Specifying the Design Matrix Directly' section on page 847). In addition, you can use the keyword _RESPONSE_ alone or as part of an effect. Effects cannot be nested within _RESPONSE_, so effects of the form A (_RESPONSE_) are invalid.

For more information, see the 'Log-Linear Model Analysis' sec-tion on page 870 and the 'Repeated Measures Analysis' section on page 873.

Some examples of MODEL statements are

   model r=a b;   main effects only   model r=a b a*b;   main effects with interaction   model r=a b(a);   nested effect   model r=ab;   complete factorial   model r=a b(a=1) b(a=2);   nested-by-value effects   model r*s=_response_;   log-linear model   model r*s=a _response_(a);   nested repeated measurement factor   model _f_=_response_;   direct input of the response functions 

The relationship between these specifications and the structure of the design matrix X is described in the 'Generation of the Design Matrix' section on page 876.

The following table summarizes the options available in the MODEL statement.

Task

Options

Specify details of computation

 

Generates maximum likelihood estimates

ML=

Generates weighted least-squares estimates

GLS

 

WLS

Omits intercept term from the model

NOINT

Specifies parameterization of classification variables

PARAM=

Adds a number to each cell frequency

ADDCELL=

Averages main effects across response functions

AVERAGED

Specifies the convergence criterion for maximum likelihood

EPSILON=

Specifies the number of iterations for maximum likelihood

MAXITER=

Specifies how missing cells are treated

MISSING=

Specifies how zero cells are treated

ZERO=

Request additional computation and tables

 

Significance level of confidence intervals

ALPHA=

Wald confidence intervals of estimates

CLPARM

Estimated correlation matrix of estimates

CORRB

Covariance matrix of response functions

COV

Estimated covariance matrix of estimates

COVB

Design and _RESPONSE_ matrix

DESIGN

Two-way frequency tables

FREQ

Iterations for maximum likelihood

ITPRINT

One-way frequency tables

ONEWAY

Predicted values

PRED=

 

PREDICT

Probability estimates

PROB

Population profiles

PROFILE

Crossproducts matrix

XPX

Title

TITLE=

Suppress output

 

Design matrix

NODESIGN

Parameter estimates

NOPARM

Variable levels

NOPREDVAR

Population and response profiles

NOPROFILE

_RESPONSE_ matrix

NORESPONSE

The following list describes these options in alphabetical order.

ADDCELL= number

  • adds number to the frequency count in each cell, where number is any positive number. This option has no effect on maximum likelihood analysis; it is used only for weighted least-squares analysis.

ALPHA= number

  • sets the significance level for the Wald confidence intervals for parameter estimates. The value must be between 0 and 1. The default value of 0.05 results in the calculation of a 95% confidence interval. This option has no effect unless the CLPARM option is also specified.

AVERAGED

  • specifies that dependent variable effects can be modeled and that independent variable main effects are averaged across the response functions in a population. For further information on the effect of using (or not using) the AVERAGED option, see the 'Generation of the Design Matrix' section on page 876. Direct input of the design matrix or specification of the _RESPONSE_ keyword in the MODEL statement automatically induces an AVERAGED model type.

CLPARM

  • produces Wald confidence limits for the parameter estimates. The confidence coefficient can be specified with the ALPHA= option.

CORRB

  • displays the estimated correlation matrix of the parameter estimates.

COV

  • displays S i , which is the covariance matrix of the response functions for each population.

COVB

  • displays the estimated covariance matrix of the parameter estimates.

DESIGN

  • displays the design matrix X for WLS and ML analyses, and also displays the _RESPONSE_ matrix for log-linear models. For further information, see the 'Generation of the Design Matrix' section on page 876.

EPSILON= number

  • specifies the convergence criterion for the maximum likelihood estimation of the parameters. The iterative estimation process stops when the proportional change in the log likelihood is less than number , or after the number of iterations specified by the MAXITER= option, whichever comes first. By default, EPSILON=1E ˆ’ 8.

FREQ

  • produces the two-way frequency table for the cross-classification of populations by responses.

ITPRINT

  • displays parameter estimates and other information at each iteration of a maximum likelihood analysis.

MAXITER= number

  • specifies the maximum number of iterations used for the maximum likelihood estimation of the parameters. By default, MAXITER=20.

ML < =NRIPF < ( ipf-options ) >>

  • computes maximum likelihood estimates (MLE) using either a Newton-Raphson algorithm (NR) or an iterative proportional fitting algorithm (IPF).

  • The option ML=NR (or simply ML) is available when you use generalized logits, and also when you perform binary logistic regression with logits, cumulative logits, or adjacent category logits. For generalized logits (the default response functions), ML=NR is the default estimation method.

  • The option ML=IPF is available for fitting a hierarchical log-linear model with one population (no independent variables and no population variables). The use of bar notation to express the log-linear effects guarantees that the model is hierarchical (the presence of any interaction term in the model requires the presence of all its lower-order terms). If your table is incomplete (that is, your table has a zero or missing entry in at least one cell), then all missing cells and all cells with zero weight are treated as structural zeros by default; this behavior can be modified with the ZERO= and MISSING= options in the MODEL statement.

  • You can control the convergence of the two algorithms with the EPSILON= and MAXITER= options in the MODEL statement. You can select the convergence criterion for the IPF algorithm with the CONVCRIT= option. Note: The RESTRICT statement is not available with the ML=IPF option.

  • You can specify the following ipf-options within parentheses after the ML=IPF option.

CONV= keyword

CONVCRIT= keyword

  • specifies the method that determines when convergence of the IPF algorithm occurs. You can specify one of the following keywords :

CELL

termination requires the maximum absolute difference between consecutive cell estimates to be less than 0.001 (or the value of the EPSILON= option, if specified).

LOGL

termination requires the relative difference between consecutive estimates of the log-likelihood to be less than 1E-8 (or the value of the EPSILON= option, if specified). This is the default.

MARGIN

termination requires the maximum absolute difference between consecutive margin estimates to be less than 0.001 (or the value of the EPSILON= option, if specified).

DF= keyword

  • specifies the method used to compute the degrees of freedom for the goodness of fit G 2 test (labeled 'Likelihood Ratio' in the 'Estimates' table).

  • For a complete table (a table having nonzero entries in every cell), the degrees of freedom are calculated as the number of cells in the table ( n c ) minus the number of independent parameters specified in the model ( n p ). For incomplete tables, these degrees of freedom may be adjusted by the number of fitted zeros ( n z , which includes the number of structural zeros) and the number of nonestimable parameters due to the zeros ( n n ). If you are analyzing an incomplete table, you should verify that the degrees of freedom are correct.

  • You can specify one of the following keywords :

    UNADJ

    computes the unadjusted degrees of freedom as n c ˆ’ n p . These are the same degrees of freedom you would get if all cells in the table were positive.

    ADJ

    computes the degrees of freedom as ( n c ˆ’ n p ) ˆ’ ( n z ˆ’ n n ) (Bishop, Fienberg, and Holland 1975), which adjusts for fitted zeros and nonestimable parameters. This is the default, and for complete tables gives the same results as the UNADJ option.

    ADJEST

    computes the degrees of freedom as ( n c ˆ’ n p ) ˆ’ n z , which adjusts for fitted zeros only. This gives a lower bound on the true degrees of freedom.

PARM

  • computes parameter estimates, generates the 'ANOVA,' 'Parameter Estimates,' and 'Predicted Values of Response Functions' tables, and includes the predicted standard errors in the 'Predicted Values of Frequencies and Probabilities' tables.

  • When you specify the PARM option, the algorithm used to obtain the maximum likelihood parameter estimates is weighted least squares on the IPF-predicted frequencies. This algorithm can be much faster than the NewtonRaphson algorithm used if you just specify the ML=NR option. In the resulting ANOVA table, the likelihood ratio is computed from the initial IPF fit while the degrees of freedom are generated from the WLS analysis; the DF= option can override this. Also, the initial response function, which the WLS method usually computes from the raw data, is computed from the IPF fitted frequencies.

  • If there are any zero marginals in the configurations that define the model, then there are predicted cell frequencies of zero and WLS cannot be used to compute the estimates. In this case, PROC CATMOD automatically changes the algorithm from ML=IPF to ML=NR and prints a note in the log.

MISSING= keyword

MISS = keyword

  • specifies whether a missing cell is treated as a sampling or structural zero.

  • Structural zero cells are removed from the analysis since their expected values are zero, while sampling zero cells may have nonzero expected value and may be estimable. For a single population, the missing cells are treated as structural zeros by default. For multiple populations, as long as some population has a nonzero count for a given population and response profile, the missing values are treated as sampling zeros by default.

  • The following table displays the available keywords and summarizes how PROC CATMOD treats missing values for one or more populations.

    MISSING=

    One Population

    Multiple Populations

    STRUCTURAL (default)

    structural zeros

    sampling zeros

    SAMPSAMPLING

    sampling zeros

    sampling zeros

    value

    sets missing weights and cells to value

    sets missing weights and cells to value

NODESIGN

  • suppresses the display of the design matrix X when the DESIGN option is also specified. This enables you to display only the _RESPONSE_ matrix for log-linear models.

NOINT

  • suppresses the intercept term in the model.

NOITER

  • suppresses the display of parameter estimates and other information at each iteration of a maximum likelihood analysis.

NOPARM

  • suppresses the display of the estimated parameters and the statistics for testing that each parameter is zero.

NOPREDVAR

  • suppresses the display of the variable levels in tables requested with the PRED= option and in the 'Estimates' table. Population profiles are replaced with the sample number, class variable levels are suppressed, and response profiles are replaced with a function number.

NOPRINT

  • suppresses the normal display of results. The NOPRINT option is useful when you only want to create output data sets with the OUT= or OUTEST= optioninthe RESPONSE statement. A NOPRINT option is also available in the PROC CATMOD statement. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, 'Using the Output Delivery System,' for more information.

NOPROFILE

  • suppresses the display of the population profiles and the response profiles.

NORESPONSE

  • suppresses the display of the _RESPONSE_ matrix for log-linear models when the DESIGN option is also specified. This enables you to display only the design matrix for log-linear models.

ONEWAY

  • produces a one-way table of frequencies for each variable used in the analysis. This table is useful in determining the order of the observed levels for each variable.

PARAM = EFFECT REFERENCE

  • specifies the parameterization method for the classification variable or variables. The default is PARAM=EFFECT. Both the effect and reference parameterizations are full rank. See the 'Generation of the Design Matrix' section on page 876 for further details.

PREDICT

PRED=FREQ PROB

  • displays the observed and predicted values of the response functions for each population, together with their standard errors and the residuals (observed - predicted). In addition, if the response functions are the standard ones (generalized logits), then the PRED=FREQ option specifies the computation and display of predicted cell frequencies, while PRED=PROB (or just PREDICT) specifies the computation and display of predicted cell probabilities.

  • The OUT= data set always contains the predicted probabilities. If the response functions are the generalized logits, the predicted cell probabilities are output unless the option PRED=FREQ is specified, in which case the predicted cell frequencies are output.

PROB

  • produces the two-way table of probability estimates for the cross-classification of populations by responses. These estimates sum to one across the response categories for each population.

PROFILE

  • displays all of the population profiles. If you have more than 60 populations, then by default only the first 40 profiles are displayed; the PROFILE option overrides this default behavior.

TITLE=' title '

  • displays the title at the top of certain pages of output that correspond to this MODEL statement.

WLS

GLS

  • computes weighted least-squares estimates. This type of estimation is also called generalized-least-squares estimation. For response functions other than the default (of generalized logits), WLS is the default estimation method.

XPX

  • displays X ² S ˆ’ 1 X , the crossproducts matrix for the normal equations.

ZERO= keyword

ZEROS= keyword

ZEROES= keyword

  • specifies whether a non-missing cell with zero weight in the data set is treated as a sampling or structural zero.

  • Structural zero cells are removed from the analysis since their expected values are zero, while sampling zero cells have nonzero expected value and may be estimable. For a single population, the zero cells are treated as structural zeros by default; with multiple populations, as long as some population has a nonzero count for a given population and response profile, the zeros are treated as sampling zeros by default.

  • The following table displays the available keywords and summarizes how PROC CATMOD treats zeros for one or more populations.

    ZERO=

    One Population

    Multiple Populations

    STRUCTURAL (default)

    structural zeros

    sampling zeros

    SAMP SAMPLING

    sampling zeros

    sampling zeros

    value

    sets zero weights to value

    sets zero weights to value

Specifying the Design Matrix Directly

If you specify the design matrix directly, adjacent rows of the matrix must be separated by a comma, and the matrix must have q s rows, where s is the number of populations and q is the number of response functions per population. The first q rows correspond to the response functions for the first population, the second set of q rows corresponds to the functions for the second population, and so forth. The following is an example using direct specification of the design matrix.

  proc catmod;   model R=(1 0,   1 1,   1 2,   1 3);   run;  

These statements are appropriate for the case of one population and for R with five levels (generating four response functions), so that 4 — 1 = 4. These statements are also appropriate for a situation with two populations and two response functions per population; giving 2 — 2 = 4 rows of the design matrix. (To induce more than one population, the POPULATION statement is needed.)

When you input the design matrix directly, you also have the option of specifying that any subsets of the parameters be tested for equality to zero. Indicate each subset by specifying the appropriate column numbers of the design matrix, followed by an equal sign and a label (24 characters or less, in quotes) that describes the subset. Adjacent subsets are separated by a comma, and the entire specification is enclosed in parentheses and placed after the design matrix. For example,

  proc catmod;   population Group Time;   model R=(1  1  0  0,   1  1  0  1,   1  1  0  2,   1  0  1  0,   1  0  1  1,   1  0  1  2,   1 -1 -1  0,   1 -1 -1  1,   1 -1 -1 2) (1  ='Intercept',   2 3='Group main effect',   4  ='Linear effect of Time');   run;  

The preceding statements are appropriate when Group and Time each have three levels, and R is dichotomous. The POPULATION statement induces nine populations, and q =1(since R is dichotomous), so q s = 1 — 9 = 9.

If you input the design matrix directly but do not specify any subsets of the parameters to be tested, then PROC CATMOD tests the effect of MODEL MEAN, which represents the significance of the model beyond what is explained by an overall mean. For the previous example, the MODEL MEAN effect is the same as that obtained by specifying

  (2 3 4='modelmean');  

at the end of the MODEL statement.

POPULATION Statement

  • POPULATION variables ;

The POPULATION statement specifies that populations are to be based only on cross-classifications of the specified variables . If you do not specify the POPULATION statement, then populations are based only on cross-classifications of the independent variables in the MODEL statement.

The POPULATION statement has two major uses:

  • When you enter the design matrix directly, there are no independent variables in the MODEL statement; therefore, the POPULATION statement is the only way of inducing more than one population.

  • When you fit a reduced model, the POPULATION statement may be necessary if you want to form the same number of populations as there are for the saturated model.

To illustrate the first use, suppose that you specify the following statements:

  data one;   input A $ B $ wt @@;   datalines;   yes yes 23   yes no 31 no yes 47 no no 50   ;   proc catmod;   weight wt;   population B;   model A=(1 0,   1 1);   run;  

Since the dependent variable A has two levels, there is one response function per population. Since the variable B has two levels, there are two populations. Thus, the MODEL statement is valid since the number of rows in the design matrix (2) is the same as the total number of response functions. If the POPULATION statement is omitted, there would be only one population and one response function, and the MODEL statement would be invalid.

To illustrate the second use, suppose that you specify

  data two;   input A $ B $ Y wt @@;   datalines;   yes  yes  1  23       yes yes 2 63   yes  no   1  31       yes no  2 70   no   yes  1  47       no  yes 2 80   no   no   1  50       no  no  2 84   ;   proc catmod;   weight wt;   model Y=A B A*B / wls;   run;  

These statements form four populations and produce the following design matrix and analysis of variance table.

 

Source

DF

Chi-Square

Pr > ChiSq

click to expand

Intercept

1

48.10

<.0001

A

1

3.47

0.0625

B

1

0.25

0.6186

A*B

1

0.19

0.6638

Residual

 

Since the B and A * B effects are nonsignificant ( p> . 10), you may want to fitthe reduced model that contains only the A effect. If your new statements are

  proc catmod;   weight wt;   model Y=A / wls;   run;  

then only two populations are formed , and the design matrix and the analysis of variance table are as follows.

 

Source

DF

Chi-Square

Pr > ChiSq

Intercept

1

47.94

<.0001

A

1

3.33

0.0678

Residual

 

However, if the new statements are

  proc catmod;   weight wt;   population A B;   model Y=A / wls;   run;  

then four populations are formed, and the design matrix and the analysis of variance table are as follows.

 

Source

DF

Chi-Square

Pr > ChiSq

Intercept

1

47.76

<.0001

A

1

3.30

0.0694

Residual

2

0.35

0.8374

The advantage of the latter analysis is that it retains four populations for the reduced model, thereby creating a built-in goodness-of-fit test: the residual chi-square. Such a test is important because the cumulative (or joint) effect of deleting two or more effects from the model may be significant, even if the individual effects are not.

The resulting differences between the two analyses are due to the fact that the latter analysis uses pure weighted least-squares estimates with respect to the four populations that are actually sampled. The former analysis pools populations and therefore uses parameter estimates that can be regarded as weighted least-squares estimates of maximum likelihood predicted cell frequencies. In any case, the estimation methods are asymptotically equivalent; therefore, the results are very similar. If you specify the ML option (instead of the WLS option) in the MODEL statements, then the parameter estimates are identical for the two analyses.

CAUTION: if your model has different covariate profiles within any population, then the first profile is used in the analysis.

REPEATED Statement

  • REPEATED factor-description < , , factor-description >< / options > ;

where a factor-description is

  • factor-name < $ >< levels >

and factor-description s are separated from each other by a comma. The $ is required for character-valued factors. The value of levels provides the number of levels of the repeated measurement factor identified by a given factor-name . For only one repeated measurement factor, levels is optional; for two or more repeated measurement factors, it is required.

The REPEATED statement incorporates repeated measurement factors into the model. You can use this statement whenever there is more than one dependent variable and the keyword _RESPONSE_ is specified in the MODEL statement. If the dependent variables correspond to one or more repeated measurement factors, you can use the REPEATED statement to define _RESPONSE_ in terms of those factors. You can specify the name, type, and number of levels of each factor, as well as the identification of each level.

You cannot specify the REPEATED statement for an analysis that also contains the FACTORS or LOGLIN statement since all of them specify the same information: how to partition the variation among the response functions within a population.

In the REPEATED statement,

factor-name

names a repeated measurement factor that corresponds to two or more response functions. This name must be a valid SAS variable name, and it should not be the same as the name of a variable that already exists in the data set being analyzed.

$

indicates that the factor is character-valued. If the $ is omitted, then PROC CATMOD assumes that the factor is numeric. The type of the factor is relevant only when you use the PROFILE= option or when the _RESPONSE_= option specifies nested-by-value effects.

levels

specifies the number of levels of the corresponding repeated measurement factor. If there is only one such factor and the number is omitted, then PROC CATMOD assumes that the number of levels is equal to the number of response functions per population ( q ). Unless you specify the PROFILE= option, the number q must either be equal to or be a multiple of the product of the number of levels of all the factors.

You can specify the following options in the REPEATED statement after a slash.

PROFILE=( matrix )

  • specifies the values assumed by the factors for each response function. There should be one column for each factor, and the values in a given column should match the type (character or numeric) of the corresponding factor. Character values are restricted to 16 characters or less. If there are q response functions per population, then the matrix must have i rows, where q must either be equal to or be a multiple of i . Adjacent rows of the matrix should be separated by a comma.

  • The values in the PROFILE matrix are useful for specifying models in those situations where the study design is not a full factorial with respect to the factors. They can also be used to specify nested-with-value effects in the _RESPONSE_= option. If you specify character values in both the PROFILE= option and the _RESPONSE_= option, then the values must match with respect to whether or not they are enclosed in quotes (that is, enclosed in quotes in both places or in neither place).

_RESPONSE_= effects

  • specifies design effects. The variables named in the effects must be factor-names that appear in the REPEATED statement. If the _RESPONSE_= option is omitted, then PROC CATMOD builds a full factorial _RESPONSE_ effect with respect to the repeated measurement factors. For example, the following two statements are equivalent in that they produce the same parameter estimates.

      repeated Time 2, Treatment 2;   repeated Time 2, Treatment 2 / _response_=TimeTreatment;  
  • However, the second statement produces tests of the Time , Treatment ,and Time * Treatment effects in the 'Analysis of Variance' table, whereas the first statement produces a single test for the combined effects in _RESPONSE_.

TITLE= ' title '

  • displays the title at the top of certain pages of output that correspond to this REPEATED statement.

    For further information and numerous examples of the REPEATED statement, see the section 'Repeated Measures Analysis' on page 873.

RESPONSE Statement

  • RESPONSE < function >< / options > ;

The RESPONSE statement specifies functions of the response probabilities. The procedure models these response functions as linear combinations of the parameters.

By default, PROC CATMOD uses the standard response functions (generalized logits, which are explained in detail in the 'Understanding the Standard Response Functions' section on page 859). With these standard response functions, the default estimation method is maximum likelihood, but you can use the WLS option in the MODEL statement to request weighted least-squares estimation. With other response functions (specified in the RESPONSE statement), the default (and only) estimation method is weighted least squares.

You can specify more than one RESPONSE statement, in which case each RESPONSE statement produces a separate analysis. If the computed response functions for any population are linearly dependent (yielding a singular covariance matrix), then PROC CATMOD displays an error message and stops processing. See the 'Cautions' section on page 887 for methods of dealing with this.

The function specification can be any of the items in the following list. For an example of response functions generated and formulas for q (the number of response functions), see the 'More on Response Functions' section on page 854.

ALOGIT ALOGITS

specifies response functions as adjacent-category logits of the marginal probabilities for each of the dependent variables. For each dependent variable, the response functions are a set of linearly independent adjacent-category logits, obtained by taking the logarithms of the ratios of two probabilities. The denominator of the k th ratio is the marginal probability corresponding to the k th level of the variable, and the numerator is the marginal probability corresponding to the ( k + 1)th level. If a dependent variable has two levels, then the adjacent-category logit is the negative of the generalized logit.

CLOGIT CLOGITS

specifies that the response functions are cumulative logits of the marginal probabilities for each of the dependent variables. For each dependent variable, the response functions are a set of linearly independent cumulative logits, obtained by taking the logarithms of the ratios of two probabilities. The denominator of the k th ratio is the cumulative probability, c k , corresponding to the k th level of the variable, and the numerator is 1 - c k (Agresti 1984, 113-114). If a dependent variable has two levels, then PROC CATMOD computes its cumulative logit as the negative of its generalized logit. You should use cumulative logits only when the dependent variables are ordinally scaled.

JOINT

specifies that the response functions are the joint response probabilities. A linearly independent set is created by deleting the last response probability. For the case of one dependent variable, the JOINT and MARGINALS specifications are equivalent.

LOGIT LOGITS

specifies that the response functions are generalized logits of the marginal probabilities for each of the dependent variables. For each dependent variable, the response functions are a set of linearly independent generalized logits, obtained by taking the logarithms of the ratios of two probabilities. The denominator of each ratio is the marginal probability corresponding to the last observed level of the variable, and the numerators are the marginal probabilities corresponding to each of the other levels. If there is one dependent variable, then specifying LOGIT is equivalent to using the standard response functions.

MARGINAL MARGINALS

specifies that the response functions are marginal probabilities for each of the dependent variables in the MODEL statement. For each dependent variable, the response functions are a set of linearly independent marginals, obtained by deleting the marginal probability corresponding to the last level.

MEAN MEANS

specifies that the response functions are the means of the dependent variables in the MODEL statement. This specification requires that all of the dependent variables be numeric.

READ variables

specifies that the response functions and their covariance matrix are to be read directly from the input data set with one response function for each variable named. See the section 'Inputting Response Functions and Covariances Directly' on page 862 for more information.

transformation

specifies response functions that can be expressed by using successive applications of the four operations: LOG , EXP , * matrix literal, or + matrix literal. The operations are described in detail in the 'Using a Transformation to Specify Response Functions' section on page 856.

You can specify the following options in the RESPONSE statement after a slash.

OUT= SAS-data-set

  • produces a SAS data set that contains, for each population, the observed and predicted values of the response functions, their standard errors, and the residuals. Moreover, if you use the standard response functions, the data set also includes observed and predicted values of the cell frequencies or the cell probabilities. For further information, see the 'Output Data Sets' section on page 866.

OUTEST= SAS-data-set

  • produces a SAS data set that contains the estimated parameter vector and its estimated covariance matrix. For further information, see the 'Output Data Sets' section on page 866.

TITLE= ' title'

  • displays the title at the top of certain pages of output that correspond to this RESPONSE statement.

More on Response Functions

Suppose the dependent variable A has 3 levels and is the only response-effect in the MODEL statement. The following table shows the proportions upon which the response functions are defined.

Value of A :

1

2

3

proportions:

p 1

p 2

p 3

Note that ˆ‘ j p j = 1. The following table shows the response functions generated for each population.

Function Specification

Value of q

Response Function

none [*]

2

ALOGITS

2

CLOGITS

2

click to expand

JOINT

2

p 1 , p 2

LOGITS

2

MARGINAL

2

p 1 , p 2

MEAN

1

1 p 1 + 2 p 2 + 3 p 3

[*] Without a function specification, the default response functions are generalized logits.

Now, suppose the dependent variables A and B each have 3 levels (valued 1, 2, and 3 each) and the response-effect in the MODEL statement is A * B . The following table shows the proportions upon which the response functions are defined.

Value of A :

1

1

1

2

2

2

3

3

3

Value of B :

1

2

3

1

2

3

1

2

3

proportions:

p 1

p 2

p 3

p 4

p 5

p 6

p 7

p 8

p 9

The marginal totals for the preceding table are defined as follows,

click to expand

where ˆ‘ j p j =1. The following table shows the response functions generated for each population.

Function Specification

Value of q

Response Function

none [*]

8

click to expand

ALOGITS

4

click to expand

CLOGITS

4

click to expand

JOINT

8

p 1 , p 2 , p 3 , p 4 , p 5 , p 6 , p 7 , p 8

LOGITS

4

click to expand

MARGINAL

4

p 1 · , p 2 · , p ·1 , p ·2

MEAN

2

1 p 1 + 2 p 2 + 3 p 3. , 1 p ·1 + 2 p ·2 + 3 p ·3

[*] Without a function specification, the default response functions are generalized logits.

The READ and transformation function specifications are not shown in the preceding table. For these two situations, there is not a general response function; the response functions generated depend on what you specify.

Another important aspect of the function specification is the number of response functions generated per population, q . Let m i represent the number of levels for the i th dependent variable in the MODEL statement, and let d represent the number of dependent variables in the MODEL statement. Then, if the function specification is ALOGITS, CLOGITS, LOGITS, or MARGINALS, the number of response functions is

If the function specification is JOINT or the default (generalized logits), the number of response functions per population is

where r is the number of response profiles. If every possible cross-classification of the dependent variables is observed in the samples, then

Otherwise, r is the number of cross-classifications actually observed.

If the function specification is MEANS, the number of response functions per population is q = d .

Response Statement Examples

Some example response statements are shown in the following table.

Example

Result

response marginals;

marginals for each dependent variable

response means;

the mean of each dependent variable

response logits;

generalized logits of the marginal probabilities

response clogits;

cumulative logits of the marginal probabilities

response alogits;

adjacent-category logits of the marginal probabilities

response joint;

the joint probabilities

response 1 -1 log;

the logit

response;

generalized logits

response123;

the mean score, with scores of 1, 2, and 3 corresponding to the three response levels

response read b1-b4;

four response functions and their covariance matrix, read directly from the input data set

Using a Transformation to Specify Response Functions

If you specify a transformation , it is applied to the vector that contains the sample proportions in each population. The transformation can be any combination of the following four operations.

Operation

Specification

linear combination

* matrix literal matrix literal

logarithm

LOG

exponential

EXP

adding constant

+ matrix literal

If more than one operation is specified, then PROC CATMOD applies the operations consecutively from right to left.

A matrix literal is a matrix of numbers with each row of the matrix separated from the next by a comma. If you specify a linear combination, in most cases the * is not needed. The following statement defines the response function p 1 + 1. The * is needed to separate the two matrix literals '1' and '1 0'.

  response + 1 * 1 0;  

The LOG of a vector transforms each element of the vector into its natural logarithm; the EXP of a vector transforms each element into its exponential function (antilogarithm).

In order to specify a linear response function for data that have r = 3 response categories, you could specify either of the following RESPONSE statements:

  response  * 1 0 0 , 0 1 0;   response    1 0 0 , 0 1 0;  

The matrix literal in the preceding statements specifies a 2 —3 matrix, which is applied to each population as follows:

click to expand

where p 1 , p 2 , and p 3 are sample proportions for the three response categories in a population, and F 1 and F 2 are the two response functions computed for that population. This response function, therefore, sets F 1= p 1 and F 2= p 2 in each population.

As another example of the linear response function, suppose you have two dependent variables corresponding to two observers who evaluate the same subjects. If the observers grade on the same three-point scale and if all nine possible responses are observed, then the following RESPONSE statement would compute the probability that the observers agree on their assessments:

  response 1 0 0 0 1 0 0 0 1;  

This response function is then computed as

click to expand

where p ij denotes the probability that a subject gets a grade of i from the first observer and j from the second observer.

If the function is a compound function, requiring more than one operation to specify it, then the operations should be listed so that the first operation to be applied is on the right and the last operation to be applied is on the left. For example, if there are two response levels, the response function

  response 1   1 log;  

is equivalent to the matrix expression:

click to expand

which is the logit response function since p 2 = 1 ˆ’ p 1 when there are only two response levels.

Another example of a compound response function is

  response exp 1   1 * 1 0 0 1, 0 1 1 0 log;  

which is equivalent to the matrix expression

  F  =  EXP  (  A  *  B  *  LOG  (  P  )) 

where P is the vector of sample proportions for some population,

click to expand

If the four responses are based on two dependent variables, each with two levels, then the function can also be written as

which is the odds (crossproduct) ratio for a 2 — 2 table.

Understanding the Standard Response Functions

If no RESPONSE statement is specified, PROC CATMOD computes the standard response functions, which contrast the log of each response probability with the log of the probability for the last response category. If there are r response categories, then there are r ˆ’ 1 standard response functions. For example, if there are four response categories, using no RESPONSE statement is equivalent to specifying

  response 1 0 0   1,   0 1 0   1,   0 0 1   1  log;  

This results in three response functions:

click to expand

If there are only two response levels, the resulting response function would be a logit. Thus, the standard response functions are called generalized logits. They are useful in dealing with the log-linear model:

If C denotes the matrix in the preceding RESPONSE statement, then because of the restriction that the probabilities sum to 1, it follows that an equivalent model is

click to expand

But C * LOG ( P ) is simply the vector of standard response functions. Thus, fitting a log-linear model on the cell probabilities is equivalent to fitting a linear model on the generalized logits.

RESTRICT Statement

  • RESTRICT parameter=value <... parameter=value > ;

where parameter is the letter B followed by a number; for example, B3 specifies the third parameter in the model. The value is the value to which the parameter is restricted. The RESTRICT statement restricts values of parameters to the values you specify, so that the estimation of the remaining parameters is subject to these restrictions. Consider the following statement:

  restrict b1=1 b4=0 b6=0;  

This restricts the values of three parameters. The first parameter is set to 1, and the fourth and sixth parameters are set to zero.

The RESTRICT statement is interactive. A new RESTRICT statement replaces any previous ones. In addition, if you submit two or more MODEL, LOGLIN, FACTORS, or REPEATED statements, then the subsequent occurrences of these statements also delete the previous RESTRICT statement.

WEIGHT Statement

  • WEIGHT variable ;

You can use a WEIGHT statement to refer to a variable containing the cell frequencies, which need not be integers. The WEIGHT statement lets you use summary data sets containing a count variable. See the 'Input Data Sets' section on page 860 for further information concerning the WEIGHT statement.




SAS.STAT 9.1 Users Guide (Vol. 2)
SAS/STAT 9.1 Users Guide Volume 2 only
ISBN: B003ZVJDOK
EAN: N/A
Year: 2004
Pages: 92

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net