Syntax


The following statements are available in PROC LOGISTIC:

  • PROC LOGISTIC < options > ;

    • BY variables ;

    • CLASS variable < (v-options) >< variable < ( v-options ) > >

      • < / v-options > ;

    • CONTRAST label effect values < , effect values >< / options > ;

    • EXACT < label >< Intercept >< effects >< / options > ;

    • FREQ variable ;

    • MODEL events/trials = < effects >< / options > ;

    • MODEL variable < (variable “options) > = < effects >< / options > ;

    • OUTPUT < OUT= SAS-data-set >

      • < keyword= name keyword=name >< / option > ;

    • SCORE < options > ;

    • STRATA effects < / options > ;

    • < label: > TEST equation1 < , , < equationk >>< / option > ;

    • UNITS independent1=list1 < independentk=listk >< / option > ;

    • WEIGHT variable < / option > ;

The PROC LOGISTIC and MODEL statements are required; only one MODEL statement can be specified. The CLASS statement (if used) must precede the MODEL statement, and the CONTRAST, EXACT, and STRATA statements (if used) must follow the MODEL statement. The rest of this section provides detailed syntax information for each of the preceding statements, beginning with the PROC LOGISTIC statement. The remaining statements are covered in alphabetical order.

PROC LOGISTIC Statement

  • PROC LOGISTIC < options > ;

The PROC LOGISTIC statement starts the LOGISTIC procedure and optionally identifies input and output data sets and suppresses the display of results.

ALPHA = ±

  • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value ± must be between 0 and 1; the default value is 0.05, which results in 95% intervals. This value is used as the default confidence level for limits computed by the following options.

    Statement

    Options

    CONTRAST

    ESTIMATE=

    EXACT

    ESTIMATE=

    MODEL

    CLODDS= CLPARM=

    OUTPUT

    UCL= LCL=

    SCORE

    CLM

    You can override the default in each of these cases by specifying the ALPHA= option for each statement individually.

COVOUT

  • adds the estimated covariance matrix to the OUTEST= data set. For the COVOUT option to have an effect, the OUTEST= option must be specified. See the section OUTEST= Output Data Set on page 2374 for more information.

DATA= SAS-data-set

  • names the SAS data set containing the data to be analyzed . If you omit the DATA= option, the procedure uses the most recently created SAS data set. The INMODEL= option cannot be specified with this option.

DESCENDING

DESC

  • reverses the sorting order for the levels of the response variable. If both the DESCENDING and ORDER= options are specified, PROC LOGISTIC orders the levels according to the ORDER= option and then reverses that order. This option has the same effect as the response variable option DESCENDING in the MODEL statement. See the Response Level Ordering section on page 2329 for more detail.

EXACTONLY

  • requests only the exact analyses. The asymptotic analysis that PROC LOGISTIC usually performs is suppressed.

EXACTOPTIONS( options )

  • specifies options that apply to every EXACT statement in the program. The following options are available:

  • ADDTOBS adds the observed sufficient statistic to the sampled exact distribution if the statistic was not sampled. This option has no effect unless the METHOD=NETWORKMC option is specified and the ESTIMATE option is specified in the EXACT statement. If the observed statistic has not been sampled, then the parameter estimate does not exist; by specifying this option, you can produce ( biased ) estimates.

  • MAXTIME= seconds specifies the maximum clock time (in seconds) that PROC LOGISTIC can use to calculate the exact distributions. If the limit is exceeded, the procedure halts all computations and prints a note to the LOG. The default maximum clock time is seven days.

  • METHOD= keyword specifies which exact conditional algorithm to use for every EXACT statement specified. You can specify one of the following keywords :

    • DIRECT invokes the multivariate shift algorithm of Hirji, Mehta, and Patel (1987). This method directly builds the exact distribution, but it may require an excessive amount of memory in its intermediate stages. METHOD=DIRECT is invoked by default when you are conditioning out at most the intercept, or when the LINK=GLOGIT option is specified in the MODEL statement.

    • NETWORK invokes an algorithm similar to that described in Mehta, Patel, and Senchaudhuri (1992). This method builds a network for each parameter that you are conditioning out, combines the networks, then uses the multivariate shift algorithm to create the exact distribution. The NETWORK method can be faster and require less memory than the DIRECT method. The NETWORK method is invoked by default for most analyses.

    • NETWORKMC invokes the hybrid network and Monte Carlo algorithm of Mehta, Patel, and Senchaudhuri (2000). This method creates a network then samples from that network; this method does not reject any of the samples at the cost of using a large amount of memory to create the network. METHOD=NETWORKMC is most useful for producing parameter estimates for problems that are too large for the DIRECT and NETWORK methods to handle and for which asymptotic methods are invalid; for example, for sparse data on a large grid.

  • N= n specifies the number of Monte Carlo samples to take when METHOD=NETWORKMC. By default n =10 , 000. If the procedure cannot obtain n samples due to a lack of memory, then a note is printed in the LOG (the number of valid samples is also reported in the listing) and the analysis continues.

    Note that the number of samples used to produce any particular statistic may be smaller than n . For example, let X 1 and X 2 be continuous variables, denote their joint distribution by f ( X 1 , X 2), and let f ( X 1 X 2 = x 2) denote the marginal distribution of X 1 conditioned on the observed value of X 2. If you request the JOINT test of X 1 and X 2, then n samples are used to generate the estimate ( X 1 , X 2) of f ( X 1 , X 2), from which the test is computed. However, the parameter estimate for X 1 is computed from the subset of ( X 1 , X 2) having X 2 = x 2, and this subset need not contain n samples. Similarly, the distribution for each level of a classification variable is created by extracting the appropriate subset from the joint distribution for the CLASS variable. The sample sizes used to compute the statistics are written to the ODS OUTPUT data set of the tables.

    In some cases, the marginal sample size may be too small to admit accurate estimation of a particular statistic; a note is printed in the LOG when a marginal sample size is less than 100. Increasing n will increase the number of samples used in a marginal distribution; however, if you want to control the sample size exactly, you can:

    • Remove the JOINT option from the EXACT statement.

    • Create dummy variables in a DATA step to represent the levels of a CLASS variable, and specify them as independent variables in the MODEL statement.

  • ONDISK uses disk-space instead of random access memory to build the exact conditional distribution. Use this option to handle larger problems at the cost of slower processing.

  • SEED= n specifies the initial seed for the random number generator used to take the Monte Carlo samples for METHOD=NETWORKMC. The value of the SEED= option must be an integer. If you do not specify a seed, or if you specify a value less than or equal to zero, then PROC LOGISTIC uses the time of day from the computer s clock to generate an initial seed. The seed is displayed in the Model Information table.

  • STATUSN= n prints a status line in the LOG after every n Monte Carlo samples for METHOD=NETWORKMC. The number of samples taken and the current exact p -value for testing the significance of the model are displayed. You can use this status line to track the progress of the computation of the exact conditional distributions.

  • STATUSTIME= seconds specifies the time interval (in seconds) for printing a status line in the LOG. You can use this status line to track the progress of the computation of the exact conditional distributions. The time interval you specify is approximate; the actual time interval will vary. By default, no status reports are produced.

INEST= SAS-data-set

  • names the SAS data set that contains initial estimates for all the parameters in the model. BY- group processing is allowed in setting up the INEST= data set. See the section INEST= Input Data Set on page 2376 for more information.

INMODEL= SAS-data-set

  • specifies the name of the SAS data set that contains the model information needed for scoring new data. This INMODEL= data set is the OUTMODEL= data set saved in a previous PROC LOGISTIC call. The DATA= option cannot be specified with this option; instead, specify the data sets to be scored in the SCORE statements.

  • When the INMODEL= data set is specified, FORMAT statements are not allowed; variables in the DATA= and PRIOR= data sets should be formatted within the data sets. If a SCORE statement is specified in the same run as fitting the model, FORMAT statements should be specified after the SCORE statement in order for the formats to apply to all the DATA= and PRIOR= data sets in the SCORE statement.

  • You can specify the BY statement provided the INMODEL= data set is created under the same BY-group processing.

  • The CLASS, EXACT, MODEL, OUTPUT, TEST, and UNIT statements are not available with the INMODEL= option.

NAMELEN= n

  • specifies the length of effect names in tables and output data sets to be n characters , where n is a value between 20 and 200. The default length is 20 characters.

NOCOV

  • specifies that the covariance matrix is not saved in the OUTMODEL= data set. The covariance matrix is needed for computing the confidence intervals for the posterior probabilities in the OUT= data set in the SCORE statement. Specifying this option will reduce the size of the OUTMODEL= data set.

NOPRINT

  • suppresses all displayed output. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, Using the Output Delivery System, for more information.

ORDER=DATA FORMATTED FREQ INTERNAL

RORDER=DATA FORMATTED INTERNAL

  • specifies the sorting order for the levels of the response variable. See the response variable option ORDER= in the MODEL statement for more information.

OUTDESIGN= SAS-data-set

  • specifies the name of the data set that contains design matrix for the model. The data set contains the same number of observations as the corresponding DATA= data set and includes the response variable (with the same format as in the input data), the FREQ variable, the WEIGHT variable, the OFFSET variable, and the design variables for the covariates, including the Intercept variable of constant value 1 unless the NOINT option in the MODEL statement is specified.

OUTDESIGNONLY

  • suppresses the model fitting and only creates the OUTDESIGN= data set. This option is ignored if the OUTDESIGN= option is not specified.

OUTEST= SAS-data-set

  • creates an output SAS data set that contains the final parameter estimates and, optionally, their estimated covariances (see the preceding COVOUT option). The output data set also includes a variable named _LNLIKE_ , which contains the log likelihood .

  • See the section OUTEST= Output Data Set on page 2374 for more information.

OUTMODEL= SAS-data-set

  • specifies the name of the SAS data set that contains the information about the fitted model. This data set contains sufficient information to score new data without having to refit the model. It is solely used as the input to the INMODEL= option in a subsequent PROC LOGISTIC call. Note: information is stored in this data set in a very compact form, hence you should not modify it manually.

SIMPLE

  • displays simple descriptive statistics (mean, standard deviation, minimum and maximum) for each continuous explanatory variable; and for each CLASS variable involved in the modeling, the frequency counts of the classification levels are displayed. The SIMPLE option generates a breakdown of the simple descriptive statistics or frequency counts for the entire data set and also for individual response categories.

TRUNCATE

  • specifies that class levels should be determined using no more than the first 16 characters of the formatted values of CLASS, response, and strata variables. When formatted values are longer than 16 characters, you can use this option to revert to the levels as determined in releases previous to Version 9. This option invokes the same optionintheCLASS statement.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC LOGISTIC to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the LOGISTIC procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure (in base SAS software).

If a SCORE statement is specified, then define the primary data set to be the DATA= or the INMODEL=data set in the PROC LOGISTIC statement, and define the secondary data set to be the DATA= data set and PRIOR= data set in the SCORE statement. The primary data set contains all of the BY variables, and the secondary data set must contain either all of them or none of them. If the secondary data set contains all the BY-variables, matching is carried out between the primary and secondary data sets. If the secondary data set does not contain any of the BY-variables, the entire secondary data set is used for every BY-group in the primary data set and the BY-variables are added to the output data sets specified in the SCORE statement.

Caution: The order of your response and classification variables is determined by combining data across all BY groups; however, the observed levels may change between BY groups. This may affect the value of the reference level for these variables, and hence your interpretation of the model and the parameters.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

  • CLASS variable < (v-options) >< variable < (v-options) > >

    • < / v-options > ;

The CLASS statement names the classification variables to be used in the analysis. The CLASS statement must precede the MODEL statement. You can specify various v-options for each variable by enclosing them in parentheses after the variable name. You can also specify global v-options for the CLASS statement by placing them after a slash (/). Global v-options are applied to all the variables specified in the CLASS statement. If you specify more than one CLASS statement, the global v-options specified on any one CLASS statement apply to all CLASS statements. However, individual CLASS variable v-options override the global v-options .

CPREFIX= n

  • specifies that, at most, the first n characters of a CLASS variable name be used in creating names for the corresponding design variables. The default is 32 ˆ’ min(32 , max(2 , f )), where f is the formatted length of the CLASS variable.

DESCENDING

DESC

  • reverses the sorting order of the classification variable. If both the DESCENDING and ORDER= options are specified, PROC LOGISTIC orders the categories according to the ORDER= option and then reverses that order.

LPREFIX= n

  • specifies that, at most, the first n characters of a CLASS variable label be used in creating labels for the corresponding design variables. The default is 256 ˆ’ min(256 , max(2 , f )), where f is the formatted length of the CLASS variable.

MISSING

  • allows missing value (. for a numeric variable and blanks for a character variables) as a valid value for the CLASS variable.

ORDER=DATA FORMATTED FREQ INTERNAL

  • specifies the sorting order for the levels of classification variables. By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. When ORDER=FORMATTED is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values. This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option may be useful when you use the CONTRAST statement.

  • The following table shows how PROC LOGISTIC interprets values of the ORDER= option.

    Value of ORDER=

    Levels Sorted By

    DATA

    order of appearance in the input data set

    FORMATTED

    external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

    FREQ

    descending frequency count; levels with the most observations come first in the order

    INTERNAL

    unformatted value

  • For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts .

PARAM= keyword

  • specifies the parameterization method for the classification variable or variables. Design matrix columns are created from CLASS variables according to the following coding schemes. The default is PARAM=EFFECT. If PARAM=ORTHPOLY or PARAM=POLY, and the CLASS levels are numeric, then the ORDER= optioninthe CLASS statement is ignored, and the internal, unformatted values are used. See the CLASS Variable Parameterization section on page 2331 for further details.

    EFFECT

    specifies effect coding

    GLM

    specifies less-than -full-rank reference cell coding; this option can only be used as a global option

    ORDINAL

    specifies the cumulative parameterization for an ordinal CLASS variable.

    POLYNOMIAL

    POLY

    specifies polynomial coding

    REFERENCE

    REF

    specifies reference cell coding

    ORTHEFFECT

    orthogonalizes PARAM=EFFECT

    ORTHORDINAL

    orthogonalizes PARAM=ORDINAL

    ORTHPOLY

    orthogonalizes PARAM=POLYNOMIAL

    ORTHREF

    orthogonalizes PARAM=REFERENCE

  • The EFFECT, POLYNOMIAL, REFERENCE, ORDINAL, and their orthogonal parameterizations are full rank. The REF= option in the CLASS statement determines the reference level for the EFFECT, REFERENCE, and their orthogonal parameterizations.

  • Parameter names for a CLASS predictor variable are constructed by concatenating the CLASS variable name with the CLASS levels. However, for the POLYNOMIAL and orthogonal parameterizations, parameter names are formed by concatenating the CLASS variable name and keywords that reflect the parameterization.

REF= level keyword

  • specifies the reference level for PARAM=EFFECT, PARAM=REFERENCE, and their orthogonalizations. For an individual (but not a global) variable REF= option , you can specify the level of the variable to use as the reference level. For a global or individual variable REF= option , you can use one of the following keywords . The default is REF=LAST.

    FIRST

    designates the first ordered level as reference

    LAST

    designates the last ordered level as reference

TRUNCATE

  • specifies that class levels should be determined using no more than the first 16 characters of the formatted values of CLASS, response, and strata variables. When formatted values are longer than 16 characters, you can use this option to revert to the levels as determined in releases previous to Version 9. The TRUNCATE option is only available as a global option. This option invokes the same option in the PROC LOGISTIC statement.

CONTRAST Statement

  • CONTRAST label row-description < , row-description >< / options > ;

  • where a row-description is: effect values <, effect values >

The CONTRAST statement provides a mechanism for obtaining customized hypothesis tests. It is similar to the CONTRAST and ESTIMATE statements in PROC GLM and PROC CATMOD, depending on the coding schemes used with any classification variables involved.

The CONTRAST statement enables you to specify a matrix, L , for testing the hypothesis L = , where is the parameter vector. You must be familiar with the details of the model parameterization that PROC LOGISTIC uses (for more information, see the PARAM= option in the section CLASS Statement on page 2295).

Optionally, the CONTRAST statement enables you to estimate each row, , of L and test the hypothesis = 0. Computed statistics are based on the asymptotic chi-square distribution of the Wald statistic.

There is no limit to the number of CONTRAST statements that you can specify, but they must appear after the MODEL statement.

The following parameters are specified in the CONTRAST statement:

label

identifies the contrast on the output. A label is required for every contrast specified, and it must be enclosed in quotes.

effect

identifies an effect that appears in the MODEL statement. The name INTERCEPT can be used as an effect when one or more intercepts are included in the model. You do not need to include all effects that are included in the MODEL statement.

values

are constants that are elements of the L matrix associated with the effect. To correctly specify your contrast, it is crucial to know the ordering of parameters within each effect and the variable levels associated with any parameter. The Class Level Information table shows the ordering of levels within variables. The E option, described later in this section, enables you to verify the proper correspondence of values to parameters.

The rows of L are specified in order and are separated by commas. Multiple degree-of-freedom hypotheses can be tested by specifying multiple row-descriptions . For any of the full-rank parameterizations, if an effect is not specified in the CONTRAST statement, all of its coefficients in the L matrix are set to 0. If too many values are specified for an effect, the extra ones are ignored. If too few values are specified, the remaining ones are set to 0.

When you use effect coding (by default or by specifying PARAM=EFFECT in the CLASS statement), all parameters are directly estimable (involve no other parameters). For example, suppose an effect coded CLASS variable A has four levels. Then there are three parameters ( ± 1 , ± 2 , ± 3 ) representing the first three levels, and the fourth parameter is represented by

To test the first versus the fourth level of A , you would test

click to expand

or, equivalently,

click to expand

which, in the form L = , is

click to expand

Therefore, you would use the following CONTRAST statement:

  contrast '1 vs. 4' A 2 1 1;  

To contrast the third level with the average of the first two levels, you would test

or, equivalently,

click to expand

Therefore, you would use the following CONTRAST statement:

  contrast '1&2 vs. 3' A 1 1   2;  

Other CONTRAST statements are constructed similarly. For example,

  contrast '1 vs. 2    ' A  1   1  0;   contrast '1&2 vs. 4  ' A  3  3  2;   contrast '1&2 vs. 3&4' A  2  2  0;   contrast 'Main Effect' A  1  0  0,   A  0  1  0,   A  0  0  1;  

When you use the less-than-full-rank parameterization (by specifying PARAM=GLM in the CLASS statement), each row is checked for estimability. If PROC LOGISTIC finds a contrast to be nonestimable, it displays missing values in corresponding rows in the results. PROC LOGISTIC handles missing level combinations of classification variables in the same manner as PROC GLM. Parameters corresponding to missing level combinations are not included in the model. This convention can affect the way in which you specify the L matrix in your CONTRAST statement. If the elements of L are not specified for an effect that contains a specified effect, then the elements of the specified effect are distributed over the levels of the higher-order effect just as the GLM procedure does for its CONTRAST and ESTIMATE statements. For example, suppose that the model contains effects A and B and their interaction A*B. If you specify a CONTRAST statement involving A alone, the L matrix contains nonzero terms for both A and A*B, since A*B contains A.

The degrees of freedom is the number of linearly independent constraints implied by the CONTRAST statement, that is, the rank of L .

You can specify the following options after a slash (/).

ALPHA= ±

  • specifies the level of significance ± for the 100(1 ˆ’ ± )% confidence interval for each contrast when the ESTIMATE option is specified. The value ± must be between 0 and 1. By default, ± is equal to the value of the ALPHA= option in the PROC LOGISTIC statement, or 0.05 if that option is not specified.

E

  • displays the L matrix.

ESTIMATE= keyword

  • requests that each individual contrast (that is, each row, e , of L ) or exponentiated contrast ( e ) be estimated and tested. PROC LOGISTIC displays the point estimate, its standard error, a Wald confidence interval, and a Wald chi-square test for each contrast. The significance level of the confidence interval is controlled by the ALPHA= option. You can estimate the contrast or the exponentiated contrast ( e ), or both, by specifying one of the following keywords :

    PARM

    specifies that the contrast itself be estimated

    EXP

    specifies that the exponentiated contrast be estimated

    BOTH

    specifies that both the contrast and the exponentiated contrast be estimated

SINGULAR = number

  • tunes the estimability check. This option is ignored when the full-rank parameterization is used. If v is a vector, define ABS( v ) to be the largest absolute value of the elements of v . For a row vector l ² of the contrast matrix L , define c to be equal to ABS( l ) if ABS( l ) is greater than 0; otherwise , c equals 1. If ABS( l ² ˆ’ l ² T ) is greater than c * number , then l is declared nonestimable. The T matrix is the Hermite form matrix I , where represents a generalized inverse of the information matrix I of the null model. The value for number must be between 0 and 1; the default value is 1E ˆ’ 4.

EXACT Statement

  • EXACT < label >< Intercept >< effects >< / options > ;

The EXACT statement performs exact tests of the parameters for the specified effects and optionally estimates the parameters and outputs the exact conditional distributions. You can specify the keyword INTERCEPT and any effects in the MODEL statement. Inference on the parameters of the specified effects is performed by conditioning on the sufficient statistics of all the other model parameters (possibly including the intercept).

You can specify several EXACT statements, but they must follow the MODEL statement. Each statement can optionally include an identifying label. If several EXACT statements are specified, any statement without a label will be assigned a label of the form Exact n , where n indicates the n th EXACT statement. The label is included in the headers of the displayed exact analysis tables.

If a STRATA statement is also specified, then a stratified exact conditional logistic regression is performed. The model contains a different intercept for each stratum, and these intercepts are conditioned out of the model along with any other nuisance parameters ( essentially , any parameters specified in the MODEL statement which are not in the EXACT statement).

If the LINK=GLOGIT option is specified in the MODEL statement, then the EXACTOPTION option METHOD=DIRECT is invoked by default and a generalized logit model is fit. Since each effect specified in the MODEL statement adds k parameters to the model (where k + 1 is the number of response levels), exact analysis of the generalized logit model using this method is limited to rather small problems.

The CONTRAST, OUTPUT, SCORE, TEST, and UNITS statements are not available with an exact analysis. Exact analyses are not performed when you specify a WEIGHT statement, a link other than LINK=LOGIT or LINK=GLOGIT, an offset variable, the NOFIT option, or a model-selection method. Exact estimation is not available for ordinal response models.

The following options can be specified in each EXACT statement after a slash (/):

ALPHA= ±

  • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence limits for the parameters or odds ratios. The value ± must be between 0 and 1. By default, ± is equal to the value of the ALPHA= option in the PROC LOGISTIC statement, or 0.05 if that option is not specified.

ESTIMATE < = keyword >

  • estimates the individual parameters (conditional on all other parameters) for the effects specified in the EXACT statement. For each parameter, a point estimate, a confidence interval, and a p -value for a two-sided test that the parameter is zero are displayed. Note that the two-sided p -value is twice the one-sided p -value. You can optionally specify one of the following keywords:

    PARM

    specifies that the parameters be estimated. This is the default.

    ODDS

    specifies that the odds ratios be estimated. For classification variables, use of the reference parameterization is recommended.

    BOTH

    specifies that the parameters and odds ratios be estimated

JOINT

  • performs the joint test that all of the parameters are simultaneously equal to zero, individual hypothesis tests for the parameter of each continuous variable, and joint tests for the parameters of each classification variable. The joint test is indicated in the Conditional Exact Tests table by the label Joint.

JOINTONLY

  • performs only the joint test of the parameters. The test is indicated in the Conditional Exact Tests table by the label Joint. When this option is specified, individual tests for the parameters of each continuous variable and joint tests for the parameters of the classification variables are not performed.

CLTYPE=EXACT MIDP

  • requests either the exact or mid- p confidence intervals for the parameter estimates. By default, the exact intervals are produced. The confidence coefficient can be specified with the ALPHA= option. The mid- p interval can be modified with the MIDPFACTOR= option. See the Inference for a Single Parameter section on page 2373 for details.

MIDPFACTOR = 1 ( 1 , 2 )

  • sets the tie factors used to produce the mid- p hypothesis statistics and the mid- p confidence intervals. 1 modifies both the hypothesis tests and confidence intervals, while 2 affects only the hypothesis tests. By default, 1 = 0 . 5 and 2 = 1 . 0. See the Hypothesis Tests section on page 2371 and the Inference for a Single Parameter section on page 2373 for details.

ONESIDED

  • requests one-sided confidence intervals and p -values for the individual parameter estimates and odds ratios. The one-sided p -value is the smaller of the left and right tail probabilities for the observed sufficient statistic of the parameter under the null hypothesis that the parameter is zero. The two-sided p -values (default) are twice the one-sided p -values. See the Inference for a Single Parameter section on page 2373 for more details.

OUTDIST= SAS-data-set

  • names the SAS data set containing the exact conditional distributions. This data set contains all of the exact conditional distributions required to process the corresponding EXACT statement. The data set contains the possible sufficient statistics for the parameters of the effects specified in the EXACT statement, the counts, and, when hypothesis tests are performed on the parameters, the probability of occurrence and the score value for each sufficient statistic. When you request an OUTDIST= data set, the observed sufficient statistics are displayed in the Sufficient Statistics table. See the OUTDIST= Output Data Set section on page 2377 for more information.

EXACT Statement Examples

  • In the following example, two exact tests are computed: one for x1 and the other for x2 . The test for x1 is based on the exact conditional distribution of the sufficient statistic for the x1 parameter given the observed values of the sufficient statistics for the intercept, x2 , and x3 parameters; likewise, the test for x2 is conditional on the observed sufficient statistics for the intercept, x1 , and x3 :

      proc logistic;   model y= x1 x2 x3;   exact 'lab1' x1 x2;   run;  
  • You can specify multiple EXACT statements in the same PROC LOGISTIC invocation. PROC LOGISTIC determines, from all the EXACT statements, the distinct conditional distributions that need to be evaluated. For example, there is only one exact conditional distribution for the following two EXACT statements, and it would be a waste of resources to compute the same exact conditional distribution twice:

      exact 'One' x1 / estimate=parm;   exact 'Two' x1 / estimate=parm onesided;  
  • For each EXACT statement, individual tests for the parameters of the specified effects are computed unless the JOINTONLY option is specified. Consider the following EXACT statements:

      exact 'E12' x1 x2 / estimate;   exact 'E1'  x1    / estimate;   exact 'E2'  x2    / estimate;   exact 'J12' x1 x2 / joint;  

    In the E12 statement, the parameters for x1 and x2 are estimated and tested separately. Specifying the E12 statement is equivalent to specifying both the E1 and E2 statements. In the J12 statement, the joint test for the parameters of x1 and x2 is computed as well as the individual tests for x1 and x2 .

    All exact conditional distributions for the tests and estimates computed in a single EXACT statement are output to the corresponding OUTDIST= data set. For example, consider the following EXACT statements:

      exact 'O1'   x1    /           outdist=o1;   exact 'OJ12' x1 x2 / jointonly outdist=oj12;   exact 'OA12' x1 x2 / joint     outdist=oa12;   exact 'OE12' x1 x2 / estimate  outdist=oe12;  

    The O1 statement outputs a single exact conditional distribution. The OJ12 statement outputs only the joint distribution for x1 and x2 . The OA12 statement outputs three conditional distributions: one for x1 , one for x2 , and one jointly for x1 and x2 . The OE12 statement outputs two conditional distributions: one for x1 and the other for x2 . Data set oe12 contains both the x1 and x2 variables; the distribution for x1 has missing values in the x2 column while the distribution for x2 has missing values in the x1 column.

    See the OUTDIST= Output Data Set section on page 2377 for more information.

FREQ Statement

  • FREQ variable ;

The variable in the FREQ statement identifies a variable that contains the frequency of occurrence of each observation. PROC LOGISTIC treats each observation as if it appears n times, where n is the value of the FREQ variable for the observation. If it is not an integer, the frequency value is truncated to an integer. If the frequency value is less than 1 or missing, the observation is not used in the model fitting. When the FREQ statement is not specified, each observation is assigned a frequency of 1.

If a SCORE statement is specified, then the FREQ variable is used for computing fit statistics and the ROC curve, but they are not required for scoring. If the DATA= data set in the SCORE statement does not contain the FREQ variable, the frequency values are assumed to be 1 and a warning message is issued in the LOG. If you fit a model and perform the scoring in the same run, the same FREQ variable is used for fitting and scoring. If you fit a model in a previous run and input it with the INMODEL= option in the current run, then the FREQ variable can be different from the one used in the previous run; however, if a FREQ variable was not specified in the previous run you can still specify a FREQ variable in the current run.

MODEL Statement

  • MODEL events/trials= < effects >< / options > ;

  • MODEL variable < (variable_options) > = < effects >< / options > ;

The MODEL statement names the response variable and the explanatory effects, including covariates, main effects, interactions, and nested effects; see the section Specification of Effects on page 1784 of Chapter 32, The GLM Procedure, for more information. If you omit the explanatory effects, the procedure fits an intercept-only model. Model options can be specified after a slash (/).

Two forms of the MODEL statement can be specified. The first form, referred to as single-trial syntax, is applicable to binary, ordinal, and nominal response data. The second form, referred to as events/trials syntax, is restricted to the case of binary response data. The single-trial syntax is used when each observation in the DATA= data set contains information on only a single trial, for instance, a single subject in an experiment. When each observation contains information on multiple binary-response trials, such as the counts of the number of subjects observed and the number responding, then events/trials syntax can be used.

In the events/trials syntax, you specify two variables that contain count data for a binomial experiment. These two variables are separated by a slash. The value of the first variable, events , is the number of positive responses (or events). The value of the second variable, trials , is the number of trials. The values of both events and ( trials ˆ’ events ) must be nonnegative and the value of trials must be positive for the response to be valid.

In the single-trial syntax, you specify one variable (on the left side of the equal sign) as the response variable. This variable can be character or numeric. Options specific to the response variable can be specified immediately after the response variable with a pair of parentheses around them.

For both forms of the MODEL statement, explanatory effects follow the equal sign. Variables can be either continuous or classification variables. Classification variables can be character or numeric, and they must be declared in the CLASS statement. When an effect is a classification variable, the procedure enters a set of coded columns into the design matrix instead of directly entering a single column containing the values of the variable.

Response Variable Options

  • You can specify the following options by enclosing them in a pair of parentheses after the response variable.

DESCENDING DESC

  • reverses the order of the response categories. If both the DESCENDING and ORDER= options are specified, PROC LOGISTIC orders the response categories according to the ORDER= option and then reverses that order. See the Response Level Ordering section on page 2329 for more detail.

EVENT= category keyword

  • specifies the event category for the binary response model. PROC LOGISTIC models the probability of the event category. The EVENT= option has no effect when there are more than two response categories. You can specify the value (formatted if a format is applied) of the event category in quotes or you can specify one of the following keywords. The default is EVENT=FIRST.

    FIRST

    designates the first ordered category as the event

    LAST

    designates the last ordered category as the event

    One of the most common sets of response levels is {0,1}, with 1 representing the event for which the probability is to be modeled . Consider the example where Y takes the values 1 and 0 for event and nonevent, respectively, and Exposure is the explanatory variable. To specify the value 1 as the event category, use the MODEL statement

      model Y(event='1') = Exposure;  

ORDER= DATA FORMATTED FREQ INTERNAL

  • specifies the sorting order for the levels of the response variable. By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. When ORDER=FORMATTED is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values.

  • The following table shows the interpretation of the ORDER= values.

    Value of ORDER=

    Levels Sorted By

    DATA

    order of appearance in the input data set

    FORMATTED

    external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

    FREQ

    descending frequency count; levels with the most observations come first in the order

    INTERNAL

    unformatted value

  • For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts .

REFERENCE= category keyword

REF= category keyword

  • specifies the reference category for the generalized logit model and the binary response model. For the generalized logit model, each nonreference category is contrasted with the reference category. For the binary response model, specifying one response category as the reference is the same as specifying the other response category as the event category. You can specify the value (formatted if a format is applied) of the reference category in quotes or you can specify one of the following keywords. The default is REF=LAST.

    FIRST

    designates the first ordered category as the reference

    LAST

    designates the last ordered category as the reference

Model Options

Table 42.1 summarizes the options available in the MODEL statement, which can be specified after a slash (/).

Table 42.1: Model Statement Options

Option

Description

Model Specification Options

LINK=

specifies link function

NOINT

suppresses intercept

NOFIT

suppresses model fitting

OFFSET=

specifies offset variable

SELECTION=

specifies effect selection method

Effect Selection Options

BEST=

controls the number of models displayed for SCORE selection

DETAILS

requests detailed results at each step

FAST

uses fast elimination method

HIERARCHY=

specifies whether and how hierarchy is maintained and whether a single effect or multiple effects are allowed to enter or leave the model per step

INCLUDE=

specifies number of effects included in every model

MAXSTEP=

specifies maximum number of steps for STEPWISE selection

SEQUENTIAL

adds or deletes effects in sequential order

SLENTRY=

specifies significance level for entering effects

SLSTAY=

specifies significance level for removing effects

START=

specifies number of variables in first model

STOP=

specifies number of variables in final model

STOPRES

adds or deletes variables by residual chi-square criterion

Model-Fitting Specification Options

ABSFCONV=

specifies absolute function convergence criterion

FCONV=

specifies relative function convergence criterion

GCONV=

specifies relative gradient convergence criterion

XCONV=

specifies relative parameter convergence criterion

MAXFUNCTION=

specifies maximum number of function calls for the conditional analysis

MAXITER=

specifies maximum number of iterations

NOCHECK

suppresses checking for infinite parameters

RIDGING=

specifies the technique used to improve the log-likelihood function when its value is worse than that of the previous step

SINGULAR=

specifies tolerance for testing singularity

TECHNIQUE=

specifies iterative algorithm for maximization

Options for Confidence Intervals

ALPHA=

specifies ± for the 100(1 ˆ’ ± )% confidence intervals

CLPARM=

computes confidence intervals for parameters

CLODDS=

computes confidence intervals for odds ratios

PLCONV=

specifies profile likelihood convergence criterion

Options for Classifying Observations

CTABLE

displays classification table

PEVENT=

specifies prior event probabilities

PPROB=

specifies probability cutpoints for classification

Options for Overdispersion and Goodness-of-Fit Tests

AGGREGATE=

determines subpopulations for Pearson chi-square and deviance

SCALE=

specifies method to correct overdispersion

LACKFIT

requests Hosmer and Lemeshow goodness-of-fittest

Options for ROC Curves

OUTROC=

names the output data set

ROCEPS=

specifies probability grouping criterion

Options for Regression Diagnostics

INFLUENCE

displays influence statistics

IPLOTS

requests index plots

Options for Display of Details

CORRB

displays correlation matrix

COVB

displays covariance matrix

EXPB

displays exponentiated values of estimates

ITPRINT

displays iteration history

NODUMMYPRINT

suppresses Class Level Information table

PARMLABEL

displays parameter labels

RSQUARE

displays generalized R 2

STB

displays standardized estimates

Computational Options

NOLOGSCALE

performs calculations using normal scaling

The following list describes these options.

ABSFCONV= value

  • specifies the absolute function convergence criterion. Convergence requires a small change in the log-likelihood function in subsequent iterations,

    where l i is the value of the log-likelihood function at iteration i . See the section Convergence Criteria on page 2338.

AGGREGATE

AGGREGATE= (variable-list)

  • specifies the subpopulations on which the Pearson chi-square test statistic and the likelihood ratio chi-square test statistic (deviance) are calculated. Observations with common values in the given list of variables are regarded as coming from the same subpopulation. Variables in the list can be any variables in the input data set. Specifying the AGGREGATE option is equivalent to specifying the AGGREGATE= option with a variable list that includes all explanatory variables in the MODEL statement. The deviance and Pearson goodness-of-fit statistics are calculated only when the SCALE= option is specified. Thus, the AGGREGATE (or AGGREGATE=) option has no effect if the SCALE= option is not specified. See the section Rescaling the Covariance Matrix on page 2354 for more detail.

ALPHA= ±

  • sets the level of significance ± for 100(1 ˆ’ ± )%confidence intervals for regression parameters or odds ratios. The value ± must be between 0 and 1. By default, ± is equal to the value of the ALPHA= option in the PROC LOGISTIC statement, or 0.05 if the option is not specified. This option has no effect unless confidence limits for the parameters or odds ratios are requested .

BEST= n

  • specifies that n models with the highest score chi-square statistics are to be displayed for each model size. It is used exclusively with the SCORE model selection method. If the BEST= option is omitted and there are no more than ten explanatory variables, then all possible models are listed for each model size. If the option is omitted and there are more than ten explanatory variables, then the number of models selected for each model size is, at most, equal to the number of explanatory variables listed in the MODEL statement.

CLODDS=PL WALD BOTH

  • requests confidence intervals for the odds ratios. Computation of these confidence intervals is based on the profile likelihood (CLODDS=PL) or based on individual Wald tests (CLODDS=WALD). By specifying CLODDS=BOTH, the procedure computes twosetsofconfidence intervals for the odds ratios, one based on the profile likelihood and the other based on the Wald tests. The confidence coefficient can be specified with the ALPHA= option.

CLPARM=PL WALD BOTH

  • requests confidence intervals for the parameters. Computation of these confidence intervals is based on the profile likelihood (CLPARM=PL) or individual Wald tests (CLPARM=WALD). By specifying CLPARM=BOTH, the procedure computes two sets of confidence intervals for the parameters, one based on the profile likelihood and the other based on individual Wald tests. The confidence coefficient can be specified with the ALPHA= option. See the Confidence Intervals for Parameters section on page 2345 for more information.

CORRB

  • displays the correlation matrix of the parameter estimates.

COVB

  • displays the covariance matrix of the parameter estimates.

CTABLE

DETAILS

  • produces a summary of computational details for each step of the effect selection process. It produces the Analysis of Effects Not in the Model table before displaying the effect selected for entry for FORWARD or STEPWISE selection. For each model fitted, it produces the Type 3 Analysis of Effects table if the fitted model involves CLASS variables, the Analysis of Maximum Likelihood Estimates table, and measures of association between predicted probabilities and observed responses. For the statistics included in these tables, see the Displayed Output section on page 2381. The DETAILS option has no effect when SELECTION=NONE.

EXPB

EXPEST

  • displays the exponentiated values ( ) of the parameter estimates i in the Analysis of Maximum Likelihood Estimates table for the logit model. These exponentiated values are the estimated odds ratios for the parameters corresponding to the continuous explanatory variables.

FAST

  • uses a computational algorithm of Lawless and Singhal (1978) to compute a first-order approximation to the remaining slope estimates for each subsequent elimination of a variable from the model. Variables are removed from the model based on these approximate estimates. The FAST option is extremely efficient because the model is not refitted for every variable removed. The FAST option is used when SELECTION=BACKWARD and in the backward elimination steps when SELECTION=STEPWISE. The FAST option is ignored when SELECTION=FORWARD or SELECTION=NONE.

FCONV= value

  • specifies the relative function convergence criterion. Convergence requires a small relative change in the log-likelihood function in subsequent iterations,

    click to expand

    where l i is the value of the log likelihood at iteration i . See the section Convergence Criteria on page 2338.

GCONV= value

  • specifies the relative gradient convergence criterion. Convergence requires that the normalized prediction function reduction is small,

    click to expand

    where l i is the value of the log-likelihood function, g i is the gradient vector, and I i is the (expected) information matrix, all at iteration i . This is the default convergence criterion, and the default value is 1E ˆ’ 8. See the section Convergence Criteria on page 2338.

HIERARCHY= keyword

HIER= keyword

  • specifies whether and how the model hierarchy requirement is applied and whether a single effect or multiple effects are allowed to enter or leave the model in one step. You can specify that only CLASS effects, or both CLASS and interval effects, be subject to the hierarchy requirement. The HIERARCHY= option is ignored unless you also specify one of the following options: SELECTION=FORWARD, SELECTION=BACKWARD, or SELECTION=STEPWISE.

  • Model hierarchy refers to the requirement that, for any term to be in the model, all effects contained in the term must be present in the model. For example, in order for the interaction A*B to enter the model, the main effects A and B must be in the model. Likewise, neither effect A nor B can leave the model while the interaction A*B is in the model.

  • The keywords you can specify in the HIERARCHY= option are as follows :

    NONE

    Model hierarchy is not maintained. Any single effect can enter or leave the model at any given step of the selection process.

    SINGLE

    Only one effect can enter or leave the model at one time, subject to the model hierarchy requirement. For example, suppose that you specify the main effects A and B and the interaction A*B in the model. In the first step of the selection process, either A or B can enter the model. In the second step, the other main effect can enter the model. The interaction effect can enter the model only when both main effects have already been entered. Also, before A or B can be removed from the model, the A*B interaction must first be removed. All effects (CLASS and interval) are subject to the hierarchy requirement.

    SINGLECLASS

    This is the same as HIERARCHY=SINGLE except that only CLASS effects are subject to the hierarchy requirement.

    MULTIPLE

    More than one effect can enter or leave the model at one time, subject to the model hierarchy requirement. In a forward selection step, a single main effect can enter the model, or an interaction can enter the model together with all the effects that are contained in the interaction. In a backward elimination step, an interaction itself, or the interaction together with all the effects that the interaction contains, can be removed. All effects (CLASS and interval) are subject to the hierarchy requirement.

    MULTIPLECLASS

    This is the same as HIERARCHY=MULTIPLE except that only CLASS effects are subject to the hierarchy requirement.

  • The default value is HIERARCHY=SINGLE, which means that model hierarchy is to be maintained for all effects (that is, both CLASS and interval effects) and that only a single effect can enter or leave the model at each step.

INCLUDE= n

  • includes the first n effects in the MODEL statement in every model. By default, INCLUDE=0. The INCLUDE= option has no effect when SELECTION=NONE.

  • Note that the INCLUDE= and START= options perform different tasks : the INCLUDE= option includes the first n effects variables in every model, whereas the START= option only requires that the first n effects appear in the first model.

INFLUENCE

  • displays diagnostic measures for identifying influential observations in the case of a binary response model. It has no effect otherwise. For each observation, the INFLUENCE option displays the case number (which is the sequence number of the observation), the values of the explanatory variables included in the final model, and the regression diagnostic measures developed by Pregibon (1981). For a discussion of these diagnostic measures, see the Regression Diagnostics section on page 2359. When a STRATA statement is specified, the diagnostics are computed following Storer and Crowley (1985); see the Regression Diagnostic Details section on page 2367 for details.

IPLOTS

  • produces an index plot for each regression diagnostic statistic. An index plot is a scatterplot with the regression diagnostic statistic represented on the y-axis and the case number on the x-axis. See Example 42.6 on page 2422 for an illustration.

ITPRINT

  • displays the iteration history of the maximum-likelihood model fitting. The ITPRINT option also displays the last evaluation of the gradient vector and the final change in the ˆ’ 2 Log Likelihood.

LACKFIT

LACKFIT < ( n ) >

  • performs the Hosmer and Lemeshow goodness-of-fit test (Hosmer and Lemeshow 2000) for the case of a binary response model. The subjects are divided into approximately ten groups of roughly the same size based on the percentiles of the estimated probabilities. The discrepancies between the observed and expected number of observations in these groups are summarized by the Pearson chi-square statistic, which is then compared to a chi-square distribution with t degrees of freedom, where t is the number of groups minus n . By default, n =2. A small p -value suggests that the fitted model is not an adequate model. See the The Hosmer-Lemeshow Goodness-of-Fit Test section on page 2356 for more information.

    LINK= keyword

    L= keyword

  • specifies the link function linking the response probabilities to the linear predictors. You can specify one of the following keywords. The default is LINK=LOGIT.

    CLOGLOG

    the complementary log-log function. PROC LOGISTIC fits the binary complementary log-log model when there are two response categories and fits the cumulative complementary log-log model when there are more than two response categories. Aliases: CCLOGLOG, CCLL, CUMCLOGLOG.

    GLOGIT

    the generalized logit function. PROC LOGISTIC fits the generalized logit model where each nonreference category is contrasted with the reference category. You can use the response variable option REF= to specify the reference category.

    LOGIT

    the log odds function. PROC LOGISTIC fits the binary logit model when there are two response categories and fits the cumulative logit model when there are more than two response categories. Aliases: CLOGIT, CUMLOGIT.

    PROBIT

    the inverse standard normal distribution function. PROC LOGISTIC fits the binary probit model when there are two response categories and fits the cumulative probit model when there are more than two response categories. Aliases: NORMIT, CPROBIT, CUMPROBIT.

  • See the section Link Functions and the Corresponding Distributions on page 2334 for details.

MAXFUNCTION= n

  • specifies the maximum number of function calls to perform when maximizing the conditional likelihood. This option is only valid when a STRATA statement is specified. The default values are

    • 125 when the number of parameters p < 40

    • 500 when 40 p < 400

    • 1000 when p 400

  • Since the optimization is terminated only after completing a full iteration, the number of function calls that are actually performed can exceed n . If convergence is not attained, the displayed output and all output data sets created by the procedure contain results based on the last maximum likelihood iteration.

MAXITER= n

  • specifies the maximum number of iterations to perform. By default, MAXITER=25. If convergence is not attained in n iterations, the displayed output and all output data sets created by the procedure contain results that are based on the last maximum likelihood iteration.

MAXSTEP= n

  • specifies the maximum number of times any explanatory variable is added to or removed from the model when SELECTION=STEPWISE. The default number is twice the number of explanatory variables in the MODEL statement. When the MAXSTEP= limit is reached, the stepwise selection process is terminated. All statistics displayed by the procedure (and included in output data sets) are based on the last model fitted. The MAXSTEP= option has no effect when SELECTION=NONE, FORWARD, or BACKWARD.

NOCHECK

  • disables the checking process to determine whether maximum likelihood estimates of the regression parameters exist. If you are sure that the estimates are finite, this option can reduce the execution time if the estimation takes more than eight iterations. For more information, see the Existence of Maximum Likelihood Estimates section on page 2338.

NODUMMYPRINT

NODESIGNPRINT

NODP

  • suppresses the Class Level Information table, which shows how the design matrix columns for the CLASS variables are coded.

NOINT

  • suppresses the intercept for the binary response model, the first intercept for the ordinal response model (which forces all intercepts to be nonnegative), or all intercepts for the generalized logit model. This can be particularly useful in conditional logistic analysis; see Example 42.10 on page 2443.

NOFIT

  • performs the global score test without fitting the model. The global score test evaluates the joint significance of the effects in the MODEL statement. No further analyses are performed. If the NOFIT option is specified along with other MODEL statement options, NOFIT takes effect and all other options except LINK=, TECHNIQUE=, and OFFSET= are ignored.

NOLOGSCALE

  • specifies that computations for the conditional and exact conditional logistic model should be computed using normal scaling. Log-scaling can handle numerically larger problems than normal scaling; however, computations in the log-scale are slower than computations in normal-scale.

OFFSET= name

  • names the offset variable. The regression coefficient for this variable will be fixed at 1.

OUTROC= SAS-data-set

OUTR= SAS-data-set

  • creates, for binary response models, an output SAS data set that contains the data necessary to produce the receiver operating characteristic (ROC) curve. See the section OUTROC= Output Data Set on page 2378 for the list of variables in this data set.

PARMLABEL

  • displays the labels of the parameters in the Analysis of Maximum Likelihood Estimates table.

PEVENT= value

PEVENT= ( list )

  • specifies one prior probability or a list of prior probabilities for the event of interest. The false positive and false negative rates are then computed as posterior probabilities by Bayes theorem. The prior probability is also used in computing the rate of correct prediction. For each prior probability in the given list, a classification table of all observations is computed. By default, the prior probability is the total sample proportion of events. The PEVENT= option is useful for stratified samples. It has no effect if the CTABLE option is not specified. For more information, see the section False Positive and Negative Rates Using Bayes Theorem on page 2353. Also see the PPROB= option for information on how the list is specified.

PLCL

  • is the same as specifying CLPARM=PL.

PLCONV= value

  • controls the convergence criterion for confidence intervals based on the profile likelihood function. The quantity value must be a positive number, with a default value of 1E ˆ’ 4. The PLCONV= option has no effect if profile likelihood confidence intervals (CLPARM=PL) are not requested.

PLRL

  • is the same as specifying CLODDS=PL.

PPROB= value

PPROB= ( list )

  • specifies one critical probability value (or cutpoint) or a list of critical probability values for classifying observations with the CTABLE option. Each value must be between 0 and 1. A response that has a cross validated predicted probability greater than or equal to the current PPROB= value is classified as an event response. The PPROB= option is ignored if the CTABLE option is not specified.

  • A classification table for each of several cutpoints can be requested by specifying a list. For example,

      pprob= (0.3, 0.5 to 0.8 by 0.1)  
  • requests a classification of the observations for each of the cutpoints 0.3, 0.5, 0.6, 0.7, and 0.8. If the PPROB= option is not specified, the default is to display the classification for a range of probabilities from the smallest estimated probability (rounded down to the nearest 0.02) to the highest estimated probability (rounded up to the nearest 0.02) with 0.02 increments .

RIDGING=ABSOLUTE RELATIVE NONE

  • specifies the technique used to improve the log-likelihood function when its value in the current iteration is less than that in the previous iteration. If you specify the RIDGING=ABSOLUTE option, the diagonal elements of the negative (expected) Hessian are inflated by adding the ridge value. If you specify the RIDGING=RELATIVE option, the diagonal elements are inflated by a factor of 1 plus the ridge value. If you specify the RIDGING=NONE option, the crude line search method of taking half a step is used instead of ridging. By default, RIDGING=RELATIVE.

RISKLIMITS

RL

WALDRL

  • is the same as specifying CLODDS=WALD.

ROCEPS= number

  • specifies the criterion for grouping estimated event probabilities that are close to each other for the ROC curve. In each group, the difference between the largest and the smallest estimated event probabilities does not exceed the given value. The value for number must be between 0 and 1; the default value is 1E ˆ’ 4. The smallest estimated probability in each group serves as a cutpoint for predicting an event response. The ROCEPS= option has no effect if the OUTROC= option is not specified.

RSQUARE

RSQ

  • requests a generalized R 2 measure for the fitted model. For more information, see the Generalized Coefficient of Determination section on page 2342.

SCALE= scale

  • enables you to supply the value of the dispersion parameter or to specify the method for estimating the dispersion parameter. It also enables you to display the Deviance and Pearson Goodness-of-Fit Statistics table. To correct for overdispersion or underdispersion, the covariance matrix is multiplied by the estimate of the dispersion parameter. Valid values for scale are as follows:

    D DEVIANCE

    specifies that the dispersion parameter be estimated by the deviance divided by its degrees of freedom.

    P PEARSON

    specifies that the dispersion parameter be estimated by the Pearson chi-square statistic divided by its degrees of freedom.

    WILLIAMS <( constant )>

    specifies that Williams method be used to model overdispersion. This option can be used only with the events/trials syntax. An optional constant can be specified as the scale parameter; otherwise, a scale parameter is estimated under the full model. A set of weights is created based on this scale parameter estimate. These weights can then be used in fitting subsequent models of fewer terms than the full model. When fitting these submodels, specify the computed scale parameter as constant . See Example 42.9 on page 2438 for an illustration.

    N NONE

    specifies that no correction is needed for the dispersion parameter; that is, the dispersion parameter remains as 1. This specification is used for requesting the deviance and the Pearson chi-square statistic without adjusting for overdispersion.

    constant

    sets the estimate of the dispersion parameter to be the square of the given constant . For example, SCALE=2 sets the dispersion parameter to 4. The value constant must be a positive number.

  • You can use the AGGREGATE (or AGGREGATE=) option to define the subpopulations for calculating the Pearson chi-square statistic and the deviance. In the absence of the AGGREGATE (or AGGREGATE=) option, each observation is regarded as coming from a different subpopulation. For the events/trials syntax, each observation consists of n Bernoulli trials, where n is the value of the trials variable. For single-trial syntax, each observation consists of a single response, and for this setting it is not appropriate to carry out the Pearson or deviance goodness-of-fit analysis. Thus, PROC LOGISTIC ignores specifications SCALE=P, SCALE=D, and SCALE=N when single-trial syntax is specified without the AGGREGATE (or AGGREGATE=) option.

  • The Deviance and Pearson Goodness-of-Fit Statistics table includes the Pearson chi-square statistic, the deviance, their degrees of freedom, the ratio of each statistic divided by its degrees of freedom, and the corresponding p -value. For more information, see the Overdispersion section on page 2354.

SELECTION=BACKWARD B

    • FORWARD F

    • NONE N

    • STEPWISE S

    • SCORE

  • specifies the method used to select the variables in the model. BACKWARD requests backward elimination, FORWARD requests forward selection, NONE fits the complete model specified in the MODEL statement, and STEPWISE requests stepwise selection. SCORE requests best subset selection. By default, SELECTION=NONE. For more information, see the Effect Selection Methods section on page 2340.

SEQUENTIAL

SEQ

  • forces effects to be added to the model in the order specified in the MODEL statement or eliminated from the model in the reverse order specified in the MODEL statement. The model-building process continues until the next effect to be added has an insignificant adjusted chi-square statistic or until the next effect to be deleted has a significant Wald chi-square statistic. The SEQUENTIAL option has no effect when SELECTION=NONE.

SINGULAR= value

  • specifies the tolerance for testing the singularity of the Hessian matrix (Newton-Raphson algorithm) or the expected value of the Hessian matrix (Fisher-scoring algorithm). The Hessian matrix is the matrix of second partial derivatives of the log-likelihood function. The test requires that a pivot for sweeping this matrix be at least this number times a norm of the matrix. Values of the SINGULAR= option must be numeric. By default, value is the machine epsilon times 10 7 , which is approximately 10 ˆ’ 9 on most machines.

SLENTRY= value

SLE= value

  • specifies the significance level of the score chi-square for entering an effect into the model in the FORWARD or STEPWISE method. Values of the SLENTRY= option should be between 0 and 1, inclusive. By default, SLENTRY=0.05. The SLENTRY= option has no effect when SELECTION=NONE, SELECTION=BACKWARD, or SELECTION=SCORE.

SLSTAY= value

SLS= value

  • specifies the significance level of the Wald chi-square for an effect to stay in the model in a backward elimination step. Values of the SLSTAY= option should be between 0 and 1, inclusive. By default, SLSTAY=0.05. The SLSTAY= option has no effect when SELECTION=NONE, SELECTION=FORWARD, or SELECTION=SCORE.

START= n

  • begins the FORWARD, BACKWARD, or STEPWISE effect selection process with the first n effects listed in the MODEL statement. The value of n ranges from 0 to s , where s is the total number of effects in the MODEL statement. The default value of n is s for the BACKWARD method and 0 for the FORWARD and STEPWISE methods. Note that START= n specifies only that the first n effects appear in the first model, while INCLUDE= n requires that the first n effects be included in every model. For the SCORE method, START= n specifies that the smallest models contain n effects, where n ranges from 1 to s ; the default value is 1. The START= option has no effect when SELECTION=NONE.

STB

  • displays the standardized estimates for the parameters for the continuous explanatory variables in the Analysis of Maximum Likelihood Estimates table. The standardized estimate of ² i is given by i /( s/s i ), where s i is the total sample standard deviation for the i th explanatory variable and

    click to expand
  • For the intercept parameters and parameters associated with a CLASS variable, the standardized estimates are set to missing.

STOP= n

  • specifies the maximum (FORWARD method) or minimum (BACKWARD method) number of effects to be included in the final model. The effect selection process is stopped when n effects are found. The value of n ranges from 0 to s , where s is the total number of effects in the MODEL statement. The default value of n is s for the FORWARD method and 0 for the BACKWARD method. For the SCORE method, STOP= n specifies that the largest models contain n effects, where n ranges from 1 to s ; the default value of n is s . The STOP= option has no effect when SELECTION=NONE or STEPWISE.

STOPRES

SR

  • specifies that the removal or entry of effects be based on the value of the residual chi-square. If SELECTION=FORWARD, then the STOPRES option adds the effects into the model one at a time until the residual chi-square becomes insignificant (until the p -value of the residual chi-square exceeds the SLENTRY= value ). If SELECTION=BACKWARD, then the STOPRES option removes effects from the model one at a time until the residual chi-square becomes significant (until the p -value of the residual chi-square becomes less than the SLSTAY= value ). The STOPRES option has no effect when SELECTION=NONE or SELECTION=STEPWISE.

TECHNIQUE=FISHER NEWTON

TECH=FISHER NEWTON

  • specifies the optimization technique for estimating the regression parameters. NEWTON (or NR) is the Newton-Raphson algorithm and FISHER (or FS) is the Fisher-scoring algorithm. Both techniques yield the same estimates, but the estimated covariance matrices are slightly different except for the case when the LOGIT link is specified for binary response data. The default is TECHNIQUE=FISHER. See the section Iterative Algorithms for Model-Fitting on page 2336 for details.

WALDCL

CL

  • is the same as specifying CLPARM=WALD.

XCONV= value

  • specifies the relative parameter convergence criterion. Convergence requires a small relative parameter change in subsequent iterations,

    where

    click to expand

    and is the estimate of the j th parameter at iteration i . See the section Convergence Criteria on page 2338.

OUTPUT Statement

  • OUTPUT < OUT= SAS-data-set >< options > ;

The OUTPUT statement creates a new SAS data set that contains all the variables in the input data set and, optionally, the estimated linear predictors and their standard error estimates, the estimates of the cumulative or individual response probabilities, and the confidence limits for the cumulative probabilities. Regression diagnostic statistics and estimates of cross validated response probabilities are also available for binary response models. Formulas for the statistics are given in the Linear Predictor, Predicted Probability, and Confidence Limits section on page 2350, the Regression Diagnostics section on page 2359, and, for conditional logistic regression, in the Conditional Logistic Regression section on page 2365.

If you use the single-trial syntax, the data set also contains a variable named _LEVEL_ , which indicates the level of the response that the given row of output is referring to. For instance, the value of the cumulative probability variable is the probability that the response variable is as large as the corresponding value of _LEVEL_ . For details, see the section OUT= Output Data Set in the OUTPUT Statement on page 2376.

The estimated linear predictor, its standard error estimate, all predicted probabilities, and the confidence limits for the cumulative probabilities are computed for all observations in which the explanatory variables have no missing values, even if the response is missing. By adding observations with missing response values to the input data set, you can compute these statistics for new observations or for settings of the explanatory variables not present in the data without affecting the model fit.

OUT= SAS-data-set

  • names the output data set. If you omit the OUT= option, the output data set is created and given a default name using the DATA n convention.

  • The following sections explain options in the OUTPUT statement, divided into statistic options for any type of categorical responses, statistic options only for binary response,andother options. The statistic options specify the statistics to be included in the output data set and name the new variables that contain the statistics. If a STRATA statement is specified, only the PREDICTED=, DFBETAS=, and H= options are available; see the Regression Diagnostic Details section on page 2367 for details.

Statistic Options for Any Type of Categorical Response

LOWER= name

L= name

  • names the variable containing the lower confidence limits for , where is the probability of the event response if events/trials syntax or single-trial syntax with binary response is specified; for a cumulative model, is cumulative probability (that is, the probability that the response is less than or equal to the value of _LEVEL_ ); for the generalized logit model, it is the individual probability (that is, the probability that the response category is represented by the value of _LEVEL_). See the ALPHA= option to set the confidence level.

PREDICTED= name

PRED= name

PROB= name

P= name

  • names the variable containing the predicted probabilities. For the events/trials syntax or single-trial syntax with binary response, it is the predicted event probability. For a cumulative model, it is the predicted cumulative probability (that is, the probability that the response variable is less than or equal to the value of _LEVEL_ ); and for the generalized logit model, it is the predicted individual probability (that is, the probability of the response category represented by the value of _LEVEL_ ).

PREDPROBS=( keywords )

  • requests individual, cumulative, or cross validated predicted probabilities. Descriptions of the keywords are as follows.

    INDIVIDUAL I

    requests the predicted probability of each response level. For a response variable Y with three levels, 1, 2, and 3, the individual probabilities are Pr( Y =1), Pr( Y =2), and Pr( Y =3).

    CUMULATIVE C

    requests the cumulative predicted probability of each response level. For a response variable Y with three levels, 1, 2, and 3, the cumulative probabilities are Pr( Y 1), Pr( Y 2), and Pr( Y 3). The cumulative probability for the last response level always has the constant value of 1. For generalized logit models, the cumulative predicted probabilities are not computed and are set to missing.

    CROSSVALIDATE XVALIDATE X

    requests the cross validated individual predicted probability of each response level. These probabilities are derived from the leave-one-out principle; that is, dropping the data of one subject and reestimating the parameter estimates. PROC LOGISTIC uses a less expensive one-step approximation to compute the parameter estimates. This option is only valid for binary response models; for nominal and ordinal models, the cross validated probabilities are not computed and are set to missing.

  • See the Details of the PREDPROBS= Option section on page 2322 at the end of this section for further details.

STDXBETA= name

  • names the variable containing the standard error estimates of XBETA (the definition of which follows).

UPPER= name

U= name

  • names the variable containing the upper confidence limits for , where is the probability of the event response if events/trials syntax or single-trial syntax with binary response is specified; for a cumulative model, is cumulative probability (that is, the probability that the response is less than or equal to the value of _LEVEL_ ); for the generalized logit model, it is the individual probability (that is, the probability that the response category is represented by the value of _LEVEL_). See the ALPHA= option to set the confidence level.

XBETA= name

  • names the variable containing the estimates of the linear predictor ± i + ² ² x , where i is the corresponding ordered value of _LEVEL_ .

Statistic Options Only for Binary Response

C= name

  • specifies the confidence interval displacement diagnostic that measures the influence of individual observations on the regression estimates.

CBAR= name

  • specifies the another confidence interval displacement diagnostic, which measures the overall change in the global regression estimates due to deleting an individual observation.

DFBETAS= _ALL_

DFBETAS= var-list

  • specifies the standardized differences in the regression estimates for assessing the effects of individual observations on the estimated regression parameters in the fitted model. You can specify a list of up to s + 1 variable names, where s is the number of explanatory variables in the MODEL statement, or you can specify just the keyword _ALL_. In the former specification, the first variable contains the standardized differences in the intercept estimate, the second variable contains the standardized differences in the parameter estimate for the first explanatory variable in the MODEL statement, and so on. In the latter specification, the DFBETAS statistics are named DFBETA_ xxx , where xxx is the name of the regression parameter. For example, if the model contains two variables X1 and X2, the specification DFBETAS=_ALL_ produces three DFBETAS statistics: DFBETA_Intercept, DFBETA_X1, and DFBETA_X2. If an explanatory variable is not included in the final model, the corresponding output variable named in DFBETAS= var-list contains missing values.

DIFCHISQ= name

  • specifies the change in the chi-square goodness-of-fit statistic attributable to deleting the individual observation.

DIFDEV= name

  • specifies the change in the deviance attributable to deleting the individual observation.

H= name

  • specifies the diagonal element of the hat matrix for detecting extreme points in the design space.

RESCHI= name

  • specifies the Pearson (Chi) residual for identifying observations that are poorly accounted for by the model.

RESDEV= name

  • specifies the deviance residual for identifying poorly fitted observations.

Other Options

You can specify the following option after a slash.

ALPHA= ±

  • sets the level of significance ± for 100(1 ˆ’ ± )% confidence limits for the appropriate response probabilities. The value ± must be between 0 and 1. By default, ± is equal to the value of the ALPHA= option in the PROC LOGISTIC statement, or 0.05 if that option is not specified.

Details of the PREDPROBS= Option

You can request any of the three given types of predicted probabilities. For example, you can request both the individual predicted probabilities and the cross validated probabilities by specifying PREDPROBS=(I X).

When you specify the PREDPROBS= option, two automatic variables _FROM_ and _INTO_ are included for the single-trial syntax and only one variable, _INTO_ ,is included for the events/trials syntax. The _FROM_ variable contains the formatted value of the observed response. The variable _INTO_ contains the formatted value of the response level with the largest individual predicted probability.

If you specify PREDPROBS=INDIVIDUAL, the OUTPUT data set contains k additional variables representing the individual probabilities, one for each response level, where k is the maximum number of response levels across all BY-groups. The names of these variables have the form IP_ xxx , where xxx represents the particular level. The representation depends on the following situations.

  • If you specify events/trials syntax, xxx is either ˜Event or ˜Nonevent . Thus, the variable containing the event probabilities is named IP_Event and the variable containing the nonevent probabilities is named IP_Nonevent .

  • If you specify the single-trial syntax with more than one BY group, xxx is 1 for the first ordered level of the response, 2 for the second ordered level of the response, , and so forth, as given in the Response Profile table. The variable containing the predicted probabilities Pr( Y =1) is named IP_1 , where Y is the response variable. Similarly, IP_2 is the name of the variable containing the predicted probabilities Pr( Y =2), and so on.

  • If you specify the single-trial syntax with no BY-group processing, xxx is the left-justified formatted value of the response level (the value may be truncated so that IP_ xxx does not exceed 32 characters.) For example, if Y is the response variable with response levels ˜None , ˜Mild , and ˜Severe , the variables representing individual probabilities Pr( Y = None), P( Y = Mild), and P( Y = Severe) are named IP_None , IP_Mild , and IP_Severe , respectively.

If you specify PREDPROBS=CUMULATIVE, the OUTPUT data set contains k additional variables representing the cumulative probabilities, one for each response level, where k is the maximum number of response levels across all BY-groups. The names of these variables have the form CP_ xxx , where xxx represents the particular response level. The naming convention is similar to that given by PREDPROBS=INDIVIDUAL. The PREDPROBS=CUMULATIVE values are the same as those output by the PREDICT=keyword, but are arranged in variables on each output observation rather than in multiple output observations.

If you specify PREDPROBS=CROSSVALIDATE, the OUTPUT data set contains k additional variables representing the cross validated predicted probabilities of the k response levels, where k is the maximum number of response levels across all BY-groups. The names of these variables have the form XP_ xxx , where xxx represents the particular level. The representation is the same as that given by PREDPROBS=INDIVIDUAL except that for the events/trials syntax there are four variables for the cross validated predicted probabilities instead of two:

  • XP_EVENT_R1E is the cross validated predicted probability of an event when a current event trial is removed.

  • XP_NONEVENT_R1E is the cross validated predicted probability of a nonevent when a current event trial is removed.

  • XP_EVENT_R1N is the cross validated predicted probability of an event when a current nonevent trial is removed.

  • XP_NONEVENT_R1N is the cross validated predicted probability of a nonevent when a current nonevent trial is removed.

The cross validated predicted probabilities are precisely those used in the CTABLE option. See the Predicted Probability of an Event for Classification section on page 2352 for details of the computation.

SCORE Statement

  • SCORE < options > ;

The SCORE statement creates a data set that contains all the data in the DATA= data set together with posterior probabilities and, optionally, prediction confidence intervals. Fit statistics are displayed on request. If you have binary response data, the SCORE statement can be used to create the OUTROC= data set containing data for the ROC curve. You can specify several SCORE statements. FREQ, WEIGHT, and BY statements can be used with the SCORE statements.

See the Scoring Data Sets section on page 2362 for more information, and see Example 42.13 on page 2462 for an illustration of how to use this statement.

You can specify the following options:

ALPHA= ±

  • specifies the significance level ± for 100(1 ˆ’ ± )% confidence intervals. By default, ± is equal to the value of the ALPHA= option in the PROC LOGISTIC statement, or 0 . 05 if that option is not specified. This option has no effect unless the CLM option in the SCORE statement is requested.

CLM

  • outputs the Wald-test-based confidence limits for the predicted probabilities. This option is not available when the INMODEL= data set is created with the NOCOV option.

DATA= SAS-data-set

  • names the SAS data set that you want to score. If you omit the DATA= option in the SCORE statement, then scoring is performed on the DATA= input data set in the PROC LOGISTIC statement, if specified; otherwise, the DATA=_LAST_ data set is used.

  • It is not necessary for the DATA= data set in the SCORE statement to contain the response variable unless you are specifying the FITSTAT or OUTROC= option.

  • Only those variables involved in the fitted model effects are required in the DATA= data set in the SCORE statement. For example, the following code uses forward selection to select effects.

      proc logistic data=Neuralgia outmodel=sasuser.Model;   class Treatment Sex;   model Pain(event='Yes')= TreatmentSex Age   / selection=forward sle=.01;   run;  
  • Suppose Treatment and Age are the effects selected for the final model. You can score a data set which does not contain the variable Sex since the effect Sex is not in the model that the scoring is based on.

      proc logistic inmodel=sasuser.Model;   score data=Neuralgia(drop=Sex);   run;  

FITSTAT

  • displays a table of fit statistics. Four statistics are computed: total frequency, total weight, log likelihood, and misclassification rate.

OUT= SAS-data-set

  • names the SAS data set that contains the predicted information. If you omit the OUT= option, the output data set is created and given a default name using the DATA n convention.

OUTROC= SAS-data-set

  • names the SAS data set that contains the ROC curve for the DATA= data set. The ROC curve is computed only for binary response data. See the section OUTROC= Output Data Set on page 2378 for the list of variables in this data set.

PRIOR= SAS-data-set

  • names the SAS data set that contains the priors of the response categories. The priors may be values proportional to the prior probabilities; thus, they do not necessarily sum to one. This data set should include a variable named _PRIOR_ that contains the prior probabilities. For events/trials MODEL syntax, this data set should also include an _OUTCOME_ variable that contains the values EVENT and NONEVENT; for single-trial MODEL syntax, this data set should include the response variable that contains the unformatted response categories. See Example 42.13 on page 2462 for an example.

PRIOREVENT= value

  • specifies the prior event probability for a binary response model. If both PRIOR= and PRIOREVENT= options are specified, the PRIOR= option takes precedence.

ROCEPS= value

  • specifies the criterion for grouping estimated event probabilities that are close to each other for the ROC curve. In each group, the difference between the largest and the smallest estimated event probability does not exceed the given value. The value must be between 0 and 1; the default value is 1E ˆ’ 4. The smallest estimated probability in each group serves as a cutpoint for predicting an event response. The ROCEPS= option has no effect if the OUTROC= option is not specified.

STRATA Statement

  • STRATA variable < (option) >< variable < (option) > >< / options > ;

The STRATA statement names the variables that define strata or matched sets to use in a stratified conditional logistic regression of binary response data. Observations having the same variable levels are in the same matched set. At least one variable must be specified to invoke the stratified analysis, and the usual unconditional asymptotic analysis is not performed. The stratified logistic model has the form

click to expand

where hi is the event probability for the i th observation in stratum h having covariates x hi , and where the stratum-specific intercepts ± h are the nuisance parameters which are to be conditioned out.

STRATA variables can also be specified in the MODEL statement as classification or continuous covariates; however, the effects are nondegenerate only when crossed with a non-stratification variable. Specifying several STRATA statements is the same as specifying one STRATA statement containing all the strata variables. The STRATA variables can be either character or numeric, and the formatted values of the STRATA variables determine the levels. Thus, you can use also use formats to group values into levels. See the discussion of the FORMAT procedure in the SAS Procedures Guide .

If an EXACT statement is also specified, then a stratified exact conditional logistic regression is performed.

The SCORE and WEIGHT statements are not available with a STRATA statement. The following MODEL options are also not supported with a STRATA statement: CLPARM=PL, CLODDS=PL, CTABLE, LACKFIT, LINK=, NOFIT, OUTMODEL=, OUTROC=, and SCALE=.

The Strata Summary table is displayed by default; it displays the number of strata which have a specific number of events and nonevents. For example, if you are analyzing a 1:5 matched study, this table enables you to verify that every stratum in the analysis has exactly one event and five non-events. Strata containing only events or only non-events are reported in this table, but such strata are uninformative and are not used in the analysis. (Note that you can use the response variable option EVENT= to identify the events; otherwise, the first ordered response category is the event.)

The following option can be specified for a stratification variable by enclosing the option in parentheses after the variable name, or it can be specified globally for all STRATA variables after a slash (/).

MISSING

  • treats missing values (˜. , ˜.A , , ˜.Z for numeric variables and blanks for character variables) as valid STRATA variable values.

The following strata options are also available after the slash.

NOSUMMARY

  • suppresses the display of the Strata Summary table.

INFO

  • displays the Strata Information table, which includes the stratum number, levels of the STRATA variables that define the stratum, the number of events, the number of nonevents, and the total frequency for each stratum. Since the number of strata can be very large, this table is only displayed on request.

TEST Statement

  • < label: > TEST equation1 < , , < equationk >>< / option > ;

The TEST statement tests linear hypotheses about the regression coefficients. The Wald test is used to test jointly the null hypotheses ( H : L = c ) specified in a single TEST statement. When c = you should specify a CONTRAST statement instead.

Each equation specifies a linear hypothesis (a row of the L matrix and the corresponding element of the c vector); multiple equations are separated by commas. The label, which must be a valid SAS name, is used to identify the resulting output and should always be included. You can submit multiple TEST statements.

The form of an equation is as follows:

  • term < ± term ... ><= ± term < ± term ... >>

where term is a parameter of the model, or a constant, or a constant times a parameter. For a binary response model, the intercept parameter is named INTERCEPT; for an ordinal response model, the intercept parameters are named INTERCEPT, INTERCEPT2, INTERCEPT3, and so on. See the Parameter Names in the OUTEST= Data Set section on page 2375 for details on parameter naming conventions. When no equal sign appears, the expression is set to 0. The following code illustrates possible uses of the TEST statement:

  proc logistic;   model y= a1 a2 a3 a4;   test1: test intercept + .5 * a2  = 0;   test2: test intercept + .5 * a2;   test3: test a1=a2=a3;   test4: test a1=a2, a2=a3;   run;  

Note that the first and second TEST statements are equivalent, as are the third and fourth TEST statements.

You can specify the following option in the TEST statement after a slash(/).

PRINT

  • displays intermediate calculations in the testing of the null hypothesis H : L = c . This includes L ( ) L ² bordered by ( L ˆ’ c ) and [ L ( ) L ² ] ˆ’ 1 bordered by [ L ( ) L ² ] ˆ’ 1 ( L ˆ’ c ), where is the maximum likelihood estimator of and ( ) is the estimated covariance matrix of .

  • For more information, see the Testing Linear Hypotheses about the Regression Coefficients section on page 2358.

UNITS Statement

  • UNITS independent1 = list1 < independentk = listk >< / option > ;

The UNITS statement enables you to specify units of change for the continuous explanatory variables so that customized odds ratios can be estimated. An estimate of the corresponding odds ratio is produced for each unit of change specified for an explanatory variable. The UNITS statement is ignored for CLASS variables. If the CLODDS= option is specified in the MODEL statement, the corresponding confidence limits for the odds ratios are also displayed.

The term independent is the name of an explanatory variable and list represents a list of units of change, separated by spaces, that are of interest for that variable. Each unit of change in a list has one of the following forms:

  • number

  • SD or ˆ’ SD

  • number * SD

where number is any nonzero number, and SD is the sample standard deviation of the corresponding independent variable. For example, X = ˆ’ 2 requests an odds ratio that represents the change in the odds when the variable X is decreased by two units. X = 2*SD requests an estimate of the change in the odds when X is increased by two sample standard deviations.

You can specify the following option in the UNITS statement after a slash(/).

DEFAULT= list

  • gives a list of units of change for all explanatory variables that are not specified in the UNITS statement. Each unit of change can be in any of the forms described previously. If the DEFAULT= option is not specified, PROC LOGISTIC does not produce customized odds ratio estimates for any explanatory variable that is not listed in the UNITS statement.

  • For more information, see the Odds Ratio Estimation section on page 2347.

WEIGHT Statement

  • WEIGHT variable < / option > ;

When a WEIGHT statement appears, each observation in the input data set is weighted by the value of the WEIGHT variable. The values of the WEIGHT variable can be nonintegral and are not truncated. Observations with negative, zero, or missing values for the WEIGHT variable are not used in the model fitting. When the WEIGHT statement is not specified, each observation is assigned a weight of 1.

If a SCORE statement is specified, then the WEIGHT variable is used for computing fit statistics and the ROC curve, but it is not required for scoring. If the DATA= data set in the SCORE statement does not contain the WEIGHT variable, the weights are assumed to be 1 and a warning message is issued in the LOG. If you fit a model and perform the scoring in the same run, the same WEIGHT variable is used for fitting and scoring. If you fit a model in a previous run and input it with the INMODEL= option in the current run, then the WEIGHT variable can be different from the one used in the previous run; however, if a WEIGHT variable was not specified in the previous run you can still specify a WEIGHT variable in the current run.

The following option can be added to the WEIGHT statement after a slash (/).

NORMALIZE

NORM

  • causes the weights specified by the WEIGHT variable to be normalized so that they add up to the actual sample size. With this option, the estimated covariance matrix of the parameter estimators is invariant to the scale of the WEIGHT variable.




SAS.STAT 9.1 Users Guide (Vol. 4)
SAS.STAT 9.1 Users Guide (Vol. 4)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net