Details | SAS.STAT 9.1 Users Guide (Vol. 6)

Missing Values

Any observation with missing values for the response, offset, or explanatory variables is excluded from the analysis. The estimated linear predictor , its standard error estimate, the fitted probabilities, and their confidence limits are not computed for any observation with missing offset or explanatory variable values.

An observation is also excluded if it has a missing value for any STRATA or CLUSTER variable, unless the MISSING option is used in the PROC SURVEYLOGISTIC statement.

Missing values in your survey data (such as nonresponse) can compromise the quality of your results. An observation without missing values is called a complete respondent , and an observation with missing values is called an incomplete respondent .

If the missing data are missing at random, then PROC SURVEYLOGISTIC produces unbiased results when it excludes observations with missing values. However, if the complete respondents are different from the incomplete respondents with regard to a survey effect or outcome, then excluding nonrespondents from the analysis may result in biased estimates that do not accurately represent the survey population.

When the missing data are not missing at random, you should use imputation to replace missing values with acceptable values and use sampling weight adjustments to compensate for nonresponse before you use PROC SURVEYLOGISTIC. Refer to Cochran (1977), Kalton and Kaspyzyk (1986), and Brick and Kalton (1996) for more information.

Model Specification

Response Level Ordering

Response level ordering is important because, by default, PROC SURVEYLOGISTIC models the probabilities of response levels with lower Ordered Value. Ordered Values, displayed in the 'Response Profile' table, are assigned to response levels in ascending sorted order. That is, the lowest response level is assigned Ordered Value 1, the next lowest is assigned Ordered Value 2, and so on. For example, if your response variable Y takes values in {1, , D + 1}, then the functions of the response probabilities modeled with the cumulative model are

and for the generalized logit model they are

where the highest Ordered Value Y = D + 1 is the reference level. You can change these default functions by specifying the EVENT=, the REF=, the DESCENDING, or the ORDER= response variable options in the MODEL statement.

For binary response data with event and nonevent categories, the procedure models the function

where is the probability of the response level assigned Ordered Value 1 in the 'Response Profiles' table. Since

the effect of reversing the order of the two response levels is to change the signs of ± and ² in the model logit( ) = ± + ² ² x .

If your event category has a higher Ordered Value than the nonevent category, the procedure models the nonevent probability. You can use response variable options to model the event probability. For example, suppose the binary response variable Y takes the values 1 and 0 for event and nonevent, respectively, and Exposure is the explanatory variable. By default, the procedure assigns Ordered Value 1 to response level Y =0, and Ordered Value 2 to response level Y =1. Therefore, the procedure models the probability of the nonevent (Ordered Value=1) category. To model the event probability, you can do the following:

explicitly state which response level is to be modeled using the response variable option EVENT= in the MODEL statement,
```
  model Y(event='1') = Exposure;  
```
specify the response variable option DESCENDING in the MODEL statement,
```
  model Y(descending)=Exposure;  
```
specify the response variable option REF= in the MODEL statement as the nonevent category for the response variable. This option is most useful when you are fitting a generalized logit model.
```
  model Y(ref='0') = Exposure;  
```
assign a format to Y such that the first formatted value (when the formatted values are put in sorted order) corresponds to the event. For this example, Y =1 is assigned formatted value ˜event' and Y =0 is assigned formatted value ˜nonevent'. Since ORDER=FORMATTED by default, Ordered Value 1 is assigned to response level Y =1 so the procedure models the event.
```
  proc format;   value Disease 1='event'' 0='nonevent'';   run;   proc surveylogistic;   format Y Disease.;   model Y=Exposure;   run;  
```

CLASS Variable Parameterization

Consider a model with one CLASS variable A with four levels, 1, 2, 5, and 7. Details of the possible choices for the PARAM= option follow.

EFFECT	Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of ˆ’ 1. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows .
	Effect Coding
	A	Design Matrix
	A	A1	A2	A5
	1	1
	2		1
	5			1
	7	ˆ’ 1	ˆ’ 1	ˆ’ 1
	Parameter estimates of CLASS main effects using the effect coding scheme estimate the difference in the effect of each nonreference level compared to the average effect over all four levels.

GLM	As in PROC GLM, four columns are created to indicate group membership. The design matrix columns for A are as follows.
	GLM Coding
	A	Design Matrix
	A	A1	A2	A5	A7
	1	1
	2		1
	5			1
	7				1
	Parameter estimates of CLASS main effects using the GLM coding scheme estimate the difference in the effects of each level compared to the last level.

ORDINAL	Three columns are created to indicate group membership of the higher levels of the effect. For the first level of the effect (which for A is 1), all three dummy variables have a value of 0. The design matrix columns for A are as follows.
	Ordinal Coding
	A	Design Matrix
	A	A2	A5	A7
	1
	2	1
	5	1	1
	7	1	1	1
	The first level of the effect is a control or baseline level. Parameter estimates of CLASS main effects using the ORDINAL coding scheme estimate the effect on the response as the ordinal factor is set to each succeeding level. When the parameters for an ordinal main effect have the same sign, the response effect is monotonic across the levels.

POLYNOMIAL POLY	Three columns are created. The first represents the linear term ( x ), the second represents the quadratic term ( x ² ), and the third represents the cubic term ( x ³ ), where x is the level value. If the CLASS levels are not numeric, they are translated into 1, 2, 3, according to their sorting order. The design matrix columns for A are as follows.
	Polynomial Coding
	A	Design Matrix
	A	APOLY1	APOLY2	APOLY3
	1	1	1	1
	2	2	4	8
	5	5	25	125
	7	7	49	343

REFERENCE REF	Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of 0. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.
	Reference Coding
	A	Design Matrix
	A	A1	A2	A5
	1	1
	2		1
	5			1
	7
	Parameter estimates of CLASS main effects using the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.

ORTHEFFECT	The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=EFFECT. The design matrix columns for A are as follows.
	Orthogonal Effect Coding
	A	Design Matrix
	A	AOEFF1	AOEFF2	AOEFF3
	1	1 . 41421	ˆ’ . 81650	ˆ’ . 57735
	2	. 00000	1 . 63299	ˆ’ . 57735
	5	. 00000	. 00000	1.73205
	7	ˆ’ 1 . 41421	ˆ’ . 81649	ˆ’ . 57735

ORTHORDINAL ORTHOTHERM	The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=ORDINAL. The design matrix columns for A are as follows.
	Orthogonal Ordinal Coding
	A	Design Matrix
	A	AOORD1	AOORD2	AOORD3
	1	ˆ’ 1 . 73205	. 00000	. 00000
	2	. 57735	ˆ’ 1 . 63299	. 00000
	5	. 57735	. 81650	ˆ’ 1 . 41421
	7	. 57735	. 81650	1 . 41421

ORTHPOLY	The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=POLY. The design matrix columns for A are as follows.
	Orthogonal Polynomial Coding
	A	Design Matrix
	A	AOPOLY1	AOPOLY2	AOPOLY5
	1	ˆ’ 1 . 153	. 907	ˆ’ . 921
	2	ˆ’ . 734	ˆ’ . 540	1 . 473
	5	. 524	ˆ’ 1 . 370	ˆ’ . 921
	7	1 . 363	1 . 004	. 368

ORTHREF	The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=REFERENCE. The design matrix columns for A are as follows.
	Orthogonal Reference Coding
	A	Design Matrix
	A	AOREF1	AOREF2	AOREF3
	1	1 . 73205	. 00000	. 00000
	2	ˆ’ . 57735	1 . 63299	. 00000
	5	ˆ’ . 57735	ˆ’ . 81650	1 . 41421
	7	ˆ’ . 57735	ˆ’ . 81650	ˆ’ 1 . 41421

Link Functions and the Corresponding Distributions

Four link functions are available in the SURVEYLOGISTIC procedure. The logit function is the default. To specify a different link function, use the LINK= option in the MODEL statement. The link functions and the corresponding distributions are as follows:

The logit function

is the inverse of the cumulative logistic distribution function, which is
The probit (or normit) function

is the inverse of the cumulative standard normal distribution function, which is

Traditionally, the probit function includes an additive constant 5, but throughout PROC SURVEYLOGISTIC, the terms probit and normit are used interchangeably, defined as g ( p ) above.
The complementary log-log function

is the inverse of the cumulative extreme-value function (also called the Gompertz distribution), which is
The generalized logit function extends the binary logit link to a vector of levels ( ₁ , , _k+ ₁ ) by contrasting each level with a fixed level

The variances of the normal, logistic, and extreme-value distributions are not the same. Their respective means and variances are

Distribution	Mean	Variance
Normal		1
Logistic		² / 3
Extreme-value	ˆ’ ³	² / 6

where ³ is the Euler constant. In comparing parameter estimates using different link functions, you need to take into account the different scalings of the corresponding distributions and, for the complementary log-log function, a possible shift in location. For example, if the fitted probabilities are in the neighborhood of 0.1 to 0.9, then the parameter estimates using the logit link function should be about larger than the estimates from the probit link function.

Model Fitting

Determining Observations for Likelihood Contributions

If you use events/trials syntax, each observation is split into two observations. One has the response value 1 with a frequency equal to the frequency of the original observation (which is 1 if the FREQ statement is not used) times the value of the events variable. The other observation has the response value 2 and a frequency equal to the frequency of the original observation times the value of ( trials - events ). These two observations have the same explanatory variable values and the same FREQ and WEIGHT values as the original observation.

For either single-trial or events/trials syntax, let j index all observations. In other words, for single-trial syntax, j indexes the actual observations. And, for events/trials syntax, j indexes the observations after splitting (as described previously). If your data set has 30 observations and you use single-trial syntax, j has values from 1 to 30; if you use events/trials syntax, j has values from 1 to 60.

Suppose the response variable in a cumulative response model can take on the ordered values 1, , k, k +1 where k is an integer ‰ 1. The likelihood for the j th observation with ordered response value y _j and explanatory variables vector ( row vectors) x _j is given by

where F (.) is the logistic, normal, or extreme-value distribution function, ± ₁ , , ± _k are ordered intercept parameters, and ² is the slope parameter vector.

For the generalized logit model, letting the k + 1st level be the reference level, the intercepts ± ₁ , , ± _k are unordered and the slope vector ² _i varies with each logit. The likelihood for the j th observation with ordered response value y _j and explanatory variables vector x _j (row vectors) is given by

Iterative Algorithms for Model-Fitting

Two iterative maximum likelihood algorithms are available in PROC SURVEYLOGISTIC to obtain the maximum likelihood estimate of the model parameter . The default is the Fisher-scoring method, which is equivalent to fitting by iteratively reweighted least squares. The alternative algorithm is the Newton-Raphson method. Both algorithms give the same parameter estimates; The covariance matrix of is estimated in the section 'Variance Estimation for Sample Survey Data' on page 4282. For a generalized logit model, only the Newton-Raphson technique is available. You can use the TECHNIQUE= option to select a fitting algorithm.

Iteratively Reweighted Least-Squares Algorithm (Fisher Scoring)

Let Y be the response variable which takes values 1, , k, k +1 ( k ‰ 1). Let j index all observations and Y _j be the value of response for the j th observation. Consider the multinomial variable Z _j =( Z ₁ _j , ,Z _kj ) ² such that

and With _ij denoting the probability that the j th observation has response value i , the expected value of Z _j is , and . The covariance matrix of Z _j is V _j , which is the covariance matrix of a multinomial random variable for one trial with parameter vector _j . Let be the vector of regression parameters; for example, = ( ± ₁ , , ± _k , ² ² ) ² for cumulative logit model. Let D _j be the matrix of partial derivatives of _j with respect to . The estimating equation for the regression parameters is

where , w _j and f _j are the WEIGHT and FREQ values of the j th observation.

With a starting value of ⁽⁰⁾ , the maximum likelihood estimate of is obtained iteratively as

where D _j , W _j , and _j are evaluated at the i th iteration ⁽ ⁱ ⁾ . The expression after the plus sign is the step size . If the log-likelihood evaluated at ⁽ ⁱ ⁺¹⁾ is less than that evaluated at ⁽ ⁱ ⁾ , then ⁽ ⁱ ⁺¹⁾ is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained, that is, until ⁽ ⁱ ⁺¹⁾ is sufficiently close to ⁽ ⁱ ⁾ . Then the maximum likelihood estimate of is = ⁽ ⁱ ⁺¹⁾ .

By default, starting values are zero for the slope parameters, and for the intercept parameters, starting values are the observed cumulative logits (that is, logits of the observed cumulative proportions of response). Alternatively, the starting values may be specified with the INEST= option.

Newton-Raphson Algorithm

Let
be the gradient vector and the Hessian matrix, where l _j = log L _j is the log likelihood for the j th observation. With a starting value of ⁽⁰⁾ , the maximum likelihood estimate of is obtained iteratively until convergence is obtained:
where H and g are evaluated at the i th iteration ⁽ ⁱ ⁾ . If the log likelihood evaluated at ⁽ ⁱ ⁺¹⁾ is less than that evaluated at ⁽ ⁱ , then ⁽ ⁱ ⁺¹⁾ is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained, that is, until ⁽ ⁱ ⁺¹⁾ is sufficiently close to ⁽ ⁱ ⁾ . Then the maximum likelihood estimate of is = ⁽ ⁱ ⁺¹⁾ .

Convergence Criteria

Four convergence criteria are allowed, namely, ABSFCONV=, FCONV=, GCONV=, and XCONV=. If you specify more than one convergence criterion, the optimization is terminated as soon as one of the criteria is satisfied. If none of the criteria is specified, the default is GCONV=1E ˆ’ 8.

Existence of Maximum Likelihood Estimates

The likelihood equation for a logistic regression model does not always have a finite solution. Sometimes there is a nonunique maximum on the boundary of the parameter space, at infinity. The existence, finiteness, and uniqueness of maximum likelihood estimates for the logistic regression model depend on the patterns of data points in the observation space (Albert and Anderson 1984; Santner and Duffy 1986).

Consider a binary response model. Let Y _j be the response of the i th subject and let x _j be the vector of explanatory variables (including the constant 1 associated with the intercept). There are three mutually exclusive and exhaustive types of data configurations: complete separation, quasi-complete separation, and overlap.

Complete Separation

There is a complete separation of data points if there exists a vector b that correctly allocates all observations to their response groups; that is,

This configuration gives nonunique infinite estimates. If the iterative process of maximizing the likelihood function is allowed to continue, the log likelihood diminishes to zero, and the dispersion matrix becomes unbounded.

Quasi-Complete Separation

The data are not completely separable but there is a vector b such that

and equality holds for at least one subject in each response group. This configuration also yields nonunique infinite estimates. If the iterative process of maximizing the likelihood function is allowed to continue, the dispersion matrix becomes unbounded and the log likelihood diminishes to a nonzero constant.

Overlap

If neither complete nor quasi-complete separation exists in the sample points, there is an overlap of sample points. In this configuration, the maximum likelihood estimates exist and are unique.

Complete separation and quasi-complete separation are problems typically encountered with small data sets. Although complete separation can occur with any type of data, quasi-complete separation is not likely with truly continuous explanatory variables.

The SURVEYLOGISTIC procedure uses a simple empirical approach to recognize thedataconfigurations that lead to infinite parameter estimates. The basis of this approach is that any convergence method of maximizing the log likelihood must yield a solution giving complete separation, if such a solution exists. In maximizing the log likelihood, there is no checking for complete or quasi-complete separation if convergence is attained in eight or fewer iterations. Subsequent to the eighth iteration, the probability of the observed response is computed for each observation. If the probability of the observed response is one for all observations, there is a complete separation of data points and the iteration process is stopped . If the complete separation of data has not been determined and an observation is identified to have an extremely large probability ( ‰ 0.95) of the observed response, there are two possible situations. First, there is overlap in the data set, and the observation is an atypical observation of its own group. The iterative process, if allowed to continue, will stop when a maximum is reached. Second, there is quasi-complete separation in the data set, and the asymptotic dispersion matrix is unbounded. If any of the diagonal elements of the dispersion matrix for the standardized observations vectors (all explanatory variables standardized to zero mean and unit variance) exceeds 5,000, quasi-complete separation is declared and the iterative process is stopped. If either complete separation or quasi-complete separation is detected , a warning message is displayed in the procedure output.

Checking for quasi-complete separation is less foolproof than checking for complete separation. The NOCHECK option in the MODEL statement turns off the process of checking for infinite parameter estimates. In cases of complete or quasi-complete separation, turning off the checking process typically results in the procedure failing to converge.

Model Fitting Statistics

Suppose the model contains s explanatory effects. For the j th observation, let _j be the estimated probability of the observed response. The three criteria displayed by the SURVEYLOGISTIC procedure are calculated as follows:

ˆ’ 2 Log Likelihood:

where w _j and f _j are the weight and frequency values of the j th observation. For binary response models using events/trials syntax, this is equivalent to

where r _j is the number of events, n _j is the number of trials, and _j is the estimated event probability.
Akaike Information Criterion:

where p is the number of parameters in the model. For cumulative response models, p = k + s where k is the total number of response levels minus one, and s is the number of explanatory effects. For the generalized logit model, p = k ( s + 1).
Schwarz Criterion:

where p is as defined previously.

The ˆ’ 2 Log Likelihood statistic has a chi-square distribution under the null hypothesis (that all the explanatory effects in the model are zero) and the procedure produces a p -value for this statistic. The AIC and SC statistics give two different ways of adjusting the ˆ’ 2 Log Likelihood statistic for the number of terms in the model and the number of observations used.

Generalized Coefficient of Determination

Cox and Snell (1989, pp. 208-209) propose the following generalization of the coefficient of determination to a more general linear model:

where L ( ) is the likelihood of the intercept-only model, L ( ) is the likelihood of the specified model, and n is the sample size. The quantity R ² achieves a maximum of less than 1 for discrete models, where the maximum is given by

Nagelkerke (1991) proposes the following adjusted coefficient, which can achieve a maximum value of 1:

Properties and interpretation of R ² and ² are provided in Nagelkerke (1991). In the 'Testing Global Null Hypothesis: BETA=0' table, R ² is labeled as 'RSquare' and ² is labeled as 'Max-rescaled RSquare.' Use the RSQUARE option to request R ² and ² .

INEST= Data Set

You can specify starting values for the iterative algorithm in the INEST= data set.

The INEST= data set contains one observation for each BY group. The INEST= data set must contain the intercept variables (named Intercept for binary response models and Intercept, Intercept2, Intercept3, and so forth, for ordinal response models) and all explanatory variables in the MODEL statement. If BY processing is used, the INEST= data set should also include the BY variables, and there must be one observation for each BY group. If the INEST= data set also contains the _TYPE_ variable, only observations with _TYPE_ value 'PARMS' are used as starting values.

Survey Design Information

Specification of Population Totals and Sampling Rates

Variance estimates in survey samples involve a finite population correction ( fpc ) for sampling without replacement. For small sampling fractions or sampling with replacement, it is appropriate to ignore this correction (Cochran 1977; Kish 1965), and PROC SURVEYLOGISTIC does so by default. If your analysis requires an fpc , specify the sampling fraction using either the RATE= option or the TOTAL= option on the PROC SURVEYLOGISTIC statement. If you do not specify one of these options, the procedure does not use the fpc when computing variance estimates.

If your design has multiple stages of selection and you are specifying the RATE= option, you should input the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the study population. If you are using the TOTAL= option for a multistage design, you should specify the total number of PSUs in the study population. See the section 'Primary Sampling Units (PSUs)' on page 4281 for more details.

For a nonstratified sample design, or for a stratified sample design for which all the strata have the same sampling rate or population total, you should specify the rate or total as the value of the RATE= value option or the TOTAL= value option. If your sample design is stratified with different sampling rates or population totals in the strata, then you should use the RATE = SAS-data-set option or the TOTAL = SAS-data set option to name a SAS data set that contains the stratum sampling rates or totals. This data set is called a secondary data set , as opposed to the primary data set that you specify with the DATA= option.

The secondary data set must contain all the stratification variables listed in the STRATA statement, as well as all the variables in the BY statement if any of these statements are specified. If there are formats associated with the STRATA variables and the BY variables, then the formats must be consistent in the primary and the secondary data sets. If you specify the TOTAL = SAS-data-set option, the secondary data set must have a variable named _TOTAL_ that contains the stratum population totals. If you specify the RATE = SAS-data-set option, the secondary data set must have a variable named _RATE_ that contains the stratum sampling rates. If the secondary data set contains more than one observation for any one stratum, then the procedure uses the first value of _TOTAL_ or _RATE_ for that stratum and ignores the rest.

The value in the RATE= option or the values of _RATE_ in the secondary data set must be positive numbers . You can specifyvalue as a number between 0 and 1; or you can specify value in percentage form as a number between 1 and 100, in which case PROC SURVEYLOGISTIC will convert that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

If you specify the TOTAL= value option, this value must not be less than the sample size. If you provide stratum population totals in a secondary data set, these values must not be less than the corresponding stratum sample sizes.

Primary Sampling Units (PSUs)

When you have clusters, or primary sampling units (PSUs), in your sample design, the procedure estimates the variance based on the variation among PSUs. Use the CLUSTER statement to identify the first-stage clusters in your design. PROC SURVEYLOGISTIC assumes that each cluster represents a PSU in the sample and that each observation is an element of a PSU. If you do not specify a CLUSTER statement, the procedure treats each observation as a PSU.

Variance Estimation for Sample Survey Data

Due to the variability of characteristics among items in the population, researchers apply scientific sample designs in the sample selection process to reduce the risk of a distorted view of the population, and they make inferences about the population based on the information from the sample survey data. In order to make statistically valid inferences for the population, they must incorporate the sample design in the data analysis.

The SURVEYLOGISTIC procedure fits linear logistic regression models for discrete response survey data using the maximum likelihood method. In the variance estimation, the procedure incorporates complex survey sample designs, including designs with stratification, clustering, and unequal weighting .

The procedure uses the Taylor expansion method to estimate sampling errors of estimators based on complex sample designs. This method obtains a linear approximation for the estimator and then uses the variance estimate for this approximation to estimate the variance of the estimate itself. See Binder (1981, 1983), Roberts, Rao, and Kumar (1987), Skinner, Holt, and Smith (1989), Morel (1989), and Lehtonen and Pahkinen (1995) for papers that describe logistic regression for sample survey data. When there are clusters, or primary sampling units (PSUs), in the sample design, the procedure estimates variance from the variation among PSUs. When the design is stratified, the procedure pools stratum variance estimates to compute the overall variance estimate. For t tests of the estimates, the degrees of freedom equals the number of clusters minus the number of strata in the sample design. Statistical analyses, such as hypothesis tests and confident limits, will depend on these variance estimates.

Notation

Let Y be the response variable with categories 1, 2, , D, D + 1. The p covariates are denoted by a p -dimension row vector x .

For a stratified clustered sample design, each observation is represented by a row vector,

where

h = 1, 2, , H is the stratum number with a total of H strata
i =1 , 2 , ,n _h is the cluster number within stratum h , with a total of n _h clusters
j = 1, 2, , m _hi is the unit number within cluster i of stratum h , with a total of m _hi units
w _hij denotes the sampling weight
y _hij is a D -dimensional column vector whose elements are indicator variables for the first D categories for variable Y . If the response of the j th member of the i th cluster in stratum h falls in category d , the d th row of the vector is one, and the remaining elements of the vector are zero, where d = 1, 2, , D
y _hij ₍ _D ₊₁₎ is the indicator variable for the ( D + 1) category of variable Y
x _hij denotes the k -dimensional row vector of explanatory variables for the j th member of the i th cluster in stratum h . If there is an intercept, then x _hij ₁ ‰ 1.
is the total number of clusters in the entire sample
is the total sample size

The following notations are also used in the following sections:

f _h denotes the sampling rate for stratum h
_hij is the expected vector of the response variable

Note that _hij ⁽ ^D ⁺¹⁾ = 1 ˆ’ 1 ² _hij where 1 is a D -dimensional column vector whose elements are 1.

Likelihood Function

Let f ( ·) be a link function such that

where is a p -dimensional column vector for regression coefficients. The pseudo log likelihood is

Denote the maximum likelihood estimator as , which is a solution to the estimating equations:

where D _hij is the matrix of partial derivatives of the link function f with respect to .

To obtain the maximum likelihood estimator , the procedure uses iterations with a starting value ⁽⁰⁾ for . See the section 'Iterative Algorithms for Model-Fitting' on page 4275 for detail.

Generalized Logistic Model

Formulation of the generalized logit models for nominal response variables can be found in Agresti (1990). Without loss of generality, let the last category, D + 1, be the reference category for the response variable Y . The link function for the generalized logistic model is defined as

and the model parameters are:

for d = 1, 2, , D .

Cumulative Logit Model

Details of the cumulative logit model (or proportional odds model) can be found in McCullagh and Nelder (1989). Denote the cumulative sum of the expected proportions for the first d categories of variable Y by

for d = 1, 2, , D. Then the link function for the proportional odds model is

with the model parameters:

Complementary log-log Model

Use the notations in the previous section, the link function for the complementary log-log is

with the model parameters:

Probit Model

Another commonly used model for ordinal responses is the probit model with the link function

where

is the cumulative distribution function of the standard normal distribution. The model parameters are:

Estimated Variances

Using Taylor approximation, the estimated covariance matrix of is

where

If you use the Newton-Raphson algorithm by using the TECHNIQUE=NEWTON option in the MODEL statement, the matrix is replaced by the negative (expected) Hessian matrix,

The matrices of partial derivatives _hij and the response probabilities _hij are evaluated at .

Adjustments to the Variance Estimation

The factor ( n ˆ’ 1) / ( n ˆ’ p ) in the computation of the matrix should reduce the small sample bias associated with using the estimated function to calculate deviations ( Morel 1989; Hidiroglou, Fuller, and Hickman 1980). For simple random sampling, this factor contributes to the degrees of freedom correction applied to the residual mean square for ordinary least squares in which p parameter are estimated. By default, the procedure will use this adjustment in variance estimation. It is equivalent to specify the VADJUST=DF option in the MODEL statement. If you do not wish to use this multiplier in the variance estimation, you can specify the VADJUST=NONE option in the MODEL statement to suppress this factor.

In addition, you can specify the VADJUST=MOREL option to compute a further adjustment to the variance estimator for the regression coefficients , introduced by Morel (1989):

where for given nonnegative constants and ,

The adjustment » ^{ˆ’ 1} will

reduce the small sample bias reflected in inflated Type 1 error rates
guarantee a positive definite estimated covariance matrix provided that ^{ˆ’ 1} exists
be close to zero when the sample size becomes large

In this adjustment, is an estimate of the design effect, which has been bounded below by the positive constant . You can use DEFFBOUND= in the VADJUST=MOREL option in the MODEL statement to specify this lower bound; by default, the procedure uses = 1. The factor » converges to zero when the sample size becomes large, and » has an upper bound . You can use ADJBOUND= in the VADJUST=MOREL option in the MODEL statement to specify this upper bound; by default, the procedure uses = 0 . 5.

Hypothesis Testing and Estimation

Score Statistics and Tests

To understand the general form of the score statistics, let g ( ) be the vector of first partial derivatives of the log likelihood with respect to the parameter vector , and let g ( ) be the matrix of second partial derivatives of the log likelihood with respect to . That is, g ( ) is the gradient vector, and H ( ) is the Hessian matrix. Let I ( ) be either ˆ’ H ( ) or the expected value of ˆ’ H ( ). Consider a null hypothesis H . Let be the MLE of under H . The chi-square score statistic for testing H is defined by

and it has an asymptotic ² distribution with r degrees of freedom under H , where r is the number of restrictions imposed on by H .

Testing the Parallel Lines Assumption

For an ordinal response, PROC SURVEYLOGISTIC performs a test of the parallel lines assumption. In the displayed output, this test is labeled 'Score Test for the Equal Slopes Assumption' when the LINK= option is NORMIT or CLOGLOG. When LINK=LOGIT, the test is labeled as 'Score Test for the Proportional Odds Assumption' in the output. This section describes the methods used to calculate the test.

For this test the number of response levels, D + 1, is assumed to be strictly greater than 2. Let Y be the response variable taking values 1, , D, D + 1. Suppose there are k explanatory variables. Consider the general cumulative model without making the parallel lines assumption

where g ( ·) is the link function, and _d = ( ± _d , ² _d ₁ , , ² _dk ) ² is a vector of unknown parameters consisting of an intercept ± _d and k slope parameters ² _k ₁ , , ² _kd . The parameter vector for this general cumulative model is

Under the null hypothesis of parallelism H : ² ₁ _i = ² ₂ _i = = ² _Di , 1 ‰ i ‰ k , there is a single common slope parameter for each of the s explanatory variables. Let ² ₁ , , ² _k be the common slope parameters. Let ₁ , , _D and ₁ , , _D be the MLEs of the intercept parameters and the common slope parameters. Then, under H , the MLE of is

and the chi-squared score statistic g ² ( ) I ^{ˆ’ 1} ( ) g ( ) has an asymptotic chi-square distribution with k ( D ˆ’ 1) degrees of freedom. This tests the parallel lines assumption by testing the equality of separate slope parameters simultaneously for all explanatory variables.

Wald Confidence Intervals for Parameters

Wald confidence intervals are sometimes called the normal confidence intervals. They are based on the asymptotic normality of the parameter estimators. The 100(1 ˆ’ ± )% Wald confidence interval for _j is given by

where z _p is the 100 p th percentile of the standard normal distribution, _j is the maximum likelihood estimate of _j , and _j is the standard error estimate of _j in the section 'Variance Estimation for Sample Survey Data' on page 4282.

Testing Linear Hypotheses about the Regression Coefficients

Linear hypotheses for are expressed in matrix form as

where L is a matrix of coefficients for the linear hypotheses, and c is a vector of constants. The vector of regression coefficients includes slope parameters as well as intercept parameters. The Wald chi-square statistic for testing H is computed as

where ( ) is the estimated covariance matrix in the section 'Variance Estimation for Sample Survey Data' on page 4282. Under H , has an asymptotic chi-square distribution with r degrees of freedom, where r is the rank of L .

Odds Ratio Estimation

Consider a dichotomous response variable with outcomes event and nonevent . Consider a dichotomous risk factor variable X that takes the value 1 if the risk factor is present and 0 if the risk factor is absent. According to the logistic model, the log odds function, g ( X ), is given by

The odds ratio ˆ is defined as the ratio of the odds for those with the risk factor ( X = 1) to the odds for those without the risk factor ( X = 0). The log of the odds ratio is given by

The parameter, ² ₁ , associated with X represents the change in the log odds from X = 0 to X = 1. So, the odds ratio is obtained by simply exponentiating the value of the parameter associated with the risk factor. The odds ratio indicates how the odds of event change as you change X from 0 to 1. For instance, ˆ = 2 means that the odds of an event when X = 1 are twice the odds of an event when X = 0.

Suppose the values of the dichotomous risk factor are coded as constants a and b instead of 0 and 1. The odds when X = a become exp( ² + a ² ₁ ), and the odds when X = b become exp( ² + b ² ₁ ). The odds ratio corresponding to an increase in X from a to b is

Note that for any a and b such that c = b ˆ’ a = 1 , ˆ = exp( ² ₁ ). So the odds ratio can be interpreted as the change in the odds for any increase of one unit in the corresponding risk factor. However, the change in odds for some amount other than one unit is often of greater interest. For example, a change of one pound in body weight may be too small to be considered important, while a change of 10 pounds may be more meaningful. The odds ratio for a change in X from a to b is estimated by raising the odds ratio estimate for a unit change in X to the power of c = b ˆ’ a as shown previously.

For a polytomous risk factor, the computation of odds ratios depends on how the risk factor is parameterized. For illustration, suppose that Race is a risk factor with four categories: White, Black, Hispanic, and Other.

For the effect parameterization scheme (PARAM=EFFECT) with White as the reference group, the design variables for Race are as follows.

	Design Variables
Race	X ₁	X ₂	X ₃
Black	1
Hispanic		1
Other			1
White	ˆ’ 1	ˆ’ 1	ˆ’ 1

The log odds for Black is

The log odds for White is

Therefore, the log odds ratio of Black versus White becomes

For the reference cell parameterization scheme (PARAM=REF) with White as the reference cell , the design variables for race are as follows.

	Design Variables
Race	X ₁	X ₂	X ₃
Black	1
Hispanic		1
Other			1
White

The log odds ratio of Black versus White is given by

For the GLM parameterization scheme (PARAM=GLM), the design variables are as follows.

	Design Variables
Race	X ₁	X ₂	X ₃	X ₄
Black	1
Hispanic		1
Other			1
White				1

The log odds ratio of Black versus White is

Consider the hypothetical example of heart disease among race in Hosmer and Lemeshow (2000, p. 51). The entries in the following contingency table represent counts.

	Race
Disease Status	White	Black	Hispanic	Other
Present	5	20	15	10
Absent	20	10	10	10

The computation of odds ratio of Black versus White for various parameterization schemes is tabulated in the following table.

Odds Ratio of Heart Disease Comparing Black to White
	Parameter Estimates
PARAM	₁	₂	₃	₄	Odds Ratio Estimation
EFFECT	0.7651	0.4774	0.0719		exp(2 — 0 . 7651 + 0 . 4774 + 0 . 0719) = 8
REF	2.0794	1.7917	1.3863		exp(2 . 0794) = 8
GLM	2.0794	1.7917	1.3863	0.0000	exp(2 . 0794) = 8

Since the log odds ratio (log( ˆ )) is a linear function of the parameters, the Wald confidence interval for log( ˆ ) can be derived from the parameter estimates and the estimated covariance matrix. Confidence intervals for the odds ratios are obtained by exponentiating the corresponding confidence intervals for the log odd ratios. In the displayed output of PROC SURVEYLOGISTIC, the 'Odds Ratio Estimates' table contains the odds ratio estimates and the corresponding 95% Wald confidence intervals computed using the covariance matrix in the section 'Variance Estimation for Sample Survey Data' on page 4282. For continuous explanatory variables, these odds ratios correspond to a unit increase in the risk factors.

To customize odds ratios for specific units of change for a continuous risk factor, you can use the UNITS statement to specify a list of relevant units for each explanatory variable in the model. Estimates of these customized odds ratios are given in a separate table. Let ( L _j ,U _j ) beaconfidence interval for log( ˆ ). The corresponding lower and upper confidence limits for the customized odds ratio exp( c ² _j ) are exp( cL _j ) and exp( cU _j ), respectively (for c > 0), or exp( cU _j ) and exp( cL _j ), respectively (for c < 0). You use the CLODDS= option to request the confidence intervals for the odds ratios.

For a generalized logit model, odds ratios are computed similarly, except D odds ratios are computed for each effect, corresponding to the D logits in the model.

Rank Correlation of Observed Responses and Predicted Probabilities

The predicted mean score of an observation is the sum of the Ordered Values (shown in the Response Profile table) minus one, weighted by the corresponding predicted probabilities for that observation; that is, the predicted means score= ( d ˆ’ 1) _d , where D + 1 is the number of response levels and d is the predicted probability of the d th (ordered) response.

A pair of observations with different observed responses is said to be concordant if the observation with the lower ordered response value has a lower predicted mean score than the observation with the higher ordered response value. If the observation with the lower ordered response value has a higher predicted mean score than the observation with the higher ordered response value, then the pair is discordant . If the pair is neither concordant nor discordant, it is a tie . Enumeration of the total numbers of concordant and discordant pairs is carried out by categorizing the predicted mean score into intervals of length D/ 500 and accumulating the corresponding frequencies of observations.

Let N be the sum of observation frequencies in the data. Suppose there is a total of t pairs with different responses, n _c of them are concordant, n _d of them are discordant, and t ˆ’ n _c ˆ’ n _d of them are tied. PROC SURVEYLOGISTIC computes the following four indices of rank correlation for assessing the predictive ability of a model:

Note that c also gives an estimate of the area under the receiver operating characteristic (ROC) curve when the response is binary (Hanley and McNeil 1982).

For binary responses, the predicted mean score is equal to the predicted probability for Ordered Value 2. As such, the preceding definition of concordance is consistent with the definition used in previous releases for the binary response model.

Output

Displayed Output

The displayed output of the SURVEYLOGISTIC procedure includes the following:

name of the input Data Set
name and label of the Response Variable if the single-trial syntax is used
number of Response Levels
name of the Events Variable if the events/trials syntax is used
name of the Trials Variable if the events/trials syntax is used
Number of Observations read from the input data set
Number of Observations used in the analysis
name of the Frequency Variable if the FREQ statement is specified
Sum of Frequencies of all the observations read from the input data set
Sum of Frequencies of all the observations used in the analysis
name of the Weight Variable if the WEIGHT statement is specified
Sum of Weights of all the observations read from the input data set
Sum of Weights of all the observations used in the analysis
name of the Offset Variable if the OFFSET= option is specified
name(s) of the stratification variable(s) if the STRATA statement is specified
total number of strata if the STRATA statement is specified
name(s) of the cluster variable(s) if the CLUSTER statement is specified
total number of clusters if the CLUSTER statement is specified
Sum of Weights of all the observations used in the analysis
Link Function
variance adjustment method
parameters used in the VADJUST=MOREL option if this option is specified
'Response Profile' table, which gives, for each response level, the ordered value (an integer between one and the number of response levels, inclusive); the value of the response variable if the single-trial syntax is used or the values 'EVENT' and 'NO EVENT' if the events/trials syntax is used; the count or frequency; and the sum of weights if the WEIGHT statement is specified
'Class Level Information' table, which gives the level and the design variables for each CLASS explanatory variable
'Maximum Likelihood Iterative Phase' table, which gives the iteration number, the step size (in the scale of 1.0, .5, .25, and so on) or the ridge value, ˆ’ 2 log likelihood, and parameter estimates for each iteration. Also displayed are the last evaluation of the gradient vector and the last change in the ˆ’ 2 log likelihood. You need to use the ITPRINT option in the MODEL statement to obtain this table
score test result for testing the parallel lines assumption, if an ordinal response model is fitted. If LINK=CLOGLOG or LINK=PROBIT, this test is labeled 'Score Test for the Parallel Slopes Assumption.' The proportion odds assumption is a special case of the parallel lines assumption when LINK=LOGIT. In this case, the test is labeled 'Score Test for the Proportional Odds Assumption'
'Model Fit Statistics' and 'Testing Global Null Hypothesis: BETA=0' tables, which give the various criteria ( ˆ’ 2 Log L, AIC, SC) based on the likelihood for fitting a model with intercepts only and for fitting a model with intercepts and explanatory variables. If you specify the NOINT option, these statistics are calculated without considering the intercept parameters. The third column of the table gives the chi-square statistics and p -values for the ˆ’ 2 Log L statistic and for the Score statistic. These test the joint effect of the explanatory variables included in the model. The Score criterion is always missing for the models identified by the first two columns of the table. Note also that the first two rows of the Chi-Square column are always missing, since tests cannot be performed for AIC and SC
generalized R ² measures for the fitted model if you specify the RSQUARE option in the MODEL statement
'Type III Analysis of Effects' table if the model contains an effect involving a CLASS variable. This table gives the Wald Chi-square statistic, the degrees of freedom, and the p -value for each effect in the model
'Analysis of Maximum Likelihood Estimates' table, which includes
- maximum likelihood estimate of the parameter
- estimated standard error of the parameter estimate, computed as the square root of the corresponding diagonal element of the estimated covariance matrix
- Wald chi-square statistic, computed by squaring the ratio of the parameter estimate divided by its standard error estimate
- p -value of the Wald chi-square statistic with respect to a chi-square distribution with one degree of freedom
- standardized estimate for the slope parameter, given by _i / ( s/s _i ), where s _i is the total sample standard deviation for the i th explanatory variable and
  
  You need to specify the STB option in the MODEL statement to obtain these estimates. Standardized estimates of the intercept parameters are set to missing.
- value of ( ) for each slope parameter ² _i if you specify the EXPB option in the MODEL statement. For continuous variables, this is equivalent to the estimated odds ratio for a 1 unit change.
- label of the variable (if space permits ) if you specify the PARMLABEL option in the MODEL statement. Due to constraints on the line size, the variable label may be suppressed in order to display the table in one panel. Use the SAS system option LINESIZE= to specify a larger line size to accommodate variable labels. A shorter line size can break the table into two panels allowing labels to be displayed.

'Odds Ratio Estimates' table, which contains the odds ratio estimates and the corresponding 95% Wald confidence intervals. For continuous explanatory variables, these odds ratios correspond to a unit increase in the risk factors.
measures of association between predicted probabilities and observed responses, which include a breakdown of the number of pairs with different responses, and four rank correlation indexes: Somers' D , Goodman-Kruskal Gamma, and Kendall's Tau- a , and c
confidence intervals for all the parameters if you use the CLPARM option in the MODEL statement
confidence intervals for all the odds ratios if you use the CLODDS option in the MODEL statement
'Analysis of Effects not in the Model' table, which gives the score chi-square statistic for testing the significance of each variable not in the model after adjusting for the variables already in the model, and the p -value of the chi-square statistic with respect to a chi-square distribution with one degree of freedom. You specify the DETAILS option in the MODEL statement to obtain this table.
estimated covariance matrix of the parameter estimates if you use the COVB option in the MODEL statement
estimated correlation matrix of the parameter estimates if you use the CORRB option in the MODEL statement
'Linear Hypothesis Testing' table, which gives the result of the Wald test for each TEST statement (if specified)

ODS Table Names

PROC SURVEYLOGISTIC assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, 'Using the Output Delivery System.'

Table 69.2: ODS Tables Produced in PROC SURVEYLOGISTIC
ODS Table Name	Description	Statement	Option
ClassLevelInfo	CLASS variable levels and design variables	MODEL	default (with CLASS vars)
CLOdds	Wald's confidence limits for odds ratios	MODEL	CLODDS
CLparmWald	Wald's confidence limits for parameters	MODEL	CLPARM
ContrastCoeff	L matrix from	CONTRAST	CONTRAST E
ContrastEstimate	Estimates from	CONTRAST	CONTRAST ESTIMATE=
ContrastTest	Wald test for CONTRAST	CONTRAST	default
ConvergenceStatus	Convergence status	MODEL	default
CorrB	Estimated correlation matrix of parameter estimators	MODEL	CORRB
CovB	Estimated covariance matrix of parameter estimators	MODEL	COVB
CumulativeModelTest	Test of the cumulative model assumption	MODEL	(ordinal response)
DesignSummary	Design summary	STRATA CLUSTER	default
FitStatistics	Model fit statistics	MODEL	default
GlobalTests	Test for global null hypothesis	MODEL	default
IterHistory	Iteration history	MODEL	ITPRINT
LastGradient	Last evaluation of gradient	MODEL	ITPRINT
LogLikeChange	Final change in the log likelihood	MODEL	ITPRINT
ModelInfo	Model information	PROC	default
NObs	Number of observations	PROC	default
OddsRatios	Odds ratios	MODEL	default
ParameterEstimates	Maximum likelihood estimates of model	parameters	MODEL default
RSquare	R-square	MODEL	RSQUARE
ResponseProfile	Response profile	PROC	default
SimpleStatistics	Summary statistics for explanatory variables	PROC	SIMPLE
StrataInfo	Stratum information	STRATA	LIST
TestPrint1	L [cov( b )] L 'and Lb - c	TEST	PRINT
TestPrint2	Ginv( L [cov( b )] L ') and Ginv( L [cov( b )] L ')( Lb - c )	TEST	PRINT
TestStmts	Linear hypotheses testing results	TEST	default
TypeIII	Type III tests of effects	MODEL	default (with CLASS variables)

By referring to the names of such tables, you can use the ODS OUTPUT statement to place one or more of these tables in output data sets.