The following statements are available in PROC SURVEYLOGISTIC:
PROC SURVEYLOGISTIC < options > ;
BY variables ;
CLASS variable < ( v-options ) > < variable < ( v-options ) > >
/ v-options > ;
CLUSTER variables ;
CONTRAST 'label' effect values < , effect values >< / options > ;
FREQ variable ;
MODEL events/trials = < effects >< / options > ;
MODEL variable < ( variable_options ) > = < effects >< / options > ;
STRATA variables < / options > ;
< label : > TEST equation1 < , , < equationk >>< / option > ;
UNITS independent1 = list1 < independentk = listk >< / option > ;
WEIGHT variable < / option > ;
The PROC SURVEYLOGISTIC and MODEL statements are required. The CLASS, CLUSTER, STRATA, and CONTRAST statements can appear multiple times. You should only use one MODEL statement and one WEIGHT statement. The CLASS statement (if used) must precede the MODEL statement, and the CONTRAST statement (if used) must follow the MODEL statement.
The rest of this section provides detailed syntax information for each of the preceding statements, beginning with the PROC SURVEYLOGISTIC statement. The remaining statements are covered in alphabetical order.
PROC SURVEYLOGISTIC < options > ;
The PROC SURVEYLOGISTIC statement invokes the SURVEYLOGISTIC procedure and optionally identifies input data sets and controls the ordering of the response levels.
ALPHA= ±
sets the confidence level for confidence limits. The value of the ALPHA= option must be between 0 and 1, and the default value is 0.05. A confidence level of ± produces 100(1 ˆ’ ± )% confidence limits. The default of ALPHA=0.05 produces 95% confidence limits.
DATA= SAS-data-set
names the SAS data set containing the data to be analyzed . If you omit the DATA= option, the procedure uses the most recently created SAS data set.
INEST= SAS-data-set
names the SAS data set that contains initial estimates for all the parameters in the model. BY- group processing is allowed in setting up the INEST= data set. See the section 'INEST= Data Set' on page 4280 for more information.
MISSING
requests that the procedure treat missing values as a valid category for all categorical variables, which include classification variables in the model, strata variables, and cluster variables.
NAMELEN= n
specifies the length of effect names in tables and output data sets to be n characters , where n is a value between 20 and 200. The default length is 20 characters.
NOSORT
suppresses the internal sorting process to shorten the computation time if the data set is presorted by the STRATA and CLUSTER variables. By default, the procedure sorts the data by the STRATA variables if you use the STRATA statement; then the procedure sorts the data by the CLUSTER variables within strata. If your data are already stored by the order of STRATA and CLUSTER variables, then you can specify this option to omit this sorting process to reduce the usage of computing resources, especially when your data set is very large. However, if you specify this NOSORT option while your data is not presorted by STRATA and CLUSTER variables, then any changes in these variables creates a new stratum or cluster.
RATE= value SAS-data-set
R= value SAS-data-set
specifies the sampling rate as a nonnegative value , or specifies an input data set that contains the stratum sampling rates. The procedure uses this information to compute a finite population correction (fpc) for variance estimation when the sample design is without replacement. If your sample design has multiple stages, you should specify the first-stage sampling rate , which is the ratio of the number of primary sampling units (PSUs) selected to the total number of PSUs in the population.
For a nonstratified sample design, or for a stratified sample design with the same sampling rate in all strata, you should specify a nonnegative value for the RATE= option. If your design is stratified with different sampling rates in the strata, then you should name a SAS data set that contains the stratification variables and the sampling rates. See the section 'Specification of Population Totals and Sampling Rates' on page 4280 for more details.
The value in the RATE= option or the values of _RATE_ in the secondary data set must be nonnegative numbers . You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYLOGISTIC will convert that number to a proportion. The procedure treats the value 1 as 100%.
If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.
TOTAL = value SAS-data-set
N = value SAS-data-set
specifies the total number of primary sampling units (PSUs) in the study population as a positive value , or names an input data set that contains the stratum population totals. The procedure uses this information to compute a finite population correction for variance estimation.
For a nonstratified sample design, or for a stratified sample design with the same population total in all strata, you should specify a positive value for the TOTAL= option. If your sample design is stratified with different population totals in the strata, then you should name a SAS data set that contains the stratification variables and the population totals. See the section 'Specification of Population Totals and Sampling Rates' on page 4280 for more details.
If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.
BY variables ;
You can specify a BY statement with PROC SURVEYLOGISTIC to obtain separate analyses on observations in groups defined by the BY variables.
Note that using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid subpopulation or domain analysis, where the total number of units in the subpopulation is not known with certainty .
When a BY statement appears, the procedure expects the input data sets to be sorted in the order of the BY variables. The variables are one or more variables in the input data set.
If you specify more than one BY statement, the procedure uses only the latest BY statement and ignores any previous ones.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data using the SORT procedure with a similar BY statement.
Use the BY statement options NOTSORTED or DESCENDING in the BY statement. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.
For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .
CLASS variable < ( v-options ) > < variable < ( v-options ) > >
< / v-options > ;
The CLASS statement names the classification variables to be used in the analysis. The CLASS statement must precede the MODEL statement. You can specify various v-options for each variable by enclosing them in parentheses after the variable name. You can also specify global v-options for the CLASS statement by placing them after a slash (/). Global v-options are applied to all the variables specified in the CLASS statement. However, individual CLASS variable v-options override the global v-options .
CPREFIX= n
specifies that, at most, the first n characters of a CLASS variable name be used in creating names for the corresponding dummy variables. The default is 32 ˆ’ min(32 , max(2 ,f )), where f is the formatted length of the CLASS variable.
DESCENDING
DESC
reverses the sorting order of the classification variable.
LPREFIX= n
specifies that, at most, the first n characters of a CLASS variable label be used in creating labels for the corresponding dummy variables.
ORDER=DATA FORMATTED FREQ INTERNAL
specifies the sorting order for the levels of classification variables. This ordering determines which parameters in the model correspond to each level in the data, so the ORDER= option may be useful when you use the CONTRAST statement. When the default ORDER=FORMATTED is in effect for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values.
The following table shows how PROC SURVEYLOGISTIC interprets values of the ORDER= option.
Value of ORDER= | Levels Sorted By |
---|---|
DATA | order of appearance in the input data set |
FORMATTED | external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value |
FREQ | descending frequency count; levels with the most observations come first in the order |
INTERNAL | unformatted value |
By default, ORDER=FORMATTED. For FORMATTED and INTERNAL, the sort order is machine dependent. For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts .
PARAM= keyword
specifies the parameterization method for the classification variable or variables. Design matrix columns are created from CLASS variables according to the following coding schemes. The default is PARAM=EFFECT. If PARAM=ORTHPOLY or PARAM=POLY, and the CLASS levels are numeric, then the ORDER= option in the CLASS statement is ignored, and the internal, unformatted values are used.
EFFECT | specifies effect coding |
GLM | specifies less-than -full-rank, reference cell coding; this option can only be used as a global option |
ORDINAL | specifies the cumulative parameterization for an ordinal CLASS variable. |
POLYNOMIAL | |
POLY | specifies polynomial coding |
REFERENCE | |
REF | specifies reference cell coding |
ORTHEFFECT | orthogonalizes PARAM=EFFECT |
ORTHORDINAL | |
ORTHOTHERM | orthogonalizes PARAM=ORDINAL |
ORTHPOLY | orthogonalizes PARAM=POLYNOMIAL |
ORTHREF | orthogonalizes PARAM=REFERENCE |
The EFFECT, POLYNOMIAL, REFERENCE, ORDINAL, and their orthogonal parameterizations are full rank. The REF= option in the CLASS statement determines the reference level for the EFFECT, REFERENCE, and their orthogonal parameterizations.
Parameter names for a CLASS predictor variable are constructed by concatenating the CLASS variable name with the CLASS levels. However, for the POLYNOMIAL and orthogonal parameterizations, parameter names are formed by concatenating the CLASS variable name and keywords that reflect the parameterization.
REF= 'level' keyword
specifies the reference level for PARAM=EFFECT or PARAM=REFERENCE. For an individual (but not a global) variable REF= option , you can specify the level of the variable to use as the reference level. For a global or individual variable REF= option , you can use one of the following keywords . The default is REF=LAST.
FIRST | designates the first ordered level as reference |
LAST | designates the last ordered level as reference |