The following statements are available in PROC SURVEYREG:
PROC SURVEYREG < options > ;
BY variables ;
CLASS variables ;
CLUSTER variables ;
CONTRAST 'label' effect values
< effect values >< / options > ;
ESTIMATE 'label'effect values
< effect values >< / options > ;
MODEL dependent = < effects >< / options > ;
STRATA variables < / options > ;
WEIGHT variable ;
The PROC SURVEYREG and MODEL statements are required. If your model contains classification effects, you must list the classification variables in a CLASS statement, and the CLASS statement must precede the MODEL statement. If you use a CONTRAST statement or an ESTIMATE statement, the MODEL statement must precede the CONTRAST or ESTIMATE statement.
The CLASS, CLUSTER, STRATA, CONTRAST, and ESTIMATE statements can appear multiple times. You should only use one MODEL statement and one WEIGHT statement.
PROC SURVEYREG < options > ;
The PROC SURVEYREG statement invokes the procedure. You can specify the following options in the PROC SURVEYREG statement:
ALPHA = ±
sets the confidence level for confidence limits. The value of the ALPHA= option must be between 0 and 1, and the default value is 0.05. A confidence level of ± produces 100(1 ˆ’ ± )%confidence limits. The default of ALPHA=0.05 produces 95% confidence limits.
DATA = SAS-data-set
specifies the SAS data set to be analyzed by PROC SURVEYREG. If you omit the DATA= option, the procedure uses the most recently created SAS data set.
RATE = value SAS-data-set
R = value SAS-data-set
specifies the sampling rate as a non-negative value , or specifies an input data set that contains the stratum sampling rates. The procedure uses this information to compute a finite population correction for variance estimation. If your sample design has multiple stages, you should specify the first-stage sampling rate , which is the ratio of the number of PSUs selected to the total number of PSUs in the population.
For a nonstratified sample design, or for a stratified sample design with the same sampling rate in all strata, you should specify a non-negative value for the RATE= option. If your design is stratified with different sampling rates in the strata, then you should name a SAS data set that contains the stratification variables and the sampling rates. See the section Specification of Population Totals and Sampling Rates on page 4382 for more details.
The value in the RATE= option or the values of _RATE_ in the secondary data set must be non-negative numbers . You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYREG will convert that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.
If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.
TOTAL = value SAS-data-set
N = value SAS-data-set
specifies the total number of primary sampling units in the study population as a positive value , or specifies an input data set that contains the stratum population totals. The procedure uses this information to compute a finite population correction for variance estimation.
For a nonstratified sample design, or for a stratified sample design with the same population total in all strata, you should specify a positive value for the TOTAL= option. If your sample design is stratified with different population totals in the strata, then you should name a SAS data set that contains the stratification variables and the population totals. See the section Specification of Population Totals and Sampling Rates on page 4382 for more details.
If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.
TRUNCATE
specifies that class levels should be determined using no more than the first 16 characters of the formatted values of the CLASS, STRATA, and CLUSTER variables. When formatted values are longer than 16 characters , you can use this option in order to revert to the levels as determined in releases previous to Version 9.
BY variables ;
You can specify a BY statement with PROC SURVEYREG to obtain separate analyses on observations in groups defined by the BY variables.
Note that using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid subpopulation or domain analysis, where the total number of units in the subpopulation is not known with certainty . For more information on subpopulation analysis for sample survey data, refer to Cochran (1977).
When a BY statement appears, the procedure expects the input data sets to be sorted in order of the BY variables. If you specify more than one BY statement, the procedure uses only the latest BY statement and ignores any previous ones.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the SURVEYREG procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.
For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .
CLASS CLASSES variables ;
The CLASS statement specifies the classification variables to be used in the model. Typical class variables are TREATMENT, GENDER, RACE, GROUP , and REPLICATION. If you specify the CLASS statement, it must appear before the MODEL statement.
Classification variables can be either character or numeric. Class levels are determined from the formatted values of the CLASS variables. Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Concepts . By default, class levels are determined from the entire formatted values of the CLASS variables. Note that this represents a slight change from previous releases in the way in which class levels are determined. In releases prior to Version 9, class levels were determined using no more than the first 16 characters of the formatted values. If you wish to revert to this previous behavior you can use the TRUNCATE option in the PROC SURVEYREG statement.
You can use multiple CLASS statements to specify classification variables.
CLUSTER CLUSTERS variables ;
The CLUSTER statement specifies variables that identify clusters in a clustered sample design. The combinations of categories of CLUSTER variables define the clusters in the sample. If there is a STRATA statement, clusters are nested within strata.
If your sample design has clustering at multiple stages, you should identify only the first-stage clusters, or primary sampling units (PSUs), in the CLUSTER statement.
The CLUSTER variables are one or more variables in the DATA= input data set. These variables can be either character or numeric. The formatted values of the CLUSTER variables determine the CLUSTER variable levels. Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary . By default, clusters are determined from the entire formatted values of the CLUSTER variables. Note that this represents a slight change from previous releases in the way in which clusters are determined. In releases prior to Version 9, clusters were determined using no more than the first 16 characters of the formatted values. If you wish to revert to this previous behavior you can use the TRUNCATE option in the PROC SURVEYREG statement.
You can use multiple CLUSTER statements to specify cluster variables. The procedure uses variables from all CLUSTER statements to create clusters.
CONTRAST 'label' effect values < / options > ;
CONTRAST 'label' effect values < effect values >< / options > ;
The CONTRAST statement provides custom hypothesis tests for linear combinations of the regression parameters H : L ² = 0, where L is the vector or matrix you specify and ² is the vector of regression parameters. Thus, to use this feature, you must be familiar with the details of the model parameterization used by PROC SURVEYREG. For information on the parameterization, see the section 'Parameterization of PROC GLM Models' on page 1787 in Chapter 32, 'The GLM Procedure.'
Each term in the MODEL statement, called an effect , is a variable or a combination of variables. You can specify an effect with a variable name or a special notation using variable names and operators. For more details on how to specify an effect, see the section 'Specification of Effects' on page 1784 in Chapter 32, 'The GLM Procedure.'
For each CONTRAST statement, PROC SURVEYREG computes Wald's F test. The procedure displays this value with the degrees of freedom, and identifies it with the contrast label. The numerator degrees of freedom for Wald's F test equals rank( L ). The denominator degrees of freedom equals the number of clusters (or the number of observations if there is no CLUSTER statement) minus the number of strata. Alternatively, you can use the DF= option in the MODEL statement to specify the denominator degrees of freedom.
You can specify any number of CONTRAST statements, but they must appear after the MODEL statement.
In the CONTRAST statement,
label | identifies the contrast in the output. A label is required for every contrast specified. Labels must be enclosed in single quotes. |
effect | identifies an effect that appears in the MODEL statement. You can use the INTERCEPT keyword as an effect when an intercept is fitted in the model. You do not need to include all effects that are in the MODEL statement. |
values | are constants that are elements of L associated with the effect. |
You can specify the following options in the CONTRAST statement after a slash (/):
E
displays the entire coefficient L vector or matrix.
NOFILL
requests no filling in higher-order effects. When you specify only certain portions of L , by default PROC SURVEYREG constructs the remaining elements from the context. (For more information, see the section 'Specification of ESTIMATE Expressions' on page 1801 in Chapter 32, 'The GLM Procedure.')
When you specify the NOFILL option, PROC SURVEYREG does not construct the remaining portions and treats the vector or matrix L as it is defined in the CONTRAST statement.
SINGULAR = value
specifies the sensitivity for checking estimability. If v is a vector, define ABS( v ) to be the largest absolute value of the elements of v . Say H is the ( X ² X ) - X ² X matrix, and C is ABS( L ) except for elements of L that equal 0, and then C is 1. If ABS( L ˆ’ LH ) > C · value , then L is declared nonestimable. The SINGULAR= value must be between 0 and 1, and the default is 10 ˆ’ 4 .
As stated previously, the CONTRAST statement enables you to perform hypothesis tests H : L ² = 0.
If the L matrix contains more than one contrast, then you can separate the rows of the L matrix with commas. For example, for the model
proc surveyreg; class A B; model Y=A B; run;
with A at 5 levels and B at 2 levels, the parameter vector is
To test the hypothesis that the pooled A linear and A quadratic effect is zero, you can use the following L matrix:
The corresponding CONTRAST statement is
contrast 'A Linear & Quadratic' a 2 1 0 1 2, a 2 1 2 1 2;
ESTIMATE 'label' effect values < / options > ;
ESTIMATE 'label' effect values < effect values >< / options > ;
You can use an ESTIMATE statement to estimate a linear function of the regression parameters by multiplying a row vector L by the parameter estimate vector .
Each term in the MODEL statement, called an effect , is a variable or a combination of variables. You can specify an effect with a variable name or with a special notation using variable names and operators. For more details on how to specify an effect, see the section 'Specification of Effects' on page 1784 in Chapter 32, 'The GLM Procedure.'
PROC SURVEYREG checks the linear function for estimability. (See the SINGULAR= option described on page 4379).
The procedure displays the estimate L along with its standard error and t test. If you specify the CLPARM option in the MODEL statement, PROC SURVEYREG also displays confidence limits for the linear function. By default, the degrees of freedom for the t test equals the number of clusters (or the number of observations if there is no CLUSTER statement) minus the number of strata. Alternatively, you can specify the degrees of freedom with the DF= option in the MODEL statement.
You can specify any number of ESTIMATE statements, but they must appear after the MODEL statement.
In the ESTIMATE statement,
label | identifies the linear function L in the output. A label is required for every function specified. Labels must be enclosed in single quotes. |
effect | identifies an effect that appears in the MODEL statement. You can use the INTERCEPT keyword as an effect when an intercept is fitted in the model. You do not need to include all effects that are in the MODEL statement. |
values | values are constants that are elements of the vector L associated with the effect. For example, the following code forms an estimate that is the difference between the parameters estimated for the first and second levels of the CLASS variable A . |
estimate A1 vs A2 A 1 1;
You can specify the following options in the ESTIMATE statement after a slash (/):
DIVISOR = value
specifies a value by which to divide all coefficients so that fractional coefficients can be entered as integers. For example, you can use
estimate 1/3(A1+A2) 2/3A3 a 1 1 2 / divisor=3;
instead of
estimate 1/3(A1+A2) 2/3A3 a 0.33333 0.33333 0.66667;
E
displays the entire coefficient vector L .
NOFILL
requests no filling in higher-order effects. When you specify only certain portions of the vector L , by default PROC SURVEYREG constructs the remaining elements from the context. (See the section 'Specification of ESTIMATE Expressions' on page 1801 in Chapter 32, 'The GLM Procedure.') When you specify the NOFILL option, PROC SURVEYREG does not construct the remaining portions and treats the vector L as it is defined in the ESTIMATE statement.
SINGULAR = value
specifies the sensitivity for checking estimability. If v is a vector, define ABS( v ) to be the largest absolute value of the elements of v . Say H is the ( X ² X ) ˆ’ X ² X matrix, and C is ABS( L ) except for elements of L that equal 0, and then C is 1. If ABS( L ˆ’ LH ) > C — value , then L is declared nonestimable. The SINGULAR= value must be between 0 and 1, and the default is 10 ˆ’ 4 .
MODEL dependent = < effects >< / options > ;
The MODEL statement specifies the dependent (response) variable and the independent (regressor) variables or effects. Each term in a MODEL statement, called an effect , is a variable or a combination of variables. You can specify an effect with a variable name or with special notation using variable names and operators. For more information on how to specify an effect, see the section 'Specification of Effects' on page 1784 in Chapter 32, 'The GLM Procedure.' The dependent variable must be numeric. Only one MODEL statement is allowed for each PROC SURVEYREG statement. If you specify more than one MODEL statement, the procedure uses the first model and ignores the rest.
You can specify the following options in the MODEL statement after a slash (/):
ADJRSQ
requests the procedure to compute the adjusted multiple R-square.
ANOVA
requests the ANOVA table to be produced in the output. By default, the ANOVA table will not be printed in the output.
CLPARM
requests confidence limits for the parameter estimates. The SURVEYREG procedure determines the confidence coefficient using the ALPHA= option, which by default equals 0.05 and produces 95% confidence bounds. The CLPARM option also requests confidence limits for all the estimable linear functions of regression parameters in the ESTIMATE statements.
Note that when there is a CLASS statement, you need to use the SOLUTION option with the CLPARM option to obtain the parameter estimates and their confidence limits.
COVB
displays the estimated covariance matrix of the estimated regression estimates.
DEFF
displays design effects for the regression coefficient estimates.
DF = value
specifies the denominator degrees of freedom for the F tests and the degrees of freedom for the t tests. The default is the number of clusters (or the number of observations if there is no CLUSTER statement) minus the number of actual strata. The number of actual strata equals the number of strata in the data before collapsing minus the number of strata collapsed plus 1. See the section 'Stratum Collapse' on page 4388 for details on 'collapsing of strata.'
I
INVERSE
displays the inverse or the generalized inverse of the X ² X matrix. When there is a WEIGHT variable, the procedure displays the inverse or the generalized inverse of the X ² WX matrix, where W is the diagonal matrix constructed from WEIGHT variable values.
NOINT
omits the intercept from the model.
SOLUTION
displays a solution to the normal equations, which are the parameter estimates. The SOLUTION option is useful only when you use a CLASS statement. If you do not specify a CLASS statement, PROC SURVEYREG displays parameter estimates by default. But if you specify a CLASS statement, PROC SURVEYREG does not display parameter estimates unless you also specify the SOLUTION option.
VADJUST=DF NONE
VARADJ=DF NONE
VARADJUST=DF NONE
specifies if the you want to use degrees of freedom adjustment ( n - 1) / ( n - p ) in the computation of the matrix G for the variance estimation on page 4385. If you do not specify the VADJUST= option, by default, PROC SURVEYREG uses the degrees of freedom adjustment, that is equivalent to the VARADJ=DF option. If you do not wish to use this variance adjustment, you can specify the VADJUST=NONE option.
X
XPX
displays the X ² X matrix, or the X ² WX matrix when there is a WEIGHT variable, where W is the diagonal matrix constructed from WEIGHT variable values. The X option also displays the crossproducts vector X ² y , or X ² Wy .
STRATA STRATUM variables < / options > ;
The STRATA statement specifies variables that form the strata in a stratified sample design. The combinations of categories of STRATA variables define the strata in the sample.
If your sample design has stratification at multiple stages, you should identify only the first-stage strata in the STRATA statement. See the section 'Specification of Population Totals and Sampling Rates' on page 4382 for more information.
The STRATA variables are one or more variables in the DATA= input data set. These variables can be either character or numeric. By default, strata are determined from the entire formatted values of the STRATA variables. Note that this represents a slight change from previous releases in the way in which strata are determined. In releases prior to Version 9, strata were determined using no more than the first 16 characters of the formatted values. If you wish to revert to this previous behavior you can use the TRUNCATE option in the PROC SURVEYREG statement.
Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide .
You can use multiple STRATA statements to specify stratum variables.
You can specify the following options in the STRATA statement after a slash (/):
LIST
displays a 'Stratum Information' table, which includes values of the STRATA variables, and the number of observations, number of clusters, population total, and sampling rate for each stratum. This table also displays stratum collapse information.
NOCOLLAPSE
prevents the procedure from collapsing, or combining, strata that have only one sampling unit. By default, the procedure collapses strata that contain only one sampling unit. See the section 'Stratum Collapse' on page 4388 for details.
WEIGHT WGT variable ;
The WEIGHT statement specifies the variable that contains the sampling weights. This variable must be numeric. If you do not specify a WEIGHT statement, PROC SURVEYREG assigns all observations a weight of 1. Sampling weights must be positive numbers. If an observation has a weight that is nonpositive or missing, then the procedure omits that observation from the analysis. If you specify more than one WEIGHT statement, the procedure uses only the first WEIGHT statement and ignores the rest.