Example | SAS/STAT 9.1 Users Guide, Volumes 1-7

This section demonstrates how you can use the survey procedures to select a probability-based sample and then analyze the survey data to make inferences about the population. The analyses includes descriptive statistics and regression analysis. This example is a survey of income and expenditures for a group of households in North Carolina and South Carolina. The goals of the survey are to

estimate total income and total basic living expenses
investigate the linear relationship between income and living expenses

Sample Selection

To select a sample with PROC SURVEYSELECT, you input a SAS data set that contains the sampling frame or list of units from which the sample is to be selected. You also specify the selection method, the desired sample size or sampling rate, and other selection parameters.

In this example, the sample design is a stratified simple random sample design, with households as the sampling units. The sampling frame (the list of the group of the households) is stratified by State and Region . Within strata, households are selected by simple random sampling. Using this design, the following PROC SURVEYSELECT statements select a probability sample of households from the HHSample data set:

  proc surveyselect data=HHSample out=Sample   method=srs n=(3, 5, 3, 6, 2);   strata State Region;   run;

The STRATA statement names the stratification variables State and Region . In the PROC SURVEYSELECT statement, the DATA= option names the SAS data set HHSample as the input data set (the sampling frame) from which to select the sample. The OUT= option stores the sample in the SAS data set named Sample . The METHOD=SRS option specifies simple random sampling as the sample selection method. The N= option specifies the stratum sample sizes.

The SURVEYSELECT procedure then selects a stratified random sample of households and produces the output data set Sample , which contains the selected households together with their selection probabilities and sampling weights. The data set Sample also contains the sampling unit identification variable Id and the stratification variables State and Region from the data set HHSample .

Survey Data Analysis

You can use the SURVEYMEANS and SURVEYREG procedures to estimate population values and to perform regression analyses for survey data. The following example briefly shows the capabilities of each procedure. See Chapter 70, The SURVEYMEANS Procedure, and Chapter 71, The SURVEYREG Procedure, for more detailed information.

To estimate the total income and expenditure in the population from the sample, you specify the input data set containing the sample, the statistics to be computed, the variables to be analyzed , and any stratification variables. The statements to compute the descriptive statistics are as follows :

  proc surveymeans data=Sample sum clm;   var Income Expense;   strata State Region;   weight Weight;   run;

The PROC SURVEYMEANS statement invokes the procedure, specifies the input data set, and requests estimates of population totals and their standard deviations for the analysis variables (SUM), as well as confidence limits for the estimates (CLM).

The VAR statement specifies the two analysis variables, Income and Expense . The STRATA statement identifies State and Region as the stratification variables in the sample design. The WEIGHT statement specifies the sampling weight variable Weight .

You can also use the SURVEYREG procedure to perform regression analysis for sample survey data. Suppose that, in order to explore the relationship between the total income and the total basic living expenses of a household in the survey population, you choose the following linear model to describe the relationship:

The following statements fit this linear model:

  proc surveyreg data=Sample;   strata State Region ;   model  Expense = Income;   weight Weight;   run;

In the PROC SURVEYREG statement, the DATA= option specifies the input sample survey data as Sample . The STRATA statement identifies the stratification variables as State and Region . The MODEL statement specifies the model, with Expense as the dependent variable and Income as the independent variable. The WEIGHT statement specifies the sampling weight variable Weight .