This chapter introduces the SAS/STAT procedures for survey sampling and describes how you can use these procedures to analyze survey data.
Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Due to variability among items, researchers apply scientific probability-based designs to select the sample. This reduces the risk of a distorted view of the population and allows statistically valid inferences to be made from the sample. Refer to Lohr (1999), Kalton (1983), Cochran (1977), and Kish (1965) for more information on statistical sampling and analysis of complex survey data. To select probability-based random samples from a study population, you can use the SURVEYSELECT procedure, which provides a variety of methods for probability sampling. To analyze sample survey data, you can use the SURVEYMEANS, SURVEYFREQ, SURVEYREG, and SURVEYLOGISTIC procedures, which incorporate the sample design into the analyses.
Many SAS/STAT procedures, such as the MEANS, FREQ, GLM and LOGISTIC procedures, can compute sample means, produce crosstabulation tables, and estimate regression relationships. However, in most of these procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population using a complex survey design, these procedures generally do not calculate the estimates and their variances according to the design actually used. Using analyses that are not appropriate for your sample design can lead to incorrect statistical inferences.
The SURVEYMEANS, SURVEYFREQ, SURVEYREG, and SURVEYLOGISTIC procedures do properly analyze complex survey data, taking into account the sample design. These procedures can be used for multistage designs or for single-stage designs, with or without stratification, and with or without unequal weighting . The procedures use the Taylor expansion method to estimate sampling errors of estimators based on complex sample designs. This method is appropriate for all designs where the first-stage sample is selected with replacement, or where the first-stage sampling fraction is small, as it often is in practice.
The following table briefly describes the sampling and analysis procedures in SAS/STAT software.
SURVEYSELECT | |
Sampling Methods | simple random sampling unrestricted random sampling (with replacement) systematic sequential selection probability proportional to size (PPS) with and without replacement PPS systematic PPS for two units per stratum sequential PPS with minimum replacement |
SURVEYMEANS | |
Statistics | estimates of population means and totals estimates of population proportions standard errors confidence limits hypothesis tests domain analyses ratio estimates |
SURVEYFREQ | |
Analyses | one-way frequency tables two-way and multiway crosstabulation tables estimates of population totals and proportions standard errors confidence limits tests of goodness-of-fit tests of independence |
SURVEYREG | |
Analyses | linear regression model fitting regression coefficients covariance matrices hypothesis tests confidence limits estimable functions contrasts |
SURVEYLOGISTIC | |
Analyses | cumulative logit regression model fitting logit, complementary log-log, and probit link functions generalized logit regression model fitting regression coefficients covariance matrices hypothesis tests model diagnostics odds ratios confidence limits estimable functions contrasts |