Details


Missing Values

If an observation has a missing or nonpositive value for the SIZE variable, PROC SURVEYSELECT excludes that observation from the sample selection. The procedure writes a note to the log giving the number of observations omitted due to missing or nonpositive size measures.

PROC SURVEYSELECT treats missing STRATA variable values like any other STRATA variable value. The missing values form a separate stratum.

If a value of _NSIZE_ is missing in the SAMPSIZE= input data set, then PROC SURVEYSELECT writes an error message to the log and does not select a sample from that stratum. The procedure treats missing values of _NRATE_ , _MINSIZE_ , _MAXSIZE_ ,and _CERTSIZE_ similarly.

Sorting by CONTROL Variables

If you specify a CONTROL statement, PROC SURVEYSELECT sorts the input data set by the CONTROL variables before selecting the sample. If you also specify a STRATA statement, the procedure sorts by CONTROL variables within strata. Sorting by CONTROL variables is available for systematic and sequential selection methods , which include METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ. Sorting provides additional control over the distribution of the sample, giving some benefits of proportionate stratification.

By default, the sorted data set replaces the input data set. Or you can use the OUTSORT= option to name an output data set that contains the sorted input data set.

PROC SURVEYSELECT provides two types of sorting: nested sorting and hierarchic serpentine sorting. If you specify the SORT=NEST option, then the procedure sorts by the CONTROL variables according to nested sorting. If you do not specify the SORT =NEST option, the procedure uses serpentine sorting by default. These two types of sorting are equivalent when there is only one CONTROL variable.

If you request nested sorting, PROC SURVEYSELECT sorts observations in the same order as PROC SORT does for an ascending sort by the CONTROL variables. Refer to the chapter on the SORT procedure in the SAS Procedures Guide . PROC SURVEYSELECT sorts within strata if you also specify a STRATA statement. The procedure first arranges the input observations in ascending order of the first CONTROL variable. Then within each level of the first control variable, the procedure arranges the observations in ascending order of the second CONTROL variable. This continues for all CONTROL variables specified.

In hierarchic serpentine sorting, PROC SURVEYSELECT sorts by the first CONTROL variable in ascending order. Then within the first level of the first CONTROL variable, the procedure sorts by the second CONTROL variable in ascending order. Within the second level of the first CONTROL variable, the procedure sorts by the second CONTROL variable in descending order. Sorting by the second CONTROL variable continues to alternate between ascending and descending sorting throughout all levels of the first CONTROL variable. If there is a third CONTROL variable, the procedure sorts by that variable within levels formed from the first two CONTROL variables, again alternating between ascending and descending sorting. This continues for all CONTROL variables specified. This sorting algorithm minimizes the change from one observation to the next with respect to the CONTROL variable values, thus making nearby observations more similar. For more information on serpentine sorting, refer to Chromy (1979) and Williams and Chromy (1980).

Sample Selection Methods

PROC SURVEYSELECT provides a variety of methods for selecting probability-based random samples. With probability sampling, each unit in the survey population has a known, positive probability of selection. This property of probability sampling avoids selection bias and enables you to use statistical theory to make valid inferences from the sample to the survey population. Refer to Lohr (1999), Kish (1965, 1987), Kalton (1983), and Cochran (1977) for more information on probability sampling.

In equal probability sampling, each unit in the sampling frame, or in a stratum, has the same probability of being selected for the sample. PROC SURVEYSELECT provides the following methods that select units with equal probability: simple random sampling, unrestricted random sampling, systematic random sampling, and sequential random sampling. In simple random sampling, units are selected without replacement , which means that a unit cannot be selected more than once. Both systematic and sequential equal probability sampling are also without replacement. In unrestricted random sampling, units are selected with replacement , which means that a unit can be selected more than once. In with-replacement sampling, the number of hits refers to the number of times a unit is selected.

In probability proportional to size (PPS) sampling, a unit's selection probability is proportional to its size measure. PROC SURVEYSELECT provides the following methods that select units with probability proportional to size (PPS): PPS sampling without replacement, PPS sampling with replacement, PPS systematic sampling, PPS sequential sampling, Brewer's method, Murthy's method, and Sampford's method. PPS sampling is often used in cluster sampling, where you select clusters (or groups of sampling units) of varying size in the first stage of selection. For example, clusters may be schools , hospitals , or geographical areas, and the final sampling units may be students, patients , or citizens . Cluster sampling can provide efficiencies in frame construction and other survey operations. Refer to Lohr (1999), Kalton (1983), Kish (1965), and the other references cited in the following sections for more information.

All the probability sampling methods provided by PROC SURVEYSELECT use random numbers in their selection algorithms, as described in the following sections and in the references cited. PROC SURVEYSELECT uses a uniform random number function to generate streams of pseudo-random numbers from an initial starting point, or seed . You can use the SEED= option to specify the initial seed. If you do not specify the SEED= option, PROC SURVEYSELECT uses the time of day from the computer's clock to obtain the initial seed. PROC SURVEYSELECT generates uniform random numbers according to the method of Fishman and Moore (1982), using a prime modulus multiplicative generator with modulus 2 31 and multiplier 397204094. PROC SURVEYSELECT uses the same uniform random number generator as the RANUNI function. For more information on the RANUNI function, see the SAS Language Reference: Dictionary

The following sections give detailed descriptions of the sample selection methods available in PROC SURVEYSELECT. In these sections, n h denotes the sample size (the number of units in the sample) for stratum h , and N h denotes the population size (number of units in the population) for stratum h , for h = 1, 2, , H . When the sample design is not stratified, n denotes the sample size, and N denotes the population size. For PPS sampling, M hi represents the size measure for unit i in stratum h , M h · is the total of all size measures for the population of stratum h , and Z hi = M hi /M h is the relative size of unit i in stratum h .

Simple Random Sampling

The method of simple random sampling (METHOD=SRS) selects units with equal probability and without replacement. Each possible sample of n different units out of N has the same probability of being selected. The selection probability for each individual unit equals n/N . When you request stratified sampling with a STRATA statement, PROC SURVEYSELECT selects samples independently within strata. The selection probability for a unit in stratum h equals n h /N h for stratified simple random sampling.

By default, PROC SURVEYSELECT uses Floyd's ordered hash table algorithm for simple random sampling. This algorithm is fast, efficient, and appropriate for large data sets. Refer to Bentley and Floyd (1987) and Bentley and Knuth (1986).

If there is not enough memory available for Floyd's algorithm, PROC SURVEYSELECT switches to the sequential algorithm of Fann, Muller, and Rezucha (1962), which requires less memory but may require more time to select the sample. When SURVEYSELECT uses the alternative sequential algorithm, it writes a note to the log. To request the sequential algorithm, even if enough memory is available for Floyd's algorithm, you can specify METHOD=SRS2 in the PROC SURVEYSELECT statement.

Unrestricted Random Sampling

The method of unrestricted random sampling (METHOD=URS) selects units with equal probability and with replacement. Because units are selected with replacement, a unit can be selected for the sample more than once. The expected number of selections or hits for each unit equals n/N when sampling without stratification. For stratified sampling, the expected number of hits for a unit in stratum h equals n h /N h . Note that the expected number of hits exceeds one when the sample size n is greater than the population size N .

For unrestricted random sampling, by default, the output data set contains one observation for each distinct unit selected for the sample, together with a variable NumberHits that gives the number of times the observation was selected. But if you specify the OUTHITS option, then the output data set contains a separate observation for each selection, so that a unit selected three times, e.g., is represented by three observations in the output data set. For information on the contents of the output data set, see the section 'Output Data Set' on page 4456.

Systematic Random Sampling

The method of systematic random sampling (METHOD=SYS) selects units at a fixed interval throughout the sampling frame or stratum after a random start. If you specify the sample size (or the stratum sample sizes) with the SAMPSIZE= option, PROC SURVEYSELECT uses a fractional interval to provide exactly the specified sample size. The interval equals N/n , or N h /n h for stratified sampling. The selection probability for each unit equals n/N , or n h /N h for stratified sampling. If you specify the sampling rate (or the stratum sampling rates) with the SAMPRATE= option, PROC SURVEYSELECT uses the inverse of the rate as the interval for systematic selection. The selection probability for each unit equals the specified rate.

Systematic random sampling controls the distribution of the sample by spreading it throughout the sampling frame or stratum at equal intervals, thus providing implicit stratification. You can use the CONTROL statement to order the input data set by the CONTROL variables before sample selection. If you also use a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies systematic selection to the observations in the order in which they appear in the input data set.

Sequential Random Sampling

If you specify the option METHOD=SEQ and do not include a SIZE statement, PROC SURVEYSELECT uses the equal probability version of Chromy's method for sequential random sampling. This method selects units sequentially with equal probability and without replacement. Refer to Chromy (1979) and Williams and Chromy (1980). See the section 'PPS Sequential Sampling' on page 4452 for a description of Chromy's PPS selection method.

Sequential random sampling controls the distribution of the sample by spreading it throughout the sampling frame or stratum, thus providing implicit stratification according to the order of units in the frame or stratum. You can use the CONTROL statement to sort the input data set by the CONTROL variables before sample selection. If you also use a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. By default, the procedure uses hierarchic serpentine ordering for sorting. If you specify the SORT=NEST option, the procedure uses nested sorting. See the section 'Sorting by CONTROL Variables' on page 4445 for descriptions of serpentine and nested sorting. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies sequential selection to the observations in the order in which they appear in the input data set.

Following Chromy's method of sequential selection, PROC SURVEYSELECT randomly chooses a starting unit from the entire stratum (or frame, if the design is not stratified). Using this unit as the first one, the procedure treats the stratum units as a closed loop. This is done so that all pairwise (joint) selection probabilities are positive and an unbiased variance estimator can be obtained. The procedure numbers units sequentially from the random start to the end of the stratum and then continues from the beginning of the stratum until all units are numbered.

Beginning with the randomly chosen starting unit, PROC SURVEYSELECT accumulates the expected number of selections or hits, where the expected number of selections ES hi equals n h /N h for all units i in stratum h . The procedure computes

click to expand

where Int denotes the integer part of the number, and Frac denotes the fractional part.

Considering each unit sequentially, Chromy's method determines whether unit i is selected by comparing the total number of selections for the first i ˆ’ 1 units,

click to expand

with the value of I h ( i ˆ’ 1) .

If T h ( i ˆ’ 1) = I h ( i ˆ’ 1) , Chromy's method determines whether or not unit i is selected as follows . If F hi = 0 or F h ( i ˆ’ 1) > F hi , then unit i is selected with certainty . Otherwise, unit i is selected with probability

click to expand

If T h ( i ˆ’ 1) = I h ( i ˆ’ 1) + 1, Chromy's method determines whether or not unit i is selected as follows. If F hi = 0 or F hi > F ( hi ˆ’ 1) , then the unit is not selected. Otherwise, unit i is selected with probability

PPS Sampling without Replacement

If you specify the option METHOD=PPS, PROC SURVEYSELECT selects units with probability proportional to size and without replacement. The selection probability for unit i in stratum h equals n h Z hi . The procedure uses the Hanurav-Vijayan algorithm for PPS selection without replacement. Hanurav (1967) introduced this algorithm for the selection of two units per stratum, and Vijayan (1968) generalized it for the selection of more than two units. The algorithm enables computation of joint selection probabilities and provides joint selection probability values that usually ensure nonnegativity and stability of the Sen-Yates-Grundy variance estimator. Refer to Fox (1989), Golmant (1990), and Watts (1991) for details.

Notation in the remainder of this section drops the stratum subscript h for simplicity, but selection is still done independently within strata if you specify a stratified design. For a stratified design, n now denotes the sample size for the current stratum, N denotes the stratum population size, and M i denotes the size measure for unit i in the stratum. If the design is not stratified, this notation applies to the entire sampling frame.

According to the Hanurav-Vijayan algorithm, PROC SURVEYSELECT first orders units within the stratum in ascending order by size measure, so that M 1 M 2 M N . Then the procedure selects the PPS sample of n observations as follows:

  1. The procedure randomly chooses one of the integers 1, 2, ..., n with probability 1 , 2 , , n , where

    click to expand

    Z j = M j /M , , and, by definition, Z N +1 = 1 /n to ensure that

  2. If i is the integer selected in step 1, the procedure includes the last ( n ˆ’ i ) units of the stratum in the sample, where the units are ordered by size measure as described previously. The procedure then selects the remaining i units according to steps 3 through 6 below.

  3. The procedure defines new normed size measures for the remaining ( N ˆ’ n + i ) stratum units that were not selected in steps 1 and 2:

    click to expand
  4. The procedure selects the next unit from the first ( N ˆ’ n + 1) stratum units with probability proportional to a j (1), where

    click to expand

    and click to expand

  5. If stratum unit j 1 is the unit selected in step 4, then the procedure selects the next unit from units j 1 + 1 through N ˆ’ n + 2 with probability proportional to a j (2 , j 1 ), where

    click to expand
  6. The procedure repeats step 5 until all n sample units are selected.

If you request the JTPROBS option, PROC SURVEYSELECT computes the joint selection probabilities for all pairs of selected units in each stratum. The joint selection probability for units i and j in the stratum equals

click to expand

where

click to expand

and

click to expand

where click to expand

PPS Sampling with Replacement

If you specify the option METHOD=PPS_WR, PROC SURVEYSELECT selects units with probability proportional to size and with replacement. The procedure makes n h independent random selections from the stratum of N h units, selecting with probability Z hi = M hi /M h · . Because units are selected with replacement, a unit can be selected for the sample more than once. The expected number of selections or hits for unit i in stratum h equals n h Z hi . If you request the JTPROBS option, PROC SURVEYSELECT computes the joint expected number of hits for all pairs of selected units in each stratum. The joint expected number of hits for units i and j in stratum h equals

click to expand

PPS Systematic Sampling

If you specify the option METHOD=PPS_SYS, PROC SURVEYSELECT selects units by systematic random sampling with probability proportional to size. Systematic sampling selects units at a fixed interval throughout the stratum or sampling frame after a random start. PROC SURVEYSELECT uses a fractional interval to provide exactly the specified sample size. The interval equals M h · /n h for stratified sampling and M/n for sampling without stratification. Depending on the sample size and the values of the size measures, it may be possible for a unit to be selected more than once. The expected number of selections or hits for unit i in stratum h equals n h M hi /M h · = n h Z hi . Refer to Cochran (1977, pp. 265-266) and Madow (1949).

Systematic random sampling controls the distribution of the sample by spreading it throughout the sampling frame or stratum at equal intervals, thus providing implicit stratification. You can use the CONTROL statement to order the input data set by the CONTROL variables before sample selection. If you also use a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies systematic selection to the observations in the order in which they appear in the input data set.

PPS Sequential Sampling

If you specify the option METHOD=PPS_SEQ, PROC SURVEYSELECT uses Chromy's method of sequential random sampling. Refer to Chromy (1979) and Williams and Chromy (1980). Chromy's method selects units sequentially with probability proportional to size and with minimum replacement. Selection with minimum replacement means that the actual number of hits for a unit can equal the integer part of the expected number of hits for that unit, or the next largest integer. This can be compared to selection without replacement , where each unit can be selected only once, so the number of hits can equal 0 or one. The other alternative is selection with replacement , where there is no restriction on the number of hits for each unit, so the number of hits can equal 0, 1, , n h , where n h is the stratum sample size.

Sequential random sampling controls the distribution of the sample by spreading it throughout the sampling frame or stratum, thus providing implicit stratification according to the order of units in the frame or stratum. You can use the CONTROL statement to sort the input data set by the CONTROL variables before sample selection. If you also use a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. By default, the procedure uses hierarchic serpentine ordering to sort the sampling frame by the CONTROL variables within strata. If you specify the SORT=NEST option, the procedure uses nested sorting. See the section 'Sorting by CONTROL Variables' on page 4445 for descriptions of serpentine and nested sorting. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies sequential selection to the observations in the order in which they appear in the input data set.

According to Chromy's method of sequential selection, PROC SURVEYSELECT first chooses a starting unit randomly from the entire stratum, with probability proportional to size. The procedure uses this unit as the first one and treats the stratum observations as a closed loop. This is done so that all pairwise (joint) expected number of hits are positive and an unbiased variance estimator can be obtained. The procedure numbers observations sequentially from the random start to the end of the stratum and then continues from the beginning of the stratum until all units are numbered.

Beginning with the randomly chosen starting unit, Chromy's method partitions the ordered stratum sampling frame into n h zones of equal size. There is one selection from each zone and a total of n h selections or hits, although fewer than n h distinct units may be selected. Beginning with the random start, the procedure accumulates the expected number of hits and computes

click to expand

where ES hi represents the expected number of hits for unit i in stratum h ; Int denotes the integer part of the number; and Frac denotes the fractional part.

Considering each unit sequentially, Chromy's method determines the actual number of hits for unit i by comparing the total number of hits for the first i ˆ’ 1 units,

click to expand

with the value of I h ( i ˆ’ 1) .

If T h ( i ˆ’ 1) = I h ( i ˆ’ 1) , Chromy's method determines the total number of hits for the first i units as follows. If F hi = 0 or F h ( i ˆ’ 1) > F hi , then T hi = I hi . Otherwise, T hi = I hi + 1 with probability

click to expand

And the number of hits for unit i equals T hi ˆ’ T h ( i ˆ’ 1) .

If T h ( i ˆ’ 1) = I h ( i ˆ’ 1) + 1 , Chromy's method determines the total number of hits for the first i units as follows. If F hi = 0, then T hi = I hi . If F hi > F ( hi ˆ’ 1) , then T hi = I hi + 1. Otherwise, T hi = I hi + 1 with probability

Brewer's PPS Method

Brewer's method (METHOD=PPS_BREWER) selects two units from each stratum, with probability proportional to size and without replacement. The selection probability for unit i in stratum h equals 2 M hi /M h · = 2 Z hi .

Brewer's algorithm first selects a unit with probability

where

click to expand

Then a second unit is selected from the remaining units with probability

where unit i is the first unit selected. The joint selection probability for units i and j in stratum h equals

click to expand

Brewer's method requires that the relative size Z hi be less than 0.5 for all units. Refer to Cochran (1977, pp. 261-263) and Brewer (1963). Brewer's method yields the same selection probabilities and joint selection probabilities as Durbin's method. Refer to Cochran (1977) and Durbin (1967).

Murthy's PPS Method

Murthy's method (METHOD=PPS_MURTHY) selects two units from each stratum, with probability proportional to size and without replacement. The selection probability for unit i in stratum h equals

click to expand

where Z hi = M hi /M h · and

click to expand

Murthy's algorithm first selects a unit with probability Z hi . Then a second unit is selected from the remaining units with probability Z hj / (1 ˆ’ Z hi ), where unit i is the first unit selected. The joint selection probability for units i and j in stratum h equals

click to expand

Refer to Cochran (1977, pp. 263-265) and Murthy (1957).

Sampford's PPS Method

Sampford's method (METHOD=PPS_SAMPFORD) is an extension of Brewer's method that selects more than two units from each stratum, with probability proportional to size and without replacement. The selection probability for unit i in stratum h equals

click to expand

Sampford's method first selects a unit from stratum h with probability Z hi . Then subsequent units are selected with probability proportional to

and with replacement. If the same unit appears more than once in the sample of size n h , then Sampford's algorithm rejects that sample and selects a new sample. The sample is accepted if it contains n h distinct units.

The joint selection probability for units i and j in stratum h equals

click to expand

where

click to expand

where S ( m ) denotes all possible samples of size m , for m = 1, 2, , N h . The sum L m ( ij ) is defined similarly to L m but sums over all possible samples of size m that do not include units i and j , and

click to expand

Sampford's method requires that the relative size Z hi be less than 1 / n h for all units. Refer to Cochran (1977, pp. 262-263) and Sampford (1967).

Output Data Set

PROC SURVEYSELECT creates a SAS data set that contains the sample of selected units. You can specify the name of this output data set with the OUT= optioninthe PROC SURVEYSELECT statement. If you omit the OUT= option, the data set is named DATA n , where n is the smallest integer that makes the name unique.

By default, the output data set contains one observation for each unit selected for the sample. But if you specify the OUTALL option, the output data set includes all observations from the input data set. With OUTALL, the output data set also contains a variable to indicate each observation's selection status. The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected. The OUTALL option is available only for equal probability selection methods.

If you specify the OUTHITS option for methods that may select the same unit more than once (that is, methods that select with replacement or with minimum replacement), the output data set contains a separate observation for each selection. If you do not specify the OUTHITS option, the output data set contains only one observation for each selected unit, even if the unit is selected more than once, and the variable NumberHits contains the number of hits or selections for that unit.

The output data set contains design information and selection statistics, depending on the selection method and output options you specify. The output data set can include the following variables:

  • Selected , which indicates whether or not the observation is selected for the sample. This variable is included if you specify the OUTALL option. It equals 1 for an observation selected for the sample, and it equals 0 for an observation not selected.

  • STRATA variables, which you specify in the STRATA statement

  • Replicate , which is the sample replicate number. This variable is included when you request replicated sampling with the REP= option.

  • ID variables, which you name in the ID statement

  • CONTROL variables, which you specify in the CONTROL statement

  • Zone , which is the selection zone. This variable is included for METHOD=PPS_SEQ.

  • SIZE variable, which you specify in the SIZE statement

  • AdjustedSize , which is the adjusted size measure. This variable is included if you request adjusted sizes with the MINSIZE= option or the MAXSIZE= option.

  • Certain , which indicates certainty selection. This variable is included if you specify the CERTSIZE= option. It equals 1 for units included with certainty because their size measures exceed the certainty size measure. Otherwise, it equals 0.

  • NumberHits , which is the number of hits or selections. This variable is included for selection methods that are with replacement or with minimum replacement (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).

The output data set includes the following variables if you request a PPS selection method or if you specify the STATS option for other methods:

  • ExpectedHits , which is the expected number of hits or selections. This variable is included for selection methods that are with replacement or with minimum replacement, and so may select the same unit more than once (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS,and METHOD=PPS_SEQ).

  • SelectionProb , which is the probability of selection. This variable is included for selection methods that are without replacement.

  • SamplingWeight , which is the sampling weight. This variable equals the inverse of ExpectedHits or SelectionProb .

For METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select two units from each stratum with probability proportional to size, the output data set contains the following variable:

  • JtSelectionProb , which is the joint probability of selection for the two units selected from the stratum

If you request the JTPROBS option to compute joint probabilities of selection for METHOD=PPS or METHOD=PPS_SAMPFORD, then the output data set contains the following variables:

  • Unit , which is an identification variable that numbers the selected units sequentially within each stratum

  • JtProb_1 , JtProb_2 , JtProb_3 , ,where the variable JtProb_1 contains the joint probability of selection for the current unit and unit 1. Similarly, JtProb_2 contains the joint probability of selection for the current unit and unit 2, and so on.

If you request the JTPROBS option for METHOD=PPS_WR, then the output data set contains the following variables:

  • Unit , which is an identification variable that numbers the selected units sequentially within each stratum

  • JtHits_1 , JtHits_2 , JtHits_3 , , where the variable JtHits_1 contains the joint expected number of hits for the current unit and unit 1. Similarly, JtHits_2 contains the joint expected number of hits for the current unit and unit 2, and so on.

If you request the OUTSIZE option, the output data set contains the following variables. If you specify a STRATA statement, the output data set includes stratum-level values of these variables. Otherwise, the output data set contains population-level values of these variables.

  • MinimumSize , which is the minimum size measure specified with the MINSIZE= option. This variable is included if you request the MINSIZE= option.

  • MaximumSize , which is the maximum size measure specified with the MAXSIZE= option. This variable is included if you request the MAXSIZE= option.

  • CertaintySize , which is the certainty size measure specified with the CERTSIZE= option. This variable is included if you request the CERTSIZE= option.

  • Total , which is the total number of sampling units in the stratum. This variable is included if there is no SIZE statement.

  • TotalSize , which is the total of size measures in the stratum. This variable is included if there is a SIZE statement.

  • TotalAdjSize , which is the total of adjusted size measures in the stratum. This variable is included if there is a SIZE statement and if you request adjusted sizes with the MAXSIZE= optionortheMINSIZE= option.

  • SamplingRate , which is the sampling rate. This variable is included if you specify the SAMPRATE= option.

  • SampleSize , which is the sample size. This variable is included if you specify the SAMPSIZE= option, or if you specify METHOD=PPS_BREWER or METHOD=PPS_MURTHY, which select two units from each stratum.

If you request the OUTSEED option, the output data set contains the following variable:

  • InitialSeed , which is the initial seed for the stratum.

Displayed Output

By default, PROC SURVEYSELECT displays two tables that summarize the sample selection. You can suppress display of these tables by using the NOPRINT option.

PROC SURVEYSELECT creates an output data set that contains the units selected for the sample. The procedure does not display this output data set. Use PROC PRINT, PROC REPORT, or any other SAS reporting tool to display the output data set.

PROC SURVEYSELECT displays the following information in the 'Sample Selection Method' table:

  • Selection Method

  • Size Measure variable, if you specify a SIZE statement

  • Minimum Size Measure, if you specify the MINSIZE= option

  • Maximum Size Measure, if you specify the MAXSIZE= option

  • Certainty Size Measure, if you specify the CERTSIZE= option

  • Strata Variables, if you specify a STRATA statement

  • Control Variables, if you specify a CONTROL statement

  • type of Control Sorting, Serpentine or Nested, if you specify a CONTROL statement

PROC SURVEYSELECT displays the following information in the 'Sample Selection Summary' table:

  • Input Data Set name

  • Sorted Data Set name, if you specify the OUTSORT= option

  • Random Number Seed

  • Sample Size or Stratum Sample Size, if you specify the SAMPSIZE= n option

  • Sample Size Data Set, if you specify the SAMPSIZE= SAS-data-set option

  • Sampling Rate or Stratum Sampling Rate, if you specify the SAMPRATE= r option

  • Sampling Rate Data Set, if you specify the SAMPRATE= SAS-data-set option

  • Minimum Sample Size or Stratum Minimum Sample Size, if you specify the NMIN= option with the SAMPRATE= option

  • Maximum Sample Size or Stratum Maximum Sample Size, if you specify the NMAX= option with the SAMPRATE= option

  • Selection Probability, if you specify METHOD=SRS, METHOD=SYS,or METHOD=SEQ and do not specify a STRATA statement

  • Expected Number of Hits, if you specify METHOD=URS and do not specify a STRATA statement

  • Sampling Weight for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ) if you do not specify a STRATA statement

  • Number of Strata, if you specify a STRATA statement

  • Number of Replicates, if you specify the REP= option

  • Total Sample Size, if you specify a STRATA statement or the REP= option

  • Output Data Set name

ODS Table Names

PROC SURVEYSELECT assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, 'Using the Output Delivery System.'

Table 72.2: ODS Tables Produced in PROC SURVEYSELECT

ODS Table Name

Description

Statement

Option

Method

Sample selection method

PROC

default

Summary

Sample selection summary

PROC

default




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net