Syntax


The following statements are available in PROC SURVEYSELECT:

  • PROC SURVEYSELECT options ;

    • STRATA variables ;

    • CONTROL variables ;

    • SIZE variable ;

    • ID variables ;

The PROC SURVEYSELECT statement invokes the procedure and optionally identifies input and output data sets. It also specifies the selection method, the sample size, and other sample design parameters. The SURVEYSELECT statement is required.

The SIZE statement identifies the variable that contains the size measures. It is required for any selection method that is probability proportional to size (PPS).

The remaining statements are optional. The STRATA statement identifies a variable or set of variables that stratify the input data set. When you specify a STRATA statement, PROC SURVEYSELECT selects samples independently from the strata formed by the STRATA variables. The CONTROL statement identifies variables for ordering units within strata. It can be used for systematic and sequential sampling methods . The ID statement identifies variables to copy from the input data set to the output data set of selected units.

The rest of this section gives detailed syntax information for the CONTROL, ID, SIZE, and STRATA statements in alphabetical order after the description of the PROC SURVEYSELECT statement.

PROC SURVEYSELECT Statement

  • PROC SURVEYSELECT options ;

The PROC SURVEYSELECT statement invokes the procedure and optionally identifies input and output data sets. If you do not name a DATA= input data set, the procedure selects the sample from the most recently created SAS data set. If you do not name an OUT= output data set to contain the sample of selected units, the procedure still creates an output data set and names it according to the DATA n convention.

The PROC SURVEYSELECT statement also specifies the sample selection method, the sample size, and other sample design parameters. If you do not specify a selection method, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) if there is no SIZE statement. If you specify a SIZE statement but do not specify a selection method, PROC SURVEYSELECT uses probability proportional to size selection without replacement (METHOD=PPS). You must specify the sample size or sampling rate unless you request a method that selects two units from each stratum (METHOD=PPS_BREWER or METHOD=PPS_MURTHY).

You can use the SAMPSIZE= n option to specify the sample size, or you can use the SAMPSIZE= SAS-data-set option to name a secondary input data set that contains stratum sample sizes. You can also specify stratum sampling rates, minimum size measures, maximum size measures, and certainty size measures in the secondary input data set. See the descriptions of the SAMPSIZE=, SAMPRATE=, MINSIZE=, MAXSIZE=, and CERTSIZE= options. You can name only one secondary input data set in each invocation of the procedure.

The following table lists the options available with the PROC SURVEYSELECT statement. Descriptions follow in alphabetical order.

Table 72.1: PROC SURVEYSELECT Statement Options

Task

Options

Specify the input data set

DATA=

Specify output data sets

OUT=

OUTSORT=

Suppress displayed output

NOPRINT

Specify selection method

METHOD=

Specify sample size

SAMPSIZE=

SELECTALL

Specify sampling rate

SAMPRATE=

NMIN=

NMAX=

Specify number of replicates

REP=

Adjust size measures

MINSIZE=

MAXSIZE=

Specify certainty size measures

CERTSIZE=

Specify sorting type

SORT =

Specify random number seed

SEED=

Control OUT= contents

JTPROBS

OUTALL

OUTHITS

OUTSEED

OUTSIZE

STATS

You can specify the following options in the PROC SURVEYSELECT statement:

CERTSIZE

  • requests automatic selection of those units with size measures greater than or equal to the stratum certainty size measure. You provide sampling unit size measures in the DATA= input data set variable named in the SIZE statement. And you provide the stratum certainty size measures in the secondary input data set variable _CERTSIZE_ . Use the CERTSIZE option when you have already named the secondary input data set in another option, such as SAMPSIZE= SAS-data-set , SAMPRATE= SAS-data-set , MAXSIZE= SAS-data-set ,orMINSIZE= SAS-data-set . You can name only one secondary input data set in each invocation of the procedure.

  • If any unit's size measure is greater than or equal to the certainty size measure for its stratum, then PROC SURVEYSELECT selects this unit with certainty. Each certainty size measure must be a positive number. The CERTSIZE option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

  • If you want to specify a single certainty size measure in the PROC SURVEYSELECT statement, use the CERTSIZE= certain option.

CERTSIZE= certain

  • specifies the certainty size measure. PROC SURVEYSELECT selects with certainty any unit with size measure greater than or equal to the value certain ,whichmust be a positive number. You provide size measures in the DATA= input data set variable named in the SIZE statement. This option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

  • If you request a stratified sample design with a STRATA statement and specify the CERTSIZE= option, PROC SURVEYSELECT uses the certainty size certain for all strata. If you do not want to use the same certainty size for all strata, use the CERTSIZE= SAS-data-set option to specify a certainty size for each stratum.

CERTSIZE= SAS-data-set

  • names a SAS data set that contains the certainty size measures for the strata. PROC SURVEYSELECT selects with certainty any unit with size measure greater than or equal to the certainty size measure for its stratum. You provide sampling unit size measures in the DATA= input data set variable named in the SIZE statement. And you provide the stratum certainty size measures in the CERTSIZE= input data set variable _CERTSIZE_ . Each certainty size measure must be a positive number. This option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

  • The CERTSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the CERTSIZE= data set as in the DATA= data set. The CERTSIZE= data set must include a variable named _CERTSIZE_ that contains the certainty size measure for each stratum.

CERTSIZE=P= p

  • specifies the certainty proportion. PROC SURVEYSELECT selects with certainty any unit with size measure greater than or equal to the proportion p of the total size for all units in the stratum. The procedure repeats this process with the remaining units until no more certainty units are selected. You provide size measures in the DATA= input data set variable named in the SIZE statement. This option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

  • The certainty proportion must be a positive number. You can specify p as a number between 0 and 1. Or you can specify p in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

  • If you request a stratified sample design with a STRATA statement and specify the CERTSIZE=P= option, PROC SURVEYSELECT uses the same certainty proportion p for all strata.

DATA= SAS-data-set

  • names the SAS data set from which PROC SURVEYSELECT selects the sample. If you omit the DATA= option, the procedure uses the most recently created SAS data set. In sampling terminology, the input data set is the sampling frame , or list of units from which the sample is selected.

JTPROBS

  • includes joint probabilities of selection in the OUT= output data set. This option is available for the following probability proportional to size selection methods: METHOD=PPS, METHOD=PPS_SAMPFORD,andMETHOD=PPS_WR. By default, PROC SURVEYSELECT outputs joint selection probabilities for METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select two units per stratum.

  • For details on computation of joint selection probabilities for a particular sampling method, see the method description in the section 'Sample Selection Methods' on page 4446. For more information on the contents of the output data set, see the section 'Output Data Set' on page 4456.

MAXSIZE

  • requests that sampling unit size measures be adjusted according to the stratum maximum size measures in the secondary input data set. You provide sampling unit size measures in the DATA= input data set variable named in the SIZE statement. And you provide the stratum maximum size measures in the secondary input data set variable _MAXSIZE_ . Use the MAXSIZE option when you have already named the secondary input data set in another option, such as SAMPSIZE= SAS-data-set , SAMPRATE= SAS-data-set , MINSIZE= SAS-data-set , or CERTSIZE= SAS-data-set . You can name only one secondary input data set in each invocation of the procedure.

  • If any size measure exceeds the maximum size measure for its stratum, then PROC SURVEYSELECT adjusts this size measure downward to equal the maximum size measure. Each maximum size measure must be a positive number. The MAXSIZE option is available whenever you specify a SIZE statement for probability proportional to size selection and a STRATA statement for stratification.

  • If you want to specify a single maximum size value in the PROC SURVEYSELECT statement, use the MAXSIZE= max option.

MAXSIZE= max

  • specifies the maximum size measure allowed. If any size measure exceeds the value max , then PROC SURVEYSELECT adjusts this size measure to equal max ,which must be a positive number. You provide size measures in the DATA= input data set variable named in the SIZE statement. This option is available whenever you specify a SIZE statement for selection with probability proportional to size.

  • If you request a stratified sample design with a STRATA statement and specify the MAXSIZE= option, PROC SURVEYSELECT uses the maximum size max for all strata. If you do not want to use the same maximum size for all strata, use the MAXSIZE= SAS-data-set option to specify a maximum size for each stratum.

MAXSIZE= SAS-data-set

  • names a SAS data set that contains the maximum size measures allowed for the strata. If any size measure exceeds the maximum size measure for its stratum, then PROC SURVEYSELECT adjusts this size measure downward to equal the maximum size measure. You provide sampling unit size measures in the DATA= input data set variable named in the SIZE statement. And you provide the stratum maximum size measures in the MAXSIZE= input data set variable _MAXSIZE_ . Each maximum size measure must be a positive number. This option is available whenever you specify a SIZE statement for probability proportional to size selection and a STRATA statement for stratified selection.

  • The MAXSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the MAXSIZE= data set as in the DATA= data set. The MAXSIZE= data set must include a variable named _MAXSIZE_ that contains the maximum size measure for each stratum.

METHOD= name

M= name

  • specifies the method for sample selection. If you do not specify the METHOD= option, by default, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) if there is no SIZE statement. If you specify a SIZE statement, the default selection method is probability proportional to size without replacement (METHOD=PPS). Valid values for name are as follows :

    PPS

    requests selection with probability proportional to size and without replacement. See the section 'PPS Sampling without Replacement' on page 4449 for details. If you specify METHOD=PPS, you must name the size measure variable in the SIZE statement.

    PPS_BREWER BREWER

    requests selection according to Brewer's method. Brewer's method selects two units from each stratum with probability proportional to size and without replacement. See the section 'Brewer's PPS Method' on page 4453 for details. If you specify METHOD=PPS_BREWER, you must name the size measure variable in the SIZE statement. You do not need to specify the sample size with the SAMPSIZE= option, since Brewer's method selects two units from each stratum.

    PPS_MURTHY MURTHY

    requests selection according to Murthy's method. Murthy's method selects two units from each stratum with probability proportional to size and without replacement. See the section 'Murthy's PPS Method' on page 4454 for details. If you specify METHOD=PPS_MURTHY, you must name the size measure variable in the SIZE statement. You do not need to specify the sample size with the SAMPSIZE= option, since Murthy's method selects two units from each stratum.

    PPS_SAMPFORD SAMPFORD

    requests selection according to Sampford's method. Sampford's method selects units with probability proportional to size and without replacement. See the section 'Sampford's PPS Method' on page 4455 for details. If you specify METHOD=PPS_SAMPFORD, you must name the size measure variable in the SIZE statement.

    PPS_SEQ CHROMY

    requests sequential selection with probability proportional to size and with minimum replacement. This method is also known as Chromy's method. See the section 'PPS Sequential Sampling' on page 4452 for details. If you specify METHOD=PPS_SEQ, you must name the size measure variable in the SIZE statement.

    PPS_SYS

    requests systematic selection with probability proportional to size. See the section 'PPS Systematic Sampling' on page 4451 for details on this method. If you specify METHOD=PPS_SYS, you must name the size measure variable in the SIZE statement.

    PPS_WR

    requests selection with probability proportional to size and with replacement. See the section 'PPS Sampling with Replacement' on page 4451 for details on this method. If you specify METHOD=PPS_WR, you must name the size measure variable in the SIZE statement.

    SEQ

    requests sequential selection according to Chromy's method. If you specify METHOD=SEQ and do not specify a size measure variable with the SIZE statement, PROC SURVEYSELECT uses sequential zoned selection with equal probability and without replacement. See the section 'Sequential Random Sampling' on page 4448 for details on this method. If you specify METHOD=SEQ and also name a size measure variable in the SIZE statement, PROC SURVEYSELECT uses METHOD=PPS_SEQ, which is sequential selection with probability proportional to size and with minimum replacement. See the section 'PPS Sequential Sampling' on page 4452 for details on this method.

    SRS

    requests simple random sampling, which is selection with equal probability and without replacement. See the section 'Simple Random Sampling' on page 4447 for details. This method is the default if you do not specify the METHOD= option and also do not specify a SIZE statement.

    SYS

    requests systematic random sampling. If you specify METHOD=SYS and do not specify a size measure variable with the SIZE statement, PROC SURVEYSELECT uses systematic selection with equal probability. See the section 'Systematic Random Sampling' on page 4448 for details on this method. If you specify METHOD=SYS and also name a size measure variable in the SIZE statement, PROC SURVEYSELECT uses METHOD=PPS_SYS, which is systematic selection with probability proportional to size. See the section 'PPS Systematic Sampling' on page 4451 for details.

    URS

    requests unrestricted random sampling, which is selection with equal probability and with replacement. See the section 'Unrestricted Random Sampling' on page 4447 for details.

MINSIZE

  • requests that sampling unit size measures be adjusted according to the stratum minimum size measures in the secondary input data set. You provide sampling unit size measures in the DATA= input data set variable named in the SIZE statement. And you provide the stratum minimum size measures in the secondary input data set variable _MINSIZE_ . Use the MINSIZE option when you have already named the secondary input data set in another option, such as SAMPSIZE= SAS-data-set , SAMPRATE= SAS-data-set , MAXSIZE= SAS-data-set , or CERTSIZE= SAS-data-set . You can name only one secondary input data set in each invocation of the procedure.

  • If any size measure is less than the minimum size measure for its stratum, then PROC SURVEYSELECT adjusts this size measure upward to equal the minimum size measure. Each minimum size measure must be a positive number. The MINSIZE option is available whenever you specify a SIZE statement for probability proportional to size selection and a STRATA statement for stratification.

  • If you want to specify a single minimum size value in the PROC SURVEYSELECT statement, use the MINSIZE= min option.

MINSIZE= min

  • specifies the minimum size measure allowed. If any size measure is less than the value min , then PROC SURVEYSELECT adjusts this size measure upward to equal min , which must be a positive number. You provide size measures in the DATA= input data set variable named in the SIZE statement. This option is available whenever you specify a SIZE statement for selection with probability proportional to size.

  • If you request a stratified sample design with a STRATA statement and specify the MINSIZE= option, PROC SURVEYSELECT uses the minimum size min for all strata. If you do not want to use the same minimum size for all strata, use the MINSIZE= SAS-data-set option to specify a minimum size for each stratum.

MINSIZE= SAS-data-set

  • names a SAS data set that contains the minimum size measures allowed for the strata. If any size measure is less than the minimum size measure for its stratum, then PROC SURVEYSELECT adjusts this size measure upward to equal the minimum size measure. You provide sampling unit size measures in the DATA= input data set variable namedintheSIZE statement. And you provide the stratum minimum size measures in the MINSIZE= input data set variable _MINSIZE_ . Each minimum size measure must be a positive number. This option is available whenever you specify a SIZE statement for probability proportional to size selection and a STRATA statement for stratified selection.

  • The MINSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the MINSIZE= data set as in the DATA= data set. The MINSIZE= data set must include a variable named _MINSIZE_ that contains the minimum size measure for each stratum.

NMAX= n

  • specifies the maximum stratum sample size n for the SAMPRATE= option. When you specify the SAMPRATE= option, PROC SURVEYSELECT calculates the desired stratum sample size from the specified sampling rate and the total number of units in the stratum. If this sample size is greater than the value NMAX= n , then PROC SURVEYSELECT selects the maximum of n units.

  • The maximum sample size n must be a positive integer. The NMAX= option is available only with the SAMPRATE= option, which may be used with equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS,and METHOD=SEQ).

NMIN= n

  • specifies the minimum stratum sample size n for the SAMPRATE= option. When you specify the SAMPRATE= option, PROC SURVEYSELECT calculates the desired stratum sample size from the specified sampling rate and the total number of units in the stratum. If this sample size is less than the value NMIN= n , then PROC SURVEYSELECT selects the minimum of n units.

  • The minimum sample size n must be a positive integer. The NMIN= option is available only with the SAMPRATE= option, which may be used with equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS,and METHOD=SEQ).

NOPRINT

  • suppresses the display of all output. You can use the NOPRINT option when you want only to create an output data set. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, 'Using the Output Delivery System.'

OUT= SAS-data-set

  • names the output data set that contains the sample. If you omit the OUT= option, the data set is named DATA n , where n is the smallest integer that makes the name unique.

  • The output data set contains the units selected for the sample, as well as design information and selection statistics, depending on the selection method and output options you specify. See the descriptions for the options JTPROBS, OUTHITS, OUTSEED, OUTSIZE,andSTATS. For information on the contents of the output data set, see the section 'Output Data Set' on page 4456.

  • By default, the output data set contains only those units selected for the sample. To include all observations from the input data set in the output data set, use the OUTALL option.

OUTALL

  • includes all observations from the input data set in the output data set. By default, the output data set includes only those observations selected for the sample. When you specify the OUTALL option, the output data set includes all observations from DATA= and also contains a variable to indicate each observation's selection status. The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected. For information on the contents of the output data set, see the section 'Output Data Set' on page 4456.

  • The OUTALL option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS,andMETHOD=SEQ).

OUTHITS

  • includes a separate observation in the output data set for each selection when the same unit is selected more than once. By default, the output data set contains only one observation for each selected unit, even if it is selected more than once, and the variable NumberHits contains the number of hits or selections for that unit. The OUTHITS option is available for selection methods that select with replacement or with minimum replacement (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).

OUTSEED

  • includes the initial seed for each stratum in the output data set. The variable InitialSeed contains the stratum initial seed. See the section 'Sample Selection Methods' on page 4446 for information on initial seeds and random number generation in PROC SURVEYSELECT.

  • To reproduce the same sample for any stratum in a subsequent execution of PROC SURVEYSELECT, you can specify the same stratum initial seed with the SEED= SAS-data-set option, along with the same sample selection parameters.

OUTSIZE

  • includes additional design and sampling frame parameters in the output data set. If you specify the OUTSIZE option, PROC SURVEYSELECT includes the sample size or sampling rate in the output data set. When you request the OUTSIZE option and also specify the SIZE statement, the procedure outputs the size measure total for the sampling frame. If you do not specify the SIZE statement, the procedure outputs the total number of sampling units in the frame. Also, PROC SURVEYSELECT includes the minimum size measure if you specify the MINSIZE= option, the maximum size measure if you specify the MAXSIZE= option, and the certainty size measure if you specify the CERTSIZE= option.

  • If you have a stratified design, the output data set includes the stratum-level values of these parameters. Otherwise, the output data set includes the overall population-level values.

  • For information on the contents of the output data set, see the section 'Output Data Set' on page 4456.

OUTSORT= SAS-data-set

  • names an output data set that contains the sorted input data set. This option is available when you specify a CONTROL statement for systematic or sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ,and METHOD=PPS_SEQ). PROC SURVEYSELECT sorts the input data set by the CONTROL variables within strata before selecting the sample.

  • If you specify CONTROL variables but do not name an output data set with the OUTSORT= option, then the sorted data set replaces the input data set.

REP= nrep

  • specifies the number of sample replicates. If you specify the REP= option, PROC SURVEYSELECT selects nrep independent samples, each with the same specified sample size or sampling rate and the same sample design.

  • You can use replicated sampling to provide a simple method of variance estimation for any form of statistic, as well as to evaluate variable nonsampling errors such as interviewer differences. Refer to Lohr (1999), Kish (1965, 1987), and Kalton (1983) for information on replicated sampling.

SAMPRATE= r

RATE= r

  • specifies the sampling rate, which is the proportion of units selected for the sample. The sampling rate r must be a positive number. You can specify r as a number between 0 and 1. Or you can specify r in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

  • The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS,andMETHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the sampling rate r as the interval. See the section 'Systematic Random Sampling' on page 4448 for details. For other selection methods, PROC SURVEYSELECT converts the sampling rate r to the sample size before selection, multiplying the rate by the number of units in the stratum or frame and rounding up to the nearest integer.

  • If you request a stratified sample design with a STRATA statement and specify the SAMPRATE= r option, PROC SURVEYSELECT uses the sampling rate r for each stratum. If you do not want to use the same sampling rate for each stratum, use the SAMPRATE=( values ) option or the SAMPRATE= SAS-data-set option to specify a sampling rate for each stratum.

SAMPRATE=( values )

RATE=( values )

  • specifies sampling rates for the strata. You can separate values with blanks or commas. The number of SAMPRATE= values must equal the number of strata in the input data set.

  • List the stratum sampling rate values in the order in which the strata appear in the input data set. If you use the SAMPRATE=( values ) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED options in the STRATA statement.

  • Each stratum sampling rate value must be a positive number. You can specify each value as a number between 0 and 1. Or you can specify a value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

  • The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS,andMETHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the interval for the stratum. See the section 'Systematic Random Sampling' on page 4448 for details on systematic sampling. For other selection methods, PROC SURVEYSELECT converts the stratum sampling rate to a stratum sample size before selection, multiplying the rate by the number of units in the stratum and rounding up to the nearest integer.

SAMPRATE= SAS-data-set

RATE= SAS-data-set

  • names a SAS data set that contains sampling rates for the strata. This input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SAMPSIZE= data set as in the DATA= data set. The SAMPRATE= data set should have a variable _RATE_ that contains the sampling rate for each stratum.

  • Each sampling rate value must be a positive number. You can specify each value as a number between 0 and 1. Or you can specify a value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

  • The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS,andMETHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the interval for the stratum. See the section 'Systematic Random Sampling' on page 4448 for details. For other selection methods, PROC SURVEYSELECT converts the stratum sampling rate to the stratum sample size before selection, multiplying the rate by the number of units in the stratum and rounding up to the nearest integer.

SAMPSIZE= n

N= n

  • specifies the sample size, which is the number of units selected for the sample. The sample size n must be a positive integer. For methods that select without replacement, thesamplesize n must not exceed the number of units in the input data set.

  • If you request a stratified sample design with a STRATA statement and specify the SAMPSIZE= n option, PROC SURVEYSELECT selects n units from each stratum. For methods that select without replacement, the sample size n must not exceed the number of units in any stratum. If you do not want to select the same number of units from each stratum, use the SAMPSIZE=( values ) option or the SAMPSIZE= SAS-data-set option to specify different sample sizes for the strata.

  • For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units in the stratum. However, you can change this default by specifying the SELECTALL option. With the SELECTALL option, PROC SURVEYSELECT selects all stratum units whenever the stratum sample size exceeds the number of units in the stratum.

SAMPSIZE=( values )

N=( values )

  • specifies sample sizes for the strata. You can separate values with blanks or commas. The number of SAMPSIZE= values must equal the number of strata in the input data set.

  • List the stratum sample size values in the order in which the strata appear in the input data set. If you use the SAMPSIZE=( values ) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED options in the STRATA statement.

  • Each stratum sample size value must be a positive integer. For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units in the stratum. However, you can change this default by specifying the SELECTALL option. With the SELECTALL option, PROC SURVEYSELECT selects all stratum units whenever the stratum sample size exceeds the number of units in the stratum.

SAMPSIZE= SAS-data-set

N= SAS-data-set

  • names a SAS data set that contains the sample sizes for the strata. This input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SAMPSIZE= data set as in the DATA= data set. The SAMPSIZE= data set should have a variable _NSIZE_ that contains the sample size for each stratum.

  • Each stratum sample size value must be a positive integer. For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units in the stratum. However, you can change this default by specifying the SELECTALL option. With the SELECTALL option, PROC SURVEYSELECT selects all stratum units whenever the stratum sample size exceeds the number of units in the stratum.

SEED= number

  • specifies the initial seed for random number generation. The value of the SEED= option must be an integer. If you do not specify the SEED= option, or if the SEED= value is negative or zero, PROC SURVEYSELECT uses the time of day from the computer's clock to obtain the initial seed. See the section 'Sample Selection Methods' on page 4446 for more information.

  • Whether or not you specify the SEED= option, PROC SURVEYSELECT displays the value of the initial seed in the 'Sample Selection Summary' table. If you need to reproduce the same sample in a subsequent execution of PROC SURVEYSELECT, you can specify this same seed value with the SEED= option, along with the same sample selection parameters, and PROC SURVEYSELECT will reproduce the sample.

  • If you request a stratified sample design with a STRATA statement, you can use the SEED= SAS-data-set option to specify an initial seed for each stratum. Otherwise, PROC SURVEYSELECT generates random numbers continuously across strata from the random number stream initialized by the SEED= value, as described in the section 'Sample Selection Methods' on page 4446.

  • To include the stratum initial seeds in the output data set, use the OUTSEED option.

SEED= SAS-data-set

  • names a SAS data set that contains initial seeds for the strata. This input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SAMPSIZE= data set as in the DATA= data set. The SEED= data set should have a variable _SEED_ that contains the initial seed for each stratum.

  • Each stratum initial seed value should be an integer. If the initial seed value for the first stratum is not a positive integer, PROC SURVEYSELECT uses the time of day from the computer's clock to obtain the initial seed. If the initial seed value for a subsequent stratum is not a positive integer, PROC SURVEYSELECT continues to use the random number stream already initialized by the seed for the previous stratum. See the section 'Sample Selection Methods' on page 4446 for more information.

  • To include the stratum initial seeds in the output data set, specify the OUTSEED option.

  • If you specified initial seeds by strata with the SEED= SAS-data-set option, you can reproduce the same sample in a subsequent execution of PROC SURVEYSELECT by specifying these same stratum initial seeds, along with the same sample selection parameters. If you need to reproduce the same sample for only a subset of the strata, you can use the same initial seeds for those strata in the subset.

SELECTALL

  • requests that PROC SURVEYSELECT select all stratum units whenever the stratum sample size exceeds the total number of units in the stratum, for without-replacement selection methods. By default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units in the stratum, for methods that select without replacement.

  • The SELECTALL option is available for without-replacement selection methods, which include METHOD=SRS, METHOD=SYS, METHOD=SEQ, METHOD=PPS,andMETHOD=PPS_SAMPFORD. The SELECTALL option is not available for with-replacement selection methods, with-minimum-replacement methods, or for those PPS methods that select two units per stratum.

SORT=NEST SERP

  • specifies the type of sorting by CONTROL variables. The option SORT=NEST requests nested sorting, and SORT=SERP requests hierarchic serpentine sorting. The default is SORT=SERP. See the section 'Sorting by CONTROL Variables' on page 4445 for descriptions of serpentine and nested sorting. Where there is only one CONTROL variable, the two types of sorting are equivalent.

  • This option is available when you specify a CONTROL statement for systematic or sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ,andMETHOD=PPS_SEQ). When you specify a CONTROL statement, PROC SURVEYSELECT sorts the input data set by the CONTROL variables within strata before selecting the sample.

  • With sorting by CONTROL variables, you can also use the OUTSORT= option to name an output data set that contains the sorted input data set. Otherwise, if you do not specify the OUTSORT= option, then the sorted data set replaces the input data set.

STATS

  • includes selection probabilities and sampling weights in the OUT= output data set for equal probability selection methods when you do not specify a STRATA statement. This option is available for the following equal probability selection methods: METHOD=SRS, METHOD=URS, METHOD=SYS,andMETHOD=SEQ. For PPS selection methods and stratified designs, the output data set contains selection probabilities and sampling weights by default. For more information on the contents of the output data set, see the section 'Output Data Set' on page 4456.

CONTROL Statement

  • CONTROL variables ;

The CONTROL statement names variables for sorting the input data set. The CONTROL variables can be character or numeric.

PROC SURVEYSELECT sorts the input data set by the CONTROL variables before selecting the sample. If you also specify a STRATA statement, PROC SURVEYSELECT sorts by CONTROL variables within strata. Control sorting is available for systematic and sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ,andMETHOD=PPS_SEQ).

By default, PROC SURVEYSELECT uses hierarchic serpentine sorting by the CONTROL variables. If you specify the SORT=NEST option, the procedure uses nested sorting. See the description for the SORT= option. For more information on serpentine and nested sorting, see the section 'Sorting by CONTROL Variables' on page 4445.

You can use the OUTSORT= option to name an output data set that contains the sorted input data set. If you do not specify the OUTSORT= option when you use the CONTROL statement, then the sorted data set replaces the input data set.

ID Statement

  • ID variables ;

The ID statement names variables from the DATA= input data set to be included in the OUT= data set of selected units. If there is no ID statement, PROC SURVEYSELECT includes all variables from the DATA= data set in the OUT= data set. The ID variables can be character or numeric.

SIZE Statement

  • SIZE variable ;

The SIZE statement names one and only one size measure variable, which contains the size measures to be used when sampling with probability proportional to size. The SIZE variable must be numeric. When the value of an observation's SIZE variable is missing or nonpositive, that observation has no chance of being selected for the sample.

The SIZE statement is required for all PPS selection methods, which include METHOD=PPS, METHOD=PPS_BREWER, METHOD=PPS_MURTHY, METHOD=PPS_SAMPFORD, METHOD=PPS_SEQ, METHOD=PPS_SYS, and METHOD=PPS_WR. For details on how size measures are used, see the descriptions of PPS methods in the section 'Sample Selection Methods' on page 4446.

Note that a unit's size measure, specified in the SIZE statement and used for PPS selection, is not the same as the sample size. The sample size is the number of units selected for the sample, and you can specify this with the SAMPSIZE= option.

STRATA Statement

  • STRATA variables ;

You can specify a STRATA statement with PROC SURVEYSELECT to partition the input data set into nonoverlapping groups defined by the STRATA variables. PROC SURVEYSELECT then selects independent samples from these strata, according to the selection method and design parameters specified in the PROC SURVEYSELECT statement. For information on the use of stratification in sample design, refer to Lohr (1999), Kalton (1983), Kish (1965, 1987), and Cochran (1977).

The variables are one or more variables in the input data set. The STRATA variables function much like BY variables, and PROC SURVEYSELECT expects the input data set to be sorted in order of the STRATA variables.

If you specify a CONTROL statement, or if you specify METHOD=PPS, the input data set must be sorted in ascending order of the STRATA variables. This means you cannot use the STRATA option NOTSORTED or DESCENDING when you specify a CONTROL statement or METHOD=PPS.

If your input data set is not sorted by the STRATA variables in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with the STRATA variables in a BY statement.

  • Specify the option NOTSORTED or DESCENDING in the STRATA statement for the SURVEYSELECT procedure (when you do not specify a CONTROL statement or METHOD=PPS). The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the STRATA variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the STRATA variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net