Syntax | SAS.STAT 9.1 Users Guide (Vol. 6)

The following statements are available in PROC SURVEYFREQ.

PROC SURVEYFREQ < options > ;
- BY variables ;
- CLUSTER variables ;
- STRATA variables < / option > ;
- TABLES requests < / options > ;
- WEIGHT variable ;

The PROC SURVEYFREQ statement invokes the procedure, identifies the data set to be analyzed , and provides sample design information. The PROC SURVEYFREQ statement is required.

The TABLES statement specifies frequency or crosstabulation tables and requests tests and statistics for those tables. The STRATA statement lists the variables that form the strata in a stratified sample design. The CLUSTER statement specifies cluster identification variables in a clustered sample design. The WEIGHT statement names the sampling weight variable. You can use a BY statement with PROC SURVEYFREQ to obtain separate analyses for groups defined by the BY variables.

All statements can appear multiple times except the PROC SURVEYFREQ statement and the WEIGHT statement, which can appear only once.

The rest of this section gives detailed syntax information for the BY, CLUSTER, STRATA, TABLES, and WEIGHT statements in alphabetical order after the description of the PROC SURVEYFREQ statement.

PROC SURVEYFREQ Statement

PROC SURVEYFREQ < options > ;

The PROC SURVEYFREQ statement invokes the procedure. In this statement, you identify the data set to be analyzed and specify sample design information. The DATA= option names the input data set to be analyzed. If your analysis includes a finite population correction factor, you can input either the sampling rate or the population total using the RATE= or TOTAL= option. If your design is stratified, with different sampling rates or totals for different strata, then you can input these stratum rates or totals in a SAS data set containing the stratification variables.

You can specify the following options in the PROC SURVEYFREQ statement:

DATA= SAS-data-set

names the SAS data set to be analyzed by PROC SURVEYFREQ. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MISSING

requests that the procedure treat missing values as a valid category for all categorical variables, which include TABLES variables, STRATA variables, and CLUSTER variables. For more information, see the section 'Missing Values' on page 4205.

NOSUMMARY

suppresses the display of the Data Summary table, which PROC SURVEYFREQ produces by default. For a description of this table, see the section 'Data and Sample Design Summary Table' on page 4225.

ORDER=DATA FORMATTED FREQ INTERNAL

specifies the order in which the values of the frequency and crosstabulation table variables are to be reported . The following table shows how PROC SURVEYFREQ interprets values of the ORDER= option:

DATA	orders values according to their order in the input data set.
FORMATTED	orders values by their formatted values. This order is operating-environment dependent. By default, the order is ascending .
FREQ	orders values by descending frequency count. The frequency count of a variable value is its (nonweighted) frequency of occurrence or sample size , and not its weighted frequency.
INTERNAL	orders values by their unformatted values, which yields the same order that the SORT procedure does. This order is operating-environment dependent.

By default, ORDER=INTERNAL.

PAGE

displays only one table per page. Otherwise, PROC SURVEYFREQ displays multiple tables per page as space permits .

RATE= value SAS-data-set

R= value SAS-data-set

specifies the sampling rate as a nonnegative value , or identifies an input data set that gives the stratum sampling rates in a variable named _RATE_ . The procedure uses this information to compute a finite population correction for variance estimation. If your sample design has multiple stages, you should specify the first-stage sampling rate , which is the ratio of the number of PSUs selected to the total number of PSUs in the population.
For a nonstratified sample design, or for a stratified sample design with the same sampling rate in all strata, you should specify a nonnegative value for the RATE= option. If your design is stratified with different sampling rates in the strata, then you should name a SAS data set that contains the stratification variables and the sampling rates. See the section 'Population Totals and Sampling Rates' on page 4204 for more details.
The sampling rate value must be a nonnegative number. You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYFREQ will convert that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.
If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.

TOTAL= value SAS-data-set

N= value SAS-data-set

specifies the total number of primary sampling units (PSUs) in the study population as a positive value , or identifies an input data set that gives the stratum population totals in a variable named _TOTAL_ . The procedure uses this information to compute a finite population correction for variance estimation.
For a nonstratified sample design, or for a stratified sample design with the same population total in all strata, you should specify a positive value for the TOTAL= option. If your sample design is stratified with different population totals in the strata, then you should name a SAS data set that contains the stratification variables and the population totals. See the section 'Population Totals and Sampling Rates' on page 4204 for more details.
If you do not specify the TOTAL= option or the RATE= option, then the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= option and the RATE= option.

BY Statement

BY variables ;

You can specify a BY statement with PROC SURVEYFREQ to obtain separate analyses on observations in groups defined by the BY variables. The variables are one or more variables in the input data set.

Note that using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid subpopulation or domain analysis, the difference being that in domain analysis the total number of units in the subpopulation is not known with certainty . You should include the domain variable(s) in your TABLES request to obtain domain analysis. See the section 'Domain Analysis' on page 4205 for more details.

If you specify more than one BY statement, the procedure uses only the last BY statement and ignores any previous BY statements.

When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives:

Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the FREQ procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLUSTER Statement

CLUSTER variables ;

The CLUSTER statement names variables that identify the first-stage clusters, or PSUs, in a clustered sample design. The combinations of categories of CLUSTER variables define the clusters in the sample. If there is a STRATA statement, clusters are nested within strata.

If your sample design has clustering at multiple stages, you should specify only the first-stage clusters or primary sampling units (PSUs) in the CLUSTER statement. See the section 'Specifying the Sample Design' on page 4203 for more information.

The CLUSTER variables are one or more variables in the DATA= input data set. These variables can either be character or numeric, but the procedure treats them as categorical variables. The formatted values of the CLUSTER variables determine the CLUSTER variable levels. Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .

You can use multiple CLUSTER statements to specify CLUSTER variables. The procedure uses variables from all CLUSTER statements to create clusters.

STRATA Statement

STRATA variables < / option > ;

The STRATA statement names variables that form the strata in a stratified sample design. The combinations of categories of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently.

If your sample design has stratification at multiple stages, you should identify only the first-stage strata in the STRATA statement. See the section 'Specifying the Sample Design' on page 4203 for more information.

The STRATA variables are one or more variables in the DATA= input data set. These variables can be either character or numeric, but the procedure treats them as categorical. The formatted values of the STRATA variables determine the STRATA variable levels. Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .

You can specify the following option in the STRATA statement after a slash ( / ):

LIST

displays a 'Stratum Information' table, which lists all strata together with the corresponding values of the STRATA variables. This table provides the number of observations and number of clusters for each stratum, as well as the sampling fraction if you specify the RATE= or the TOTAL= option. See the section 'Stratum Information Table' on page 4225 for more information.

TABLES Statement

TABLES requests < / options > ;

The TABLES statement requests one-way to n -way frequency and crosstabulation tables and statistics for those tables.

If you omit the TABLES statement, PROC SURVEYFREQ generates one-way frequency tables for all data set variables that are not listed in the other statements.

The following argument is required in the TABLES statement.

requests

specify the frequency and crosstabulation tables to produce. A request is composed of one variable name or several variable names separated by asterisks . To request a one-way frequency table, use a single variable. To request a two-way crosstabulation table, use an asterisk between two variables. To request a multiway table (an n -way table, where n >2), separate the desired variables with asterisks. The unique values of these variables form the rows, columns , and layers of the table.
For two-way tables to multiway tables, the values of the last variable form the crosstabulation table columns, while the values of the next -to-last variable form the rows. Each level (or combination of levels) of the other variables forms one layer. PROC SURVEYFREQ produces a separate crosstabulation table for each layer. For example, a specification of A * B * C * D in a TABLES statement produces k tables, where k is the number of different combinations of levels for A and B . Each table lists the levels for D (columns) within each level of C (rows).

You can use multiple TABLES statements in the PROC SURVEYFREQ step. You can also specify any number of table requests in a single TABLES statement. To specify multiple table requests quickly, use a grouping syntax by placing parentheses around several variables and joining other variables or variable combinations. For example, the following statements illustrate grouping syntax:

Table 68.1: Grouping Syntax
Request	Equivalent to
tables A * ( BC );	tables A * B A * C ;
tables ( AB )*( CD );	tables A * C B * C A * D B * D ;
tables ( ABC )* D ;	tables A * D B * D C * D ;
tables A - - C ;	tables A B C ;
tables ( A - - C )* D ;	tables A * D B * D C * D ;

The TABLES statement variables are one or more variables from the DATA= input data set. These variables can be either character or numeric, but the procedure treats them as categorical variables. PROC SURVEYFREQ uses the formatted values of the TABLES variable to determine the categorical variable levels. So if you assign a format to a variable with a FORMAT statement, PROC SURVEYFREQ formats the values before dividing observations into the levels of a frequency or crosstabulation table. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .
The frequency or crosstabulation table lists the values of both character and numeric variables in ascending order based on internal (unformatted) variable values unless you change the order with the ORDER= option. To list the values in ascending order by formatted value, use ORDER=FORMATTED in the PROC SURVEYFREQ statement.

Without Options

If you request a frequency or crosstabulation table without specifying options, PROC SURVEYFREQ produces the following for each table level or cell :

frequency (sample size)
weighted frequency (estimated total)
standard error of weighted frequency
percentage (estimated proportion)
standard error of percentage

The table displays weighted frequencies if your analysis includes a WEIGHT statement, or if you specify the WTFREQ option in the TABLES statement. The table also displays the number of observations with missing values. See the section 'One-Way Frequency Tables' on page 4226 and the section 'Crosstabulation Tables' on page 4227 for more information.

Options

The following table lists the options available with the TABLES statement. Descriptions follow in alphabetical order.

Table 68.2: TABLES Statement Options
Option	Description
Control Statistical Analysis
ALPHA=	sets the level for confidence limits
CHISQ	requests Rao-Scott chi-square test
CHISQ1	requests Rao-Scott modified chi-square test
DDF=	specifies denominator DF for Wald chi-square tests
LRCHISQ	requests Rao-Scott likelihood ratio test
LRCHISQ1	requests Rao-Scott modified likelihood ratio test
TESTP=	specifies null proportions for one-way chi-square tests
WCHISQ	requests Wald chi-square test
WLLCHISQ	requests Wald log-linear chi-square test
Control Additional Table Information
CL	displays confidence limits for percents
CLWT	displays confidence limits for weighted frequencies
COL	displays column percents and standard errors
CV	displays coefficients of variation for percents
CVWT	displays coefficients of variation for weighted frequencies
DEFF	displays design effects for percents
EXPECTED	displays expected weighted frequencies for two-way tables
ROW	displays row percents and standard errors
VAR	displays variances for percents
VARWT	displays variances for weighted frequencies
WTFREQ	displays weighted frequencies and standard errors when there is no WEIGHT statement
Control Displayed Output
NOFREQ	suppresses display of frequency counts
NOPERCENT	suppresses display of percents
NOPRINT	suppresses display of tables but displays statistical tests
NOSPARSE	suppresses display of zero rows and columns in two-way tables
NOSTD	suppresses display of standard errors for all estimates
NOTOTAL	suppresses display of row and column totals
NOWT	suppresses display of weighted frequencies

You can specify the following options in a TABLES statement:

ALPHA= ±

sets the level for confidence limits. The value of the ALPHA= option must be between 0 and 1, and the default is 0.05. A confidence level of ± produces 100(1 ˆ’ ± )% confidence limits. The default of ALPHA=0.05 produces 95% confidence limits.
You request confidence limits for percentages with the CL option, and you request confidence limits for weighted frequencies with the CLWT option. See the section 'Confidence Limits' on page 4213 for more information.

CHISQ

requests the Rao-Scott chi-square test. This test applies a design effect correction to the Pearson chi-square statistic computed from the weighted frequencies. See the section 'Rao-Scott Chi-Square Test' on page 4216 for more information.
By default for one-way tables, the CHISQ option provides a design-based goodness-of-fit test for equal proportions. To compute the test for other null hypothesis proportions, specify the null proportions with the TESTP= option.
The CHISQ option uses proportion estimates to compute the design effect correction. To use null hypothesis proportions instead, specify the CHISQ1 option.

CHISQ1

requests the Rao-Scott modified chi-square test. This test applies a design effect correction to the Pearson chi-square statistic computed from the weighted frequencies, and bases the design effect correction on null hypothesis proportions. See the section 'Rao-Scott Chi-Square Test' on page 4216 for more information. To compute the design effect correction from proportion estimates instead of null proportions, specify the CHISQ option.
By default for one-way tables, the CHISQ option provides a design-based goodness-of-fit test for equal proportions. To compute the test for other null hypothesis proportions, specify the null proportions with the TESTP= option.

requests confidence limits for the percentages, or proportion estimates, in the crosstabulation table. PROC SURVEYFREQ determines the confidence coefficient from the ALPHA= option, which by default equals 0.05 and produces 95% confidence limits. See the section 'Confidence Limits' on page 4213 for more information.

CLWT

requests confidence limits for the weighted frequencies, or estimated totals, in the crosstabulation table. PROC SURVEYFREQ determines the confidence coefficient from the ALPHA= option, which by default equals 0.05 and produces 95% confidence limits. See the section 'Confidence Limits' on page 4213 for more information.

COL

displays the column percentage, or estimated proportion of the column total, for each cell in a two-way table. The COL option also displays the standard errors of the column percentages. See the section 'Row and Column Proportions' on page 4212 for more information. This option has no effect for one-way tables.

displays the coefficient of variation for each percentage, or proportion estimate, in the crosstabulation table. See the section 'Coefficient of Variation' on page 4214 for more information.

CVWT

displays the coefficient of variation for each weighted frequency, or estimated total, in the crosstabulation table. See the section 'Coefficient of Variation' on page 4214 for more information.

DDF= df

specifies the denominator degrees of freedom for the F -statistics used in the Wald chi-square tests. By default, the denominator degrees of freedom is the number of clusters minus the number of strata. See the section 'Wald Chi-Square Test' on page 4221 and the section 'Wald Log-Linear Chi-Square Test' on page 4223 for more information. You request the Wald chi-square test with the WCHISQ option, and you request the Wald log-linear chi-square test with the WLLCHISQ option.

DEFF

displays the design effect for each overall proportion estimate in the crosstabulation table. See the section 'Design Effect' on page 4215 for more information.

EXPECTED

displays expected weighted frequencies for the table cells in a two-way table. The expected frequencies are computed under the null hypothesis that the row and column variables are independent. See the section 'Expected Weighted Frequency' on page 4215 for more information. This option has no effect for one-way tables.

LRCHISQ

requests the Rao-Scott likelihood ratio chi-square test. This test applies a design effect correction to the likelihood ratio chi-square statistic computed from the weighted frequencies. See the section 'Rao-Scott Likelihood Ratio Chi-Square Test' on page 4219 for more information.
By default for one-way tables, the LRCHISQ option provides a design-based test for equal proportions. To compute the test for other null hypothesis proportions, specify the null proportions with the TESTP= option.
The LRCHISQ option uses proportion estimates to compute the design effect correction. To use null hypothesis proportions instead, specify the LRCHISQ1 option.

LRCHISQ1

requests the Rao-Scott modified likelihood ratio chi-square test. This test applies a design effect correction to the likelihood ratio chi-square statistic computed from the weighted frequencies, and bases the design effect correction on null hypothesis proportions. See the section 'Rao-Scott Likelihood Ratio Chi-Square Test' on page 4219 for more information. To compute the design effect correction from proportion estimates instead of null proportions, specify the LRCHISQ option.
By default for one-way tables, the LRCHISQ option provides a design-based test for equal proportions. To compute the test for other null hypothesis proportions, specify the null proportions with the TESTP= option.

NOFREQ

suppresses the display of cell frequencies in the crosstabulation table. The NOFREQ option also suppresses the display of row, column, and overall table frequencies.

NOPERCENT

suppresses the display of cell percentages in the crosstabulation table. The NOPERCENT option also suppresses the display of standard errors of the percentages.

NOPRINT

suppresses the display of frequency and crosstabulation tables but displays all requested statistical tests. Note that this option disables the Output Delivery System (ODS) for the suppressed tables. For more information, see Chapter 14, 'Using the Output Delivery System.'

NOSPARSE

suppresses the display of variable levels with zero frequency in two-way tables. By default, the procedure displays all levels of the column variable within each level of the row variable, including any column variable levels with zero frequency for that row. For multiway tables, the procedure displays all levels of the row variable for each layer of the table by default, including any row variable levels with zero frequency for the layer. Also by default, the procedure displays all variable levels that occur in the input data set, including those levels with no observations actually used in the analysis due to missing or nonpositive weights or missing values. See the section 'Missing Values' on page 4205 for details.

NOSTD

suppresses the display of all standard errors in the crosstabulation table.

NOTOTAL

suppresses the display of row totals, column totals, and overall totals in the crosstabulation table.

NOWT

suppresses the display of weighted frequencies in the crosstabulation table. The NOWT option also suppresses the display of standard errors of the weighted frequencies.

ROW

displays the row percentage, or estimated proportion of the row total, for each cell in a two-way table. The ROW option also displays the standard errors of the row percentages. See the section 'Row and Column Proportions' on page 4212 for more information. This option has no effect for one-way tables.

TESTP=( values )

specifies null hypothesis proportions, or test percentages, for one-way chi-square tests. You can separate values with blanks or commas. Specify values in probability form as numbers between 0 and 1, where the proportions sum to 1. Or specify values in percentage form as numbers between 0 and 100, where the percentages sum to 100. PROC SURVEYFREQ treats the value 1 as the percentage form 1%. The number of TESTP= values must equal the number of variable levels in the one-way table. List these values in the order in which the corresponding variable levels appear in the output.
When you specify the TESTP= option, PROC SURVEYFREQ displays the specified test percentages in the one-way frequency table. The TESTP= option has no effect for two-way tables.
PROC SURVEYFREQ uses the TESTP= values for all one-way chi-square tests you request in the TABLES statement. The available one-way chi-square tests include the Rao-Scott (Pearson) chi-square test and the Rao-Scott likelihood ratio chi-square test and their modified versions, requested by options CHISQ, CHISQ1, LRCHISQ, and LRCHISQ1. See the section 'Rao-Scott Chi-Square Test' on page 4216 and the section 'Rao-Scott Likelihood Ratio Chi-Square Test' on page 4219 for more details.

VAR

displays the variance estimate for each percentage in the crosstabulation table. See the section 'Proportions' on page 4210 for details.

VARWT

displays the variance estimate for each weighted frequency, or estimated total, in the crosstabulation table. See the section 'Totals' on page 4209 for details.

WCHISQ

requests the Wald chi-square test. See the section 'Wald Chi-Square Test' on page 4221 for more information. By default, the denominator degrees of freedom for the Wald test F -statistic is the number of clusters minus the number of strata. Alternatively, you can specify the denominator degrees of freedom with the DDF= option.

WLLCHISQ

requests the Wald log-linear chi-square test. See the section 'Wald Log-Linear Chi-Square Test' on page 4223 for more information. By default, the denominator degrees of freedom for the Wald test F -statistic is the number of clusters minus the number of strata. Alternatively, you can specify the denominator degrees of freedom with the DDF= option.

WTFREQ

displays the weighted frequencies and their standard errors when you do not specify a WEIGHT statement. PROC SURVEYFREQ displays the weighted frequencies by default when the analysis includes a WEIGHT statement. Without a WEIGHT statement, PROC SURVEYFREQ assigns all observations a weight of 1.

WEIGHT Statement

WEIGHT variable ;

The WEIGHT statement names the variable that contains the sampling weights. This variable must be numeric. If you do not specify a WEIGHT statement, PROC SURVEYFREQ assigns all observations a weight of 1. Sampling weights must be positive numbers. If an observation has a weight that is nonpositive or missing, then the procedure omits that observation from the analysis. See the section 'Missing Values' on page 4205 for more information. If you specify more than one WEIGHT statement, the procedure uses only the first WEIGHT statement and ignores the rest.