Examples | SAS.STAT 9.1 Users Guide (Vol. 6)

The 'Getting Started' section on page 4315 contains examples of analyzing data from simple random sampling and stratified simple random sample designs. This section provides more examples that illustrate how to use PROC SURVEYMEANS.

Example 70.1. Stratified Cluster Sample Design

Consider the example in the section 'Stratified Sampling' on page 4318. The study population is a junior high school with a total of 4,000 students in grades 7, 8, and 9. Researchers want to know how much these students spend weekly for ice cream, on the average, and what percentage of students spend at least $10 weekly for ice cream.

The example in the section 'Stratified Sampling' on page 4318 assumes that the sample of students was selected using a stratified simple random sample design. This example shows analysis based on a more complex sample design.

Suppose that every student belongs to a study group and that study groups are formed within each grade level. Each study group contains between two and four students. Table 70.4 shows the total number of study groups for each grade.

Table 70.4: Study Groups and Students by Grade
Grade	Number of Study Groups	Number of Students
7	608	1,824
8	252	1,025
9	403	1,151
Total	617	4,000

It is quicker and more convenient to collect data from students in the same study group than to collect data from students individually. Therefore, this study uses a stratified clustered sample design. The primary sampling units, or clusters, are study groups. The list of all study groups in the school is stratified by grade level. From each grade level, a sample of study groups is randomly selected, and all students in each selected study group are interviewed. The sample consists of eight study groups from the 7th grade, three groups from the 8th grade, and five groups from the 9th grade.

The SAS data set named IceCreamStudy saves the responses of the selected students:

  data IceCreamStudy;   input Grade StudyGroup Spending @@;   if (Spending < 10) then Group='less';   else Group='more';   datalines;   7 34   7      7  34   7    7 412  4      9  27 14   7 34   2      9 230  15    9  27 15      7 501  2   9 230  8      9 230   7    7 501  3      8  59 20   7 403  4      7 403  11    8  59 13      8  59 17   8 143 12      8 143  16    8  59 18      9 235  9   8 143 10      9 312   8    9 235  6      9 235 11   9 312 10      7 321   6    8 156 19      8 156 14   7 321  3      7 321  12    7 489  2      7 489  9   7 78   1      7  78  10    7 489  2      7 156  1   7 78   6      7 412   6    7 156  2      9 301  8   ;

In the data set IceCreamStudy , the variable Grade contain a student's grade. The variable StudyGroup identifies a student's study group. It is possible for students from different grades to have the same study group number because study groups are sequentially numbered within each grade. The variable Spending contains a student's response to how much he spends per week for ice cream, in dollars. The variable GROUP indicates whether a student spends at least $10 weekly for ice cream. It is not necessary to store the data in order of grade and study group.

The SAS data set StudyGroup is created to provide PROC SURVEYMEANS with the sample design information shown in Table 70.4:

  data StudyGroups;   input Grade _total_; datalines;   7 608   8 252   9 403   ;

The variable Grade identifies the strata, and the variable _TOTAL_ contains the total number of study groups in each stratum. As discussed in the section 'Specification of Population Totals and Sampling Rates' on page 4334, the population totals stored in the variable _TOTAL_ should be expressed in terms of the primary sampling units (PSUs), which are study groups in this example. Therefore, the variable _TOTAL_ contains the total number of study groups for each grade, rather than the total number of students.

In order to obtain unbiased estimates, you create sampling weights using the following SAS statements:

  data IceCreamStudy;   set IceCreamStudy;   if Grade=7 then Prob=8/608;   if Grade=8 then Prob=3/252;   if Grade=9 then Prob=5/403;   Weight=1/Prob;

The sampling weights are the reciprocals of the probabilities of selections. The variable Weight contains the sampling weights. Because the sampling design is clustered, and all students from each selected cluster are interviewed, the sampling weights equal the inverse of the cluster (or study group) selection probabilities.

The following SAS statements perform the analysis for this sample design:

  title1 'Analysis of Ice Cream Spending';   title2 'Stratified Clustered Sample Design';   proc surveymeans data=IceCreamStudy total=StudyGroups;   strata Grade / list;   cluster StudyGroup;   var Spending Group;   weight Weight;   run;

Output 70.1.1 provides information on the sample design and the input data set. There are 3 strata in the sample design, and the sample contains 16 clusters and 40 observations. The variable Group has two levels, ˜less' and ˜more'.

Output 70.1.1: Data Summary and Class Information

  Analysis of Ice Cream Spending   Stratified Clustered Sample Design   The SURVEYMEANS Procedure   Data Summary   Number of Strata                   3   Number of Clusters                16   Number of Observations            40   Sum of Weights                3162.6   Class Level Information   Class   Variable      Levels    Values   Group              2    less more

Output 70.1.2 displays information for each stratum. Since the primary sampling units in this design are study groups, the population totals shown in Output 70.1.2 are the total numbers of study groups for each stratum or grade. This differs from Figure 70.3 on page 4320, which provides the population totals in terms of students since students were the primary sampling units for that design. Output 70.1.2 also displays the number of clusters for each stratum and analysis variable.

Output 70.1.2: Stratum Information

  Analysis of Ice Cream Spending   Stratified Clustered Sample Design   The SURVEYMEANS Procedure   Stratum Information   Stratum            Population  Sampling   Index      Grade       Total      Rate     N Obs  Variable  Level         N   ----------------------------------------------------------------------------   1            7         608     1.32%        20  Spending               20   Group     less         17   more          3   2            8         252     1.19%         9  Spending                9   Group     less          0   more          9   3            9         403     1.24%        11  Spending               11   Group     less          6   more          5   ----------------------------------------------------------------------------   Stratum Information   Stratum            Population  Sampling   Index      Grade       Total      Rate     N Obs  Variable  Level  Clusters   ----------------------------------------------------------------------------   1            7         608     1.32%        20  Spending                8   Group     less          8   more          3   2            8         252     1.19%         9  Spending                3   Group     less          0   more          3   3            9         403     1.24%        11  Spending                5   Group     less          4   more          4   ----------------------------------------------------------------------------

Output 70.1.3 displays the estimates of the average weekly ice cream expense and the percentage of students spending at least $10 weekly for ice cream.

Output 70.1.3: Statistics

  Analysis of Ice Cream Spending   Stratified Clustered Sample Design   The SURVEYMEANS Procedure   Statistics   Std Error       Lower 95%   Variable    Level               N            Mean         of Mean     CL for Mean   ---------------------------------------------------------------------------------   Spending                       40        8.923860        0.650859        7.517764   Group       less               23        0.561437        0.056368        0.439661   more               17        0.438563        0.056368        0.316787   ---------------------------------------------------------------------------------   Statistics   Upper 95%   Variable    Level     CL for Mean   ---------------------------------   Spending                10.329957   Group       less         0.683213   more         0.560339   ---------------------------------

Example 70.2. Domain Analysis

Suppose that you are studying profiles of the 800 top-performing companies to provide information on their impact on the economy. You are also interested in the company profiles within each market type. A sample of 66 companies is selected with unequal probability across market types. However, market type is not included in the sample design. Thus, the number of companies within each market type is a random variable in your sample. To obtain statistics within each market type, you should use domain analysis. The data of the 66 companies are saved in the following data set:

  data Company;   length Type ;   input Type$ Asset Sale Value Profit Employee Weight;   datalines;   Other            2764.0  1828.0  1850.3   144.0   18.7   9.6   Energy          13246.2  4633.5  4387.7   462.9   24.3  42.6   Finance          3597.7   377.8    93.0    14.0    1.1  12.2   Transportation   6646.1  6414.2  2377.5   348.2   47.1  21.8   HiTech           1068.4  1689.8  1430.2    72.9    4.6   4.3   Manufacturing    1125.0  1719.4  1057.5    98.1   20.4   4.5   Other            1459.0  1241.4   452.7    24.5   20.1   5.5   Finance          2672.3   262.5   296.2    23.1    2.2   9.3   Finance           311.0   566.2   932.0    52.8    2.7   1.9   Energy           1148.6  1014.6   485.1    60.6    4.0   4.5   Finance          5327.0   572.4   372.9    25.2    4.2  17.7   Energy           1602.7   678.4   653.0    75.6    2.8   6.0   Energy           5808.8  1288.4  2007.0   318.8    5.9  19.2   Medical           268.8   204.4   820.9    45.6    3.7   1.8   Transportation   5222.6  2627.8  1910.0   245.6   22.8  17.4   Other             872.7  1419.4   939.3    69.7   12.2   3.7   Retail           4461.7  8946.8  4662.7   289.0  132.1  15.0   HiTech           6719.2  6942.0  8240.2   381.3   85.8  22.1   Retail            833.4  1538.8  1090.3    64.9   15.4   3.5   Finance           415.9   167.3  1126.8    56.8    0.7   2.2   HiTech            442.4  1139.9  1039.9    57.6   22.7   2.3   Other             801.5  1157.0   664.2    56.9   15.5   3.4   Finance          4954.8   468.8   366.4    41.7    3.0  16.5   Finance          2661.9   257.9   181.1    21.2    2.1   9.3   Finance          5345.8   530.1   337.4    36.4    4.3  17.8   Energy           3334.3  1644.7  1407.8   157.6    6.4  11.4   Manufacturing    1826.6  2671.7   483.2    71.3   25.3   6.7   Retail            618.8  2354.7   767.7    58.6   19.0   2.9   Retail           1529.1  6534.0   826.3    58.3   65.8   5.7   Manufacturing    4458.4  4824.5  3132.1    28.9   67.0  15.0   HiTech           5831.7  6611.1  9464.7   459.6   86.7  19.3   Medical          6468.3  4199.2  3170.4   270.1   59.5  21.3   Energy           1720.7   473.1   811.1    86.6    1.6   6.3   Energy           1679.7  1379.9   721.1    91.8    4.5   6.2   Retail           4018.2 16823.4  2038.3   178.1  162.0  13.6   Other             227.1   575.8  1083.8    62.6    1.9   1.6   Finance          3872.8   362.0   209.3    27.6    2.4  13.1   Retail           3359.3  4844.7  2651.4   224.1   75.6  11.5   Energy           1295.6   356.9   180.8   162.3    0.6   5.0   Energy           1658.0   626.6   688.0   126.0    3.5   6.1   Finance         12156.7  1345.5   680.7   106.6    9.4  39.2   HiTech           3982.6  4196.0  3946.8   313.9   64.3  13.5   Finance          8760.7   886.4  1006.9    90.0    7.5  28.5   Manufacturing    2362.2  3153.3  1080.0   137.0   25.2   8.4   Transportation   2499.9  3419.0   992.6    47.2   25.3   8.8   Energy           1430.4  1610.0   664.3    77.7    3.5   5.4   Energy          13666.5 15465.4  2736.7   411.4   26.6  43.9   Manufacturing    4069.3  4174.7  2907.6   289.2   38.2  13.7   Energy           2924.7   711.9  1067.8   146.7    3.4  10.1   Transportation   1262.1  1716.0   364.3    71.2   14.5   4.9   Medical           684.4   672.9   287.4    61.8    6.0   3.1   Energy           3069.3  1719.0  1439.0   196.4    4.9  10.6   Medical           246.5   318.8   924.1    43.8    3.1   1.7   Finance         11562.2  1128.5   580.4    64.2    6.7  37.3   Finance          9316.0  1059.4   816.5    95.9    8.0  30.2   Retail           1094.3  3848.0   563.3    29.4   44.7   4.4   Retail           1102.1  4878.3   932.4    65.2   47.3   4.4   HiTech            466.4   675.8   845.7    64.5    5.2   2.4   Manufacturing   10839.4  5468.7  1895.4   232.8   47.8  35.0   Manufacturing     733.5  2135.3    96.6    10.9    2.7   3.2   Manufacturing   10354.2 14477.4  5607.2   321.9  188.5  33.5   Energy           1902.1  2697.9   329.3    34.2    2.2   6.9   Other            2245.2  2132.2  2230.4   198.9    8.0   8.0   Transportation    949.4  1248.3   298.9    35.4   10.4   3.9   Retail           2834.4  2884.6   458.2    41.2   49.8   9.8   Retail           2621.1  6173.8  1992.7   183.7  115.1   9.2   ;

For each company in your sample,

the variable Type identifies the type of market for the company.
the variable Asset contains the company's assets in millions of dollars.
the variable Sale contains sales in millions of dollars.
the variable Value contains the market value of the company in millions of dollars.
the variable Profit contains the profit in millions of dollars.
the variable Employee stores the number of employees in thousands.
the variable Weight contains the sampling weight.

The following SAS statements use PROC SURVEYMEANS to perform the domain analysis, estimating means and other statistics for the overall population and also for the subpopulations (or domain) defined by market type. The DOMAIN statement specifies Type as the domain variable:

  title1 'Top Companies Profile Study';   proc surveymeans data=Company total=800 mean sum;   var Asset Sale Value Profit Employee;   weight Weight;   domain Type;   run;

Output 70.2.1 shows that there are 66 observations in the sample. The sum of the sampling weights equals 799.8, which is close to the total number of companies in the study population.

Output 70.2.1: Company Profile Study

  Top Companies Profile Study   The SURVEYMEANS Procedure   Data Summary   Number of Observations            66   Sum of Weights                 799.8   Statistics   Std Error   Variable            Mean         of Mean             Sum         Std Dev   ------------------------------------------------------------------------   Asset        6523.488510      720.557075         5217486         1073829   Sale         4215.995799      839.132506         3371953          847885   Value        2145.935121      342.531720         1716319          359609   Profit        188.788210       25.057876          150993           30144   Employee       36.874869        7.787857           29493     7148.003298   ------------------------------------------------------------------------

The 'Statistics' table in Output 70.2.1 displays the estimates of the mean and total for all analysis variables for the entire 800 companies, while Output 70.2.2 shows the mean and total estimates for each company type.

Output 70.2.2: Domain Analysis for Company Profile Study

  Top Companies Profile Study   The SURVEYMEANS Procedure   Domain Analysis: Type   Std Error   Type            Variable          Mean       of Mean           Sum       Std Dev   --------------------------------------------------------------------------------   Energy          Asset      7868.302932   1941.699163       1449341        785962   Sale       5419.679099   2416.214417        998305        673373   Value      2249.297177    520.295162        414321        213580   Profit      289.564658     52.512141         53338         25927   Employee     14.151194      3.974697   2606.650000   1481.777769   Finance         Asset      7890.190264   1057.185336       1855773        704506   Sale        829.210502    115.762531        195030         74436   Value       565.068197     76.964547        132904         48156   Profit       63.716837     10.099341         14986   5801.108513   Employee      5.806293      0.811555   1365.640000    519.658410   HiTech          Asset      5031.959781    732.436967        321542        183302   Sale       5464.292019    731.296997        349168        196013   Value      6707.828482   1194.160584        428630        249154   Profit      346.407042     42.299004         22135         12223   Employee     70.766980      8.683595   4522.010000   2524.778281   Manufacturing   Asset      7403.004250   1454.921083        888361        492577   Sale       7207.638833   2112.444703        864917        501679   Value      2986.442750    799.121544        358373        196979   Profit      211.933583     39.993255         25432         13322   Employee     83.314333     31.089019   9997.720000   6294.309490   Medical         Asset      5046.570609   1218.444638        140799        131942   Sale       3313.219713    758.216303         92439         85655   Value      2561.614695    530.802245         71469         64663   Profit      218.682796     44.051447   6101.250000   5509.560969   Employee     46.518996     11.135955   1297.880000   1213.651734   Other           Asset      1850.250000    338.128984         58838         31375   Sale       1620.784906    168.686773         51541         24593   Value      1432.820755    297.869828         45564         24204   Profit      115.089937     27.970560   3659.860000   2018.201371   Employee     14.306604      2.313733    454.950000    216.327710   Retail          Asset      2939.845750    393.692369        235188         94605   Sale       7395.453500   1746.187580        591636        263263   Value      2103.863125    529.756409        168309         78304   Profit      157.171875     31.734253         12574   5478.281027   Employee     93.624000     15.726743   7489.920000   3093.832061   Transportation  Asset      4712.047359    888.954411        267644        163516   Sale       4030.233275   1015.555708        228917        142669   Value      1703.330282    313.841326         96749         58947   Profit      224.762324     56.168925         12767   8287.585418   Employee     30.946303      6.786270   1757.750000   1066.586615   --------------------------------------------------------------------------------

Example 70.3. Ratio Analysis

Suppose you are interested in the profit per employee and the sale per employee among the 800 top-performing companies in the data in the previous example. The following SAS statements illustrate how you can use PROC SURVEYMEANS to estimate these ratios:

  title1 'Ratio Analysis in Top Companies Profile Study';   proc surveymeans data=Company total=800 ratio;   var Profit Sale Employee;   weight Weight;   ratio Profit Sale / Employee;   run;

The RATIO statement requests the ratio of the profit and the sale to the number of employees.

Output 70.3.1 shows the estimated ratios and their standard errors. Because the profit and the sale figures are in millions of dollars, and the employee numbers in thousands, the profit per employee is estimated as $5,120 with a standard error of $1,059, and the sale per employee is $114,333 with a standard error of $20,503.

Output 70.3.1: Estimate Ratios

  Ratio Analysis in Top Companies Profile Study   The SURVEYMEANS Procedure   Ratio Analysis   Numerator Denominator        Ratio         Std Err   --------------------------------------------------   Sale      Employee      114.332497       20.502742   Profit    Employee        5.119698        1.058939   --------------------------------------------------

Example 70.4. Analyzing Survey Data with Missing Values

As described in the section 'Missing Values' on page 4333, the SURVEYMEANS procedure excludes an observation from the analysis if it has a missing value for the analysis variable or a nonpositive value for the WEIGHT variable.

However, if there is evidence indicating that the nonrespondents are different from the respondents for your study, you can use the DOMAIN statement to compute descriptive statistics among respondents from your survey data without imputation for nonrespondents. Note that although the variance estimation for respondents takes into account the assumption that the study population consists of distinct groups of respondents and nonrespondents, the degrees of freedom will not adjust for the non-respondents because they are deleted from the computation. As a result, there are fewer degrees of freedom and wider confidence limits in comparison to counting those nonrespondents for degrees of freedom. When the sample size and the number of respondents are large, the difference maybe ignored.

Consider the ice cream example in the section 'Stratified Sampling' on page 4318. Suppose that some of the students failed to provide the amounts spent on ice cream, as shown in the following data set IceCream :

  data IceCream;   input Grade Spending @@; datalines;   7 7  7  7  8  .  9 10  7  .  7 10  7  3  8 20  8 19  7 2   7 .  9 15  8 16  7  6  7  6  7  6  9 15  8 17  8 14  9 .   9 8  9  7  7  3  7 12  7  4  9 14  8 18  9  9  7  2  7 1   7 4  7 11  9  8  8  .  8 13  7  .  9  .  9 11  7  2  7 9   ;   data StudentTotals;   input Grade _total_; datalines;   7 1824   8 1025   9 1151

Considering the possibility that those students who didn't respond spend differently than those students who did respond, you can create an indicator variable to identify the respondents and non-respondents with the following SAS DATA step statements:

  data IceCream;   set IceCream;   if Spending=. then Indicator='Nonrespondent';   else do;   Indicator='Respondent';   if (Spending < 10) then Group='less';   else Group='more';   end;   if Grade=7 then Prob=20/1824;   if Grade=8 then Prob=9/1025;   if Grade=9 then Prob=11/1151;   Weight=1/Prob;

The variable Indicator identifies a student in the data set as either a respondent or a nonrespondent. The variable Group specifies whether a student spent more than $10 among the respondents.

The following SAS statements produce the desired analysis:

  title1 'Analysis of Ice Cream Spending';   proc surveymeans data=IceCream total=StudentTotals mean sum;   strata Grade / list;   var Spending Group;   weight Weight;   domain Indicator;   run;

Output 70.4.2 shows the mean and total estimates excluding those students who failed to provide the spending amount on ice cream.

Output 70.4.2: Analysis of Incomplete Ice Cream Data Treating Respondents as a

  Analysis of Ice Cream Spending   The SURVEYMEANS Procedure   Domain Analysis: Indicator   Std Error   Indicator        Variable    Level            Mean         of Mean             Sum   ----------------------------------------------------------------------------------   Nonrespondent    Spending                        .               .               .   Group       less                .               .               .   more                .               .               .   Respondent       Spending                 9.770542        0.652347           32139   Group       less         0.515404        0.067092     1695.345455   more         0.484596        0.067092     1594.004040   ----------------------------------------------------------------------------------   Domain Analysis: Indicator   Indicator        Variable    Level         Std Dev   --------------------------------------------------   Nonrespondent    Spending                        .   Group       less                .   more                .   Respondent       Spending              3515.126876   Group       less       220.690305   more       220.690305   --------------------------------------------------

Output 70.4.1 shows the mean and total estimates treating respondents as a domain in the student population. Compared to the estimates in Output 70.4.1, the point estimates are the same, but the variance estimations are slightly higher.

Output 70.4.1: Analysis of Incomplete Ice Cream Data Excluding Observations

  Analysis of Ice Cream Spending   The SURVEYMEANS Procedure   Data Summary   Number of Strata                   3   Number of Observations            40   Sum of Weights                  4000   Statistics   Std Error   Variable    Level            Mean         of Mean             Sum         Std Dev   ---------------------------------------------------------------------------------   Spending                 9.770542        0.541381           32139     1780.792065   Group       less         0.515404        0.067092     1695.345455      220.690305   more         0.484596        0.067092     1594.004040      220.690305   ---------------------------------------------------------------------------------