Getting Started


This section demonstrates how you can use PROC SURVEYREG to perform a regression analysis for sample survey data. For a complete description of the usage of PROC SURVEYREG, see the section 'Syntax' on page 4373. The 'Examples' section on page 4395 provides more detailed examples that illustrate the applications of PROC SURVEYREG.

Simple Random Sampling

Suppose that, in a junior high school, there are a total of 4,000 students in grades 7, 8, and 9. You want to know how household income and the number of children in a household affect students' average weekly spending for ice cream.

In order to answer this question, you draw a sample using simple random sampling from the student population in the junior high school. You randomly select 40 students and ask them their average weekly expenditure for ice cream, their household income, and the number of children in their household. The answers from the 40 students are saved as a SAS data set:

  data IceCream;   input Grade Spending Income Kids @@;   datalines;   7   7  39  2   7   7  38  1   8  12  47  1   9  10  47  4   7   1  34  4   7  10  43  2   7   3  44  4   8  20  60  3   8  19  57  4   7   2  35  2   7   2  36  1   9  15  51  1   8  16  53  1   7   6  37  4   7   6  41  2   7   6  39  2   9  15  50  4   8  17  57  3   8  14  46  2   9   8  41  2   9   8  41  1   9   7  47  3   7   3  39  3   7  12  50  2   7   4  43  4   9  14  46  3   8  18  58  4   9   9  44  3   7   2  37  1   7   1  37  2   7   4  44  2   7  11  42  2   9   8  41  2   8  10  42  2   8  13  46  1   7   2  40  3   9   6  45  1   9  11  45  4   7   2  36  1   7   9  46  1   ;  

In the data set IceCream , the variable Grade indicates a student's grade. The variable Spending contains the dollar amount of each student's average weekly spending for ice cream. The variable Income specifies the household income, in thousands of dollars. The variable Kids indicates how many children are in a student's family.

The following PROC SURVEYREG statements request a regression analysis:

  title1 'Ice Cream Spending Analysis';   title2 'Simple Random Sample Design';   proc surveyreg data=IceCream total=4000;   class Kids;   model Spending = Income Kids / solution anova;   run;  

The PROC SURVEYREG statement invokes the procedure. The TOTAL=4000 option specifies the total in the population from which the sample is drawn. The CLASS statement requests that the procedure use the variable Kids as a classification variable in the analysis. The MODEL statement describes the linear model that you want to fit, with Spending as the dependent variable and Income and Kids as the independent variables . The SOLUTION option in the MODEL statement requests that the procedure output the regression coefficient estimates. The ANOVA option requests that the procedure output the ANOVA table.

Figure 71.1 displays the summary of the data, the summary of the fit, and the levels of the classification variable Kids . The 'Fit Statistics' table displays the denominator degrees of freedom, which are used in F tests and t tests in the regression analysis.

start figure
  Ice Cream Spending Analysis   Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Data Summary   Number of Observations            40   Mean of Spending             8.75000   Sum of Spending            350.00000   Fit Statistics   R-square            0.8132   Root MSE            2.4506   Denominator DF          39   Class Level Information   Class   Variable      Levels    Values   Kids               4    1 2 3 4  
end figure

Figure 71.1: Summary of Data

Figure 71.2 displays the ANOVA table for the regression and the tests for model effects. The effect Income is significant in the linear regression model, while the effect Kids is not significant at the 5% level.

start figure
  Ice Cream Spending Analysis   Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Tests of Model Effects   Effect       Num DF    F Value    Pr > F   Model             4     119.15    <.0001   Intercept         1     153.32    <.0001   Income            1     324.45    <.0001   Kids              3       0.92    0.4385   NOTE: The denominator degrees of freedom for the F tests is 39.  
end figure

Figure 71.2: Testing Effects in the Regression

The regression coefficient estimates and their standard errors and associated t tests are displayed in Figure 71.3.

start figure
  Ice Cream Spending Analysis   Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Estimated Regression Coefficients   Standard   Parameter      Estimate         Error    t Value    Pr > t   Intercept   26.084677    2.46720403   10.57      <.0001   Income         0.775330    0.04304415      18.01      <.0001   Kids 1         0.897655    1.12352876       0.80      0.4292   Kids 2         1.494032    1.24705263       1.20      0.2381   Kids 3   0.513181    1.33454891   0.38      0.7027   Kids 4         0.000000    0.00000000        .         .   NOTE: The denominator degrees of freedom for the t tests is 39.   Matrix X'X is singular and a generalized inverse was used to solve the   normal equations. Estimates are not unique.  
end figure

Figure 71.3: Regression Coefficients

Stratified Sampling

Suppose that the previous student sample is actually drawn using a stratified sample design. The strata are grades in the junior high school: 7, 8, and 9. Within strata, simple random samples are selected. Table 71.1 provides the number of students in each grade.

Table 71.1: Students in Grades

Grade

Number of Students

7

1,824

8

1,025

9

1,151

Total

4,000

In order to analyze this sample using PROC SURVEYREG, you need to input the stratification information by creating a SAS data set with the information in Table 71.1. The following SAS statements create such a data set called StudentTotals :

  data StudentTotals;   input Grade _TOTAL_;   datalines;   7 1824   8 1025   9 1151   ;  

The variable Grade is the stratification variable, and the variable _TOTAL_ contains the total numbers of students in each stratum in the survey population. PROC SURVEYREG requires you to use the keyword _TOTAL_ as the name of the variable that contains the population total information.

In a stratified sample design, when the sampling rates in the strata are unequal , you need to use sampling weights to reflect this information. For this example, the appropriate sampling weights are the reciprocals of the probabilities of selection. You can use the following data step to create the sampling weights:

  data IceCream;   set IceCream;   if Grade=7 then Prob=20/1824;   if Grade=8 then Prob=9/1025;   if Grade=9 then Prob=11/1151;   Weight=1/Prob;  

If you use PROC SURVEYSELECT to select your sample, PROC SURVEYSELECT creates these sampling weights for you.

The following statements demonstrate how you can fit a linear model while incorporating the sample design information (stratification):

  title1 'Ice Cream Spending Analysis';   title2 'Stratified Simple Random Sample Design';   proc surveyreg data=IceCream total=StudentTotals;   strata Grade /list;   class Kids;   model Spending = Income Kids / solution anova;   weight Weight;   run;  

Comparing these statements to those in the section 'Simple Random Sampling' on page 4365, you can see how the TOTAL= StudentTotals option replaces the previous TOTAL=4000 option.

The STRATA statement specifies the stratification variable Grade . The LIST option in the STRATA statement requests that the stratification information be included in the output. The WEIGHT statement specifies the weight variable.

Figure 71.4 summarizes the data information, the sample design information, and the fit information. Note that, due to the stratification, the denominator degrees of freedom for F tests and t tests is 37, which is different from the analysis in Figure 71.1.

start figure
  Ice Cream Spending Analysis   Stratified Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Data Summary   Number of Observations               40   Sum of Weights                   4000.0   Weighted Mean of Spending       9.14130   Weighted Sum of Spending        36565.2   Design Summary   Number of Strata             3   Fit Statistics   R-square            0.8219   Root MSE            2.4185   Denominator DF          37  
end figure

Figure 71.4: Summary of the Regression

For each stratum, Figure 71.5 displays the value of identifying variables, the number of observations (sample size), the total population size , and the calculated sampling rate or fraction.

start figure
  Ice Cream Spending Analysis   Stratified Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Stratum Information   Stratum                           Population    Sampling   Index     Grade       N Obs           Total        Rate   1         7            20            1824      1.10%   2         8             9            1025      0.88%   3         9            11            1151      0.96%   Class Level Information   Class   Variable      Levels    Values   Kids               4    1 2 3 4  
end figure

Figure 71.5: Stratification and Classification Information

Figure 71.6 displays the ANOVA table for the regression and tests for the significance of model effects under the stratified sample design. The Income effect is strongly significant, while the Kids effect is not significantatthe5%level.

start figure
  Ice Cream Spending Analysis   Stratified Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Tests of Model Effects   Effect       Num DF    F Value    Pr > F   Model             4     124.85    <.0001   Intercept         1     150.95    <.0001   Income            1     326.89    <.0001   Kids              3       0.99    0.4081   NOTE: The denominator degrees of freedom for the F tests is 37.  
end figure

Figure 71.6: Testing Effects

The regression coefficient estimates for the stratified sample, along with their standard errors and associated t tests, are displayed in Figure 71.7.

start figure
  Ice Cream Spending Analysis   Stratified Simple Random Sample Design   The SURVEYREG Procedure   Regression Analysis for Dependent Variable Spending   Estimated Regression Coefficients   Standard   Parameter      Estimate         Error    t Value    Pr > t   Intercept   26.086882    2.44108058   10.69      <.0001   Income         0.776699    0.04295904      18.08      <.0001   Kids 1         0.888631    1.07000634       0.83      0.4116   Kids 2         1.545726    1.20815863       1.28      0.2087   Kids 3   0.526817    1.32748011   0.40      0.6938   Kids 4         0.000000    0.00000000        .         .   NOTE: The denominator degrees of freedom for the t tests is 37.   Matrix X'WX is singular and a generalized inverse was used to solve the   normal equations.  Estimates are not unique.  
end figure

Figure 71.7: Regression Coefficients

You can request other statistics and tests using PROC SURVEYREG. You can also analyze data from a more complex sample design. The remainder of this chapter provides more detailed information.

Output Data Set

PROC SURVEYREG uses the Output Delivery System (ODS) to create output data sets. This is a departure from older SAS procedures that provide OUTPUT statements for similar functionality. For more information on ODS, see Chapter 14, 'Using the Output Delivery System.'

For example, to save the 'ParameterEstimates' table (Figure 71.7) in the previous section in an output data set, you use the ODS OUTPUT statement as follows :

  title1 'Ice Cream Spending Analysis';   title2 'Stratified Simple Random Sample Design';   proc surveyreg data=IceCream total=StudentTotals;   strata Grade /list;   class Kids;   model Spending = Income Kids / solution;   weight Weight;   ods output ParameterEstimates = MyParmEst;   run;  

The statement

  ods output ParameterEstimates = MyParmEst;  

requests that the 'ParameterEstimates' table that appears in Figure 71.7 be placed in a SAS data set named MyParmEst .

The PRINT procedure displays observations of the data set MyParmEst :

  proc print data=MyParmEst;   run;  

Figure 71.8 displays the observations in the data set MyParmEst .

start figure
  Ice Cream Spending Analysis   Stratified Simple Random Sample Design   OBS    Parameter      Estimate        StdErr     DenDF     tValue     Probt   1     Intercept   26.086882    2.44108058        37   10.69    <.0001   2     Income         0.776699    0.04295904        37      18.08    <.0001   3     Kids 1         0.888631    1.07000634        37       0.83    0.4116   4     Kids 2         1.545726    1.20815863        37       1.28    0.2087   5     Kids 3   0.526817    1.32748011        37   0.40    0.6938   6     Kids 4         0.000000    0.00000000        37        .       .  
end figure

Figure 71.8: The Data Set MyParmEst

The section 'ODS Table Names' on page 4394 gives the complete list of the tables produced by PROC SURVEYREG.




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net