Examples


Example 36.1. Computing a Basic Kernel Density Estimate

The following example illustrates the basic functionality of the UNIVAR statement. The effective channel length (in microns) is measured for 1225 field effect transistors . The channel lengths are saved as values of the variable length in a SAS data set named channel ; refer to the file kdex2.sas in the SAS Sample Library.

  data channel;   input length @@;   datalines;   0.91 1.01 0.95 1.13 1.12 0.86 0.96 1.17 1.36 1.10   0.98 1.27 1.13 0.92 1.15 1.26 1.14 0.88 1.03 1.00   0.98 0.94 1.09 0.92 1.10 0.95 1.05 1.05 1.11 1.15   . . .   1.80 2.35 2.23 1.96 2.16 2.08 2.06 2.03 2.18 1.83   2.13 2.05 1.90 2.07 2.15 1.96 2.15 1.89 2.15 2.04   1.95 1.93 2.22 1.74 1.91   ;  

The following statements request a kernel density estimate of the variable length .

  ods html;   ods graphics on;   proc kde data=channel;   univar length;   run;   ods graphics off;   ods html close;  

You can see a histogram with an overlaid kernel density estimate in Output 36.1.1. This graph is requested by specifying the experimental ODS GRAPHICS statement prior to the PROC KDE statements. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.

Output 36.1.1: Histogram with Overlaid Kernel Density Estimate (Experimental)
start example
  click to expand  
end example
 

The default output tables for this analysis are the Inputs and the Controls tables.

  Inputs   Data Set                          WORK.CHANNEL   Number of Observations Used       1225   Variable                          length   Bandwidth Method                  Sheather-Jones   Plug In  

The Inputs table lists basic information about the density fit, including the input data set, the number of observations, the variable used, and the bandwidth method. The default bandwidth method is the Sheather-Jones plug-in.

  Controls   length   Grid Points                     401   Lower Grid Limit               0.58   Upper Grid Limit               2.43   Bandwidth Multiplier              1  

The Controls table lists the primary numbers controlling the kernel density fit. Here the default number of grid points is used and no adjustment is made to the default bandwidth.

Example 36.2. Changing the Bandwidth

Continuing with the previous example, you can specify different bandwidth multipliers that determine the smoothness of the kernel density estimate. The following statements show kernel density estimates for the variable length by specifying two different bandwidth multipliers with the BWM= option. Output 36.2.1 shows an oversmoothed estimate because the bandwidth multiplier is 2. Output 36.2.2 is created by specifying BWM=0.25, so it is an undersmoothed estimate.

  ods html;   ods graphics on;   proc kde data=channel;   univar length(bwm=2) length(bwm=0.25);   run;   ods graphics off;   ods html close;  
Output 36.2.1: Histogram with Oversmoothed Kernel Density Estimate (Experimental)
start example
  click to expand  
end example
 
Output 36.2.2: Histogram with Undersmoothed Kernel Density Estimate (Experimental)
start example
  click to expand  
end example
 

Example 36.3. Changing the Bandwidth (Bivariate)

Recall the analysis from the Getting Started section on page 1993. Suppose that you would like a slightly smoother estimate. You could then rerun the analysis with a larger bandwidth:

  ods html;   ods graphics on;   proc kde data=bivnormal;   bivarxy/bwm=2;   run;   ods graphics off;   ods html close;  

The BWM= option requests bandwidth multipliers of 2 for both x and y . By specifying the experimental ODS GRAPHICS statement you can visualize the results of this fit immediately in a contour plot, as shown in Output 36.3.1.

Output 36.3.1: Contour Plot of Estimated Density with Additional Smoothing (Experimental)
start example
  click to expand  
end example
 

Multiple Bandwidths

You can also specify multiple bandwidths with only one run of the KDE procedure. Notice that by specifying pairs of variables inside parentheses, a kernel density estimate is computed for each pair. In the following statements the first kernel density is computed with the default bandwidth, but the second specifies a bandwidth multiplier of 0.5 for the variable x and a multiplier of 2 for the variable y . The effect of the latter options is shown in Output 36.3.2.

  ods html;   ods graphics on;   proc kde data=bivnormal;   bivar (x y)   (x (bwm=0.5) y (bwm=2));   run;   ods graphics off;   ods html close;  
Output 36.3.2: Contour Plot of Estimated Density with Different Smoothing for x and y (Experimental)
start example
  click to expand  
end example
 

Example 36.4. Requesting Additional Output Tables

The following example illustrates how to request output tables with summary statistics in addition to the default output tables.

Using the same data as in the Getting Started section on page 1993, the following statements request univariate and bivariate summary statistics, percentiles, and levels of the kernel density estimate.

  proc kde data=bivnormal;   bivarxy/bivstats levels percentiles unistats;   run;  
  The KDE Procedure   Univariate Statistics   x         y   Mean   0.075   0.070   Variance                  9.73      9.93   Standard Deviation        3.12      3.15   Range                    20.39     19.09   Interquartile Range       4.46      4.51   Bandwidth                 0.99      1.00  

The Univariate Statistics table contains standard univariate statistics for each variable, as well as statistics associated with the density estimate. Note that the estimated variances for both x and y are fairly close to the true values of 10.

  Bivariate Statistics   Covariance                8.88   Correlation               0.90  

The Bivariate Statistics table lists the covariance and correlation between the two variables. Note that the estimated correlation is equal to its true value to two decimal places.

  Percentiles   x         y   0.5   7.71   8.44   1.0   7.08   7.46   2.5   6.17   6.31   5.0   5.28   5.23   10.0   4.18   4.11   25.0   2.24   2.30   50.0   0.11   0.058   75.0     2.22      2.21   90.0     3.81      3.94   95.0     4.88      5.22   97.5     6.03      5.94   99.0     6.90      6.77   99.5     7.71      7.07  

The Percentiles table lists percentiles for each variable.

  Levels   Lower     Upper     Lower     Upper   Percent    Density     for x     for x     for y     for y   1    0.001181   8.14      8.45   8.76      8.39   5    0.003031   7.10      7.07   7.14      6.77   10    0.004989   6.41      5.69   6.49      6.12   50     0.01591   3.64      3.96   3.58      3.86   90     0.02388   1.22      1.19   1.32      0.95   95     0.02525   0.88      0.50   0.99      0.62   99     0.02608   0.53      0.16   0.67      0.30   100     0.02629   0.19   0.19   0.35   0.35  

The Levels table lists contours of the density corresponding to percentiles of the bivariate data, and the minimum and maximum values of each variable on those contours . For example, 5% of the observed data have a density value less than 0.0030. The minimum x and y values on this contour are ˆ’ 7 . 10 and ˆ’ 7 . 14, respectively (the Lower for x and Lower for y columns ), and the maximum values are 7 . 07 and 6 . 77, respectively (the Upper for x and Upper for y columns).

You can also request Percentiles or Levels tables with specific percentiles. For example,

  proc kde data=bivnormal;   bivar x y / levels=2.5, 50, 97.5   percentiles=2.5, 25, 50, 75, 97.5;   run;  
  The KDE Procedure   Percentiles   x         y   2.5   6.17   6.31   25.0   2.24   2.30   50.0   0.11   0.058   75.0      2.22      2.21   97.5      6.03      5.94  
  Levels   Lower     Upper     Lower     Upper   Percent    Density     for x     for x     for y     for y   2.5    0.001914   7.79      8.11   7.79      7.74   50.0     0.01591   3.64      3.96   3.58      3.86   97.5     0.02573   0.88      0.50   0.99      0.30  

Example 36.5. Using Output Data Set to Produce Graphics

You can create a SAS data set containing the kernel density estimate by specifying the OUT= option. Using the same 1000 simulated observations from a bivariate normal density as in the Getting Started section on page 1993, you can specify

  proc kde data=bivnormal;   bivarxy/levels   out=MyOut;   run;  

The output data set MyOut from this analysis contains 3600 points containing the kernel density estimate. The variables value1 and value2 of this data set contain the grid values of the x and y variables, respectively. The variable density is the kernel density estimate. You can generate surface and contour plots of this estimate using SAS/GRAPH as follows :

  proc g3d data=MyOut;   plot value2*value1=density;   run;   proc gcontour data=MyOut;   plot value2*value1=density;   run;  

Output 36.5.1 and Output 36.5.2 display these plots.

Output 36.5.2: Contour Plot of the Bivariate Kernel Density Estimate
start example
  click to expand  
end example
 
  Levels   Lower     Upper     Lower     Upper   Percent    Density     for x     for x     for y     for y   1    0.001181   8.14      8.45   8.76      8.39   5    0.003031   7.10      7.07   7.14      6.77   10    0.004989   6.41      5.69   6.49      6.12   50     0.01591   3.64      3.96   3.58      3.86   90     0.02388   1.22      1.19   1.32      0.95   95     0.02525   0.88      0.50   0.99      0.62   99     0.02608   0.53      0.16   0.67      0.30   100     0.02629   0.19   0.19   0.35   0.35  

The Levels table lists contours of the density corresponding to percentiles of the bivariate data, and the minimum and maximum values of each variable on those contours. For example, 5% of the observed data have a density value less than 0 . 0030. You can use the results from the Levels table to plot specific contours corresponding to percentiles of the data. You can use the values from the Density column of this table with PROC GCONTOUR to plot the 1, 5, 10, 50, 90, 95, and 99 percent levels of the density; this plot is displayed in Output 36.5.3.

  proc gcontour data=MyOut;   plot value2*value1=density / levels=0.0012 0.0030 0.0050 0.0159   0.0239 0.0253 0.0261;   run;  

The next -to-outermost contour of Output 36.5.3 represents an approximate 95% ellipsoid for x and y .

Example 36.6. Univariate KDE Graphics (Experimental)

This a continuation of Example 36.1, used here to illustrate the experimental ODS graphics. The following statements request the available univariate plots in PROC KDE.

  ods html;   ods graphics on;   proc kde data=channel;   univar length / plots=density histogram histdensity;   run;   ods graphics off;   ods html close;  
Output 36.6.1: Histogram (Experimental)
start example
  click to expand  
end example
 
Output 36.6.2: Kernel Density Estimate (Experimental)
start example
  click to expand  
end example
 
Output 36.6.3: Histogram with Overlaid Kernel Density Estimate (Experimental)
start example
  click to expand  
end example
 

Output 36.6.1, Output 36.6.2, and Output 36.6.3 show a histogram, a kernel density estimate, and a histogram with an overlaid kernel density estimate, respectively. These graphical displays are requested by specifying the experimental ODS GRAPHICS statement and the experimental PLOTS= option in the UNIVAR statement. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.

Example 36.7. Bivariate KDE Graphics (Experimental)

This example illustrates the available bivariate graphics in PROC KDE. The octane dataset comes from Rodriguez and Taniguchi (1980), where it is used for predicting customer octane satisfaction using trained-rater observations. The variables in this data set are Rater and Customer . Either variable may have missing values. Refer to the file kdex3.sas in the SAS Sample Library.

  data octane;   input Rater Customer;   label Rater    = 'Rater'   Customer = 'Customer';   datalines;   94.5 92.0   94.0 88.0   94.0 90.0   . . .   93.0 87.0   88.0 84.0   .H 90.0   ;  

The following statements request all the available bivariate plots in PROC KDE.

  ods html;   ods graphics on;   proc kde data=octane;   bivar Rater Customer / plots=all;   run;   ods graphics off;   ods html close;  

Output 36.7.1 shows a scatter plot of the data, Output 36.7.2 shows a bivariate histogram of the data, Output 36.7.3 shows a contour plot of bivariate density estimate, Output 36.7.4 shows a contour plot of bivariate density estimate overlaid with a scatter plot of data, Output 36.7.5 shows a surface plot of bivariate kernel density estimate, and Output 36.7.6 shows a bivariate histogram overlaid with a bivariate kernel density estimate. These graphical displays are requested by specifying the experimental ODS GRAPHICS statement and the experimental PLOTS= optioninthe BIVAR statement. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.

Output 36.7.1: Scatter Plot (Experimental)
start example
  click to expand  
end example
 
Output 36.7.2: Bivariate Histogram (Experimental)
start example
  click to expand  
end example
 
Output 36.7.3: Contour Plot (Experimental)
start example
  click to expand  
end example
 
Output 36.7.4: Contour Plot with Overlaid Scatter Plot (Experimental)
start example
  click to expand  
end example
 
Output 36.7.5: Surface Plot (Experimental)
start example
  click to expand  
end example
 
Output 36.7.6: Bivariate Histogram with Overlaid Surface Plot (Experimental)
start example
  click to expand  
end example
 



SAS.STAT 9.1 Users Guide (Vol. 3)
SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)
ISBN: B0042UQTBS
EAN: N/A
Year: 2004
Pages: 105

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net