Example 36.1. Computing a Basic Kernel Density Estimate
The following example illustrates the basic functionality of the UNIVAR statement. The effective channel length (in microns) is measured for 1225 field effect transistors . The channel lengths are saved as values of the variable length in a SAS data set named channel ; refer to the file kdex2.sas in the SAS Sample Library.
data channel; input length @@; datalines; 0.91 1.01 0.95 1.13 1.12 0.86 0.96 1.17 1.36 1.10 0.98 1.27 1.13 0.92 1.15 1.26 1.14 0.88 1.03 1.00 0.98 0.94 1.09 0.92 1.10 0.95 1.05 1.05 1.11 1.15 . . . 1.80 2.35 2.23 1.96 2.16 2.08 2.06 2.03 2.18 1.83 2.13 2.05 1.90 2.07 2.15 1.96 2.15 1.89 2.15 2.04 1.95 1.93 2.22 1.74 1.91 ;
The following statements request a kernel density estimate of the variable length .
ods html; ods graphics on; proc kde data=channel; univar length; run; ods graphics off; ods html close;
You can see a histogram with an overlaid kernel density estimate in Output 36.1.1. This graph is requested by specifying the experimental ODS GRAPHICS statement prior to the PROC KDE statements. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.
Output 36.1.1: Histogram with Overlaid Kernel Density Estimate (Experimental) The default output tables for this analysis are the Inputs and the Controls tables.
Inputs Data Set WORK.CHANNEL Number of Observations Used 1225 Variable length Bandwidth Method Sheather-Jones Plug In
The Inputs table lists basic information about the density fit, including the input data set, the number of observations, the variable used, and the bandwidth method. The default bandwidth method is the Sheather-Jones plug-in.
Controls length Grid Points 401 Lower Grid Limit 0.58 Upper Grid Limit 2.43 Bandwidth Multiplier 1
The Controls table lists the primary numbers controlling the kernel density fit. Here the default number of grid points is used and no adjustment is made to the default bandwidth.
Example 36.3. Changing the Bandwidth (Bivariate)
Recall the analysis from the Getting Started section on page 1993. Suppose that you would like a slightly smoother estimate. You could then rerun the analysis with a larger bandwidth:
ods html; ods graphics on; proc kde data=bivnormal; bivarxy/bwm=2; run; ods graphics off; ods html close;
The BWM= option requests bandwidth multipliers of 2 for both x and y . By specifying the experimental ODS GRAPHICS statement you can visualize the results of this fit immediately in a contour plot, as shown in Output 36.3.1.
Output 36.3.1: Contour Plot of Estimated Density with Additional Smoothing (Experimental) Multiple Bandwidths
You can also specify multiple bandwidths with only one run of the KDE procedure. Notice that by specifying pairs of variables inside parentheses, a kernel density estimate is computed for each pair. In the following statements the first kernel density is computed with the default bandwidth, but the second specifies a bandwidth multiplier of 0.5 for the variable x and a multiplier of 2 for the variable y . The effect of the latter options is shown in Output 36.3.2.
ods html; ods graphics on; proc kde data=bivnormal; bivar (x y) (x (bwm=0.5) y (bwm=2)); run; ods graphics off; ods html close;
Output 36.3.2: Contour Plot of Estimated Density with Different Smoothing for x and y (Experimental) Example 36.4. Requesting Additional Output Tables
The following example illustrates how to request output tables with summary statistics in addition to the default output tables.
Using the same data as in the Getting Started section on page 1993, the following statements request univariate and bivariate summary statistics, percentiles, and levels of the kernel density estimate.
proc kde data=bivnormal; bivarxy/bivstats levels percentiles unistats; run;
The KDE Procedure Univariate Statistics x y Mean 0.075 0.070 Variance 9.73 9.93 Standard Deviation 3.12 3.15 Range 20.39 19.09 Interquartile Range 4.46 4.51 Bandwidth 0.99 1.00
The Univariate Statistics table contains standard univariate statistics for each variable, as well as statistics associated with the density estimate. Note that the estimated variances for both x and y are fairly close to the true values of 10.
Bivariate Statistics Covariance 8.88 Correlation 0.90
The Bivariate Statistics table lists the covariance and correlation between the two variables. Note that the estimated correlation is equal to its true value to two decimal places.
Percentiles x y 0.5 7.71 8.44 1.0 7.08 7.46 2.5 6.17 6.31 5.0 5.28 5.23 10.0 4.18 4.11 25.0 2.24 2.30 50.0 0.11 0.058 75.0 2.22 2.21 90.0 3.81 3.94 95.0 4.88 5.22 97.5 6.03 5.94 99.0 6.90 6.77 99.5 7.71 7.07
The Percentiles table lists percentiles for each variable.
Levels Lower Upper Lower Upper Percent Density for x for x for y for y 1 0.001181 8.14 8.45 8.76 8.39 5 0.003031 7.10 7.07 7.14 6.77 10 0.004989 6.41 5.69 6.49 6.12 50 0.01591 3.64 3.96 3.58 3.86 90 0.02388 1.22 1.19 1.32 0.95 95 0.02525 0.88 0.50 0.99 0.62 99 0.02608 0.53 0.16 0.67 0.30 100 0.02629 0.19 0.19 0.35 0.35
The Levels table lists contours of the density corresponding to percentiles of the bivariate data, and the minimum and maximum values of each variable on those contours . For example, 5% of the observed data have a density value less than 0.0030. The minimum x and y values on this contour are ˆ’ 7 . 10 and ˆ’ 7 . 14, respectively (the Lower for x and Lower for y columns ), and the maximum values are 7 . 07 and 6 . 77, respectively (the Upper for x and Upper for y columns).
You can also request Percentiles or Levels tables with specific percentiles. For example,
proc kde data=bivnormal; bivar x y / levels=2.5, 50, 97.5 percentiles=2.5, 25, 50, 75, 97.5; run;
The KDE Procedure Percentiles x y 2.5 6.17 6.31 25.0 2.24 2.30 50.0 0.11 0.058 75.0 2.22 2.21 97.5 6.03 5.94
Levels Lower Upper Lower Upper Percent Density for x for x for y for y 2.5 0.001914 7.79 8.11 7.79 7.74 50.0 0.01591 3.64 3.96 3.58 3.86 97.5 0.02573 0.88 0.50 0.99 0.30
Example 36.5. Using Output Data Set to Produce Graphics
You can create a SAS data set containing the kernel density estimate by specifying the OUT= option. Using the same 1000 simulated observations from a bivariate normal density as in the Getting Started section on page 1993, you can specify
proc kde data=bivnormal; bivarxy/levels out=MyOut; run;
The output data set MyOut from this analysis contains 3600 points containing the kernel density estimate. The variables value1 and value2 of this data set contain the grid values of the x and y variables, respectively. The variable density is the kernel density estimate. You can generate surface and contour plots of this estimate using SAS/GRAPH as follows :
proc g3d data=MyOut; plot value2*value1=density; run; proc gcontour data=MyOut; plot value2*value1=density; run;
Output 36.5.1 and Output 36.5.2 display these plots.
Output 36.5.2: Contour Plot of the Bivariate Kernel Density Estimate Levels Lower Upper Lower Upper Percent Density for x for x for y for y 1 0.001181 8.14 8.45 8.76 8.39 5 0.003031 7.10 7.07 7.14 6.77 10 0.004989 6.41 5.69 6.49 6.12 50 0.01591 3.64 3.96 3.58 3.86 90 0.02388 1.22 1.19 1.32 0.95 95 0.02525 0.88 0.50 0.99 0.62 99 0.02608 0.53 0.16 0.67 0.30 100 0.02629 0.19 0.19 0.35 0.35
The Levels table lists contours of the density corresponding to percentiles of the bivariate data, and the minimum and maximum values of each variable on those contours. For example, 5% of the observed data have a density value less than 0 . 0030. You can use the results from the Levels table to plot specific contours corresponding to percentiles of the data. You can use the values from the Density column of this table with PROC GCONTOUR to plot the 1, 5, 10, 50, 90, 95, and 99 percent levels of the density; this plot is displayed in Output 36.5.3.
proc gcontour data=MyOut; plot value2*value1=density / levels=0.0012 0.0030 0.0050 0.0159 0.0239 0.0253 0.0261; run;
The next -to-outermost contour of Output 36.5.3 represents an approximate 95% ellipsoid for x and y .