The following example illustrates the basic functionality of the UNIVAR statement. The effective channel length (in microns) is measured for 1225 field effect transistors . The channel lengths are saved as values of the variable length in a SAS data set named channel ; refer to the file kdex2.sas in the SAS Sample Library.
data channel; input length @@; datalines; 0.91 1.01 0.95 1.13 1.12 0.86 0.96 1.17 1.36 1.10 0.98 1.27 1.13 0.92 1.15 1.26 1.14 0.88 1.03 1.00 0.98 0.94 1.09 0.92 1.10 0.95 1.05 1.05 1.11 1.15 . . . 1.80 2.35 2.23 1.96 2.16 2.08 2.06 2.03 2.18 1.83 2.13 2.05 1.90 2.07 2.15 1.96 2.15 1.89 2.15 2.04 1.95 1.93 2.22 1.74 1.91 ;
The following statements request a kernel density estimate of the variable length .
ods html; ods graphics on; proc kde data=channel; univar length; run; ods graphics off; ods html close;
You can see a histogram with an overlaid kernel density estimate in Output 36.1.1. This graph is requested by specifying the experimental ODS GRAPHICS statement prior to the PROC KDE statements. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.
The default output tables for this analysis are the Inputs and the Controls tables.
Inputs Data Set WORK.CHANNEL Number of Observations Used 1225 Variable length Bandwidth Method Sheather-Jones Plug In
The Inputs table lists basic information about the density fit, including the input data set, the number of observations, the variable used, and the bandwidth method. The default bandwidth method is the Sheather-Jones plug-in.
Controls length Grid Points 401 Lower Grid Limit 0.58 Upper Grid Limit 2.43 Bandwidth Multiplier 1
The Controls table lists the primary numbers controlling the kernel density fit. Here the default number of grid points is used and no adjustment is made to the default bandwidth.
Continuing with the previous example, you can specify different bandwidth multipliers that determine the smoothness of the kernel density estimate. The following statements show kernel density estimates for the variable length by specifying two different bandwidth multipliers with the BWM= option. Output 36.2.1 shows an oversmoothed estimate because the bandwidth multiplier is 2. Output 36.2.2 is created by specifying BWM=0.25, so it is an undersmoothed estimate.
ods html; ods graphics on; proc kde data=channel; univar length(bwm=2) length(bwm=0.25); run; ods graphics off; ods html close;
Recall the analysis from the Getting Started section on page 1993. Suppose that you would like a slightly smoother estimate. You could then rerun the analysis with a larger bandwidth:
ods html; ods graphics on; proc kde data=bivnormal; bivarxy/bwm=2; run; ods graphics off; ods html close;
The BWM= option requests bandwidth multipliers of 2 for both x and y . By specifying the experimental ODS GRAPHICS statement you can visualize the results of this fit immediately in a contour plot, as shown in Output 36.3.1.
You can also specify multiple bandwidths with only one run of the KDE procedure. Notice that by specifying pairs of variables inside parentheses, a kernel density estimate is computed for each pair. In the following statements the first kernel density is computed with the default bandwidth, but the second specifies a bandwidth multiplier of 0.5 for the variable x and a multiplier of 2 for the variable y . The effect of the latter options is shown in Output 36.3.2.
ods html; ods graphics on; proc kde data=bivnormal; bivar (x y) (x (bwm=0.5) y (bwm=2)); run; ods graphics off; ods html close;
The following example illustrates how to request output tables with summary statistics in addition to the default output tables.
Using the same data as in the Getting Started section on page 1993, the following statements request univariate and bivariate summary statistics, percentiles, and levels of the kernel density estimate.
proc kde data=bivnormal; bivarxy/bivstats levels percentiles unistats; run;
The KDE Procedure Univariate Statistics x y Mean 0.075 0.070 Variance 9.73 9.93 Standard Deviation 3.12 3.15 Range 20.39 19.09 Interquartile Range 4.46 4.51 Bandwidth 0.99 1.00
The Univariate Statistics table contains standard univariate statistics for each variable, as well as statistics associated with the density estimate. Note that the estimated variances for both x and y are fairly close to the true values of 10.
Bivariate Statistics Covariance 8.88 Correlation 0.90
The Bivariate Statistics table lists the covariance and correlation between the two variables. Note that the estimated correlation is equal to its true value to two decimal places.
Percentiles x y 0.5 7.71 8.44 1.0 7.08 7.46 2.5 6.17 6.31 5.0 5.28 5.23 10.0 4.18 4.11 25.0 2.24 2.30 50.0 0.11 0.058 75.0 2.22 2.21 90.0 3.81 3.94 95.0 4.88 5.22 97.5 6.03 5.94 99.0 6.90 6.77 99.5 7.71 7.07
The Percentiles table lists percentiles for each variable.
Levels Lower Upper Lower Upper Percent Density for x for x for y for y 1 0.001181 8.14 8.45 8.76 8.39 5 0.003031 7.10 7.07 7.14 6.77 10 0.004989 6.41 5.69 6.49 6.12 50 0.01591 3.64 3.96 3.58 3.86 90 0.02388 1.22 1.19 1.32 0.95 95 0.02525 0.88 0.50 0.99 0.62 99 0.02608 0.53 0.16 0.67 0.30 100 0.02629 0.19 0.19 0.35 0.35
The Levels table lists contours of the density corresponding to percentiles of the bivariate data, and the minimum and maximum values of each variable on those contours . For example, 5% of the observed data have a density value less than 0.0030. The minimum x and y values on this contour are ˆ’ 7 . 10 and ˆ’ 7 . 14, respectively (the Lower for x and Lower for y columns ), and the maximum values are 7 . 07 and 6 . 77, respectively (the Upper for x and Upper for y columns).
You can also request Percentiles or Levels tables with specific percentiles. For example,
proc kde data=bivnormal; bivar x y / levels=2.5, 50, 97.5 percentiles=2.5, 25, 50, 75, 97.5; run;
The KDE Procedure Percentiles x y 2.5 6.17 6.31 25.0 2.24 2.30 50.0 0.11 0.058 75.0 2.22 2.21 97.5 6.03 5.94
Levels Lower Upper Lower Upper Percent Density for x for x for y for y 2.5 0.001914 7.79 8.11 7.79 7.74 50.0 0.01591 3.64 3.96 3.58 3.86 97.5 0.02573 0.88 0.50 0.99 0.30
You can create a SAS data set containing the kernel density estimate by specifying the OUT= option. Using the same 1000 simulated observations from a bivariate normal density as in the Getting Started section on page 1993, you can specify
proc kde data=bivnormal; bivarxy/levels out=MyOut; run;
The output data set MyOut from this analysis contains 3600 points containing the kernel density estimate. The variables value1 and value2 of this data set contain the grid values of the x and y variables, respectively. The variable density is the kernel density estimate. You can generate surface and contour plots of this estimate using SAS/GRAPH as follows :
proc g3d data=MyOut; plot value2*value1=density; run; proc gcontour data=MyOut; plot value2*value1=density; run;
Output 36.5.1 and Output 36.5.2 display these plots.
Levels Lower Upper Lower Upper Percent Density for x for x for y for y 1 0.001181 8.14 8.45 8.76 8.39 5 0.003031 7.10 7.07 7.14 6.77 10 0.004989 6.41 5.69 6.49 6.12 50 0.01591 3.64 3.96 3.58 3.86 90 0.02388 1.22 1.19 1.32 0.95 95 0.02525 0.88 0.50 0.99 0.62 99 0.02608 0.53 0.16 0.67 0.30 100 0.02629 0.19 0.19 0.35 0.35
The Levels table lists contours of the density corresponding to percentiles of the bivariate data, and the minimum and maximum values of each variable on those contours. For example, 5% of the observed data have a density value less than 0 . 0030. You can use the results from the Levels table to plot specific contours corresponding to percentiles of the data. You can use the values from the Density column of this table with PROC GCONTOUR to plot the 1, 5, 10, 50, 90, 95, and 99 percent levels of the density; this plot is displayed in Output 36.5.3.
proc gcontour data=MyOut; plot value2*value1=density / levels=0.0012 0.0030 0.0050 0.0159 0.0239 0.0253 0.0261; run;
The next -to-outermost contour of Output 36.5.3 represents an approximate 95% ellipsoid for x and y .
This a continuation of Example 36.1, used here to illustrate the experimental ODS graphics. The following statements request the available univariate plots in PROC KDE.
ods html; ods graphics on; proc kde data=channel; univar length / plots=density histogram histdensity; run; ods graphics off; ods html close;
Output 36.6.1, Output 36.6.2, and Output 36.6.3 show a histogram, a kernel density estimate, and a histogram with an overlaid kernel density estimate, respectively. These graphical displays are requested by specifying the experimental ODS GRAPHICS statement and the experimental PLOTS= option in the UNIVAR statement. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.
This example illustrates the available bivariate graphics in PROC KDE. The octane dataset comes from Rodriguez and Taniguchi (1980), where it is used for predicting customer octane satisfaction using trained-rater observations. The variables in this data set are Rater and Customer . Either variable may have missing values. Refer to the file kdex3.sas in the SAS Sample Library.
data octane; input Rater Customer; label Rater = 'Rater' Customer = 'Customer'; datalines; 94.5 92.0 94.0 88.0 94.0 90.0 . . . 93.0 87.0 88.0 84.0 .H 90.0 ;
The following statements request all the available bivariate plots in PROC KDE.
ods html; ods graphics on; proc kde data=octane; bivar Rater Customer / plots=all; run; ods graphics off; ods html close;
Output 36.7.1 shows a scatter plot of the data, Output 36.7.2 shows a bivariate histogram of the data, Output 36.7.3 shows a contour plot of bivariate density estimate, Output 36.7.4 shows a contour plot of bivariate density estimate overlaid with a scatter plot of data, Output 36.7.5 shows a surface plot of bivariate kernel density estimate, and Output 36.7.6 shows a bivariate histogram overlaid with a bivariate kernel density estimate. These graphical displays are requested by specifying the experimental ODS GRAPHICS statement and the experimental PLOTS= optioninthe BIVAR statement. For general information about ODS graphics, see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the KDE procedure, see the ODS Graphics section on page 2009.