Getting Started


The following example demonstrates how you can use the TPSPLINE procedure to fit a semiparametric model.

Suppose that y is a continuous variable and x1 and x2 are two explanatory variables of interest. To fit a smoothing spline model, you can use a MODEL statement similar to that used in many regression procedures in the SAS System.

  proc tpspline;   model y = (x1 x2);   run;  

The TPSPLINE procedure can fit semiparametric models; the parentheses in the preceding MODEL statement separates the smoothing variables from the regression variables. The following statements illustrates this syntax.

  proc tpspline;   modely = x3 (x1 x2);   run;  

This model assumes a linear relation with x3 and an unknown functional relation with x1 and x2 .

If you want to fit several responses using the same explanatory variables, you can save computation time by using the multiple responses feature in the MODEL statement. For example, if y1 and y2 are two response variables, the following MODEL statement can be used to fit two models. Separate analyses are then performed for each response variable.

  proc tpspline;   model y1 y2 = (x1 x2);   run;  

The following example illustrates the use of PROC TPSPLINE. The data are from Bates, Lindstrom, Wahba, and Yandell (1987).

  data Measure;   input x1 x2 y @@;   datalines;     1.0   1.0   15.54483570   1.0   1.0    15.76312613     .5   1.0   18.67397826   .5   1.0    18.49722167   .0   1.0   19.66086310     .0   1.0    19.80231311   .5   1.0   18.59838649     .5   1.0    18.51904737   1.0   1.0   15.86842815    1.0   1.0    16.03913832     1.0   .5   10.92383867   1.0   .5    11.14066546     .5   .5   14.81392847   .5   .5    14.82830425   .0   .5   16.56449698     .0   .5    16.44307297   .5   .5   14.90792284     .5   .5    15.05653924   1.0   .5   10.91956264    1.0   .5    10.94227538     1.0   .0    9.61492010   1.0   .0     9.64648093     .5   .0   14.03133439   .5   .0    14.03122345   .0   .0   15.77400253     .0   .0    16.00412514   .5   .0   13.99627680     .5   .0    14.02826553   1.0   .0    9.55700164    1.0   .0     9.58467047     1.0   .5   11.20625177   1.0   .5    11.08651907     .5   .5   14.83723493   .5   .5    14.99369172   .0   .5   16.55494349     .0   .5    16.51294369   .5   .5   14.98448603     .5   .5    14.71816070   1.0   .5   11.14575565    1.0   .5    11.17168689     1.0  1.0   15.82595514   1.0  1.0    15.96022497     .5  1.0   18.64014953   .5  1.0    18.56095997   .0  1.0   19.54375504     .0  1.0   19.80902641   .5  1.0   18.56884576     .5  1.0   18.61010439   1.0  1.0   15.86586951    1.0  1.0   15.90136745   ;  

The dataset Measure contains three variables x1 , x2 ,and y . Suppose that you want to fit a surface by using the variables x1 and x2 to model the response y . The variables x1 and x2 are spaced evenly on a [ ˆ’ 1 — 1] — [ ˆ’ 1 — 1] square, and the response y is generated by adding a random error to a function f ( x 1 ,x 2). The raw data are plotted using the G3D procedure. In order to plot those replicates, the data are jittered a little bit.

  data Measure1;   set Measure;   run;   proc sort data=Measure1;   by x2 x1;   run;   data measure1;   set measure1; by x1;   if last.x1 then x1=x1+0.00001;   run;   proc g3d data=Measure1;   scatter x2*x1=y /size=.5   zmin=9 zmax=21   zticknum=4;   title "Raw Data";   run;  

Figure 74.1 displays the raw data.

click to expand
Figure 74.1: Plot of Data Set MEASURE

The following statements invoke the TPSPLINE procedure, using the Measure data set as input. In the MODEL statement, the x1 and x2 variables are listed as smoothing variables. The LOGNLAMBDA= option returns a list of GCV values with log 10 ( n » ) ranging from ˆ’ 4 to ˆ’ 2. The OUTPUT statement creates the data set estimate to contain the predicted values and the 95% upper and lower confidence limits.

  proc tpspline data=Measure;   model y=(x1 x2) /lognlambda=(   4 to   2 by 0.1);   output out=estimate pred uclm lclm;   run;   proc print data=estimate;   run;  

The results of this analysis are displayed in the following figures. Figure 74.2 shows that the data set Measure contains 50 observations with 25 unique design points. The GCV values are listed along with the log 10 of n » . The value of log 10 ( n » ) that minimizes the GCV function is around ˆ’ 3 . 5.Thefinal thin-plate smoothing spline estimate is based on LOGNLAMBDA = ˆ’ 3.4762. The residual sum of squares is 0.246110, and the degrees of freedom is 24.593203. The standard deviation, defined as RSS/(Tr(I-A)), is 0.098421. The predictions and 95% confidence limits are displayed in Figure 74.3.

start figure
  Estimates from Proc TPSPLINE   Obs     x1      x2        y         P_y       LCLM_y     UCLM_y   1   1.0   1.0    15.5448    15.6474    15.5115    15.7832   2   1.0   1.0    15.7631    15.6474    15.5115    15.7832   3   0.5   1.0    18.6740    18.5783    18.4430    18.7136   4   0.5   1.0    18.4972    18.5783    18.4430    18.7136   5     0.0   1.0    19.6609    19.7270    19.5917    19.8622   6     0.0   1.0    19.8023    19.7270    19.5917    19.8622   7     0.5   1.0    18.5984    18.5552    18.4199    18.6905   8     0.5   1.0    18.5190    18.5552    18.4199    18.6905   9     1.0   1.0    15.8684    15.9436    15.8077    16.0794   10     1.0   1.0    16.0391    15.9436    15.8077    16.0794   11   1.0   0.5    10.9238    11.0467    10.9114    11.1820   12   1.0   0.5    11.1407    11.0467    10.9114    11.1820   13   0.5   0.5    14.8139    14.8246    14.6896    14.9597   14   0.5   0.5    14.8283    14.8246    14.6896    14.9597   15     0.0   0.5    16.5645    16.5102    16.3752    16.6452   16     0.0   0.5    16.4431    16.5102    16.3752    16.6452   17     0.5   0.5    14.9079    14.9812    14.8461    15.1162   18     0.5   0.5    15.0565    14.9812    14.8461    15.1162   19     1.0   0.5    10.9196    10.9497    10.8144    11.0850   20     1.0   0.5    10.9423    10.9497    10.8144    11.0850   21   1.0     0.0     9.6149     9.6372     9.5019     9.7724   22   1.0     0.0     9.6465     9.6372     9.5019     9.7724   23   0.5     0.0    14.0313    14.0188    13.8838    14.1538   24   0.5     0.0    14.0312    14.0188    13.8838    14.1538   25     0.0     0.0    15.7740    15.8822    15.7472    16.0171   26     0.0     0.0    16.0041    15.8822    15.7472    16.0171   27     0.5     0.0    13.9963    14.0006    13.8656    14.1356   28     0.5     0.0    14.0283    14.0006    13.8656    14.1356   29     1.0     0.0     9.5570     9.5769     9.4417     9.7122   30     1.0     0.0     9.5847     9.5769     9.4417     9.7122   31   1.0     0.5    11.2063    11.1614    11.0261    11.2967   32   1.0     0.5    11.0865    11.1614    11.0261    11.2967   33   0.5     0.5    14.8372    14.9182    14.7831    15.0532   34   0.5     0.5    14.9937    14.9182    14.7831    15.0532   35     0.0     0.5    16.5549    16.5386    16.4036    16.6736   36     0.0     0.5    16.5129    16.5386    16.4036    16.6736   37     0.5     0.5    14.9845    14.8549    14.7199    14.9900   38     0.5     0.5    14.7182    14.8549    14.7199    14.9900   39     1.0     0.5    11.1458    11.1727    11.0374    11.3080   40     1.0     0.5    11.1717    11.1727    11.0374    11.3080   41   1.0     1.0    15.8260    15.8851    15.7493    16.0210   42   1.0     1.0    15.9602    15.8851    15.7493    16.0210   43   0.5     1.0    18.6401    18.5946    18.4593    18.7299   44   0.5     1.0    18.5610    18.5946    18.4593    18.7299   45     0.0     1.0    19.5438    19.6729    19.5376    19.8081   46     0.0     1.0    19.8090    19.6729    19.5376    19.8081   47     0.5     1.0    18.5688    18.5832    18.4478    18.7185   48     0.5     1.0    18.6101    18.5832    18.4478    18.7185   49     1.0     1.0    15.8659    15.8761    15.7402    16.0120   50     1.0     1.0    15.9014    15.8761    15.7402    16.0120  
end figure

Figure 74.3: Data Set ESTIMATE
  The TPSPLINE Procedure   Dependent Variable: y   GCV Function   log10(n*Lambda)             GCV     4.000000        0.019215     3.900000        0.019183     3.800000        0.019148     3.700000        0.019113     3.600000        0.019082     3.500000        0.019064*     3.400000        0.019074     3.300000        0.019135     3.200000        0.019286     3.100000        0.019584     3.000000        0.020117     2.900000        0.021015     2.800000        0.022462     2.700000        0.024718     2.600000        0.028132     2.500000        0.033165     2.400000        0.040411     2.300000        0.050614     2.200000        0.064699     2.100000        0.083813     2.000000        0.109387   Note: * indicates minimum GCV value.   Summary Statistics   of Final Estimation   log10(n*Lambda)   3.4762   Smoothing Penalty        2558.1432   Residual SS                 0.2461   Tr(I-A)                    25.4068   Model DF                   24.5932   Standard Deviation          0.0984  

The fitted surface is plotted with PROC G3D as follows .

  proc g3d data=estimate;   plot x2*x1=p_y/grid   zmin=9 zmax=21 zticknum=4;   title 'Plot of Fitted Surface';   run;  

The resulting plot is displayed in Figure 74.4.

click to expand
Figure 74.4: Plot of TPSPLINE Fit of Data Set Measure

Because the data in data set Measure are very sparse, the fitted surface is not smooth. To produce a smoother surface, the following statements generate the data set Pred in order to obtain a finer grid. The SCORE statement evaluates the fitted surface at those new design points.

  data pred;   do x1=   1 to 1 by 0.1;   do x2=   1 to 1 by 0.1;   output;   end;   end;   run;   proc tpspline data=measure;   model y=(x1 x2)/lognlambda=(   4 to   2 by 0.1);   score data=pred out=predy;   run;   proc g3d data=predy;   plot x2*x1=p_y/grid   zmin=9 zmax=21 zticknum=4;   title 'Plot of Fitted Surface on a Fine Grid';   run;  

The surface plot based on the finer grid is displayed in Figure 74.5. The plot shows that a parametric model with quadratic terms of x1 and x2 provides a reasonable fit to the data.

click to expand
Figure 74.5: Plot of TPSPLINE fit



SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net