Getting Started


In the following example, PROC PRINQUAL uses the MTV method. Suppose that the problem is to linearize a curve through three-dimensional space. Let

where X = ˆ’ 1 . 00 , ˆ’ . 98 , ˆ’ . 96 , , 1 . 00.

These three variables define a curve in three-dimensional space. The GPLOT procedure is used to display two-dimensional views of this curve. These data are completely described by three linear components , but they define a single curve, which could be described as a single nonlinear component.

PROC PRINQUAL is used to attempt to straighten the curve into a one-dimensional line with a continuous transformation of each variable. The N=1 option in the PROC PRINQUAL statement requests one principal component. The TRANSFORM statement requests a cubic spline transformation with nine knots. Splines are curves, which are usually required to be continuous and smooth. Splines are usually defined as piecewise polynomials of degree n with function values and first n ˆ’ 1 derivatives that agree at the points where they join. The abscissa values of the join points are called knots . The term spline is also used for polynomials (splines with no knots) and piecewise polynomials with more than one discontinuous derivative. Splines with no knots are generally smoother than splines with knots, which are generally smoother than splines with multiple discontinuous derivatives. Splines with few knots are generally smoother than splines with many knots; however, increasing the number of knots usually increases the fit of the spline function to the data. Knots give the curve freedom to bend to more closely follow the data. Refer to Smith (1979) for an excellent introduction to splines. For another example of using splines, see Example 75.1 in Chapter 75, The TRANSREG Procedure.

One component accounts for 71 percent of the variance of the untransformed data, and after 45 iterations, over 98 percent of the variance of the transformed data is accounted for by one component (see Figure 59.2). Note that the algorithm would not have converged with 50 iterations and the default convergence criterion, so more iterations may be needed for this problem.

start figure
  Iteratively Derive Variable Transformations   The PRINQUAL Procedure   PRINQUAL MTV Algorithm Iteration History   Iteration   Average    Maximum     Proportion    Criterion   Number     Change     Change    of Variance       Change    Note   ---------------------------------------------------------------------------   1    0.16253    1.33045        0.71369   2    0.07871    0.94549        0.79035      0.07667   3    0.06518    0.80219        0.86334      0.07299   4    0.05322    0.57928        0.91379      0.05045   5    0.04154    0.38404        0.94204      0.02825   6    0.03181    0.24391        0.95640      0.01436   7    0.02461    0.15397        0.96349      0.00709   8    0.01982    0.10205        0.96704      0.00355   9    0.01662    0.07393        0.96894      0.00189   10    0.01439    0.06232        0.97005      0.00112   11    0.01288    0.05436        0.97081      0.00075   12    0.01189    0.04911        0.97139      0.00058   13    0.01119    0.04531        0.97188      0.00049   14    0.01068    0.04276        0.97232      0.00044   15    0.01027    0.04115        0.97273      0.00041   16    0.00993    0.04039        0.97313      0.00040   17    0.00965    0.04249        0.97351      0.00038   18    0.00940    0.04400        0.97388      0.00037   19    0.00919    0.04509        0.97423      0.00036   20    0.00900    0.04587        0.97458      0.00034   21    0.00883    0.04643        0.97491      0.00033   22    0.00867    0.04681        0.97523      0.00032   23    0.00852    0.04705        0.97555      0.00031   24    0.00839    0.04719        0.97585      0.00031   25    0.00827    0.04724        0.97615      0.00030   26    0.00816    0.04722        0.97644      0.00029   27    0.00805    0.04713        0.97672      0.00028   28    0.00795    0.04699        0.97700      0.00027   29    0.00785    0.04680        0.97726      0.00027   30    0.00776    0.04656        0.97752      0.00026   31    0.00768    0.04629        0.97777      0.00025   32    0.00760    0.04598        0.97802      0.00025   33    0.00752    0.04564        0.97826      0.00024   34    0.00745    0.04528        0.97849      0.00023   35    0.00739    0.04489        0.97872      0.00023   36    0.00733    0.04448        0.97894      0.00022   37    0.00729    0.04405        0.97915      0.00022   38    0.00724    0.04361        0.97936      0.00021   39    0.00720    0.04315        0.97957      0.00021   40    0.00716    0.04268        0.97977      0.00020   41    0.00713    0.04219        0.97997      0.00020   42    0.00709    0.04170        0.98016      0.00019   43    0.00706    0.04120        0.98035      0.00019   44    0.00703    0.04070        0.98054      0.00019   45    0.00699    0.04019        0.98072      0.00018    Converged   Algorithm converged.  
end figure

Figure 59.2: PROC PRINQUAL MTV Iteration History

PROC PRINQUAL creates an output data set (which is not displayed) that contains both the original and transformed variables. The original variables have the names X1 , X2 , and X3 . Transformed variables are named TX1 , TX2 , and TX3 . All observations in the output data set have _ TYPE_ = SCORE , since the CORRELATIONS option is not specified in the PROC PRINQUAL statement. The GPLOT procedure uses this output data set and displays the nonlinear transformations of all three variables and the nearly one-dimensional scatter plot (see Figure 59.3 and Figure 59.4).

click to expand
Figure 59.3: Variable Transformation Plots
click to expand
Figure 59.4: Plots of the Nearly One-Dimensional Curve

PROC PRINQUAL tries to project each variable on the first principal component. Notice that the curve in this example is closer to a circle than to a function from some views (see the plot of X3 vs. X2 in Figure 59.1) and that the first component does not run approximately from one end point of the curve to the other (see Figure 59.4). Since the curve has these characteristics, PROC PRINQUAL linearizes the scatter plot by collapsing the scatter around the principal axis, not by straightening the curve into a single line. PROC PRINQUAL would straighten simpler curves.

click to expand
Figure 59.1: Three-Dimensional Curve Example Output

The following statements produce Figure 59.1 through Figure 59.4:

  * Generate a Three-Dimensional Curve;   data X;   do X =   1 to 1 by 0.02;   X1 =      X ** 3;   X2 = X1   X ** 5;   X3 = X2   X ** 6;   output;   end;   drop X;   run;   goptions goutmode=replace nodisplay;   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;   * Depending on your goptions, these plot options may work better:   * %let opts = haxis=axis2 vaxis=axis1 frame;   proc gplot data=X;   title;   axis1 minor=none label=(angle=90 rotate=0)   order=(   1 to 1);   axis2 minor=none order=(   1 to 1);   plot X1*X2 / &opts name='prqin1';   plot X3*X2 / &opts name='prqin2' vreverse;   plot X1*X3 / &opts name=prqin3;   symbol1 color=blue;   run; quit;   goptions display;   proc greplay nofs tc=sashelp.templt template=l2r2;   igout gseg;   treplay 1:prqin1 2:prqin2 3:prqin3;   run; quit;   * Try to Straighten the Curve;   proc prinqual data=X n=1 maxiter=50 covariance converge=0.007;   title 'Iteratively Derive Variable Transformations';   transform spline(X1-X3 / nknots=9);   run;   * Plot the Transformations;   goptions nodisplay;   proc gplot;   title;   axis1 minor=none label=(angle=90 rotate=0);   axis2 minor=none;   plot TX1*X1 / &opts name='prqin4';   plot TX2*X2 / &opts name='prqin5';   plot TX3*X3 / &opts name='prqin6';   symbol1 color=blue;   run; quit;   goptions display;   proc greplay nofs tc=sashelp.templt template=l2r2;   igout gseg;   treplay 1:prqin4 2:prqin6 3:prqin5;   run; quit;   * Plot the Straightened Scatter Plot;   goptions nodisplay;   proc gplot;   axis1 minor=none label=(angle=90 rotate=0)   order=(   1 to 1);   axis2 minor=none order=(   1 to 1);   plot TX1*TX2 / &opts name='prqin7';   plot TX3*TX2 / &opts name='prqin8' vreverse;   plot TX1*TX3 / &opts name='prqin9';   symbol1 color=blue;   run; quit;   goptions display;   proc greplay nofs tc=sashelp.templt template=l2r2;   igout gseg;   treplay 1:prqin7 2:prqin8 3:prqin9;   run; quit;  



SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net