Examples


Example 56.1. Examining Model Details

The following example, from Umetrics (1995), demonstrates different ways to examine a PLS model. The data come from the field of drug discovery. New drugs are developed from chemicals that are biologically active. Testing a compound for biological activity is an expensive procedure, so it is useful to be able to predict biological activity from cheaper chemical measurements. In fact, computational chemistry makes it possible to calculate certain chemical measurements without even making the compound. These measurements include size , lipophilicity, and polarity at various sites on the molecule . The following statements create a data set named penta , which contains these data.

  data penta;   input obsnam $ S1 L1 P1 S2 L2 P2   S3 L3 P3 S4 L4 P4   S5 L5 P5  log_RAI @@;   n = _n_;   datalines;   VESSK   2.6931   2.5271 -1.2871  3.0777   0.3891   0.0701   1.9607   1.6324  0.5746  1.9607   1.6324  0.5746   2.8369  1.4092   3.1398                     0.00   VESAK   2.6931 -2.5271   1.2871  3.0777   0.3891   0.0701   1.9607   1.6324  0.5746  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     0.28   VEASK   22.6931   2.5271   1.2871  3.0777  0.3891   0.0701   0.0744   1.7333  0.0902  1.9607   1.6324  0.5746   2.8369  1.4092   3.1398                     0.20   VEAAK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701   2.8369  1.4092   3.1398                     0.51   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   VKAAK   2.6931   2.5271   1.2871  2.8369   1.4092   3.1398   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     0.11   VEWAK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     2.73   VEAAP   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902     1.2201  0.8829  2.2253                     0.18   VEHAK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701   2.4064  1.7438  1.1057  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     1.53   VAAAK   2.6931   2.5271   1.2871  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398   0.10   GEAAK     2.2261   5.3648  0.3049  3.0777   0.3891   0.0701   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   2.8369  1.4092 -3.1398   0.52   LEAAK   4.1921   1.0285   0.9801  3.0777   0.3891   0.0701   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     0.40   FEAAK   4.9217  1.2977  0.4473  3.0777   0.3891   0.0701   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     0.30   VEGGK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701   2.2261   5.3648  0.3049  2.2261   5.3648  0.3049   2.8369  1.4092   3.1398   1.00   VEFAK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701     4.9217  1.2977  0.4473  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     1.57   VELAK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701     4.1921   1.0285   0.9801  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     0.59   AAAAA     0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902   0.10   AAYAA     0.0744   1.7333  0.0902  0.0744   1.7333  0.0902     1.3944  2.3230  0.0139  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902                     0.46   AAWAA     0.0744   1.7333  0.0902  0.0744   1.7333  0.0902     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902                     0.75   VAWAA   2.6931   2.5271   1.2871  0.0744   1.7333  0.0902     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902                     1.43   VAWAK   2.6931   2.5271   1.2871  0.0744   1.7333  0.0902     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     1.45   VKWAA   2.6931   2.5271   1.2871  2.8369   1.4092   3.1398     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902                     1.71   VWAAK   2.6931   2.5271   1.2871   4.7548   3.6521  0.8524   0.0744   1.7333  0.0902  0.0744   1.7333  0.0902   2.8369  1.4092   3.1398                     0.04   VAAWK   2.6931   2.5271   1.2871  0.0744   1.7333  0.0902   0.0744   1.7333  0.0902   4.7548   3.6521  0.8524   2.8369  1.4092   3.1398                     0.23   EKWAP     3.0777  0.3891   0.0701  2.8369   1.4092   3.1398     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902     1.2201  0.8829  2.2253                     1.30   VKWAP   2.6931   2.5271   1.2871  2.8369   1.4092   3.1398     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902     1.2201  0.8829  2.2253                     2.35   RKWAP     2.8827  2.5215 3.4435  2.8369   1.4092   3.1398     4.7548  3.6521  0.8524  0.0744   1.7333  0.0902     1.2201  0.8829  2.2253                     1.98   VEWVK   2.6931   2.5271   1.2871  3.0777   0.3891   0.0701     4.7548  3.6521  0.8524   2.6931   2.5271   1.2871   2.8369  1.4092   3.1398                     1.71   PGFSP   1.2201  0.8829  2.2253  2.2261   5.3648  0.3049     4.9217  1.2977  0.4473  1.9607   1.6324  0.5746     1.2201  0.8829  2.2253                     0.90   FSPFR   4.9217  1.2977  0.4473  1.9607   1.6324  0.5746     1.2201  0.8829  2.2253   4.9217   1.2977  0.4473   2.8827  2.5215   3.4435                     0.64   RYLPT     2.8827  2.5215   3.4435   1.3944   2.3230  0.0139     4.1921   1.0285   0.9801   1.2201   0.8829  2.2253   0.9243   2.0921   1.3996                     0.40   GGGGG     2.2261   5.3648  0.3049  2.2261   5.3648  0.3049   2.2261   5.3648  0.3049  2.2261   5.3648  0.3049   2.2261   5.3648  0.3049                      .   ;   data ptrain; set penta; if (n <= 15);   data ptest ; set penta; if (n >  15);   run;  

You would like to study the relationship between these measurements and the activity of the compound, represented by the logarithm of the relative Bradykinin activating activity ( log_ RAI ). Notice that these data consist of many predictors relative to the number of observations. Partial least squares is especially appropriate in this situation as a useful tool for finding a few underlying predictive factors that account for most of the variation in the response. Typically, the model is fit for part of the data (the training or work set), and the quality of the fit is judged by how well it predicts the other part of the data (the test or prediction set). For this example, the first 15 observations serve as the training set and the rest constitute the test set (refer to Ufkes et al. 1978, 1982).

When you fit a PLS model, you hope to find a few PLS factors that explain most of the variation in both predictors and responses. Factors that explain response variation provide good predictive models for new responses, and factors that explain predictor variation are well represented by the observed values of the predictors. The following statements fit a PLS model with two factors and save predicted values, residuals, and other information for each data point in a data set named outpls .

  proc pls data=ptrain nfac=2;   model log_RAI = S1-S5 L1-L5 P1-P5;   output out=outpls predicted = yhat1   yresidual = yres1   xresidual = xres1-xres15   xscore    = xscr   yscore    = yscr;   run;  

The PLS procedure displays a table, shown in Output 56.1.1, showing how much predictor and response variation is explained by each PLS factor.

Output 56.1.1: Amount of Training Set Variation Explained
start example
  The PLS Procedure   Percent Variation Accounted for   by Partial Least Squares Factors   Number of   Extracted        Model Effects        Dependent Variables   Factors     Current       Total     Current       Total   1     16.9014     16.9014     89.6399     89.6399   2     12.7721     29.6735      7.8368     97.4767  
end example
 

From Output 56.1.1, note that 97% of the response variation is already explained, but only 29% of the predictor variation is explained.

Partial least squares algorithms choose successive orthogonal factors that maximize the covariance between each X-score and the corresponding Y-score. For a good PLS model, the first few factors show a high correlation between the X- and Y-scores. The correlation usually decreases from one factor to the next . You can plot the X-scores versus the Y-scores for the first PLS factor using the following SAS statements.

  %let ifac = 1;   data pltanno; set outpls;   length text $ 2;   retain function label position 5 hsys 3 xsys 2 ysys 2   color blue style swissb;   text=%str(n); x=xscr&ifac; y=yscr&ifac;   axis1 label=(angle=270 rotate=90 "Y score &ifac")   major=(number=5) minor=none;   axis2 label=("X-score &ifac") minor=none;   symbol1 v=none i=none;   proc gplot data=outpls;   plot yscr&ifac*xscr&ifac=1   / anno=pltanno vaxis=axis1 haxis=axis2 frame cframe=ligr;   run;  

By changing the macro variable ifac to 2 instead of 1, you can use the same statements to plot the X-scores versus the Y-scores for the second PLS factor. The resulting plots are shown in Output 56.1.2 and Output 56.1.3. The numbers on the plot represent the observation number in the penta data set.

Output 56.1.2: First X- and Y-scores for Penta-Peptide Model 1
start example
click to expand
end example
 
Output 56.1.3: Second X- and Y-scores for Penta-Peptide Model 1
start example
click to expand
end example
 

For this example, the figures show high correlation between X- and Y-scores for the first factor but somewhat lower correlation for the second factor.

You can also plot the X-scores against each other to look for irregularities in the data. You should look for patterns or clearly grouped observations. If you see a curved pattern, for example, you may want to add a quadratic term . Two or more groupings of observations indicate that it might be better to analyze the groups separately, perhaps by including classification effects in the model. The following SAS statements produce a plot of the first and second X-scores:

  data pltanno; set outpls;   length text $ 2;   retain function label position 5 hsys 3 xsys 2 ysys 2   color blue style swissb;   text=%str(n); x=xscr1; y=xscr2;   axis1 label=(angle=270 rotate=90 "X score 2")   major=(number=5) minor=none;   axis2 label=("X-score 1") minor=none;   symbol1 v=none i=none;   proc gplot data=outpls;   plot xscr2*xscr1=1   / anno=pltanno vaxis=axis1 haxis=axis2 frame cframe=ligr;   run;  

The plot is shown in Output 56.1.4.

Output 56.1.4: First and Second X-scores for Penta-Peptide Model 1
start example
click to expand
end example
 

This plot appears to show most of the observations close together, with a few being more spread out with larger positive X-scores for factor 2. There are no clear grouping patterns, but observation 13 stands out; note that this observation is the most extreme on all three plots so far. This run may be overly influential in the PLS analysis; thus, you should check to make sure it is reliable.

Plots of the weights give the directions toward which each PLS factor projects. They show which predictors are most represented in each factor. Those predictors with small weights in absolute value are less important than those with large weights.

You can use the DETAILS option in the PROC PLS statement to display various model details, including the X-weights. You can then use the ODS statement to send the weights to an output data set, as follows :

  ods output XWeights=xweights;   proc pls data=ptrain nfac=2 details;   model log_RAI = S1-S5 L1-L5 P1-P5;   run;  

Once the X-weights are in a data set, you can use the following statements to plot the weights for the first two PLS factors against one another:

  proc transpose data=xweights(drop=NumberOfFactors InnerRegCoef)   out =xweights;   data xweights; set xweights;   rename col1=w1 col2=w2;   data wt_anno; set xweights;   length text $ 2;   retain function label   position 5   hsys     3   xsys     2   ysys     2   color    blue   style    swissb;   text=%str(_name_); x=w1; y=w2;   run;   axis1 label=(angle=270 rotate=90 "X weight 2")   major=(number=5) minor=none;   axis2 label=("X-weight 1") minor=none;   symbol1 v=none i=none;   proc gplot data=xweights;   plot w2*w1=1 / anno=wt_anno vaxis=axis1   haxis=axis2 frame cframe=ligr;   run; quit;  

The plot of the X-weights is shown in Output 56.1.5.

Output 56.1.5: First and Second X-weights for Penta-Peptide Model 1
start example
click to expand
end example
 

The weights plot shows a cluster of X-variables that are weighted at nearly zero for both factors. These variables add little to the model fit, and removing them may improve the model s predictive capability.

To explore further which predictors can be eliminated from the analysis, you can look at the regression coefficients for the standardized data. Predictors with small coefficients (in absolute value) make a small contribution to the response prediction. Another statistic summarizing the contribution a variable makes to the model is the Variable Importance for Projection (VIP) of Wold (1994). Whereas the regression coefficients represent the importance each predictor has in the prediction of just the response, the VIP represents the value of each predictor in fitting the PLS model for both predictors and response. If a predictor has a relatively small coefficient (in absolute value) and a small value of VIP, then it is a prime candidate for deletion. Wold in Umetrics (1995) considers a value less than 0.8 to be small for the VIP. The following statements produce coefficients and the VIP.

  /*   /  Put coefficients, weights, and R**2's into data sets.   /------------------------------------------------------- */   ods listing close;   ods output PercentVariation = pctvar   XWeights         = xweights   CenScaleParms    = solution;   proc pls data=ptrain nfac=2 details;   model log_RAI = S1 L1 P1   S2 L2 P2   S3 L3 P3   S4 L4 P4   S5 L5 P5 / solution;   run;   ods listing;   /*   /  Just reformat the coefficients.   /------------------------------------------------------- */   data solution; set solution;   format log_RAI 8.5;   if (RowName = 'Intercept') then delete;   rename RowName = Predictor log_RAI = B;   run;   /*   /  Transpose weights and R**2's.   /------------------------------------------------------- */   data xweights; set xweights; _name_='W'trim(left(_n_));   data pctvar ;  set pctvar ;  _name_='R'trim(left(_n_));   proc transpose data=xweights(drop=NumberOfFactors InnerRegCoef)   out =xweights;   proc transpose data=pctvar(keep=_name_ CurrentYVariation)   out =pctvar;   run;   /*   /  Sum the normalized squared weights times the   /  normalized R**2's.  The VIP is defined as the square   /  root of this weighted average times the number of   /  predictors.   /-------------------------------------------------------*/   proc sql;   create table vip as   select *,   w1 /sqrt(uss(w1)) as wnorm1,   w2 /sqrt(uss(w2)) as wnorm2   from xweights left join pctvar(drop=_name_) on 1;   data vip; set vip; keep _name_ vip;   array wnorm{2};   array r{2};   VIP = 0;   do i = 1 to 2;   VIP = VIP + r{i}*(wnorm{i}**2)/sum(of r1-r2);   end;   VIP = sqrt(VIP * 15);   data vipbpls; merge solution vip(drop=_name_);   proc print data=vipbpls;   run;  

The output appears in Output 56.1.6.

Output 56.1.6: Estimated PLS Regression Coefficients and VIP (Model 1)
start example
 Obs   Predictor           B      VIP      1      S1   0.13831    0.61108      2      L1         0.05720    0.31822      3      P1   0.19064    0.75127      4      S2         0.12383    0.50482      5      L2         0.05909    0.27123      6      P2         0.09361    0.35927      7      S3   0.28415    1.57775      8      L3         0.47131    2.43480      9      P3         0.26613    1.13222    10      S4   0.09145    1.22255    11      L4         0.12265    1.17994    12      P4   0.04878    0.88380    13      S5         0.03320    0.21288    14      L5         0.03320    0.21288    15      P5   0.03320    0.21288 
end example
 

For this data set, the variables L1 , L2 , P2 , S5 , L5 , and P5 have small absolute coefficients and small VIP. Looking back at the weights plot in Output 56.1.5, you can see that these variables tend to be the ones near zero for both PLS factors. You should consider dropping these variables from the model.

Example 56.2. Examining Outliers

This example is a continuation of Example 56.1 on page 3388.

A PLS model effectively models both the predictors and the responses. In order to check for outliers, you should, therefore, look at the Euclidean distance from each point to the PLS model in both the standardized predictors and the standardized responses. No point should be dramatically farther from the model than the rest. If there is a group of points that are all farther from the model than the rest, they may have something in common, in which case they should be analyzed separately. The following statements compute and plot these distances to the reduced model, dropping variables L1 , L2 , P2 , P4 , S5 , L5 , and P5 :

  proc pls data=ptrain nfac=2 noprint;   model log_RAI = S1    P1   S2   S3 L3 P3   S4 L4   ;   output out=stdres stdxsse=stdxsse   stdysse=stdysse;   data stdres; set stdres;   xdist = sqrt(stdxsse);   ydist = sqrt(stdysse);   run;   symbol1 i=needles v=dot c=blue;   proc gplot data=stdres;   plot xdist*n=1 / cframe=ligr;   proc gplot data=stdres;   plot ydist*n=1 / cframe=ligr;   run;  

The plots are shown in Output 56.2.1 and Output 56.2.2.

Output 56.2.1: Distances from the X-variables to the Model (Training Set)
start example
click to expand
end example
 
Output 56.2.2: Distances from the Y-variables to the Model (Training Set)
start example
click to expand
end example
 

There appear to be no profound outliers in either the predictor space or the response space.

Example 56.3. Choosing a PLS Model by Test Set Validation

The following example demonstrates issues in spectrometric calibration. The data (Umetrics 1995) consist of spectrographic readings on 33 samples containing known concentrations of two amino acids, tyrosine and tryptophan. The spectra are measured at 30 frequencies across the overall range of frequencies. For example, Output 56.3.1 shows the observed spectra for three samples, one with only tryptophan, one with only tyrosine, and one with a mixture of the two, all at a total concentration of 10 ˆ’ 6 .

Output 56.3.1: Spectra for Three Samples of Tyrosine and Tryptophan
start example
click to expand
end example
 

Of the 33 samples, 18 are used as a training set and 15 as a test set. The data originally appear in McAvoy et al. (1989).

These data were created in a lab, with the concentrations fixed in order to provide a wide range of applicability for the model. You want to use a linear function of the logarithms of the spectra to predict the logarithms of tyrosine and tryptophan concentration, as well as the logarithm of the total concentration. Actually, because of the possibility of zeros in both the responses and the predictors, slightly different transformations are used. The following statements create SAS data sets containing the training and test data, named ftrain and ftest , respectively:

  data ftrain;   input obsnam $ tot tyr f1-f30 @@;   try = tot - tyr;   if (tyr) then tyr_log = log10(tyr); else tyr_log = -8;   if (try) then try_log = log10(try); else try_log = -8;   tot_log = log10(tot);   datalines;   17mix35 0.00003 0     6.215   5.809   5.114   3.963   2.897   2.269   1.675   1.235     0.900   0.659   0.497   0.395   0.335   0.315   0.333   0.377     0.453 0.549   0.658   0.797   0.878   0.954   1.060   1.266     1.520   1.804   2.044   2.269   2.496   2.714   19mix35 0.00003 3E-7   -5.516 -5.294 -4.823 -3.858 -2.827 -2.249 -1.683 -1.218   -0.907 -0.658 -0.501 -0.400 -0.345 -0.323 -0.342 -0.387   -0.461 -0.554 -0.665 -0.803 -0.887 -0.960 -1.072 -1.272   -1.541 -1.814 -2.058 -2.289 -2.496 -2.712   21mix35 0.00003 7.5E-7   -5.519 -5.294 -4.501 -3.863 -2.827 -2.280 -1.716 -1.262   -0.939 -0.694 -0.536 -0.444 -0.384 -0.369 -0.377 -0.421   -0.495 -0.596 -0.706 -0.824 -0.917 -0.988 -1.103 -1.294   -1.565 -1.841 -2.084 -2.320 -2.521 -2.729   23mix35 0.00003 1.5E-6   -5.294 -4.705 -4.262 -3.605 -2.726 -2.239 -1.681 -1.250   -0.925 -0.697 -0.534 -0.437 -0.381 -0.359 -0.369 -0.426   -0.499 -0.591 -0.701 -0.843 -0.925 -0.989 -1.109 -1.310   -1.579 -1.852 -2.090 -2.316 -2.521 -2.743   25mix35 0.00003 3E-6   -4.600 -4.069 -3.764 -3.262 -2.598 -2.191 -1.680 -1.273   -0.958 -0.729 -0.573 -0.470 -0.422 -0.407 -0.422 -0.468   -0.538 -0.639 -0.753 -0.887 -0.968 -1.037 -1.147 -1.357   -1.619 -1.886 -2.141 -2.359 -2.585 -2.792   27mix35 0.00003 7.5E-6   -3.812 -3.376 -3.026 -2.726 -2.249 -1.919 -1.541 -1.198   -0.951 -0.764 -0.639 -0.570 -0.528 -0.525 -0.550 -0.606   -0.689 -0.781 -0.909 -1.031 -1.126 -1.191 -1.303 -1.503   -1.784 -2.058 -2.297 -2.507 -2.727 -2.970   29mix35 0.00003 0.000015   -3.053 -2.641 -2.382 -2.194 -1.977 -1.913 -1.728 -1.516   -1.317 -1.158 -1.029 -0.963 -0.919 -0.915 -0.933 -0.981   -1.055 -1.157 -1.271 -1.409 -1.505 -1.546 -1.675 -1.880   -2.140 -2.415 -2.655 -2.879 -3.075 -3.319   28mix35 0.00003 0.0000225   -2.626 -2.248 -2.004 -1.839 -1.742 -1.791 -1.786 -1.772   -1.728 -1.666 -1.619 -1.591 -1.575 -1.580 -1.619 -1.671   -1.754 -1.857 -1.982 -2.114 -2.210 -2.258 -2.379 -2.570   -2.858 -3.117 -3.347 -3.568 -3.764 -4.012   26mix35 0.00003 0.000027   -2.370 -1.990 -1.754 -1.624 -1.560 -1.655 -1.772 -1.899   -1.982 -2.074 -2.157 -2.211 -2.267 -2.317 -2.369 -2.460   -2.545 -2.668 -2.807 -2.951 -3.030 -3.075 -3.214 -3.376   -3.685 -3.907 -4.129 -4.335 -4.501 -4.599   24mix35 0.00003 0.0000285   -2.326 -1.952 -1.702 -1.583 -1.507 -1.629 -1.771 -1.945   -2.115 -2.297 -2.448 -2.585 -2.696 -2.808 -2.913 -3.030   -3.163 -3.265 -3.376 -3.534 -3.642 -3.721 -3.858 -4.012   -4.262 -4.501 -4.704 -4.822 -4.956 -5.292   22mix35 0.00003 0.00002925   -2.277 -1.912 -1.677 -1.556 -1.487 -1.630 -1.791 -1.969   -2.203 -2.437 -2.655 -2.844 -3.032 -3.214 -3.378 -3.503   -3.646 -3.812 -3.958 -4.129 -4.193 -4.262 -4.415 -4.501   -4.823 -5.111 -5.113 -5.294 -5.290 -5.294   20mix35 0.00003 0.0000297   -2.266 -1.912 -1.688 -1.546 -1.500 -1.640 -1.801 -2.011   -2.277 -2.545 -2.823 -3.094 -3.376 -3.572 -3.812 -4.012   -4.262 -4.415 -4.501 -4.705 -4.823 -4.823 -4.956 -5.111   -5.111 -5.516 -5.524 -5.806 -5.806 -5.806   18mix35 0.00003 0.00003   -2.258 -1.900 -1.666 -1.524 -1.479 -1.621 -1.803 -2.043   -2.308 -2.626 -2.895 -3.214 -3.568 -3.907 -4.193 -4.423   -4.825 -5.111 -5.111 -5.516 -5.516 -5.516 -5.516 -5.806   -5.806 -5.806 -5.806 -5.806 -6.210 -6.215   trp2    0.0001 0   -5.922 -5.435 -4.366 -3.149 -2.124 -1.392 -0.780 -0.336   -0.002  0.233  0.391  0.490  0.540  0.563  0.541  0.488   0.414  0.313  0.203  0.063 -0.028 -0.097 -0.215 -0.411   -0.678 -0.953 -1.208 -1.418 -1.651 -1.855   mix5    0.0001 0.00001   -3.932 -3.411 -2.964 -2.462 -1.836 -1.308 -0.796 -0.390   -0.076  0.147  0.294  0.394  0.446  0.460  0.443  0.389   0.314  0.220  0.099 -0.033 -0.128 -0.197 -0.308 -0.506   -0.785 -1.050 -1.313 -1.529 -1.745 -1.970   mix4    0.0001 0.000025   -2.996 -2.479 -2.099 -1.803 -1.459 -1.126 -0.761 -0.424   -0.144  0.060  0.195  0.288  0.337  0.354  0.330  0.274   0.206  0.105 -0.009 -0.148 -0.242 -0.306 -0.424 -0.626   -0.892 -1.172 -1.425 -1.633 -1.877 -2.071   mix3    0.0001 0.00005   -2.128 -1.661 -1.344 -1.160 -0.996 -0.877 -0.696 -0.495   -0.313 -0.165 -0.042  0.032  0.069  0.079  0.050 -0.006   -0.082 -0.179 -0.295 -0.436 -0.523 -0.584 -0.706 -0.898   -1.178 -1.446 -1.696 -1.922 -2.128 -2.350   mix6    0.0001 0.00009   -1.140 -0.757 -0.497 -0.362 -0.329 -0.412 -0.513 -0.647   -0.772 -0.877 -0.958 -1.040 -1.104 -1.162 -1.233 -1.317   -1.425 -1.543 -1.661 -1.804 -1.877 -1.959 -2.034 -2.249   -2.502 -2.732 -2.964 -3.142 -3.313 -3.576   ;   data ftest;   input obsnam $ tot tyr f1-f30 @@;   try = tot - tyr;   if (tyr) then tyr_log = log10(tyr); else tyr_log = -8;   if (try) then try_log = log10(try); else try_log = -8;   tot_log = log10(tot);   datalines;   43trp6 1E-6 0   -5.915 -5.918 -6.908 -5.428 -4.117 -5.103 -4.660 -4.351   -4.023 -3.849 -3.634 -3.634 -3.572 -3.513 -3.634 -3.572   -3.772 -3.772 -3.844 -3.932 -4.017 -4.023 -4.117 -4.227   -4.492 -4.660 -4.855 -5.428 -5.103 -5.428   59mix6 1E-6 1E-7   -5.903 -5.903 -5.903 -5.082 -4.213 -5.083 -4.838 -4.639   -4.474 -4.213 -4.001 -4.098 -4.001 -4.001 -3.907 -4.001   -4.098 -4.098 -4.206 -4.098 -4.213 -4.213 -4.335 -4.474   -4.639 -4.838 -4.837 -5.085 -5.410 -5.410   51mix6 1E-6 2.5E-7   -5.907 -5.907 -5.415 -4.843 -4.213 -4.843 -4.843 -4.483   -4.343 -4.006 -4.006 -3.912 -3.830 -3.830 -3.755 -3.912   -4.006 -4.001 -4.213 -4.213 -4.335 -4.483 -4.483 -4.642   -4.841 -5.088 -5.088 -5.415 -5.415 -5.415   49mix6 1E-6 5E-7   -5.419 -5.091 -5.091 -4.648 -4.006 -4.846 -4.648 -4.483   -4.343 -4.220 -4.220 -4.220 -4.110 -4.110 -4.110 -4.220   -4.220 -4.343 -4.483 -4.483 -4.650 -4.650 -4.846 -4.846   -5.093 -5.091 -5.419 -5.417 -5.417 -5.907   53mix6 1E-6 7.5E-7   -5.083 -4.837 -4.837 -4.474 -3.826 -4.474 -4.639 -4.838   -4.837 -4.639 -4.639 -4.641 -4.641 -4.639 -4.639 -4.837   -4.838 -4.838 -5.083 -5.082 -5.083 -5.410 -5.410 -5.408   -5.408 -5.900 -5.410 -5.903 -5.900 -6.908   57mix6 1E-6 9E-7   -5.082 -4.836 -4.639 -4.474 -3.826 -4.636 -4.638 -4.638   -4.837 -5.082 -5.082 -5.408 -5.082 -5.080 -5.408 -5.408   -5.408 -5.408 -5.408 -5.408 -5.408 -5.900 -5.900 -5.900   -5.900 -5.900 -5.900 -5.900 -6.908 -6.908   41tyro6 1E-6 1E-6   -5.104 -4.662 -4.662 -4.358 -3.705 -4.501 -4.662 -4.859   -5.104 -5.431 -5.433 -5.918 -5.918 -5.918 -5.431 -5.918   -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -6.908   -5.918 -5.918 -6.908 -6.908 -5.918 -5.918   28trp5 0.00001 0   -5.937 -5.937 -5.937 -4.526 -3.544 -3.170 -2.573 -2.115   -1.792 -1.564 -1.400 -1.304 -1.244 -1.213 -1.240 -1.292   -1.373 -1.453 -1.571 -1.697 -1.801 -1.873 -2.008 -2.198   -2.469 -2.706 -2.990 -3.209 -3.384 -3.601   37mix5  0.00001 1E-6   -5.109 -4.865 -4.501 -4.029 -3.319 -3.070 -2.569 -2.207   -1.895 -1.684 -1.516 -1.423 -1.367 -1.348 -1.374 -1.415   -1.503 -1.596 -1.718 -1.839 -1.927 -1.997 -2.118 -2.333   -2.567 -2.874 -3.106 -3.313 -3.579 -3.781   33mix5  0.00001 2.5E-6   -4.366 -4.129 -3.781 -3.467 -3.037 -2.939 -2.593 -2.268   -1.988 -1.791 -1.649 -1.565 -1.520 -1.509 -1.524 -1.580   -1.665 -1.758 -1.882 -2.037 -2.090 -2.162 -2.284 -2.465   -2.761 -3.037 -3.270 -3.520 -3.709 -3.937   31mix5  0.00001 5E-6   -3.790 -3.373 -3.119 -2.915 -2.671 -2.718 -2.555 -2.398   -2.229 -2.085 -1.971 -1.902 -1.860 -1.837 -1.881 -1.949   -2.009 -2.127 -2.230 -2.381 -2.455 -2.513 -2.624 -2.827   -3.117 -3.373 -3.586 -3.785 -4.040 -4.366   35mix5  0.00001 7.5E-6   -3.321 -2.970 -2.765 -2.594 -2.446 -2.548 -2.616 -2.617   -2.572 -2.550 -2.508 -2.487 -2.488 -2.487 -2.529 -2.593   -2.688 -2.792 -2.908 -3.037 -3.149 -3.189 -3.273 -3.467   -3.781 -4.029 -4.241 -4.501 -4.669 -4.865   39mix5  0.00001 9E-6   -3.142 -2.812 -2.564 -2.404 -2.281 -2.502 -2.589 -2.706   -2.842 -2.964 -3.068 -3.103 -3.182 -3.268 -3.361 -3.411   -3.517 -3.576 -3.705 -3.849 -3.932 -3.932 -4.029 -4.234   -4.501 -4.664 -4.860 -5.104 -5.431 -5.433   26tyro5 0.00001 0.00001   -3.037 -2.696 -2.464 -2.321 -2.239 -2.444 -2.602 -2.823   -3.144 -3.396 -3.742 -4.063 -4.398 -4.699 -4.893 -5.138   -5.140 -5.461 -5.463 -5.945 -5.461 -5.138 -5.140 -5.138   -5.138 -5.463 -5.461 -5.461 -5.461 -5.461   tyro2   0.0001 0.0001   -1.081 -0.710 -0.470 -0.337 -0.327 -0.433 -0.602 -0.841   -1.119 -1.423 -1.750 -2.121 -2.449 -2.818 -3.110 -3.467   -3.781 -4.029 -4.241 -4.366 -4.501 -4.366 -4.501 -4.501   -4.668 -4.668 -4.865 -4.865 -5.109 -5.111   ;  

The following statements fit a PLS model with 10 factors.

  proc pls data=ftrain nfac=10;   model tot_log tyr_log try_log = f1-f30;   run;  

The table shown in Output 56.3.2 indicates that only three or four factors are required to explain almost all of the variation in both the predictors and the responses.

Output 56.3.2: Amount of Training Set Variation Explained
start example
  The PLS Procedure   Percent Variation Accounted for   by Partial Least Squares Factors   Number of   Extracted        Model Effects        Dependent Variables   Factors     Current       Total     Current       Total   1     81.1654     81.1654     48.3385     48.3385   2     16.8113     97.9768     32.5465     80.8851   3      1.7639     99.7407     11.4438     92.3289   4      0.1951     99.9357      3.8363     96.1652   5      0.0276     99.9634      1.6880     97.8532   6      0.0132     99.9765      0.7247     98.5779   7      0.0052     99.9817      0.2926     98.8705   8      0.0053     99.9870      0.1252     98.9956   9      0.0049     99.9918      0.1067     99.1023   10      0.0034     99.9952      0.1684     99.2707  
end example
 

In order to choose the optimal number of PLS factors, you can explore how well models based on the training data with different numbers of factors fit the test data. To do so, use the CV=TESTSET option, with an argument pointing to the test data set ftest , as in the following statements:

  proc pls data=ftrain nfac=10 cv=testset(ftest)   cvtest(stat=press seed=12345);   model tot_log tyr_log try_log = f1-f30;   run;  

The results of the test set validation are shown in Output 56.3.3. They indicate that, although five PLS factors give the minimum predicted residual sum of squares, the residuals for four factors are insignificantly different from those for five. Thus, the smaller model is preferred.

Output 56.3.3: Test Set Validation for the Number of PLS Factors
start example
  The PLS Procedure   Test Set Validation for the Number of Extracted Factors   Number of        Root   Extracted        Mean    Prob >   Factors       PRESS     PRESS   0    3.056797    <.0001   1    2.630561    <.0001   2     1.00706    0.0070   3    0.664603    0.0020   4    0.521578    0.3800   5    0.500034    1.0000   6    0.513561    0.5100   7    0.501431    0.6870   8    1.055791    0.1530   9    1.435085    0.1010   10    1.720389    0.0320   Minimum root mean PRESS                      0.5000   Minimizing number of factors                      5   Smallest number of factors with p > 0.1           4   The PLS Procedure   Percent Variation Accounted for   by Partial Least Squares Factors   Number of   Extracted        Model Effects        Dependent Variables   Factors     Current       Total     Current       Total   1     81.1654     81.1654     48.3385     48.3385   2     16.8113     97.9768     32.5465     80.8851   3      1.7639     99.7407     11.4438     92.3289   4      0.1951     99.9357      3.8363     96.1652  
end example
 

The factor loadings show how the PLS factors are constructed from the centered and scaled predictors. For spectral calibration, it is useful to plot the loadings against the frequency. In many cases, the physical meanings that can be attached to factor loadings help to validate the scientific interpretation of the PLS model. You can use the following statements to plot the loadings for the four PLS factors against frequency.

  ods listing close;   ods output XLoadings=xloadings;   proc pls data=ftrain nfac=4 details method=pls;   model tot_log tyr_log try_log = f1-f30;   run;   ods listing;   proc transpose data=xloadings(drop=NumberOfFactors)   out =xloadings;   data xloadings; set xloadings;   n = _n_;   rename col1=Factor1 col2=Factor2   col3=Factor3 col4=Factor4;   run;   goptions border;   axis1 label=("Loading") major=(number=5) minor=none;   axis2 label=("Frequency")                 minor=none;   symbol1 v=none i=join c=red   l=1;   symbol2 v=none i=join c=green l=1 /*l= 3*/;   symbol3 v=none i=join c=blue  l=1 /*l=34*/;   symbol4 v=none i=join c=yellowl=1 /*l=46*/;   legend1 label=none cborder=black;   proc gplot data=xloadings;   plot (Factor1 Factor2 Factor3 Factor4)*n   / overlay legend=legend1 vaxis=axis1   haxis=axis2 vref=0 lvref=2 frame cframe=ligr;   run; quit;  

The resulting plot is shown in Output 56.3.4.

Output 56.3.4: Predictor Loadings Across Frequencies
start example
click to expand
end example
 

Notice that all four factors handle frequencies below and above about 7 or 8 differently. For example, the first factor is very nearly a simple contrast between the averages of the two sets of frequencies, and the second factor appears to be a weighted sum of only the frequencies in the first set.




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net