Examples | SAS.STAT 9.1 Users Guide (Vol. 5)

Example 59.1. Multidimensional Preference Analysis of Cars Data

This example uses PROC PRINQUAL to perform a nonmetric multidimensional preference (MDPREF) analysis (Carroll 1972). MDPREF analysis is a principal component analysis of a data matrix with columns that correspond to people and rows that correspond to objects. The data are ratings or rankings of each person s preference for each object. The data are the transpose of the usual multivariate data matrix. (In other words, the columns are people instead of the more typical matrix where rows represent people.) The final result of an MDPREF analysis is a biplot (Gabriel 1981) of the resulting preference space. A biplot displays the judges and objects in a single plot by projecting them onto the plane in the transformed variable space that accounts for the most variance.

The data are ratings by 25 judges of their preference for each of 17 automobiles. The ratings are made on a 0 to 9 scale, with 0 meaning very weak preference and 9 meaning very strong preference for the automobile. These judgments were made in 1980 about that year s products. There are two additional variables that indicate the manufacturer and model of the automobile.

This example uses PROC PRINQUAL, PROC FACTOR, and the % PLOTIT macro. PROC FACTOR is used before PROC PRINQUAL to perform a principal component analysis of the raw judgments. PROC FACTOR is also used immediately after PROC PRINQUAL since PROC PRINQUAL is a scoring procedure that optimally scores the data but does not report the principal component analysis.

The %PLOTIT macro produces the biplot. For information on the %PLOTIT macro, see Appendix B, Using the %PLOTIT Macro.

The scree plot, in the standard principal component analysis reported by PROC FACTOR, shows that two principal components should be retained for further use. (See the scree plot in Output 59.1.1 ”there is a clear separation between the first two components and the remaining components.) There are nine eigenvalues that are precisely zero because there are nine fewer observations than variables in the data matrix. PROC PRINQUAL is then used to monotonically transform the raw judgments to maximize the proportion of variance accounted for by the first two principal components. The following statements create the data set and perform a principal component analysis of the original data. These statements produce Output 59.1.1.

  title 'Preference Ratings for Automobiles Manufactured in 1980';   data CarPref;   input Make $ 1-10 Model $ 12-22 @25 (Judge1-Judge25) (1.);   datalines;   Cadillac   Eldorado     8007990491240508971093809   Chevrolet  Chevette     0051200423451043003515698   Chevrolet  Citation     4053305814161643544747795   Chevrolet  Malibu       6027400723121345545668658   Ford       Fairmont     2024006715021443530648655   Ford       Mustang      5007197705021101850657555   Ford       Pinto        0021000303030201500514078   Honda      Accord       5956897609699952998975078   Honda      Civic        4836709507488852567765075   Lincoln    Continental  7008990592230409962091909   Plymouth   Gran Fury    7006000434101107333458708   Plymouth   Horizon      3005005635461302444675655   Plymouth   Volare       4005003614021602754476555   Pontiac    Firebird     0107895613201206958265907   Volkswagen Dasher       4858696508877795377895000   Volkswagen Rabbit       4858509709695795487885000   Volvo      DL           9989998909999987989919000   ;   * Principal Component Analysis of the Original Data;   options ls=80 ps=65;   proc factor data=CarPref nfactors=2 scree;   ods select Eigenvalues ScreePlot;   var Judge1-Judge25;   title3 'Principal Components of Original Data';   run;

Output 59.1.1: Principal Component Analysis of Original Data

  Preference Ratings for Automobiles Manufactured in 1980   Principal Components of Original Data   The FACTOR Procedure   Initial Factor Method: Principal Components   Eigenvalues of the Correlation Matrix: Total = 25  Average = 1   Eigenvalue    Difference    Proportion    Cumulative   1    10.8857202     5.0349926        0.4354        0.4354   2     5.8507276     3.8077964        0.2340        0.6695   3     2.0429312     0.5207808        0.0817        0.7512   4     1.5221504     0.3078035        0.0609        0.8121   5     1.2143469     0.2564839        0.0486        0.8606   6     0.9578630     0.2197345        0.0383        0.8989   7     0.7381286     0.1497259        0.0295        0.9285   8     0.5884027     0.2117186        0.0235        0.9520   9     0.3766841     0.1091250        0.0151        0.9671   10     0.2675591     0.0773893        0.0107        0.9778   11     0.1901698     0.0463921        0.0076        0.9854   12     0.1437776     0.0349382        0.0058        0.9911   13     0.1088394     0.0607418        0.0044        0.9955   14     0.0480977     0.0056610        0.0019        0.9974   15     0.0424367     0.0202714        0.0017        0.9991   16     0.0221653     0.0221653        0.0009        1.0000   17     0.0000000     0.0000000        0.0000        1.0000   18     0.0000000     0.0000000        0.0000        1.0000   19     0.0000000     0.0000000        0.0000        1.0000   20     0.0000000     0.0000000        0.0000        1.0000   21     0.0000000     0.0000000        0.0000        1.0000   22     0.0000000     0.0000000        0.0000        1.0000   23     0.0000000     0.0000000        0.0000        1.0000   24     0.0000000     0.0000000        0.0000        1.0000   25     0.0000000                      0.0000        1.0000   Preference Ratings for Automobiles Manufactured in 1980   Principal Components of Original Data   The FACTOR Procedure   Initial Factor Method: Principal Components   Scree Plot of Eigenvalues             12 +         1       10 +               8 +   E   i   g   e   n   v   a 6 +   l             2   u   e   s       4 +               2 +            3     4   5   6 7   8   9 0 1 2   0 +                                     3 4 5 6 7 8 9 0 1 2 3 4 5           -----+----+----+----+----+----+----+----+----+----+----+----+----+----+----   0    2    4    6    8   10   12   14   16   18   20   22   24   26   Number

To fit the nonmetric MDPREF model, you can use the PRINQUAL procedure. The MONOTONE option is specified in the TRANSFORM statement to request a nonmetric MDPREF analysis; alternatively, you can instead specify the IDENTITY option for a metric analysis. Several options are used in the PROC PRINQUAL statement. The option DATA= CarPref specifies the input data set, OUT= Results creates an output data set, and N=2 and the default METHOD=MTV transform the data to better fit a two-component model. The REPLACE option replaces the original data with the monotonically transformed data in the OUT= data set. The MDPREF option standardizes the component scores to variance one so that the geometry of the biplot is correct, and it creates two variables in the OUT= data set named Prin1 and Prin2 . These variables contain the standardized principal component scores and structure matrix, which are used to make the biplot. If the variables in data matrix X are standardized to mean zero and variance one, and n is the number of rows in X , then X = V ¹ ^/ ² W ² is the principal component model, where X ² X / ( n ˆ’ 1) = W W ² . The W and contain the eigenvectors and eigenvalues of the correlation matrix of X .Thefirst two columns of V , the standardized component scores, and W ¹ / ² , which is the structure matrix, are output. The advantage of creating a biplot based on principal components is that coordinates do not depend on the sample size . The following statements transform the data and produce Output 59.1.2.

  * Transform the Data to Better Fit a Two Component Model;   proc prinqual data=CarPref out=Results n=2 replace mdpref;   id model;   transform monotone(Judge1-Judge25);   title2 'Multidimensional Preference (MDPREF) Analysis';   title3 'Optimal Monotonic Transformation of Preference Data';   run;

Output 59.1.2: Transformation of Automobile Preference Data

  Preference Ratings for Automobiles Manufactured in 1980   Multidimensional Preference (MDPREF) Analysis   Optimal Monotonic Transformation of Preference Data   The PRINQUAL Procedure   PRINQUAL MTV Algorithm Iteration History   Iteration   Average    Maximum     Proportion    Criterion   Number     Change     Change    of Variance       Change    Note   ----------------------------------------------------------------------------   1    0.24994    1.28017        0.66946   2    0.07223    0.36958        0.80194      0.13249   3    0.04522    0.29026        0.81598      0.01404   4    0.03096    0.25213        0.82178      0.00580   5    0.02182    0.23045        0.82493      0.00315   6    0.01602    0.19017        0.82680      0.00187   7    0.01219    0.14748        0.82793      0.00113   8    0.00953    0.11031        0.82861      0.00068   9    0.00737    0.06461        0.82904      0.00043   10    0.00556    0.04469        0.82930      0.00026   11    0.00445    0.04087        0.82944      0.00014   12    0.00381    0.03706        0.82955      0.00011   13    0.00319    0.03348        0.82965      0.00009   14    0.00255    0.02999        0.82971      0.00006   15    0.00213    0.02824        0.82976      0.00005   16    0.00183    0.02646        0.82980      0.00004   17    0.00159    0.02472        0.82983      0.00003   18    0.00139    0.02305        0.82985      0.00003   19    0.00123    0.02145        0.82988      0.00002   20    0.00109    0.01993        0.82989      0.00002   21    0.00096    0.01850        0.82991      0.00001   22    0.00086    0.01715        0.82992      0.00001   23    0.00076    0.01588        0.82993      0.00001   24    0.00067    0.01440        0.82994      0.00001   25    0.00059    0.00871        0.82994      0.00001   26    0.00050    0.00720        0.82995      0.00000   27    0.00043    0.00642        0.82995      0.00000   28    0.00037    0.00573        0.82995      0.00000   29    0.00031    0.00510        0.82995      0.00000   30    0.00027    0.00454        0.82995      0.00000    Not Converged   WARNING: Failed to converge, however criterion change is less than 0.0001.

The iteration history displayed by PROC PRINQUAL indicates that the proportion of variance is increased from an initial 0.66946 to 0.82995. The proportion of variance accounted for by PROC PRINQUAL on the first iteration equals the cumulative proportion of variance shown by PROC FACTOR for the first two principal components. In this example, PROC PRINQUAL s initial iteration performs a standard principal component analysis of the raw data. The columns labeled Average Change, Maximum Change, and Variance Change contain values that always decrease, indicating that PROC PRINQUAL is improving the transformations at a monotonically decreasing rate over the iterations. This does not always happen, and when it does not, it suggests that the analysis may be converging to a degenerate solution. See Example 59.3 on page 3688 for a discussion of a degenerate solution. The algorithm does not converge in 30 iterations. However, the criterion change is small, indicating that more iterations are unlikely to have much effect on the results.

The second PROC FACTOR analysis is performed on the transformed data. The WHERE statement is used to retain only the monotonically transformed judgments. The scree plot shows that the first two eigenvalues are now much larger than the remaining smaller eigenvalues. The second eigenvalue has increased markedly at the expense of the next several eigenvalues. Two principal components seem to be necessary and sufficient to adequately describe these judges preferences for these automobiles. The cumulative proportion of variance displayed by PROC FACTOR for the first two principal components is 0.83. The following statements perform the analysis and produce Output 59.1.3:

  * Final Principal Component Analysis;   proc factor data=Results nfactors=2 scree;   ods select Eigenvalues ScreePlot;   var Judge1-Judge25;   where _TYPE_='SCORE';   title3 'Principal Components of Monotonically Transformed Data';   run;

Output 59.1.3: Principal Components of Transformed Data

  Preference Ratings for Automobiles Manufactured in 1980   Multidimensional Preference (MDPREF) Analysis   Principal Components of Monotonically Transformed Data   The FACTOR Procedure   Initial Factor Method: Principal Components   Eigenvalues of the Correlation Matrix: Total = 25  Average = 1   Eigenvalue    Difference    Proportion    Cumulative   1    11.5959045     2.4429455        0.4638        0.4638   2     9.1529589     7.9952554        0.3661        0.8300   3     1.1577036     0.3072013        0.0463        0.8763   4     0.8505023     0.1284323        0.0340        0.9103   5     0.7220700     0.2613540        0.0289        0.9392   6     0.4607160     0.0958339        0.0184        0.9576   7     0.3648821     0.0877851        0.0146        0.9722   8     0.2770970     0.1250945        0.0111        0.9833   9     0.1520025     0.0506622        0.0061        0.9894   10     0.1013403     0.0292763        0.0041        0.9934   11     0.0720640     0.0200979        0.0029        0.9963   12     0.0519661     0.0336675        0.0021        0.9984   13     0.0182987     0.0027059        0.0007        0.9991   14     0.0155927     0.0093669        0.0006        0.9997   15     0.0062258     0.0055503        0.0002        1.0000   16     0.0006755     0.0006755        0.0000        1.0000   17     0.0000000     0.0000000        0.0000        1.0000   18     0.0000000     0.0000000        0.0000        1.0000   19     0.0000000     0.0000000        0.0000        1.0000   20     0.0000000     0.0000000        0.0000        1.0000   21     0.0000000     0.0000000        0.0000        1.0000   22     0.0000000     0.0000000        0.0000        1.0000   23     0.0000000     0.0000000        0.0000        1.0000   24     0.0000000     0.0000000        0.0000        1.0000   25     0.0000000                      0.0000        1.0000   Preference Ratings for Automobiles Manufactured in 1980   Multidimensional Preference (MDPREF) Analysis   Principal Components of Monotonically Transformed Data   The FACTOR Procedure   Initial Factor Method: Principal Components   Scree Plot of Eigenvalues             12 +   1             10 +       2         8 +   E   i   g   e   n   v   a 6 +   l   u   e   s       4 +               2 +       3   4 5   6   7 8 9   0 +                             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5           -----+----+----+----+----+----+----+----+----+----+----+----+----+----+----   0    2    4    6    8   10   12   14   16   18   20   22   24   26   Number

The remainder of the example constructs the MDPREF biplot. A biplot is a plot that displays the relation between the row points and the columns of a data matrix. The rows of V , the standardized component scores, and W ^{1 / 2} , which is the structure matrix, contain enough information to reproduce X . The ( i, j ) element of X is the product of row i of V and row j of W ^{1 / 2} . If all but the first two columns of V and W ^{1 / 2} are discarded, the ( i, j ) element of X is approximated by the product of row i of V and row j of W ¹ / ² .

Since the MDPREF analysis is based on a principal component model, the dimensions of the MDPREF biplot are the first two principal components. The first principal component is the longest dimension through the MDPREF biplot. The first principal component is overall preference, which is the most salient dimension in the preference judgments. One end points in the direction that is on the average preferred most by the judges, and the other end points in the least preferred direction. The second principal component is orthogonal to the first principal component, and it is the orthogonal direction that is the second most salient. The interpretation of the second dimension varies from example to example.

With an MDPREF biplot, it is geometrically appropriate to represent each automobile (object) by a point and each judge by a vector. The automobile points have coordinates that are the scores of the automobile on the first two principal components. The judge vectors emanate from the origin of the space and go through a point with coordinates that are the coefficients of the judge (variable) on the first two principal components.

The absolute length of a vector is arbitrary. However, the relative lengths of the vectors indicate fit, with the squared lengths being proportional to the communalities in the PROC FACTOR output. The direction of the vector indicates the direction that is most preferred by the individual judge, with preference increasing as the vector moves from the origin. Let v ² be row i of V , u ² be row j of U = W ^1/ ² , ˆ v ˆ be the length of v , ˆ u ˆ be the length of u , and be the angle between v and u . The predicted degree of preference that an individual judge has for an automobile is u ² v = ˆ u ˆ v ˆ cos . Each car point can be orthogonally projected onto the vector. The projection of car i on vector j is u (( u ² v/ )( u ² u )) and the length of this projection is ˆ v ˆ cos . The automobile that projects farthest along a vector in the direction it points is that judge s most preferred automobile, since the length of this projection, ˆ v ˆ cos , differs from the predicted preference, ˆ u ˆ v ˆ cos , only by ˆ u ˆ , which is constant within each judge.

To interpret the biplot, look for directions through the plot that show a continuous change in some attribute of the automobiles, or look for regions in the plot that contain clusters of automobile points and determine what attributes the automobiles have in common. Those points that are tightly clustered in a region of the plot represent automobiles that have the same preference patterns across the judges. Those vectors that point in roughly the same direction represent judges who tend to have similar preference patterns.

The following statement constructs the biplot and produces Output 59.1.4:

  title3 'Biplot of Automobiles and Judges';   %plotit(data=results, datatype=mdpref 2);

Output 59.1.4: Preference Ratings for Automobiles Manufactured in 1980

The DATATYPE=MDPREF 2 option indicates that the coordinates come from an MDPREF analysis, so the macro represents the scores as points and the structure as vectors, with the vectors stretched by a factor of two to make a better graphical display.

In the biplot, American automobiles are located on the left of the space, while European and Japanese automobiles are located on the right. At the top of the space are expensive American automobiles (Cadillac Eldorado, Lincoln Continental) while at the bottom are inexpensive ones (Pinto, Chevette). The first principal component differentiates American from imported automobiles, and the second arranges automobiles by price and other associated characteristics.

The two expensive American automobiles form a cluster, the sporty automobile (Firebird) is by itself, the Volvo DL is by itself, and the remaining imported autos form a cluster, as do the remaining American autos. It seems there are 5 prototypical automobiles in this set of 17, in terms of preference patterns among the 25 judges.

Most of the judges prefer the imported automobiles, especially the Volvo. There is also a fairly large minority that prefer the expensive cars, whether or not they are American (those with vectors that point towards one o clock), or simply prefer expensive American automobiles (vectors that point towards eleven o clock). There are two people who prefer anything except expensive American cars (five o clock vectors), and one who prefers inexpensive American cars (seven o clock vector).

Several vectors point toward the upper-right corner of the plot, toward a region with no cars. This is the region between the European and Japanese cars on the right and the luxury cars on the top. This suggests that there is a market for luxury Japanese and European cars.

Example 59.2. Multidimensional Preference Analysis of Cars Data, ODS Graphics (Experimental)

The following graphical displays are requested by specifying the experimental ODS GRAPHICS statement. For general information about ODS graphics see Chapter 15, Statistical Graphics Using ODS. For specific information about the graphics available in the PRINQUAL procedure, see the ODS Graphics section on page 3677.

  title 'Preference Ratings for Automobiles Manufactured in 1980';   options validvarname=any;   data CarPref;   input Make $ 1-10 Model $ 12-22 @25 ('1'n-'25'n) (1.);   datalines;   Cadillac   Eldorado     8007990491240508971093809   Chevrolet  Chevette     0051200423451043003515698   Chevrolet  Citation     4053305814161643544747795   Chevrolet  Malibu       6027400723121345545668658   Ford       Fairmont     2024006715021443530648655   Ford       Mustang      5007197705021101850657555   Ford       Pinto        0021000303030201500514078   Honda      Accord       5956897609699952998975078   Honda      Civic        4836709507488852567765075   Lincoln    Continental  7008990592230409962091909   Plymouth   Gran Fury    7006000434101107333458708   Plymouth   Horizon      3005005635461302444675655   Plymouth   Volare       4005003614021602754476555   Pontiac    Firebird     0107895613201206958265907   Volkswagen Dasher       4858696508877795377895000   Volkswagen Rabbit       4858509709695795487885000   Volvo      DL           9989998909999987989919000   ;   ods html;   ods graphics on;   proc prinqual data=CarPref out=Results n=2 replace mdpref maxiter=100;   id model;   transform monotone('1'n-'25'n);   title2 'Multidimensional Preference (MDPREF) Analysis';   title3 'Optimal Monotonic Transformation of Preference Data';   run;   ods graphics off;   ods html close;

Output 59.2.1: Multidimensional Preference Analysis (Experimental)

Example 59.3. Principal Components of Basketball Rankings

The data in this example are 1985 “1986 preseason rankings of 35 college basketball teams by 10 different news services. The services do not all rank the same teams or the same number of teams, so there are missing values in these data. Each of the 35 teams in the data set is ranked by at least one news service. One way of summarizing these data is with a principal component analysis, since the rankings should all be related to a single underlying variable, the first principal component.

You can use PROC PRINQUAL to estimate the missing ranks and compute scores for all observations. You can formulate a PROC PRINQUAL analysis that assumes that the observed ranks are ordinal variables and replaces the ranks with new numbers that are monotonic with the ranks and better fit the one principal component model. The missing rank estimates need to be constrained since a news service would have positioned the unranked teams below the teams it ranked. PROC PRINQUAL should impose order constraints within the nonmissing values and between the missing and nonmissing values, but not within the missing values. PROC PRINQUAL has sophisticated missing data handling facilities; however, these facilities cannot directly handle this problem. The solution requires reformulating the problem.

By performing some preliminary data manipulations, specifying the N=1 option in the PROC PRINQUAL statement, and specifying the UNTIE transformation in the TRANSFORM statement, you can make the missing value estimates conform to the requirements. The PROC MEANS step finds the largest rank for each variable. The next DATA step replaces missing values with a value that is one larger than the largest observed rank. The N=1 option (in the PRINQUAL procedure) specifies that the variables should be transformed to make them as one-dimensional as possible. The UNTIE transformation in the TRANSFORM statement monotonically transforms the ranks, untying any ties in an optimal way. Because the only ties are for the values that replace the missing values, and because these values are larger than the observed values, the rescoring of the data satisfies the preceding requirements.

The following statements create the data set and perform the transformations discussed previously. These statements produce Output 59.3.1.

  * Example 2: Basketball Data   *   * Preseason 1985 College Basketball Rankings   * (rankings of 35 teams by 10 news services)   *   * Note: (a) Various news services rank varying numbers of teams.   *       (b) Not all 35 teams are ranked by all news services.   *       (c) Each team is ranked by at least one service.   *       (d) Rank 20 is missing for UPI.;   title1 '1985 Preseason College Basketball Rankings';   data bballm;   input School . CSN DurhamSun DurhamHerald WashingtonPost   USA_Today SportMagazine InsideSports UPI AP   SportsIllustrated;   label CSN               = 'Community Sports News (Chapel Hill, NC)'   DurhamSun         = 'Durham Sun'   DurhamHerald      = 'Durham Morning Herald'   WashingtonPost    = 'Washington Post'   USA_Today         = 'USA Today'   SportMagazine     = 'Sport Magazine'   InsideSports      = 'Inside Sports'   UPI               = 'United Press International'   AP                = 'Associated Press'   SportsIllustrated = 'Sports Illustrated'   ;   format CSN--SportsIllustrated 5.1;   datalines;   Louisville     1  8  1  9  8  9  6 10  9  9   Georgia Tech   2  2  4  3  1  1  1  2  1  1   Kansas         3  4  5  1  5 11  8  4  5  7   Michigan       4  5  9  4  2  5  3  1  3  2   Duke           5  6  7  5  4 10  4  5  6  5   UNC            6  1  2  2  3  4  2  3  2  3   Syracuse       7 10  6 11  6  6  5  6  4 10   Notre Dame     8 14 15 13 11 20 18 13 12  .   Kentucky       9 15 16 14 14 19 11 12 11 13   LSU           10  9 13  . 13 15 16  9 14  8   DePaul        11  . 21 15 20  . 19  .  . 19   Georgetown    12  7  8  6  9  2  9  8  8  4   Navy          13 20 23 10 18 13 15  . 20  .   Illinois      14  3  3  7  7  3 10  7  7  6   Iowa          15 16  .  . 23  .  . 14  . 20   Arkansas      16  .  .  . 25  .  .  .  . 16   Memphis State 17  . 11  . 16  8 20  . 15 12   Washington    18  .  .  .  .  .  . 17  .  .   UAB           19 13 10  . 12 17  . 16 16 15   UNLV          20 18 18 19 22  . 14 18 18  .   NC State      21 17 14 16 15  . 12 15 17 18   Maryland      22  .  .  . 19  .  .  . 19 14   Pittsburgh    23  .  .  .  .  .  .  .  .  .   Oklahoma      24 19 17 17 17 12 17  . 13 17   Indiana       25 12 20 18 21  .  .  .  .  .   Virginia      26  . 22  .  . 18  .  .  .  .   Old Dominion 27   .  .  .  .  .  .  .  .  .   Auburn        28 11 12  8 10  7  7 11 10 11   St. Johns     29  .  .  .  . 14  .  .  .  .   UCLA          30  .  .  .  .  .  . 19  .  .   St. Joseph's   .  . 19  .  .  .  .  .  .  .   Tennessee      .  . 24  .  . 16  .  .  .  .   Montana        .  .  . 20  .  .  .  .  .  .   Houston        .  .  .  . 24  .  .  .  .  .   Virginia Tech  .  .  .  .  .  . 13  .  .  .   ;   * Find maximum rank for each news service and replace   * each missing value with the next highest rank.;   proc means data=bballm noprint;   output out=maxrank   max=mcsn mdurs mdurh mwas musa mspom mins mupi map mspoi;   run;   data bball;   set bballm;   if _n_=1 then set maxrank;   array services[10] CSN--SportsIllustrated;   array maxranks[10] mcsn--mspoi;   keep School CSN--SportsIllustrated;   do i=1 to 10;   if services[i]=. then services[i]=maxranks[i]+1;   end;   run;   * Assume that the ranks are ordinal and that unranked teams   * would have been ranked lower than ranked teams.  Monotonically   * transform all ranked teams while estimating the unranked teams.   * Enforce the constraint that the missing ranks are estimated to   * be less than the observed ranks.  Order the unranked teams   * optimally within this constraint.  Do this so as to maximize   * the variance accounted for by one linear combination. This   * makes the data as nearly rank one as possible, given the   * constraints.   *   * NOTE: The UNTIE transformation should be used with caution.   * If frequently produces degenerate results.;   proc prinqual data=bball out=tbball scores n=1 tstandard=z;   title2 'Optimal Monotonic Transformation of Ranked Teams';   title3 'with Constrained Estimation of Unranked Teams';   transform untie(CSN -- SportsIllustrated);   id School;   run;

Output 59.3.1: Transformation of Basketball Team Rankings

  1985 Preseason College Basketball Rankings   Optimal Monotonic Transformation of Ranked Teams   with Constrained Estimation of Unranked Teams   The PRINQUAL Procedure   PRINQUAL MTV Algorithm Iteration History   Iteration   Average    Maximum     Proportion    Criterion   Number     Change     Change    of Variance       Change    Note   ----------------------------------------------------------------------------   1    0.18563    0.76531        0.85850   2    0.03225    0.14627        0.94362      0.08512   3    0.02126    0.10530        0.94669      0.00307   4    0.01467    0.07526        0.94801      0.00132   5    0.01067    0.05282        0.94865      0.00064   6    0.00800    0.03669        0.94899      0.00034   7    0.00617    0.02862        0.94919      0.00020   8    0.00486    0.02636        0.94932      0.00013   9    0.00395    0.02453        0.94941      0.00009   10    0.00327    0.02300        0.94947      0.00006   11    0.00275    0.02166        0.94952      0.00005   12    0.00236    0.02041        0.94956      0.00004   13    0.00205    0.01927        0.94959      0.00003   14    0.00181    0.01818        0.94962      0.00003   15    0.00162    0.01719        0.94964      0.00002   16    0.00147    0.01629        0.94966      0.00002   17    0.00136    0.01546        0.94968      0.00002   18    0.00128    0.01469        0.94970      0.00002   19    0.00121    0.01398        0.94971      0.00001   20    0.00115    0.01332        0.94973      0.00001   21    0.00111    0.01271        0.94974      0.00001   22    0.00105    0.01213        0.94975      0.00001   23    0.00099    0.01155        0.94976      0.00001   24    0.00095    0.01095        0.94977      0.00001   25    0.00091    0.01038        0.94978      0.00001   26    0.00088    0.00986        0.94978      0.00001   27    0.00084    0.00936        0.94979      0.00001   28    0.00081    0.00889        0.94980      0.00001   29    0.00077    0.00846        0.94980      0.00000   30    0.00073    0.00805        0.94980      0.00000    Not Converged   WARNING: Failed to converge, however criterion change is less than 0.0001.

An alternative approach is to use the pairwise deletion option of the CORR procedure to compute the correlation matrix and then use PROC PRINCOMP or PROC FACTOR to perform the principal component analysis. This approach has several disadvantages. The correlation matrix may not be positive semidefinite (psd), an assumption required for principal component analysis. PROC PRINQUAL always produces a psd correlation matrix. Even with pairwise deletion, PROC CORR removes the six observations with only a single nonmissing value from this data set. Finally, it is still not possible to calculate scores on the principal components for those teams that have missing values.

It is possible to compute the composite ranking using PROC PRINCOMP and some preliminary data manipulations, similar to those discussed previously.

Chapter 58, The PRINCOMP Procedure, contains an example where the average of the unused ranks in each poll is substituted for the missing values, and each observation is weighted by the number of nonmissing values. This method has much to recommend it. It is much faster and simpler than using PROC PRINQUAL. It is also much less prone to degeneracies and capitalization on chance. However, PROC PRINCOMP does not allow the nonmissing ranks to be monotonically transformed and the missing values untied to optimize fit.

PROC PRINQUAL monotonically transforms the observed ranks and estimates the missing ranks (within the constraints given previously) to account for almost 95 percent of the variance of the transformed data by just one dimension. PROC FACTOR is then used to report details of the principal component analysis of the transformed data. As shown by the Factor Pattern values in Output 59.3.2, nine of the ten news services have a correlation of 0.95 or larger with the scores on the first principal component after the data are optimally transformed. The scores are sorted and the composite ranking is displayed following the PROC FACTOR output. More confidence can be placed in the stability of the scores for the teams that are ranked by the majority of the news services than in scores for teams that are seldom ranked.

Output 59.3.2: Alternative Approach for Analyzing Basketball Rankings

  1985 Preseason College Basketball Rankings   Optimal Monotonic Transformation of Ranked Teams   with Constrained Estimation of Unranked Teams   Principal Component Analysis   The FACTOR Procedure   Initial Factor Method: Principal Components   Prior Communality Estimates: ONE   Eigenvalues of the Correlation Matrix: Total = 10  Average = 1   Eigenvalue    Difference    Proportion    Cumulative   1    9.49808040    9.27698055        0.9498        0.9498   2    0.22109985    0.13434105        0.0221        0.9719   3    0.08675881    0.01266762        0.0087        0.9806   4    0.07409119    0.03048596        0.0074        0.9880   5    0.04360523    0.00567364        0.0044        0.9924   6    0.03793160    0.02098385        0.0038        0.9962   7    0.01694775    0.00299099        0.0017        0.9979   8    0.01395675    0.00982630        0.0014        0.9992   9    0.00413045    0.00073249        0.0004        0.9997   10    0.00339797                      0.0003        1.0000   1 factor will be retained by the NFACTOR criterion.   Factor Pattern   Factor1   TCSN                    CSN Transformation                       0.91136   TDurhamSun              DurhamSun Transformation                 0.98887   TDurhamHerald           DurhamHerald Transformation              0.97402   TWashingtonPost         WashingtonPost Transformation            0.97408   TUSA_Today              USA_Today Transformation                 0.98867   TSportMagazine          SportMagazine Transformation             0.95331   TInsideSports           InsideSports Transformation              0.98521   TUPI                    UPI Transformation                       0.98534   TAP                     AP Transformation                        0.99590   TSportsIllustrated      SportsIllustrated Transformation         0.98615   Variance Explained by Each Factor   Factor1   9.4980804   Final Communality Estimates: Total = 9.498080   TDurham      TWashington   TCSN      TDurhamSun          Herald             Post      TUSA_Today   0.83057866      0.97785439      0.94870875       0.94882907      0.97747798   TSport         TInside                                          TSports   Magazine          Sports            TUPI             TAP      Illustrated   0.90879058      0.97064640      0.97088804      0.99181626       0.97249026   1985 Preseason College Basketball Rankings   Optimal Monotonic Transformation of Ranked Teams   with Constrained Estimation of Unranked Teams   Teams Ordered by Scores on First Principal Component   OBS    School             Prin1   1    Georgia Tech     -6.20315   2    UNC              -5.93314   3    Michigan         -5.71034   4    Kansas           -4.78699   5    Duke             -4.75896   6    Illinois         -4.19220   7    Georgetown       -4.02861   8    Louisville       -3.73087   9    Syracuse         -3.47497   10    Auburn           -1.78429   11    LSU              -0.35928   12    Memphis State     0.46737   13    Kentucky          0.63661   14    Notre Dame        0.71919   15    Navy              0.76187   16    UAB               0.98316   17    DePaul            1.09891   18    Oklahoma          1.12012   19    NC State          1.15144   20    UNLV              1.28766   21    Iowa              1.45260   22    Indiana           1.48123   23    Maryland          1.54935   24    Virginia          2.01385   25    Arkansas          2.02718   26    Washington        2.10878   27    Tennessee         2.27770   28    Virginia Tech     2.36103   29    St. Johns         2.37387   30    Montana           2.43502   31    UCLA              2.52481   32    Pittsburgh        3.00907   33    Old Dominion      3.03324   34    St. Joseph's      3.39259   35    Houston           4.69614

The monotonic transformations are plotted for each of the ten news services. These plots are the values of the raw ranks (with the missing ranks replaced by the maximum rank plus one) versus the rescored (transformed) ranks. The transformations are the step functions that maximize the fit of the data to the principal component model. Smoother transformations could be found by using MSPLINE transformations, but MSPLINE transformations would not correctly handle the missing data problem.

The following statements perform the final analysis and produce Output 59.3.2:

  * Perform the Final Principal Component Analysis;   proc factor nfactors=1;   var TCSN -- TSportsIllustrated;   title4 'Principal Component Analysis';   run;   proc sort;   by Prin1;   run;   * Display Scores on the First Principal Component;   proc print;   title4 'Teams Ordered by Scores on First Principal Component';   var School Prin1;   run;   * Plot the Transformations;   goptions goutmode=replace nodisplay;   %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;   * Depending on your goptions, these plot options may work better:   * %let opts = haxis=axis2 vaxis=axis1 frame;   proc gplot;   title;   axis1 minor=none label=(angle=90 rotate=0)   order=(   3 to 2 by 1);   axis2 minor=none order=(0 to 40 by 10);   plot TCSN*CSN                             / &opts name='prqex1';   plot TDurhamSun*DurhamSun                 / &opts name='prqex2';   plot TDurhamHerald*DurhamHerald           / &opts name='prqex3';   plot TWashingtonPost*WashingtonPost       / &opts name='prqex4';   plot TUSA_Today*USA_Today                 / &opts name='prqex5';   plot TSportMagazine*SportMagazine         / &opts name='prqex6';   plot TInsideSports*InsideSports           / &opts name='prqex7';   plot TUPI*UPI                             / &opts name='prqex8';   plot TAP*AP                               / &opts name='prqex9';   plot TSportsIllustrated*SportsIllustrated / &opts name='prqex10';   symbol1 c=blue;   run; quit;   goptions display;   proc greplay nofs tc=sashelp.templt template=l2r2;   igout gseg;   treplay 1:prqex1 2:prqex2 3:prqex3 4:prqex4;   treplay 1:prqex5 2:prqex6 3:prqex7 4:prqex8;   treplay 1:prqex9 3:prqex10;   run; quit;

Output 59.3.3: Monotonic Transformation for Each News Service

The ordinary PROC PRINQUAL missing data handling facilities do not work for these data because they do not constrain the missing data estimates properly. If you code the missing ranks as missing and specify linear transformations, then you can compute least-squares estimates of the missing values without transforming the observed values. The first principal component then accounts for 92 percent of the variance after 20 iterations. However, Virginia Tech is ranked number 11 by its score even though it appeared in only one poll (InsideSports ranked it number 13, anchoring it firmly in the middle). Specifying monotone transformations is also inappropriate since they too allow unranked teams to move in between ranked teams.

With these data, the combination of monotone transformations and the freedom to score the missing ranks without constraint leads to degenerate transformations. PROC PRINQUAL tries to merge the 35 points into two points, producing a perfect fit in one dimension. There is evidence for this after 20 iterations when the Average Change, Maximum Change, and Variance Change values are all increasing, instead of the more stable decreasing change rate seen in the analysis shown. The change rates all stop increasing after 41 iterations, and it is clear by 70 or 80 iterations that one component will account for 100 percent of the transformed variables variance after sufficient iteration. While this may seem desirable (after all, it is a perfect fit), you should, in fact, be on guard when this happens. Whenever convergence is slow, the rates of change increase, or the final data perfectly fit the model, the solution is probably degenerating due to too few constraints on the scorings.

PROC PRINQUAL can account for 100 percent of the variance by scoring Montana and UCLA with one positive value on all variables and scoring all the other teams with one negative value on all variables. This inappropriate analysis suggests that all ranked teams are equally good except for two teams that are less good. Both of these two teams are ranked by only one news service, and their only nonmissing rank is last in the poll. This accounts for the degeneracy .