The following notation is used:
A p | intercept for partition p |
B p | slope for partition p |
C p | power for partition p |
D rcs | distance computed from the model between objects r and c for subject s |
F rcs | data weight for objects r and c for subject s obtained from the c th WEIGHT variable, or 1 if there is no WEIGHT statement |
f | value of the FIT= option |
N | number of objects |
O rcs | observed dissimilarity between objects r and c for subject s |
P rcs | partition index for objects r and c for subject s |
Q rcs | dissimilarity after applying any applicable estimated transformation for objects r and c for subject s |
R rcs | residual for objects r and c for subject s |
S p | standardization factor for partition p |
T p ( ·) | estimated transformation for partition p |
V sd | coefficient for subject s on dimension d |
X nd | coordinate for object n on dimension d |
Summations are taken over nonmissing values.
Distances are computed from the model as
| for COEF=IDENTITY: Euclidean distance |
for COEF=DIAGONAL: weighted Euclidean distance |
Partition indexes are
| for CONDITION=UN for CONDITION=MATRIX for CONDITION=ROW |
The estimated transformation for each partition is
| for LEVEL=ABSOLUTE for LEVEL=RATIO for LEVEL=INTERVAL for LEVEL=LOGINTERVAL |
For LEVEL=ORDINAL, T p ( ·) is computed as a least-squares monotone transformation.
For LEVEL=ABSOLUTE, RATIO, or INTERVAL, the residuals are computed as
For LEVEL=ORDINAL, the residuals are computed as
If f is 0, then natural logarithms are used in place of the f th powers.
For each partition, let
and
Then the standardization factor for each partition is
| for FORMULA=0 for FORMULA=1 for FORMULA=2 |
The badness-of-fit criterion that the MDS procedure tries to minimize is
The OUT= data set contains the following variables :
BY variables, if any
_ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number
_DIMENS_ , a numeric variable containing the number of dimensions
_MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains. This variable contains a missing value for observations that pertain to the data set as a whole and not to a particular matrix, such as the coordinates ( _TYPE_ = ˜CONFIG ).
_TYPE_ , a character variable of length 10 identifying the type of information in the observation
The values of _TYPE_ are as follows :
CONFIG | the estimated coordinates of the configuration of objects |
DIAGCOEF | the estimated dimension coefficients for COEF=DIAGONAL |
INTERCEPT | the estimated intercept parameters |
SLOPE | the estimated slope parameters |
POWER | the estimated power parameters |
CRITERION | the badness-of-fit criterion |
_LABEL_ or the variable in the ID statement, containing the variable label or value of the ID variable of the object to which the observation pertains. This variable contains a missing value for observations that do not pertain to a particular object or dimension.
_NAME_ , a character variable of length 8 containing the variable name of the object or dimension to which the observation pertains. This variable contains a missing value for observations that do not pertain to a particular object or dimension.
DIM1 , , DIM m , where m is the maximum number of dimensions
The OUTFIT= data set contains various measures of goodness and badness of fit. There is one observation for the entire sample plus one observation for each matrix. For the CONDITION=ROW option, there is also one observation for each row.
The OUTFIT= data set contains the following variables:
BY variables, if any
_ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number
_DIMENS_ , a numeric variable containing the number of dimensions
_MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains
_LABEL_ or the variable in the ID statement, containing the variable label or value of the ID variable of the object to which the observation pertains when CONDITION=ROW
_NAME_ , a character variable of length 8 containing the variable name of the object or dimension to which the observation pertains when CONDITION=ROW
N , the number of nonmissing data
WEIGHT , the weight of the partition
CRITER , the badness-of-fit criterion
DISCORR , the correlation between the transformed data and the distances for LEVEL=ORDINAL or the correlation between the data and the transformed distances otherwise
UDISCORR , the correlation uncorrected for the mean between the transformed data and the distances for LEVEL=ORDINAL or the correlation between the data and the transformed distances otherwise
FITCORR , the correlation between the fit-transformed data and the fit-transformed distances
UFITCORR , the correlation uncorrected for the mean between the fit-transformed data and the fit-transformed distances
The OUTRES= data set has one observation for each nonmissing datum. It contains the following variables:
BY variables, if any
_ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number
_DIMENS_ , a numeric variable containing the number of dimensions
_MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains
_ROW_ , containing the variable label or value of the ID variable of the row to which the observation pertains
_COL_ , containing the variable label or value of the ID variable of the column to which the observation pertains
DATA , the original datum
TRANDATA , the optimally transformed datum when LEVEL=ORDINAL
DISTANCE , the distance computed from the PROC MDS model
TRANSDIST , the optimally transformed distance when the LEVEL= option is not ORDINAL or ABSOLUTE
FITDATA , the datum further transformed according to the FIT= option
FITDIST , the distance further transformed according to the FIT= option
WEIGHT , the combined weight of the datum based on the WEIGHT variable(s), if any, and the standardization specified by the FORMULA= option
RESIDUAL , FITDATA minus FITDIST
To cause a datum to appear in the OUTRES= data set, yet be ignored in fitting the model, give the datum a nonmissing value but a 0 weight (see WEIGHT Statement ).
The INITIAL= data set has the same structure as the OUT= data set but is not required to have all of the variables or observations that appear in the OUT= data set. You can use an OUT= data set previously created by PROC MDS (without the OUTITER option) as an INITIAL= data set in a subsequent invocation of the procedure.
The only variables that are required are DIM1 , , DIM m (where m is the maximum number of dimensions) or equivalent variables specified in the INVAR statement. If these are the only variables, then all the observations are assumed to contain coordinates of the configuration; you cannot read dimension coefficients or transformation parameters.
To read initial values for the dimension coefficients or transformation parameters, the INITIAL= data set must contain the _TYPE_ variable and either the variable specified in the ID statement or, if no ID statement is used, the variable _NAME_ . In addition, if there is more than one data matrix, either the variable specified in the MATRIX statement or, if no MATRIX statement is used, the variable _MATRIX_ or _MATNUM_ is required.
If the INITIAL= data set contains the variable _DIMENS_ , initial values are obtained from observations with the corresponding number of dimensions. If there is no _DIMENS_ variable, the same observations are used for each number of dimensions analyzed . If you want PROC MDS to read initial values from some but not all of the observations in the INITIAL= data set, use the WHERE= data set option to select the desired observations.
Missing data in the similarity or dissimilarity matrices are ignored in fitting the model and are omitted from the OUTRES= data set. Any matrix that is completely missing is omitted from the analysis.
Missing weights are treated as 0.
Missing values are also allowed in the INITIAL= data set, but a large number of missing values may yield a degenerate initial configuration.
In multidimensional scaling models, the parameter estimates are not uniquely determined; the estimates can be transformed in various ways without changing their badness of fit. The initial and final estimates from PROC MDS are, therefore, normalized (unless you specify the NONORM option) to make it easier to compare results from different analyses.
The configuration always has a mean of 0 for each dimension.
With the COEF=IDENTITY option, the configuration is rotated to a principal-axis orientation. Unless you specify the LEVEL=ABSOLUTE option, the entire configuration is scaled so that the root-mean-square element is 1, and the transformations are adjusted to compensate.
With the COEF=DIAGONAL option, each dimension is scaled to a root-mean-square value of 1, and the dimension coefficients are adjusted to compensate. Unless you specify the LEVEL=ABSOLUTE option, the dimension coefficients are normalized as follows. If you specify the CONDITION=UN option, all of the dimension coefficients are scaled to a root-mean-square value of 1. For other values of the CONDITION= option, the dimension coefficients are scaled separately for each subject to a root-mean-square value of 1. In either case, the transformations are adjusted to compensate.
Each dimension is reflected to give a positive rank correlation with the order of the objects in the data set.
For the LEVEL=ORDINAL option, if the intercept, slope, or power parameters are fitted, the transformed data are normalized to eliminate these parameters if possible.
The MDS procedure generally produces results similar to those from the ALSCAL procedure (Young, Lewyckyj, and Takane 1986; Young 1982) if you use the following options in PROC MDS:
FIT=SQUARED
FORMULA=1 except for unfolding data, which require FORMULA=2
PFINAL to get output similar to that from PROC ALSCAL
Unlike PROC ALSCAL, PROC MDS produces no plots, so you must use output data sets and PROCPLOT or PROCGPLOT.
The MDS and ALSCAL procedures may sometimes produce different results for the following reasons:
With the LEVEL=INTERVAL option, PROC MDS fits a regression model while PROC ALSCAL fits a measurement model. These models are not equivalent if there is more than one partition, although the differences in the parameter estimates are usually minor.
PROC MDS and PROC ALSCAL use different algorithms for initialization and optimization. Hence, different local optima may be found by PROC MDS and PROC ALSCAL for some data sets with poor fit. Using the INAV=SSCP option causes the initial estimates from PROC MDS to be more like those from PROC ALSCAL.
The default convergence criteria in PROC MDS are more strict than those in PROC ALSCAL. The convergence measure in PROC ALSCAL may cause PROC ALSCAL to stop iterating because progress is slow rather than because a local optimum has been reached. Even if you run PROC ALSCAL with a very small convergence criterion and a very large iteration limit, PROC ALSCAL may never achieve the same degree of precision as PROC MDS. For most applications, this problem is of no practical consequence since two- or three-digit precision is sufficient. If the model does not fit well, obtaining higher precision may require hundreds of iterations.
PROC MDS accepts some PROC ALSCAL options as synonyms for the preceding options, as displayed in Table 43.1.
PROC ALSCAL Option | Accepted by PROC MDS? | Related PROC MDS Option or Comments |
---|---|---|
CONDITION= | Yes | |
CONVERGE= | Yes | Convergence measures are not comparable |
CUTOFF= | Yes | |
DATA= | Yes | |
DEGREE= | No | |
DIMENS= | Yes | |
DIRECTIONS= | No | |
HEADER | Yes | Default in PROC MDS |
IN= | Yes | |
ITER= | Yes | MAXITER= |
LEVEL= | Yes | LEVEL=NOMINAL is not supported |
MAXDIM= m | Yes | DIMENSION= n TO m |
MINDIM= n | Yes | DIMENSION= n TO m |
MINSTRESS= | Yes | MINCRIT= |
MODEL=EUCLID | Yes | COEF=IDENTITY |
MODEL=INDSCAL | Yes | COEF=DIAGONAL |
MODEL=GEMSCAL | No | |
MODEL=ASYMSCAL | No | |
MODEL=ASYMINDS | No | |
NEGATIVE | (Yes) | In PROC MDS, the NEGATIVE option affects slopes and powers, not subject weights. |
NOULB | Yes | |
OUT= | Yes | Some differences in contents |
PLOT | No | |
PLOTALL | No | |
| No | |
READV, etc. | No | Use WHERE data set option |
READFIXV, etc. | No | |
ROWS= | No | |
SHAPE=SYMMETRI | Yes | SHAPE=TRIANGLE |
SHAPE=ASYMMETR | Yes | SHAPE=SQUARE |
SHAPE=RECTANGU | No | Use SHAPE=TRIANGLE with extra missing values to fill out the matrix. |
SIMILAR | Yes | |
TIESTORE= | Yes | Ignored by PROC MDS |
UNTIE | Yes |
Running the MDS procedure with the options
proc mds fit=log level=loginterval ... ;
generally produces results similar to using the MLSCALE procedure (Ramsay 1986) with the options
proc mlscale stvarnce=constant suvarnce=constant ... ;
Alternatively, using the FIT=DISTANCE option in the PROC MDS statement produces results similar to specifying the NORMAL option in the PROC MLSCALE statement.
The MDS procedure uses the least-squares method of estimation. The least-squares method is equivalent to the maximum- likelihood method if the error terms are assumed to be independent and identically distributed normal random variables. Unlike PROC MLSCALE, PROC MDS does not provide any options for unequal error variances.
PROC MDS accepts some PROC MLSCALE options as synonyms for the options described previously, as displayed in Table 43.2.
PROC MLSCALE Option | Accepted by Related PROC MDS Option | PROC MDS? or Comments |
---|---|---|
SQUARE | Yes | SHAPE=SQUARE |
INPUT=MATRIX | No | Default |
INPUT=VECTOR | No | |
STLABEL= | No | ID statement |
STLBDS | No | |
SULABEL= | No | MATRIX statement |
SULBDS | No | |
CONFIG | No | |
CONFDS= | No | IN= data set |
NEQU= | No | |
CONSDS= | No | |
METVAL | No | |
METVDS | No | IN= |
SEWGTS | No | |
SEWGDS= | No | |
SPLVAL | No | |
SLPVDS= | No | |
DIMENS= | Yes | |
METRIC=IDENTITY | Yes | COEF=IDENTITY |
METRIC=DIAGONAL | Yes | COEF=DIAGONAL |
METRIC=FULL | No | |
TRANSFRM=SCALE | Yes | LEVEL=RATIO |
TRANSFRM=POWER | Yes | LEVEL=LOGINTERVAL |
TRANSFRM=SPLINE | No | |
STVARNCE= | No | |
SUVARNCE= | No | |
NORMAL | No | Default (FIT=DISTANCE) |
ITMAX= | Yes | MAXITER= |
ITXMAX= | No | |
ITWMAX= | No | |
ITAMAX= | No | |
ITPMAX= | No | |
CONV= | (Yes) | Meaning is different |
FACTOR= | No | |
HISTORY | No | PITER |
ASYMP | No | |
OUTCON | No | OUT= |
OUTDIS | No | |
OUTMET | No | OUT= |
OUTSPL | No | |
OUTRES | (Yes) | OUTRES= data set |
Unless you specify the NOPHIST option, PROC MDS displays the iteration history containing
Iteration number
Type of iteration:
Initial | initial configuration |
Monotone | monotone transformation |
Gau-New | Gauss-Newton step |
Lev-Mar | Levenberg-Marquardt step |
Badness-of-Fit Criterion
Change in Criterion
Convergence Measures:
Monotone | the Euclidean norm of the change in the optimally scaled data divided by the Euclidean norm of the optimally scaled data, averaged across partitions |
Gradient | the multiple correlation of the Jacobian matrix with the residual vector, uncorrected for the mean |
Depending on what options are specified, PROC MDS may also display the following tables:
Data Matrix and possibly Weight Matrix for each subject
Eigenvalues from the computation of the initial coordinates
Sum of Data Weights and Pooled Data Matrix computed during initialization with INAV=DATA
Configuration, the estimated coordinates of the objects
Dimension Coefficients
A table of transformation parameters, including one or more of the following:
Intercept
Slope
Power
A table of fit statistics for each matrix and possibly each row, including
Number of Nonmissing Data
Weight of the matrix or row, allowing for both observation weights and standardization factors
Badness-of-Fit Criterion
Distance Correlation computed between the distances and data with optimal transformation
Uncorrected Distance Correlation not corrected for the mean
Fit Correlation computed after applying the FIT= transformation to both distances and data
Uncorrected Fit Correlation not corrected for the mean
PROC MDS assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.
ODS Table Name | Description | Option |
---|---|---|
ConvergenceStatus | Convergence status | default |
DimensionCoef | Dimension coefficients | PCOEF w/COEF= not IDENTITY |
FitMeasures | Measures of fit | PFIT |
IterHistory | Iteration history | default |
PConfig | Estimated coordinates of the objects in the configuration | PCONFIG |
PData | Data matrices | PDATA |
PInAvData | Initial sum of weights and weighted average of data matrices with INAV=DATA | PINAVDATA |
PInEigval | Initial eigenvalues | PINEIGVAL |
PInEigvec | Initial eigenvectors | PINEIGVEC |
PInWeight | Initialization weights | PINWEIGHT |
Transformations | Transformation parameters | PTRANS w/LEVEL=RATIO, INTERVAL, LOGINTERVAL |