Details | SAS.STAT 9.1 Users Guide (Vol. 4)

Formulas

The following notation is used:

A _p	intercept for partition p
B _p	slope for partition p
C _p	power for partition p
D _rcs	distance computed from the model between objects r and c for subject s
F _rcs	data weight for objects r and c for subject s obtained from the c th WEIGHT variable, or 1 if there is no WEIGHT statement
f	value of the FIT= option
N	number of objects
O _rcs	observed dissimilarity between objects r and c for subject s
P _rcs	partition index for objects r and c for subject s
Q _rcs	dissimilarity after applying any applicable estimated transformation for objects r and c for subject s
R _rcs	residual for objects r and c for subject s
S _p	standardization factor for partition p
T _p ( ·)	estimated transformation for partition p
V _sd	coefficient for subject s on dimension d
X _nd	coordinate for object n on dimension d

Summations are taken over nonmissing values.

Distances are computed from the model as

click to expand

for COEF=IDENTITY:

Euclidean distance

for COEF=DIAGONAL:

weighted Euclidean distance

Partition indexes are

click to expand

for CONDITION=UN

for CONDITION=MATRIX

for CONDITION=ROW

The estimated transformation for each partition is

click to expand

for LEVEL=ABSOLUTE

for LEVEL=RATIO

for LEVEL=INTERVAL

for LEVEL=LOGINTERVAL

For LEVEL=ORDINAL, T _p ( ·) is computed as a least-squares monotone transformation.

For LEVEL=ABSOLUTE, RATIO, or INTERVAL, the residuals are computed as

For LEVEL=ORDINAL, the residuals are computed as

If f is 0, then natural logarithms are used in place of the f th powers.

For each partition, let

and

Then the standardization factor for each partition is

click to expand

for FORMULA=0

for FORMULA=1

for FORMULA=2

The badness-of-fit criterion that the MDS procedure tries to minimize is

OUT= Data Set

The OUT= data set contains the following variables :

BY variables, if any
_ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number
_DIMENS_ , a numeric variable containing the number of dimensions
_MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains. This variable contains a missing value for observations that pertain to the data set as a whole and not to a particular matrix, such as the coordinates ( _TYPE_ = ˜CONFIG ).
_TYPE_ , a character variable of length 10 identifying the type of information in the observation

The values of _TYPE_ are as follows :

CONFIG	the estimated coordinates of the configuration of objects
DIAGCOEF	the estimated dimension coefficients for COEF=DIAGONAL
INTERCEPT	the estimated intercept parameters
SLOPE	the estimated slope parameters
POWER	the estimated power parameters
CRITERION	the badness-of-fit criterion

_LABEL_ or the variable in the ID statement, containing the variable label or value of the ID variable of the object to which the observation pertains. This variable contains a missing value for observations that do not pertain to a particular object or dimension.
_NAME_ , a character variable of length 8 containing the variable name of the object or dimension to which the observation pertains. This variable contains a missing value for observations that do not pertain to a particular object or dimension.
DIM1 , , DIM m , where m is the maximum number of dimensions

OUTFIT= Data Set

The OUTFIT= data set contains various measures of goodness and badness of fit. There is one observation for the entire sample plus one observation for each matrix. For the CONDITION=ROW option, there is also one observation for each row.

The OUTFIT= data set contains the following variables:

BY variables, if any
_ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number
_DIMENS_ , a numeric variable containing the number of dimensions
_MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains
_LABEL_ or the variable in the ID statement, containing the variable label or value of the ID variable of the object to which the observation pertains when CONDITION=ROW
_NAME_ , a character variable of length 8 containing the variable name of the object or dimension to which the observation pertains when CONDITION=ROW
N , the number of nonmissing data
WEIGHT , the weight of the partition
CRITER , the badness-of-fit criterion
DISCORR , the correlation between the transformed data and the distances for LEVEL=ORDINAL or the correlation between the data and the transformed distances otherwise
UDISCORR , the correlation uncorrected for the mean between the transformed data and the distances for LEVEL=ORDINAL or the correlation between the data and the transformed distances otherwise
FITCORR , the correlation between the fit-transformed data and the fit-transformed distances
UFITCORR , the correlation uncorrected for the mean between the fit-transformed data and the fit-transformed distances

OUTRES= Data Set

The OUTRES= data set has one observation for each nonmissing datum. It contains the following variables:

BY variables, if any
_ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number
_DIMENS_ , a numeric variable containing the number of dimensions
_MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains
_ROW_ , containing the variable label or value of the ID variable of the row to which the observation pertains
_COL_ , containing the variable label or value of the ID variable of the column to which the observation pertains
DATA , the original datum
TRANDATA , the optimally transformed datum when LEVEL=ORDINAL
DISTANCE , the distance computed from the PROC MDS model
TRANSDIST , the optimally transformed distance when the LEVEL= option is not ORDINAL or ABSOLUTE
FITDATA , the datum further transformed according to the FIT= option
FITDIST , the distance further transformed according to the FIT= option
WEIGHT , the combined weight of the datum based on the WEIGHT variable(s), if any, and the standardization specified by the FORMULA= option
RESIDUAL , FITDATA minus FITDIST

To cause a datum to appear in the OUTRES= data set, yet be ignored in fitting the model, give the datum a nonmissing value but a 0 weight (see WEIGHT Statement ).

INITIAL= Data Set

The INITIAL= data set has the same structure as the OUT= data set but is not required to have all of the variables or observations that appear in the OUT= data set. You can use an OUT= data set previously created by PROC MDS (without the OUTITER option) as an INITIAL= data set in a subsequent invocation of the procedure.

The only variables that are required are DIM1 , , DIM m (where m is the maximum number of dimensions) or equivalent variables specified in the INVAR statement. If these are the only variables, then all the observations are assumed to contain coordinates of the configuration; you cannot read dimension coefficients or transformation parameters.

To read initial values for the dimension coefficients or transformation parameters, the INITIAL= data set must contain the _TYPE_ variable and either the variable specified in the ID statement or, if no ID statement is used, the variable _NAME_ . In addition, if there is more than one data matrix, either the variable specified in the MATRIX statement or, if no MATRIX statement is used, the variable _MATRIX_ or _MATNUM_ is required.

If the INITIAL= data set contains the variable _DIMENS_ , initial values are obtained from observations with the corresponding number of dimensions. If there is no _DIMENS_ variable, the same observations are used for each number of dimensions analyzed . If you want PROC MDS to read initial values from some but not all of the observations in the INITIAL= data set, use the WHERE= data set option to select the desired observations.

Missing Values

Missing data in the similarity or dissimilarity matrices are ignored in fitting the model and are omitted from the OUTRES= data set. Any matrix that is completely missing is omitted from the analysis.

Missing weights are treated as 0.

Missing values are also allowed in the INITIAL= data set, but a large number of missing values may yield a degenerate initial configuration.

Normalization of the Estimates

In multidimensional scaling models, the parameter estimates are not uniquely determined; the estimates can be transformed in various ways without changing their badness of fit. The initial and final estimates from PROC MDS are, therefore, normalized (unless you specify the NONORM option) to make it easier to compare results from different analyses.

The configuration always has a mean of 0 for each dimension.

With the COEF=IDENTITY option, the configuration is rotated to a principal-axis orientation. Unless you specify the LEVEL=ABSOLUTE option, the entire configuration is scaled so that the root-mean-square element is 1, and the transformations are adjusted to compensate.

With the COEF=DIAGONAL option, each dimension is scaled to a root-mean-square value of 1, and the dimension coefficients are adjusted to compensate. Unless you specify the LEVEL=ABSOLUTE option, the dimension coefficients are normalized as follows. If you specify the CONDITION=UN option, all of the dimension coefficients are scaled to a root-mean-square value of 1. For other values of the CONDITION= option, the dimension coefficients are scaled separately for each subject to a root-mean-square value of 1. In either case, the transformations are adjusted to compensate.

Each dimension is reflected to give a positive rank correlation with the order of the objects in the data set.

For the LEVEL=ORDINAL option, if the intercept, slope, or power parameters are fitted, the transformed data are normalized to eliminate these parameters if possible.

Comparison with the ALSCAL Procedure

The MDS procedure generally produces results similar to those from the ALSCAL procedure (Young, Lewyckyj, and Takane 1986; Young 1982) if you use the following options in PROC MDS:

FIT=SQUARED
FORMULA=1 except for unfolding data, which require FORMULA=2
PFINAL to get output similar to that from PROC ALSCAL

Unlike PROC ALSCAL, PROC MDS produces no plots, so you must use output data sets and PROCPLOT or PROCGPLOT.

The MDS and ALSCAL procedures may sometimes produce different results for the following reasons:

With the LEVEL=INTERVAL option, PROC MDS fits a regression model while PROC ALSCAL fits a measurement model. These models are not equivalent if there is more than one partition, although the differences in the parameter estimates are usually minor.
PROC MDS and PROC ALSCAL use different algorithms for initialization and optimization. Hence, different local optima may be found by PROC MDS and PROC ALSCAL for some data sets with poor fit. Using the INAV=SSCP option causes the initial estimates from PROC MDS to be more like those from PROC ALSCAL.
The default convergence criteria in PROC MDS are more strict than those in PROC ALSCAL. The convergence measure in PROC ALSCAL may cause PROC ALSCAL to stop iterating because progress is slow rather than because a local optimum has been reached. Even if you run PROC ALSCAL with a very small convergence criterion and a very large iteration limit, PROC ALSCAL may never achieve the same degree of precision as PROC MDS. For most applications, this problem is of no practical consequence since two- or three-digit precision is sufficient. If the model does not fit well, obtaining higher precision may require hundreds of iterations.

PROC MDS accepts some PROC ALSCAL options as synonyms for the preceding options, as displayed in Table 43.1.

Table 43.1: PROC MDS Options Compared to PROC ALSCAL Options
PROC ALSCAL Option	Accepted by PROC MDS?	Related PROC MDS Option or Comments
CONDITION=	Yes
CONVERGE=	Yes	Convergence measures are not comparable
CUTOFF=	Yes
DATA=	Yes
DEGREE=	No
DIMENS=	Yes
DIRECTIONS=	No
HEADER	Yes	Default in PROC MDS
IN=	Yes
ITER=	Yes	MAXITER=
LEVEL=	Yes	LEVEL=NOMINAL is not supported
MAXDIM= m	Yes	DIMENSION= n TO m
MINDIM= n	Yes	DIMENSION= n TO m
MINSTRESS=	Yes	MINCRIT=
MODEL=EUCLID	Yes	COEF=IDENTITY
MODEL=INDSCAL	Yes	COEF=DIAGONAL
MODEL=GEMSCAL	No
MODEL=ASYMSCAL	No
MODEL=ASYMINDS	No
NEGATIVE	(Yes)	In PROC MDS, the NEGATIVE option affects slopes and powers, not subject weights.
NOULB	Yes
OUT=	Yes	Some differences in contents
PLOT	No
PLOTALL	No
PRINT	No
READV, etc.	No	Use WHERE data set option
READFIXV, etc.	No
ROWS=	No
SHAPE=SYMMETRI	Yes	SHAPE=TRIANGLE
SHAPE=ASYMMETR	Yes	SHAPE=SQUARE
SHAPE=RECTANGU	No	Use SHAPE=TRIANGLE with extra missing values to fill out the matrix.
SIMILAR	Yes
TIESTORE=	Yes	Ignored by PROC MDS
UNTIE	Yes

Comparison with the MLSCALE Procedure

Running the MDS procedure with the options

  proc mds fit=log level=loginterval ... ;

generally produces results similar to using the MLSCALE procedure (Ramsay 1986) with the options

  proc mlscale stvarnce=constant suvarnce=constant ... ;

Alternatively, using the FIT=DISTANCE option in the PROC MDS statement produces results similar to specifying the NORMAL option in the PROC MLSCALE statement.

The MDS procedure uses the least-squares method of estimation. The least-squares method is equivalent to the maximum- likelihood method if the error terms are assumed to be independent and identically distributed normal random variables. Unlike PROC MLSCALE, PROC MDS does not provide any options for unequal error variances.

PROC MDS accepts some PROC MLSCALE options as synonyms for the options described previously, as displayed in Table 43.2.

Table 43.2: PROC MDS Options Compared to PROC MLSCALE Options
PROC MLSCALE Option	Accepted by Related PROC MDS Option	PROC MDS? or Comments
SQUARE	Yes	SHAPE=SQUARE
INPUT=MATRIX	No	Default
INPUT=VECTOR	No
STLABEL=	No	ID statement
STLBDS	No
SULABEL=	No	MATRIX statement
SULBDS	No
CONFIG	No
CONFDS=	No	IN= data set
NEQU=	No
CONSDS=	No
METVAL	No
METVDS	No	IN=
SEWGTS	No
SEWGDS=	No
SPLVAL	No
SLPVDS=	No
DIMENS=	Yes
METRIC=IDENTITY	Yes	COEF=IDENTITY
METRIC=DIAGONAL	Yes	COEF=DIAGONAL
METRIC=FULL	No
TRANSFRM=SCALE	Yes	LEVEL=RATIO
TRANSFRM=POWER	Yes	LEVEL=LOGINTERVAL
TRANSFRM=SPLINE	No
STVARNCE=	No
SUVARNCE=	No
NORMAL	No	Default (FIT=DISTANCE)
ITMAX=	Yes	MAXITER=
ITXMAX=	No
ITWMAX=	No
ITAMAX=	No
ITPMAX=	No
CONV=	(Yes)	Meaning is different
FACTOR=	No
HISTORY	No	PITER
ASYMP	No
OUTCON	No	OUT=
OUTDIS	No
OUTMET	No	OUT=
OUTSPL	No
OUTRES	(Yes)	OUTRES= data set

Displayed Output

Unless you specify the NOPHIST option, PROC MDS displays the iteration history containing

Iteration number
Type of iteration:

Initial

initial configuration

Monotone

monotone transformation

Gau-New

Gauss-Newton step

Lev-Mar

Levenberg-Marquardt step
Badness-of-Fit Criterion
Change in Criterion

Convergence Measures:

Monotone	the Euclidean norm of the change in the optimally scaled data divided by the Euclidean norm of the optimally scaled data, averaged across partitions
Gradient	the multiple correlation of the Jacobian matrix with the residual vector, uncorrected for the mean

Depending on what options are specified, PROC MDS may also display the following tables:

Data Matrix and possibly Weight Matrix for each subject
Eigenvalues from the computation of the initial coordinates
Sum of Data Weights and Pooled Data Matrix computed during initialization with INAV=DATA
Configuration, the estimated coordinates of the objects
Dimension Coefficients
A table of transformation parameters, including one or more of the following:
- Intercept
- Slope
- Power

A table of fit statistics for each matrix and possibly each row, including
- Number of Nonmissing Data
- Weight of the matrix or row, allowing for both observation weights and standardization factors
- Badness-of-Fit Criterion
- Distance Correlation computed between the distances and data with optimal transformation
- Uncorrected Distance Correlation not corrected for the mean
- Fit Correlation computed after applying the FIT= transformation to both distances and data
- Uncorrected Fit Correlation not corrected for the mean

ODS Table Names

PROC MDS assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 43.3: ODS Tables Produced in PROC MDS
ODS Table Name	Description	Option
ConvergenceStatus	Convergence status	default
DimensionCoef	Dimension coefficients	PCOEF w/COEF= not IDENTITY
FitMeasures	Measures of fit	PFIT
IterHistory	Iteration history	default
PConfig	Estimated coordinates of the objects in the configuration	PCONFIG
PData	Data matrices	PDATA
PInAvData	Initial sum of weights and weighted average of data matrices with INAV=DATA	PINAVDATA
PInEigval	Initial eigenvalues	PINEIGVAL
PInEigvec	Initial eigenvectors	PINEIGVEC
PInWeight	Initialization weights	PINWEIGHT
Transformations	Transformation parameters	PTRANS w/LEVEL=RATIO, INTERVAL, LOGINTERVAL

Initial	initial configuration
Monotone	monotone transformation
Gau-New	Gauss-Newton step
Lev-Mar	Levenberg-Marquardt step