Details


Formulas

The following notation is used:

A p

intercept for partition p

B p

slope for partition p

C p

power for partition p

D rcs

distance computed from the model between objects r and c for subject s

F rcs

data weight for objects r and c for subject s obtained from the c th WEIGHT variable, or 1 if there is no WEIGHT statement

f

value of the FIT= option

N

number of objects

O rcs

observed dissimilarity between objects r and c for subject s

P rcs

partition index for objects r and c for subject s

Q rcs

dissimilarity after applying any applicable estimated transformation for objects r and c for subject s

R rcs

residual for objects r and c for subject s

S p

standardization factor for partition p

T p ( ·)

estimated transformation for partition p

V sd

coefficient for subject s on dimension d

X nd

coordinate for object n on dimension d

Summations are taken over nonmissing values.

Distances are computed from the model as

click to expand

for COEF=IDENTITY:

Euclidean distance

for COEF=DIAGONAL:

weighted Euclidean distance

Partition indexes are

click to expand

for CONDITION=UN

for CONDITION=MATRIX

for CONDITION=ROW

The estimated transformation for each partition is

click to expand

for LEVEL=ABSOLUTE

for LEVEL=RATIO

for LEVEL=INTERVAL

for LEVEL=LOGINTERVAL

For LEVEL=ORDINAL, T p ( ·) is computed as a least-squares monotone transformation.

For LEVEL=ABSOLUTE, RATIO, or INTERVAL, the residuals are computed as

click to expand

For LEVEL=ORDINAL, the residuals are computed as

click to expand

If f is 0, then natural logarithms are used in place of the f th powers.

For each partition, let

click to expand

and

click to expand

Then the standardization factor for each partition is

click to expand

for FORMULA=0

for FORMULA=1

for FORMULA=2

The badness-of-fit criterion that the MDS procedure tries to minimize is

OUT= Data Set

The OUT= data set contains the following variables :

  • BY variables, if any

  • _ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number

  • _DIMENS_ , a numeric variable containing the number of dimensions

  • _MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains. This variable contains a missing value for observations that pertain to the data set as a whole and not to a particular matrix, such as the coordinates ( _TYPE_ = ˜CONFIG ).

  • _TYPE_ , a character variable of length 10 identifying the type of information in the observation

  • The values of _TYPE_ are as follows :

    CONFIG

    the estimated coordinates of the configuration of objects

    DIAGCOEF

    the estimated dimension coefficients for

    COEF=DIAGONAL

    INTERCEPT

    the estimated intercept parameters

    SLOPE

    the estimated slope parameters

    POWER

    the estimated power parameters

    CRITERION

    the badness-of-fit criterion

  • _LABEL_ or the variable in the ID statement, containing the variable label or value of the ID variable of the object to which the observation pertains. This variable contains a missing value for observations that do not pertain to a particular object or dimension.

  • _NAME_ , a character variable of length 8 containing the variable name of the object or dimension to which the observation pertains. This variable contains a missing value for observations that do not pertain to a particular object or dimension.

  • DIM1 , , DIM m , where m is the maximum number of dimensions

OUTFIT= Data Set

The OUTFIT= data set contains various measures of goodness and badness of fit. There is one observation for the entire sample plus one observation for each matrix. For the CONDITION=ROW option, there is also one observation for each row.

The OUTFIT= data set contains the following variables:

  • BY variables, if any

  • _ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number

  • _DIMENS_ , a numeric variable containing the number of dimensions

  • _MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains

  • _LABEL_ or the variable in the ID statement, containing the variable label or value of the ID variable of the object to which the observation pertains when CONDITION=ROW

  • _NAME_ , a character variable of length 8 containing the variable name of the object or dimension to which the observation pertains when CONDITION=ROW

  • N , the number of nonmissing data

  • WEIGHT , the weight of the partition

  • CRITER , the badness-of-fit criterion

  • DISCORR , the correlation between the transformed data and the distances for LEVEL=ORDINAL or the correlation between the data and the transformed distances otherwise

  • UDISCORR , the correlation uncorrected for the mean between the transformed data and the distances for LEVEL=ORDINAL or the correlation between the data and the transformed distances otherwise

  • FITCORR , the correlation between the fit-transformed data and the fit-transformed distances

  • UFITCORR , the correlation uncorrected for the mean between the fit-transformed data and the fit-transformed distances

OUTRES= Data Set

The OUTRES= data set has one observation for each nonmissing datum. It contains the following variables:

  • BY variables, if any

  • _ITER_ (if the OUTITER option is specified), a numeric variable containing the iteration number

  • _DIMENS_ , a numeric variable containing the number of dimensions

  • _MATRIX_ or the variable in the MATRIX statement, identifying the data matrix or subject to which the observation pertains

  • _ROW_ , containing the variable label or value of the ID variable of the row to which the observation pertains

  • _COL_ , containing the variable label or value of the ID variable of the column to which the observation pertains

  • DATA , the original datum

  • TRANDATA , the optimally transformed datum when LEVEL=ORDINAL

  • DISTANCE , the distance computed from the PROC MDS model

  • TRANSDIST , the optimally transformed distance when the LEVEL= option is not ORDINAL or ABSOLUTE

  • FITDATA , the datum further transformed according to the FIT= option

  • FITDIST , the distance further transformed according to the FIT= option

  • WEIGHT , the combined weight of the datum based on the WEIGHT variable(s), if any, and the standardization specified by the FORMULA= option

  • RESIDUAL , FITDATA minus FITDIST

To cause a datum to appear in the OUTRES= data set, yet be ignored in fitting the model, give the datum a nonmissing value but a 0 weight (see WEIGHT Statement ).

INITIAL= Data Set

The INITIAL= data set has the same structure as the OUT= data set but is not required to have all of the variables or observations that appear in the OUT= data set. You can use an OUT= data set previously created by PROC MDS (without the OUTITER option) as an INITIAL= data set in a subsequent invocation of the procedure.

The only variables that are required are DIM1 , , DIM m (where m is the maximum number of dimensions) or equivalent variables specified in the INVAR statement. If these are the only variables, then all the observations are assumed to contain coordinates of the configuration; you cannot read dimension coefficients or transformation parameters.

To read initial values for the dimension coefficients or transformation parameters, the INITIAL= data set must contain the _TYPE_ variable and either the variable specified in the ID statement or, if no ID statement is used, the variable _NAME_ . In addition, if there is more than one data matrix, either the variable specified in the MATRIX statement or, if no MATRIX statement is used, the variable _MATRIX_ or _MATNUM_ is required.

If the INITIAL= data set contains the variable _DIMENS_ , initial values are obtained from observations with the corresponding number of dimensions. If there is no _DIMENS_ variable, the same observations are used for each number of dimensions analyzed . If you want PROC MDS to read initial values from some but not all of the observations in the INITIAL= data set, use the WHERE= data set option to select the desired observations.

Missing Values

Missing data in the similarity or dissimilarity matrices are ignored in fitting the model and are omitted from the OUTRES= data set. Any matrix that is completely missing is omitted from the analysis.

Missing weights are treated as 0.

Missing values are also allowed in the INITIAL= data set, but a large number of missing values may yield a degenerate initial configuration.

Normalization of the Estimates

In multidimensional scaling models, the parameter estimates are not uniquely determined; the estimates can be transformed in various ways without changing their badness of fit. The initial and final estimates from PROC MDS are, therefore, normalized (unless you specify the NONORM option) to make it easier to compare results from different analyses.

The configuration always has a mean of 0 for each dimension.

With the COEF=IDENTITY option, the configuration is rotated to a principal-axis orientation. Unless you specify the LEVEL=ABSOLUTE option, the entire configuration is scaled so that the root-mean-square element is 1, and the transformations are adjusted to compensate.

With the COEF=DIAGONAL option, each dimension is scaled to a root-mean-square value of 1, and the dimension coefficients are adjusted to compensate. Unless you specify the LEVEL=ABSOLUTE option, the dimension coefficients are normalized as follows. If you specify the CONDITION=UN option, all of the dimension coefficients are scaled to a root-mean-square value of 1. For other values of the CONDITION= option, the dimension coefficients are scaled separately for each subject to a root-mean-square value of 1. In either case, the transformations are adjusted to compensate.

Each dimension is reflected to give a positive rank correlation with the order of the objects in the data set.

For the LEVEL=ORDINAL option, if the intercept, slope, or power parameters are fitted, the transformed data are normalized to eliminate these parameters if possible.

Comparison with the ALSCAL Procedure

The MDS procedure generally produces results similar to those from the ALSCAL procedure (Young, Lewyckyj, and Takane 1986; Young 1982) if you use the following options in PROC MDS:

  • FIT=SQUARED

  • FORMULA=1 except for unfolding data, which require FORMULA=2

  • PFINAL to get output similar to that from PROC ALSCAL

Unlike PROC ALSCAL, PROC MDS produces no plots, so you must use output data sets and PROCPLOT or PROCGPLOT.

The MDS and ALSCAL procedures may sometimes produce different results for the following reasons:

  • With the LEVEL=INTERVAL option, PROC MDS fits a regression model while PROC ALSCAL fits a measurement model. These models are not equivalent if there is more than one partition, although the differences in the parameter estimates are usually minor.

  • PROC MDS and PROC ALSCAL use different algorithms for initialization and optimization. Hence, different local optima may be found by PROC MDS and PROC ALSCAL for some data sets with poor fit. Using the INAV=SSCP option causes the initial estimates from PROC MDS to be more like those from PROC ALSCAL.

  • The default convergence criteria in PROC MDS are more strict than those in PROC ALSCAL. The convergence measure in PROC ALSCAL may cause PROC ALSCAL to stop iterating because progress is slow rather than because a local optimum has been reached. Even if you run PROC ALSCAL with a very small convergence criterion and a very large iteration limit, PROC ALSCAL may never achieve the same degree of precision as PROC MDS. For most applications, this problem is of no practical consequence since two- or three-digit precision is sufficient. If the model does not fit well, obtaining higher precision may require hundreds of iterations.

PROC MDS accepts some PROC ALSCAL options as synonyms for the preceding options, as displayed in Table 43.1.

Table 43.1: PROC MDS Options Compared to PROC ALSCAL Options

PROC ALSCAL Option

Accepted by PROC MDS?

Related PROC MDS Option or Comments

CONDITION=

Yes

 

CONVERGE=

Yes

Convergence measures are not comparable

CUTOFF=

Yes

 

DATA=

Yes

 

DEGREE=

No

 

DIMENS=

Yes

 

DIRECTIONS=

No

 

HEADER

Yes

Default in PROC MDS

IN=

Yes

 

ITER=

Yes

MAXITER=

LEVEL=

Yes

LEVEL=NOMINAL is not supported

MAXDIM= m

Yes

DIMENSION= n TO m

MINDIM= n

Yes

DIMENSION= n TO m

MINSTRESS=

Yes

MINCRIT=

MODEL=EUCLID

Yes

COEF=IDENTITY

MODEL=INDSCAL

Yes

COEF=DIAGONAL

MODEL=GEMSCAL

No

 

MODEL=ASYMSCAL

No

 

MODEL=ASYMINDS

No

 

NEGATIVE

(Yes)

In PROC MDS, the NEGATIVE option affects slopes and powers, not subject weights.

NOULB

Yes

 

OUT=

Yes

Some differences in contents

PLOT

No

 

PLOTALL

No

 

PRINT

No

 

READV, etc.

No

Use WHERE data set option

READFIXV, etc.

No

 

ROWS=

No

 

SHAPE=SYMMETRI

Yes

SHAPE=TRIANGLE

SHAPE=ASYMMETR

Yes

SHAPE=SQUARE

SHAPE=RECTANGU

No

Use SHAPE=TRIANGLE with extra missing values to fill out the matrix.

SIMILAR

Yes

 

TIESTORE=

Yes

Ignored by PROC MDS

UNTIE

Yes

 

Comparison with the MLSCALE Procedure

Running the MDS procedure with the options

  proc mds fit=log level=loginterval ... ;  

generally produces results similar to using the MLSCALE procedure (Ramsay 1986) with the options

  proc mlscale stvarnce=constant suvarnce=constant ... ;  

Alternatively, using the FIT=DISTANCE option in the PROC MDS statement produces results similar to specifying the NORMAL option in the PROC MLSCALE statement.

The MDS procedure uses the least-squares method of estimation. The least-squares method is equivalent to the maximum- likelihood method if the error terms are assumed to be independent and identically distributed normal random variables. Unlike PROC MLSCALE, PROC MDS does not provide any options for unequal error variances.

PROC MDS accepts some PROC MLSCALE options as synonyms for the options described previously, as displayed in Table 43.2.

Table 43.2: PROC MDS Options Compared to PROC MLSCALE Options

PROC MLSCALE Option

Accepted by Related PROC MDS Option

PROC MDS? or Comments

SQUARE

Yes

SHAPE=SQUARE

INPUT=MATRIX

No

Default

INPUT=VECTOR

No

 

STLABEL=

No

ID statement

STLBDS

No

 

SULABEL=

No

MATRIX statement

SULBDS

No

 

CONFIG

No

 

CONFDS=

No

IN= data set

NEQU=

No

 

CONSDS=

No

 

METVAL

No

 

METVDS

No

IN=

SEWGTS

No

 

SEWGDS=

No

 

SPLVAL

No

 

SLPVDS=

No

 

DIMENS=

Yes

 

METRIC=IDENTITY

Yes

COEF=IDENTITY

METRIC=DIAGONAL

Yes

COEF=DIAGONAL

METRIC=FULL

No

 

TRANSFRM=SCALE

Yes

LEVEL=RATIO

TRANSFRM=POWER

Yes

LEVEL=LOGINTERVAL

TRANSFRM=SPLINE

No

 

STVARNCE=

No

 

SUVARNCE=

No

 

NORMAL

No

Default (FIT=DISTANCE)

ITMAX=

Yes

MAXITER=

ITXMAX=

No

 

ITWMAX=

No

 

ITAMAX=

No

 

ITPMAX=

No

 

CONV=

(Yes)

Meaning is different

FACTOR=

No

 

HISTORY

No

PITER

ASYMP

No

 

OUTCON

No

OUT=

OUTDIS

No

 

OUTMET

No

OUT=

OUTSPL

No

 

OUTRES

(Yes)

OUTRES= data set

Displayed Output

Unless you specify the NOPHIST option, PROC MDS displays the iteration history containing

  • Iteration number

  • Type of iteration:

    Initial

    initial configuration

    Monotone

    monotone transformation

    Gau-New

    Gauss-Newton step

    Lev-Mar

    Levenberg-Marquardt step

  • Badness-of-Fit Criterion

  • Change in Criterion

  • Convergence Measures:

    Monotone

    the Euclidean norm of the change in the optimally scaled data divided by the Euclidean norm of the optimally scaled data, averaged across partitions

    Gradient

    the multiple correlation of the Jacobian matrix with the residual vector, uncorrected for the mean

Depending on what options are specified, PROC MDS may also display the following tables:

  • Data Matrix and possibly Weight Matrix for each subject

  • Eigenvalues from the computation of the initial coordinates

  • Sum of Data Weights and Pooled Data Matrix computed during initialization with INAV=DATA

  • Configuration, the estimated coordinates of the objects

  • Dimension Coefficients

  • A table of transformation parameters, including one or more of the following:

    • Intercept

    • Slope

    • Power

  • A table of fit statistics for each matrix and possibly each row, including

    • Number of Nonmissing Data

    • Weight of the matrix or row, allowing for both observation weights and standardization factors

    • Badness-of-Fit Criterion

    • Distance Correlation computed between the distances and data with optimal transformation

    • Uncorrected Distance Correlation not corrected for the mean

    • Fit Correlation computed after applying the FIT= transformation to both distances and data

    • Uncorrected Fit Correlation not corrected for the mean

ODS Table Names

PROC MDS assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 43.3: ODS Tables Produced in PROC MDS

ODS Table Name

Description

Option

ConvergenceStatus

Convergence status

default

DimensionCoef

Dimension coefficients

PCOEF w/COEF= not IDENTITY

FitMeasures

Measures of fit

PFIT

IterHistory

Iteration history

default

PConfig

Estimated coordinates of the objects in the configuration

PCONFIG

PData

Data matrices

PDATA

PInAvData

Initial sum of weights and weighted average of data matrices with INAV=DATA

PINAVDATA

PInEigval

Initial eigenvalues

PINEIGVAL

PInEigvec

Initial eigenvectors

PINEIGVEC

PInWeight

Initialization weights

PINWEIGHT

Transformations

Transformation parameters

PTRANS w/LEVEL=RATIO, INTERVAL, LOGINTERVAL




SAS.STAT 9.1 Users Guide (Vol. 4)
SAS.STAT 9.1 Users Guide (Vol. 4)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net