Syntax


You can specify the following statements with the MDS procedure:

  • PROC MDS < options > ;

    • VAR variables ;

    • INVAR variables ;

    • ID OBJECT variable ;

    • MATRIX SUBJECT variable ;

    • WEIGHT variables ;

    • BY variables ;

The PROC MDS statement is required. All other statements are optional.

PROC MDS Statement

  • PROC MDS < options > ;

By default, the only result produced by the MDS procedure is the iteration history. Hence, you should always specify one or more options for output data sets (OUT=, OUTFIT=, and OUTRES=) or displayed output (such as PFINAL). PROC MDS does not produce any plots; to produce plots, use the output data sets with PROC PLOT or PROC GPLOT.

The types of estimates written to the OUT= data set are determined by the OCONFIG, OCOEF, OTRANS, and OCRIT options. If you do not specify any of these four options, the estimates of all the parameters of the PROC MDS model and the value of the badness-of-fit criterion appear in the OUT= data set. If you specify one or more of these options, only the information requested by the specified options appear in the OUT= data set. Also, the OITER option causes these statistics to be written to the OUT= data set after initialization and on each iteration, as well as after the iterations have terminated .

Displayed output is controlled by the interaction of the PCONFIG, PCOEF, PTRANS, PFIT, and PFITROW options with the PININ, PINIT, PITER, and PFINAL options. The PCONFIG, PCOEF, PTRANS, PFIT, and PFITROW options specify which estimates and fit statistics are to be displayed. The PININ, PINIT, PITER, and PFINAL options specify when the estimates and fit statistics are to be displayed. If you specify at least one of the PCONFIG, PCOEF, PTRANS, PFIT and PFITROW options but none of the PININ, PINIT, PITER, and PFINAL options, the final results (PFINAL) are displayed. If you specify at least one of the PININ, PINIT, PITER, and PFINAL options but none of the PCONFIG, PCOEF, PTRANS, PFIT and PFITROW options, all estimates (PCONFIG, PCOEF, PTRANS) and the fit statistics for each matrix and for the entire sample (PFIT) are displayed. If you do not specify any of these nine options, no estimates or fit statistics are displayed (except the badness-of-fit criterion in the iteration history).

ALTERNATE ALT=NONE NO N

ALTERNATE ALT=MATRIX MATMSUBJECT SUB S

ALTERNATE ALT=ROW R < = n >

  • determines what form of alternating-least-squares algorithm is used. The default depends on the amount of memory available. The following ALTERNATE= options are listed in order of decreasing memory requirements:

    ALT=NONE

    causes all parameters to be adjusted simultaneously on each iteration. This option is usually best for a small number of subjects and objects.

    ALT=MATRIX

    adjusts all the parameters for the first subject, then all the parameters for the second subject, and so on, and finally adjusts all parameters that do not correspond to a subject, such as coordinates and unconditional transformations. This option usually works best for a large number of subjects with a small number of objects.

    ALT=ROW

    treats subject parameters the same way as the ALTERNATE=MATRIX option but also includes separate stages for unconditional parameters and for subsets of the objects. The ALT=ROW option usually works best for a large number of objects.

    Specifying ALT=ROW= n divides the objects into subsets of n objects each, except possibly for one subset when n does not divide the number of objects evenly. If you omit = n , the number of objects in the subsets is determined from the amount of memory available. The smaller the value of n ,the less memory is required.

  • When you specify the LEVEL=ORDINAL option, the monotone transformation is always computed in a separate stage and is listed as a separate iteration in the iteration history. In this case, estimation is done by iteratively reweighted least squares. The weights are recomputed according to the FORMULA= option on each monotone iteration; hence, it is possible for the badness-of-fit criterion to increase after a monotone iteration.

COEF=IDENTITY IDEN I

COEF=DIAGONAL DIAG D

  • specifies the type of matrix for the dimension coefficients.

    COEF=IDENTITY

    is the default, which yields Euclidean distances.

    COEF=DIAGONAL

    produces weighted Euclidean distances, in which each subject is allowed differential weights for the dimensions. The dimension coefficients that PROC MDS outputs are related to the square roots of what are called subject weights in PROC ALSCAL; the normalization in PROC MDS also differs from that in PROC ALSCAL. The weighted Euclidean model is related to the INDSCAL model (Carroll and Chang 1970).

CONDITION COND=UN U

CONDITION COND=MATRIX MATMSUBJECT SUB S

CONDITION COND=ROW R

  • specifies the conditionality of the data (Young 1987, pp. 60 ˆ’ 63). The default is CONDITION=MATRIX.

  • The data are divided into disjoint subsets called partitions . Within each partition, a separate transformation is applied, as specified by the LEVEL= option.

    COND=UN

    puts all the data into a single partition.

    COND=MATRIX

    makes each data matrix a partition.

    COND=ROW

    makes each row of each data matrix a partition.

  • The CONDITION= option also determines the default value for the SHAPE= option. If you specify the CONDITION=ROW option and omit the SHAPE= option, each data matrix is stored as a square and possibly asymmetric matrix. If you specify the CONDITION=UN or CONDITION=MATRIX option and omit the SHAPE= option, only one triangle is stored. See the SHAPE= option on page 2484 for details.

CONVERGE CONV= p

  • sets both the gradient convergence criterion and the monotone convergence criterion to p , where 0 p 1. The default is CONVERGE=.01; smaller values may greatly increase the number of iterations required. Values less than .0001 may be impossible to satisfy because of the limits of machine precision. See the GCONVERGE= ( page 2480) and MCONVERGE= ( page 2481) options.

CUTOFF= n

  • causes data less than n to be replaced by missing values. The default is CUTOFF=0.

DATA= SAS-data-set

  • specifies the SAS data set containing one or more square matrices to be analyzed . In typical psychometric data, each matrix contains judgments from one subject, so there is a one-to-one correspondence between data matrices and subjects.

  • The data matrices contain similarity or dissimilarity measurements to be modeled and, optionally , weights for these data. The data are generally assumed to be dissimilarities unless you use the SIMILAR option. However, if there are nonmissing diagonal values and these values are predominantly larger than the off-diagonal values, the data are assumed to be similarities and are treated as if the SIMILAR option is specified. The diagonal elements are not otherwise used in fitting the model.

  • Each matrix must have exactly the same number of observations as the number of variables specified by the VAR statement or determined by defaults. This number is the number of objects or stimuli.

  • The first observation and variable are assumed to contain data for the first object, the second observation and variable are assumed to contain data for the second object, andsoon.

  • When there are two or more matrices, the observations in each matrix must correspond to the same objects in the same order as in the first matrix.

  • The matrices can be symmetric or asymmetric, as specified by the SHAPE= option.

DECIMALS DEC= n

  • specifies how many decimal places to use when displaying the parameter estimates and fit statistics. The default is DECIMALS=2, which is generally reasonable except in conjunction with the LEVEL=ABSOLUTE option and very large or very small data.

DIMENSION DIMENS DIM= n < TO m < BY= i >>

  • specifies the number of dimensions to use in the MDS model, where 1 n, m < number of objects . The parameter i can be either positive or negative but not zero. If you specify different values for n and m , a separate model is fitted for each requested dimension. If you specify only DIMENSION= n , then only n dimensions are fitted. The default is DIMENSION=2 if there are three or more objects; otherwise, DIMENSION=1 is the only valid specification. The analyses for each number of dimensions are done independently. For information on choosing the dimensionality, refer to Kruskal and Wish (1978, pp. 48 ˆ’ 60).

EPSILON EPS= n

  • specifies a number n , 0 < n < 1, that determines the amount added to squared distances computed from the model to avoid numerical problems such as division by 0. This amount is computed as equal to n times the mean squared distance in the initial configuration. The distance in the MDS model is thus computed as

    click to expand
  • where sqdist is the squared Euclidean distance or the weighted squared Euclidean distance.

  • The default is EPSILON=1E ˆ’ 12, which is small enough to have no practical effect on the estimates unless the FIT= value is nonpositive and there are dissimilarities that are very close to 0. Hence, when the FIT= value is nonpositive, dissimilarities less than n times 100 times the maximum dissimilarity are disallowed .

FIT=DISTANCE DIS D

FIT=SQUARED SQU S

FIT=LOG L

FIT= n

  • specifies a predetermined (not estimated) transformation to apply to both sides of the MDS model before the error term is added.

  • The default is FIT=DISTANCE or, equivalently, FIT=1, which fits data to distances.

  • The option FIT=SQUARED or FIT=2 fits squared data to squared distances. This gives greater importance to large data and distances and lesser importance to small data and distances in fitting the model.

  • The FIT=LOG or FIT=0 option fits log data to log distances. This gives lesser importance to large data and distances and greater importance to small data and distances in fitting the model.

  • In general, the FIT= n option fits n th-power data to n th-power distances. Values of n that are large in absolute value can cause floating-point overflows.

  • If the FIT= value is 0 or negative, the data must be strictly positive (see the EPSILON= option). Negative data may produce strange results with any value other than FIT=1.

FORMULA FOR= OLSO

FORMULA FOR= 1 USSU

FORMULA FOR= 2 CSSC

  • determines how the badness-of-fit criterion is standardized in correspondence with stress formulas 1 and 2 (Kruskal and Wish 1978, pp. 24 ˆ’ 26). The default is FORMULA=1 unless you specify FIT=LOG, in which case the default is FORMULA=2. Data partitions are defined by the CONDITION= option.

    FORMULA=0

    fits a regression model by ordinary least squares (Null and Sarle 1982) without standardizing the partitions; this option cannot be used with the LEVEL=ORDINAL option. The badness-of-fit criterion is the square root of the error sum of squares.

    FORMULA=1

    standardizes each partition by the uncorrected sum of squares of the (possibly transformed) data; this option should not be used with the FIT=LOG option. With the FIT=DISTANCE and LEVEL=ORDINAL options, this is equivalent to Kruskal s stress formula 1 or an obvious generalization thereof. With the FIT=SQUARED and LEVEL=ORDINAL options, this is equivalent to Young s s-stress formula 1 or an obvious generalization thereof. The badness-of-fit criterion is analogous to , where R is a multiple correlation about the origin.

    FORMULA=2

    standardizes each partition by the corrected sum of squares of the (possibly transformed) data; this option is the recommended method for unfolding. With the FIT=DISTANCE and LEVEL=ORDINAL options, this is equivalent to Kruskal s stress formula 2 or an obvious generalization thereof. With the FIT=SQUARED and LEVEL=ORDINAL options, this is equivalent to Young s s-stress formula 2 or an obvious generalization thereof. The badness-of-fit criterion is analogous to , where R is a multiple correlation computed with a denominator corrected for the mean.

    GCONVERGE GCONV= p

  • sets the gradient convergence criterion to p , where 0 p 1. The default is GCONVERGE=0.01; smaller values may greatly increase the number of iterations required. Values less than 0.0001 may be impossible to satisfy because of the limits of machine precision.

  • The gradient convergence measure is the multiple correlation of the Jacobian matrix with the residual vector, uncorrected for the mean. See the CONVERGE= ( page 2478) and MCONVERGE= ( page 2481) options.

INAV=DATA D

INAV=SSCP S

  • affects the computation of initial coordinates. The default is INAV=DATA.

    INAV=DATA

    computes a weighted average of the data matrices. Its value is estimated only if an element is missing from every data matrix. The weighted average of the data matrices with missing values filled in is then converted to a scalar products matrix (or what would be a scalar products matrix if the fit were perfect), from which the initial coordinates are computed.

    INAV=SSCP

    estimates missing values in each data matrix and converts each data matrix to a scalar products matrix. The initial coordinates are computed from the unweighted average of the scalar products matrices.

INITIAL IN= SAS-data-set

  • specifies a SAS data set containing initial values for some or all of the parameters of the MDS model. If the INITIAL= option is omitted, the initial values are computed from the data.

LEVEL=ABSOLUTE ABS A

LEVEL=RATIO RAT R

LEVEL=INTERVAL INT I

LEVEL=LOGINTERVAL LOG L

LEVEL=ORDINAL ORD O

  • specifies the measurement level of the data and hence the type of estimated (optimal) transformations applied to the data or distances (Young 1987, pp. 57 ˆ’ 60; Krantz et. al. 1971, pp. 9 ˆ’ 12) within each partition as specified by the CONDITION= option. LEVEL=ORDINAL specifies a nonmetric analysis, while all other LEVEL= options specify metric analyses. The default is LEVEL=ORDINAL.

    LEVEL=ABSOLUTE

    allows no optimal transformations. Hence, the distinction between regression and measurement models is irrelevant.

    LEVEL=RATIO

    fits a regression model in which the distances are multiplied by a slope parameter in each partition (a linear transformation). In this case, the regression model is equivalent to the measurement model with the slope parameter reciprocated.

    LEVEL=INTERVAL

    fits a regression model in which the distances are multiplied by a slope parameter and added to an intercept parameter in each partition (an affine transformation). In this case, the regression and measurement models differ if there is more than one partition.

    LEVEL=LOGINTERVAL

    fits a regression model in which the distances are raised to a power and multiplied by a slope parameter in each partition (a power transformation).

    LEVEL=ORDINAL

    fits a measurement model in which a least-squares monotone increasing transformation is applied to the data in each partition. At the ordinal measurement level, the regression and measurement models differ.

MAXITER ITER= n

  • specifies the maximum number of iterations, where n 0. The default is MAXITER=100.

MCONVERGE MCONV= p

  • sets the monotone convergence criterion to p , where 0 p 1, for use with the LEVEL=ORDINAL option. The default is MCONVERGE=0.01; if you want greater precision, MCONVERGE=0.001 is usually reasonable, but smaller values may greatly increase the number of iterations required.

  • The monotone convergence criterion is the Euclidean norm of the change in the optimally scaled data divided by the Euclidean norm of the optimally scaled data, averaged across partitions defined by the CONDITION= option. See the CONVERGE= ( page 2478) and GCONVERGE= ( page 2480) options.

MINCRIT CRITMIN= n

  • causes iteration to terminate when the badness-of-fit criterion is less than or equal to n , where n 0. The default is MINCRIT=1E ˆ’ 6.

NEGATIVE

  • allows slopes or powers to be negative with the LEVEL=RATIO, INTERVAL, or LOGINTERVAL option.

NONORM

  • suppresses normalization of the initial and final estimates.

NOPHIST NOPRINT NOP

  • suppresses the output of the iteration history.

NOULB

  • causes missing data to be estimated during initialization by the average nonmissing value, where the average is computed according to the FIT= option. Otherwise, missing data are estimated by interpolating between the Rabinowitz (1976) upper and lower bounds.

OCOEF

  • writes the dimension coefficients to the OUT= data set. See the OUT= option for interactions with other options.

OCONFIG

  • writes the coordinates of the objects to the OUT= data set. See the OUT= option for interactions with other options.

OCRIT

  • writes the badness-of-fit criterion to the OUT= data set. See the OUT= option for interactions with other options.

OITER OUTITER

  • writes current values to the output data sets after initialization and on every iteration. Otherwise, only the final values are written to any output data sets. See the OUT=, OUTFIT=, and OUTRES= options.

OTRANS

  • writes the transformation parameter estimates to the OUT= data set if any such estimates are computed. There are no transformation parameters with the LEVEL=ORDINAL option. See the OUT= option for interactions with other options.

OUT= SAS-data-set

  • creates a SAS data set containing, by default, the estimates of all the parameters of the PROC MDS model and the value of the badness-of-fit criterion. However, if you specify one or more of the OCONFIG, OCOEF, OTRANS, and OCRIT options, only the information requested by the specified options appears in the OUT= data set. See also the OITER option.

OUTFIT= SAS-data-set

  • creates a SAS data set containing goodness-of-fit and badness-of-fit measures for each partition as well as for the entire data set. See also the OITER option.

OUTRES= SAS-data-set

  • creates a SAS data set containing one observation for each nonmissing datum from the DATA= data set. Each observation contains the original datum, the estimated distance computed from the MDS model, transformed data and distances, and the residual. See also the OITER option.

OVER= n

  • specifies the maximum overrelaxation factor, where n 1. Values between 1 and 2 are generally reasonable. The default is OVER=2 with the LEVEL=ORDINAL, ALTERNATE=MATRIX, or ALTERNATE=ROW option; otherwise, the default is OVER=1. Use this option only if you have convergence problems.

PCOEF

  • produces the estimated dimension coefficients.

PCONFIG

  • produces the estimated coordinates of the objects in the configuration.

PDATA

  • displays each data matrix.

PFINAL

  • displays final estimates.

PFIT

  • displays the badness-of-fit criterion and various types of correlations between the data and fitted values for each data matrix, as well as for the entire sample.

PFITROW

  • displays the badness-of-fit criterion and various types of correlations between the data and fitted values for each row as well as for each data matrix and for the entire sample. This option works only with the CONDITION=ROW option.

PINAVDATA

  • displays the sum of the weights and the weighted average of the data matrices computed during initialization with the INAV=DATA option.

PINEIGVAL

  • displays the eigenvalues computed during initialization.

PINEIGVEC

  • displays the eigenvectors computed during initialization.

PININ

  • displays values read from the INITIAL= data set. Since these values may be incomplete, the PFIT and PFITROW options do not apply.

PINIT

  • displays initial values.

PITER

  • displays estimates on each iteration.

PTRANS

  • displays the estimated transformation parameters if any are computed. There are no transformation parameters with the LEVEL=ORDINAL option.

RANDOM < = seed >

  • causes initial coordinate values to be pseudorandom numbers. In one dimension, the pseudorandom numbers are uniformly distributed on an interval. In two or more dimensions, the pseudorandom numbers are uniformly distributed on the circumference of a circle or the surface of a (hyper)sphere.

RIDGE= n

  • specifies the initial ridge value, where n 0. The default is RIDGE=1E ˆ’ 4.

  • If you get a floating-point overflow in the first few iterations, specify a larger value such as RIDGE=0.01 or RIDGE=1 or RIDGE=100.

  • If you know that the initial estimates are very good, using RIDGE=0 may speed convergence.

SHAPE=TRIANGULAR TRIANGLE TRI T

SHAPE=SQUARE SQU S

  • determines whether the entire data matrix for each subject or only one triangle of the matrix is stored and analyzed. If you specify the CONDITION=ROW option, the default is SHAPE=SQUARE. Otherwise, the default is SHAPE=TRIANGLE.

    SHAPE=SQUARE

    causes the entire matrix to be stored and analyzed. The matrix can be asymmetric.

    SHAPE=TRIANGLE

    causes only one triangle to be stored. However, PROC MDS reads both upper and lower triangles to look for nonmissing values and to symmetrize the data if needed. If corresponding elements in the upper and lower triangles both contain nonmissing values, only the average of the two values is stored and analyzed (Kruskal and Wish 1978, p. 74). Also, if an OUTRES= data set is requested, only the average of the two corresponding elements is output.

SIMILAR SIM < = max >

  • causes the data to be treated as similarity measurements rather than dissimilarities. If = max is not specified, each datum is converted to a dissimilarity by subtracting it from the maximum value in the data set or BY group . If = max is specified, each datum is subtracted from the maximum of max and the data. The diagonal data are included in computing these maxima.

  • By default, the data are assumed to be dissimilarities unless there are nonmissing diagonal values and these values are predominantly larger than the off-diagonal values. In this case, the data are assumed to be similarities and are treated as if the SIMILAR option is specified.

SINGULAR= p

  • specifies the singularity criterion p , 0 p 1. The default is SINGULAR=1E ˆ’ 8.

UNTIE

  • allows tied data to be assigned different optimally scaled values with the LEVEL=ORDINAL option. Otherwise, tied data are assigned equal optimally scaled values. The UNTIE option has no effect with values of the LEVEL= option other than LEVEL=ORDINAL.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC MDS to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the MDS procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

If the INITIAL= data set contains the BY variables, the BY groups must appear in the same order as in the DATA= data set. If the BY variables are not in the INITIAL= data set, the entire data set is used to provide initial values for each BY group in the DATA= data set.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide.

ID Statement

  • ID OBJECT OBJ variable ;

The ID statement specifies a variable in the DATA= data set that contains descriptive labels for the objects. The labels are used in the output and are copied to the OUT= data set. If there is more than one data matrix, only the ID values from the observations containing the first data matrix are used.

The ID variable is not used to establish any correspondence between observations and variables.

If the ID statement is omitted, the variable labels or names are used as object labels.

INVAR Statement

  • INVAR variables ;

The INVAR statement specifies the numeric variables in the INITIAL= data set that contain initial parameter estimates. The first variable corresponds to the first dimension, the second variable to the second dimension, and so on.

If the INVAR statement is omitted, the variables DIM1 , , DIM m are used, where m is the maximum number of dimensions.

MATRIX Statement

  • MATRIX MAT SUBJECT SUB variable ;

The MATRIX statement specifies a variable in the DATA= data set that contains descriptive labels for the data matrices or subjects. The labels are used in the output and are copied to the OUT= and OUTRES= data sets. Only the first observation from each data matrix is used to obtain the label for that matrix.

If the MATRIX statement is omitted, the matrices are labeled 1, 2, 3, and so on.

VAR Statement

  • VAR variables ;

The VAR statement specifies the numeric variables in the DATA= data set that contain similarity or dissimilarity measurements on a set of objects or stimuli. Each variable corresponds to one object.

If the VAR statement is omitted, all numeric variables that are not specified in another statement are used.

To analyze a subset of the objects in a data set, you can specify the variable names corresponding to the columns in the subset, but you must also use a DATA step or a WHERE clause to specify the rows in the subset. PROC MDS expects to read one or more square matrices, and you must ensure that the rows in the data set correctly correspond to the columns in number and order.

WEIGHT Statement

  • WEIGHT variables ;

The WEIGHT statement specifies numeric variables in the DATA= data set that contain weights for each similarity or dissimilarity measurement. These weights are used to compute weighted least-squares estimates. The number of WEIGHT variables must be the same as the number of VAR variables, and the variables in the WEIGHT statement must be in the same order as the corresponding variables in the VAR statement.

If the WEIGHT statement is omitted, all data within a partition are assigned equal weights.

Data with 0 or negative weights are ignored in fitting the model but are included in the OUTRES= data set and in monotone transformations.




SAS.STAT 9.1 Users Guide (Vol. 4)
SAS.STAT 9.1 Users Guide (Vol. 4)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net