PROC CALIS < options > ;
COSAN matrix model ;
MATRIX matrix elements ;
VARNAMES variables ;
LINEQS model equations ;
STD variance pattern ;
COV covariance pattern ;
RAM model list ;
VARNAMES variables ;
FACTOR < options > ;
MATRIX matrix elements ;
VARNAMES variables ;
BOUNDS boundary constraints ;
BY variables ;
FREQ variable ;
LINCON linear constraints ;
NLINCON nonlinear constraints ;
NLOPTIONS optimization options ;
PARAMETERS parameters ;
PARTIAL variables ;
STRUCTEQ variables ;
VAR variables ;
WEIGHT variable ;
program statements
If no INRAM= data set is specified, one of the four statements that defines the input form of the analysis model, COSAN, LINEQS, RAM, or FACTOR, must be used.
The MATRIX statement can be used multiple times for the same or different matrices along with a COSAN or FACTOR statement. If the MATRIX statement is used multiple times for the same matrix, later definitions override earlier ones.
The STD and COV statements can be used only with the LINEQS model statement.
You can formulate a generalized COSAN model using a COSAN statement. MATRIX statements can be used to define the elements of a matrix used in the COSAN statement. The input notation resembles the COSAN program of R. McDonald and C. Fraser (McDonald 1978, 1980).
The RAM statement uses a simple list input that is especially suitable for describing J. McArdle s RAM analysis model (McArdle 1980, McArdle and McDonald 1984) for causal and path analysis problems.
The LINEQS statement formulates the analysis model by means of a system of linear equations similar to P. Bentler s (1989) EQS program notation. The STD and COV statements can be used to define the variances and covariances corresponding to elements of matrix in the LINEQS model.
A FACTOR statement can be used to compute a first-order exploratory or confirmatory factor (or component) analysis. The analysis of a simple exploratory factor analysis model performed by PROC CALIS is not as efficient as one performed by the FACTOR procedure. The CALIS procedure is designed for more general structural problems, and it needs significantly more computation time for a simple unrestricted factor or component analysis than does PROC FACTOR.
You can add program statements to impose linear or nonlinear constraints on the parameters if you specify the model by means of a COSAN, LINEQS, or RAM statement. The PARAMETERS statement defines additional parameters that are needed as independent variables in your program code and that belong to the set of parameters to be estimated. Variable names used in the program code should differ from the preceding statement names. The code should respect the syntax rules of SAS statements usually used in the DATA step. See the SAS Program Statements section on page 628 for more information.
The BOUNDS statement can be used to specify simple lower and upper boundary constraints for the parameters.
You can specify general linear equality and inequality constraints with the LINCON statement (or via an INEST= data set). The NLINCON statement can be used to specify general nonlinear equality and inequality constraints by referring to nonlinear functions defined by program statements.
The VAR, PARTIAL, WEIGHT, FREQ, and BY statements can be used in the same way as in other procedures, for example, the FACTOR or PRINCOMP procedure. You can select a subset of the input variables to analyze with the VAR statement. The PARTIAL statement defines a set of input variables that are chosen as partial variables for the analysis of a matrix of partial correlations or covariances. The BY statement specifies groups in which separate covariance structure analyses are performed.
PROC CALIS < options > ;
This statement invokes the procedure. The options available with the PROC CALIS statement are summarized in Table 19.1 and discussed in the following six sections.
Data Set Options | Short Description |
---|---|
DATA= | input data set |
INEST= | input initial values, constraints |
INRAM= | input model |
INWGT= | input weight matrix |
OUTEST= | covariance matrix of estimates |
OUTJAC | Jacobian into OUTEST= data set |
OUTRAM= | output model |
OUTSTAT= | output statistic |
OUTWGT= | output weight matrix |
Data Processing | Short Description |
---|---|
AUGMENT | analyzes augmented moment matrix |
COVARIANCE | analyzes covariance matrix |
EDF= | defines nobs by number error df |
NOBS= | defines number of observations nobs |
NOINT | analyzes uncorrected moments |
RDF= | defines nobs by number regression df |
RIDGE | specifies ridge factor for moment matrix |
UCORR | analyzes uncorrected CORR matrix |
UCOV | analyzes uncorrected COV matrix |
VARDEF= | specifies variance divisor |
Estimation Methods | Short Description |
---|---|
METHOD= | estimation method |
ASYCOV= | formula of asymptotic covariances |
DFREDUCE= | reduces degrees of freedom |
G4= | algorithm for STDERR |
NODIAG | excludes diagonal elements from fit |
WPENALTY= | penalty weight to fit correlations |
WRIDGE= | ridge factor for weight matrix |
Optimization Techniques | Short Description |
---|---|
TECHNIQUE= | minimization method |
UPDATE= | update technique |
LINESEARCH= | line-search method |
FCONV= | function convergence criterion |
GCONV= | gradient convergence criterion |
INSTEP= | initial step length (RADIUS=, SALPHA=) |
LSPRECISION= | line-search precision (SPRECISION=) |
MAXFUNC= | max number function calls |
MAXITER= | max number iterations |
Displayed Output Options | Short Description |
---|---|
KURTOSIS | compute and display kurtosis |
MODIFICATION | modification indices |
NOMOD | no modification indices |
NOPRINT | suppresses the displayed output |
PALL | all displayed output (ALL) |
PCORR | analyzed and estimated moment matrix |
PCOVES | covariance matrix of estimates |
PDETERM | determination coefficients |
PESTIM | parameter estimates |
PINITIAL | pattern and initial values |
PJACPAT | displays structure of variable and constant elements of the Jacobian matrix |
PLATCOV | latent variable covariances, scores |
PREDET | displays predetermined moment matrix |
PRIMAT | displays output in matrix form |
| adds default displayed output |
PRIVEC | displays output in vector form |
PSHORT | reduces default output (SHORT) |
PSUMMARY | displays only fit summary (SUMMARY) |
PWEIGHT | weight matrix |
RESIDUAL = | residual matrix and distribution |
SIMPLE | univariate statistics |
STDERR | standard errors |
NOSTDERR | computes no standard errors |
TOTEFF | displays total and indirect effects |
Miscellaneous Options | Short Description |
---|---|
ALPHAECV= | probability Browne & Cudeck ECV |
ALPHARMS= | probability Steiger & Lind RMSEA |
BIASKUR | biased skewness and kurtosis |
DEMPHAS= | emphasizes diagonal entries |
FDCODE | uses numeric derivatives for code |
HESSALG= | algorithm for Hessian |
NOADJDF | no adjustment of df for active constraints |
RANDOM= | randomly generated initial values |
SINGULAR= | singularity criterion |
ASINGULAR= | absolute singularity information matrix |
COVSING= | singularity tolerance of information matrix |
MSINGULAR= | relative M singularity of information matrix |
VSINGULAR= | relative V singularity of information matrix |
SLMW= | probability limit for Wald test |
START= | constant initial values |
DATA= SAS-data-set
specifies an input data set that can be an ordinary SAS data set or a specially structured TYPE=CORR, TYPE=COV, TYPE=UCORR, TYPE=UCOV, TYPE=SSCP, or TYPE=FACTOR SAS data set, as described in the section Input Data Sets on page 630. If the DATA= option is omitted, the most recently created SAS data set is used.
INEST INVAR ESTDATA= SAS-data-set
specifies an input data set that contains initial estimates for the parameters used in the optimization process and can also contain boundary and general linear constraints on the parameters. If the model did not change too much, you can specify an OUTEST= data set from a previous PROC CALIS analysis. The initial estimates are taken from the values of the PARMS observation.
INRAM= SAS-data-set
specifies an input data set that contains in RAM list form all information needed to specify an analysis model. The INRAM= data set is described in the section Input Data Sets on page 630. Typically, this input data set is an OUTRAM= data set (possibly modified) from a previous PROC CALIS analysis. If you use an INRAM= data set to specify the analysis model, you cannot use the model specification statements COSAN, MATRIX, RAM, LINEQS, STD, COV, FACTOR, or VARNAMES, but you can use the BOUNDS and PARAMETERS statements and program statements. If the INRAM= option is omitted, you must define the analysis model with a COSAN, RAM, LINEQS, or FACTOR statement.
INWGT= SAS-data-set
specifies an input data set that contains the weight matrix W used in generalized least-squares (GLS), weighted least-squares (WLS, ADF), or diagonally weighted least-squares (DWLS) estimation. If the weight matrix W defined by an INWGT= data set is not positive definite, it can be ridged using the WRIDGE= option. See the section Estimation Criteria on page 644 for more information. If no INWGT= data set is specified, default settings for the weight matrices are used in the estimation process. The INWGT= data set is described in the section Input Data Sets on page 630. Typically, this input data set is an OUTWGT= data set from a previous PROC CALIS analysis.
OUTEST OUTVAR= SAS-data-set
creates an output data set containing the parameter estimates, their gradient, Hessian matrix, and boundary and linear constraints. For METHOD=ML, METHOD=GLS, and METHOD=WLS, the OUTEST= data set also contains the information matrix, the approximate covariance matrix of the parameter estimates ((generalized) inverse of information matrix), and approximate standard errors. If linear or nonlinear equality or active inequality constraints are present, the Lagrange multiplier estimates of the active constraints, the projected Hessian, and the Hessian of the Lagrange function are written to the data set. The OUTEST= data set also contains the Jacobian if the OUTJAC option is used.
The OUTEST= data set is described in the section OUTEST= SAS-data-set on page 634. If you want to create a permanent SAS data set, you must specify a two-level name . Refer to the chapter titled SAS Data Files in SAS Language Reference: Concepts for more information on permanent data sets.
OUTJAC
writes the Jacobian matrix, if it has been computed, to the OUTEST= data set. This is useful when the information and Jacobian matrices need to be computed for other analyses.
OUTSTAT= SAS-data-set
creates an output data set containing the BY group variables, the analyzed covariance or correlation matrices, and the predicted and residual covariance or correlation matrices of the analysis. You can specify the correlation or covariance matrix in an OUTSTAT= data set as an input DATA= data set in a subsequent analysis by PROC CALIS. The OUTSTAT= data set is described in the section OUTSTAT= SAS-dataset on page 641. If the model contains latent variables, this data set also contains the predicted covariances between latent and manifest variables and the latent variables scores regression coefficients (see the PLATCOV option on page 586). If the FACTOR statement is used, the OUTSTAT= data set also contains the rotated and unrotated factor loadings, the unique variances, the matrix of factor correlations, the transformation matrix of the rotation, and the matrix of standardized factor loadings.
You can specify the latent variable score regression coefficients with PROC SCORE to compute factor scores.
If you want to create a permanent SAS data set, you must specify a two-level name. Refer to the chapter titled SAS Data Files in SAS Language Reference: Concepts for more information on permanent data sets.
OUTRAM= SAS-data-set
creates an output data set containing the model information for the analysis, the parameter estimates, and their standard errors. An OUTRAM= data set can be used as an input INRAM= data set in a subsequent analysis by PROC CALIS. The OUTRAM= data set also contains a set of fit indices; it is described in more detail in the section OUTRAM= SAS-data-set on page 638. If you want to create a permanent SAS data set, you must specify a two-level name. Refer to the chapter titled SAS Data Files in SAS Language Reference: Concepts for more information on permanent data sets.
OUTWGT= SAS-data-set
creates an output data set containing the weight matrix W used in the estimation process. You cannot create an OUTWGT= data set with an unweighted least-squares or maximum likelihood estimation. The fit function in GLS, WLS (ADF), and DWLS estimation contain the inverse of the (Cholesky factor of the) weight matrix W writ-ten in the OUTWGT= data set. The OUTWGT= data set contains the weight matrix on which the WRIDGE= and the WPENALTY= options are applied. An OUTWGT= data set can be used as an input INWGT= data set in a subsequent analysis by PROC CALIS. The OUTWGT= data set is described in the section OUTWGT= SAS-data-set on page 643. If you want to create a permanent SAS data set, you must specify a two-level name. Refer to the chapter titled SAS Data Files in SAS Language Reference: Concepts for more information on permanent data sets.
AUGMENT AUG
analyzes the augmented correlation or covariance matrix. Using the AUG option is equivalent to specifying UCORR (NOINT but not COV) or UCOV (NOINT and COV) for a data set that is augmented by an intercept variable INTERCEPT that has constant values equal to 1. The variable INTERCEP can be used instead of the default INTERCEPT only if you specify the SAS option OPTIONS VALIDVARNAME=V6. The dimension of an augmented matrix is one higher than that of the corresponding correlation or covariance matrix. The AUGMENT option is effective only if the data set does not contain a variable called INTERCEPT and if you specify the UCOV, UCORR, or NOINT option.
Caution: The INTERCEPT variable is included in the moment matrix as the variable with number n + 1. Using the RAM model statement assumes that the first n variable numbers correspond to the n manifest variables in the input data set. Therefore, specifying the AUGMENT option assumes that the numbers of the latent variables used in the RAM or path model have to start with number n + 2.
COVARIANCE COV
analyzes the covariance matrix instead of the correlation matrix. By default, PROC CALIS (like the FACTOR procedure) analyzes a correlation matrix. If the DATA= input data set is a valid TYPE=CORR data set (containing a correlation matrix and standard deviations), using the COV option means that the covariance matrix is computed and analyzed.
DFE EDF= n
makes the effective number of observations n + i , where i is 0 if the NOINT, UCORR, or UCOV option is specified without the AUGMENT option or where i is 1 otherwise . You can also use the NOBS= option to specify the number of observations.
DFR RDF= n
makes the effective number of observations the actual number of observations minus the RDF= value. The degree of freedom for the intercept should not be included in the RDF= option. If you use PROC CALIS to compute a regression model, you can specify RDF= number-of-regressor-variables to get approximate standard errors equal to those computed by PROC REG.
NOBS= nobs
specifies the number of observations. If the DATA= input data set is a raw data set, nobs is defined by default to be the number of observations in the raw data set. The NOBS= and EDF= options override this default definition. You can use the RDF= option to modify the nobs specification. If the DATA= input data set contains a covariance, correlation, or scalar product matrix, you can specify the number of observations either by using the NOBS=, EDF=, and RDF= options in the PROC CALIS statement or by including a _TYPE_ = N observation in the DATA= input data set.
NOINT
specifies that no intercept be used in computing covariances and correlations; that is, covariances or correlations are not corrected for the mean. You can specify this option (or UCOV or UCORR) to analyze mean structures in an uncorrected moment matrix, that is, to compute intercepts in systems of structured linear equations (see Example 19.2). The term NOINT is misleading in this case because an uncorrected covariance or correlation matrix is analyzed containing a constant (intercept) variable that is used in the analysis model. The degrees of freedom used in the variance divisor (specified by the VARDEF= option) and some of the assessment of the fit function (see the section Assessment of Fit on page 649) depend on whether an intercept variable is included in the model (the intercept is used in computing the corrected covariance or correlation matrix or is used as a variable in the uncorrected covariance or correlation matrix to estimate mean structures) or not included (an uncorrected covariance or correlation matrix is used that does not contain a constant variable).
RIDGE < = r >
defines a ridge factor r for the diagonal of the moment matrix S that is analyzed. The matrix S is transformed to
If you do not specify r in the RIDGE option, PROC CALIS tries to ridge the moment matrix S so that the smallest eigenvalue is about 10 ˆ’ 3 .
Caution: The moment matrix in the OUTSTAT= output data set does not contain the ridged diagonal.
UCORR
analyzes the uncorrected correlation matrix instead of the correlation matrix corrected for the mean. Using the UCORR option is equivalent to specifying the NOINT option but not the COV option.
UCOV
analyzes the uncorrected covariance matrix instead of the covariance matrix corrected for the mean. Using the UCOV option is equivalent to specifying both the COV and NOINT options. You can specify this option to analyze mean structures in an uncorrected covariance matrix, that is, to compute intercepts in systems of linear structural equations (see Example 19.2).
VARDEF= DF N WDF WEIGHT WGT
specifies the divisor used in the calculation of covariances and standard deviations. The default value is VARDEF=DF. The values and associated divisors are displayed in the following table, where i = 0 if the NOINT option is used and i = 1 otherwise and where k is the number of partial variables specified in the PARTIAL statement. Using an intercept variable in a mean structure analysis, by specifying the AUGMENT option, includes the intercept variable in the analysis. In this case, i = 1. When a WEIGHT statement is used, w j is the value of the WEIGHT variable in the j th observation, and the summation is performed only over observations with positive weight.
Value | Description | Divisor |
---|---|---|
DF | degrees of freedom | N ˆ’ k ˆ’ i |
N | number of observations | N |
WDF | sum of weights DF |
|
WEIGHT WGT | sum of weights |
|
The default estimation method is maximum likelihood (METHOD=ML), assuming a multivariate normal distribution of the observed variables. The two-stage estimation methods METHOD=LSML, METHOD=LSGLS, METHOD=LSWLS, and METHOD=LSDWLS first compute unweighted least-squares estimates of the model parameters and their residuals. Afterward, these estimates are used as initial values for the optimization process to compute maximum likelihood, generalized least-squares, weighted least-squares, or diagonally weighted least-squares parameter estimates. You can do the same thing by using an OUTRAM= data set with least-squares estimates as an INRAM= data set for a further analysis to obtain the second set of parameter estimates. This strategy is also discussed in the section Use of Optimization Techniques on page 664. For more details, see the Estimation Criteria section on page 644.
METHOD MET= name
specifies the method of parameter estimation. The default is METHOD=ML. Valid values for name are as follows :
ML M MAX | performs normal-theory maximum likelihood parameter estimation. The ML method requires a nonsingular covariance or correlation matrix. |
GLS G | performs generalized least-squares parameter estimation. If no INWGT= data set is specified, the GLS method uses the inverse sample covariance or correlation matrix as weight matrix W . Therefore, METHOD=GLS requires a nonsingular covariance or correlation matrix. |
WLS W ADF | performs weighted least-squares parameter estimation. If no INWGT= data set is specified, the WLS method uses the inverse matrix of estimated asymptotic covariances of the sample covariance or correlation matrix as the weight matrix W . In this case, the WLS estimation method is equivalent to Browne s (1982, 1984) asymptotically distribution-free estimation. The WLS method requires a nonsingular weight matrix. |
DWLS D | performs diagonally weighted least-squares parameter estimation. If no INWGT= data set is specified, the DWLS method uses the inverse diagonal matrix of asymptotic variances of the input sample covariance or correlation matrix as the weight matrix W . The DWLS method requires a nonsingular diagonal weight matrix. |
ULS LS U | performs unweighted least-squares parameter estimation. |
LSML LSM LSMAX | performs unweighted least-squares followed by normal-theory maximum likelihood parameter estimation. |
LSGLS LSG | performs unweighted least-squares followed by generalized least-squares parameter estimation. |
LSWLS LSW LSADF | performs unweighted least-squares followed by weighted least-squares parameter estimation. |
LSDWLS LSD | performs unweighted least-squares followed by diagonally weighted least-squares parameter estimation. |
NONE NO | uses no estimation method. This option is suitable for checking the validity of the input information and for displaying the model matrices and initial values. |
ASYCOV ASC= name
specifies the formula for asymptotic covariances used in the weight matrix W for WLS and DWLS estimation. The ASYCOV option is effective only if METHOD= WLS or METHOD=DWLS and no INWGT= input data set is specified. The following formulas are implemented:
BIASED: | Browne s (1984) formula (3.4) biased asymptotic covariance estimates; the resulting weight matrix is at least positive semidefinite. This is the default for analyzing a covariance matrix. |
UNBIASED: | Browne s (1984) formula (3.8) asymptotic covariance estimates corrected for bias; the resulting weight matrix can be indefinite (that is, can have negative eigenvalues), especially for small N . |
CORR: | Browne and Shapiro s (1986) formula (3.2) (identical to DeLeeuw s (1983) formulas (2,3,4)) the asymptotic variances of the diagonal elements are set to the reciprocal of the value r specified by the WPENALTY= option (default: r = 100). This formula is the default for analyzing a correlation matrix. |
Caution: Using the WLS and DWLS methods with the ASYCOV=CORR option means that you are fitting a correlation (rather than a covariance) structure. Since the fixed diagonal of a correlation matrix for some models does not contribute to the model s degrees of freedom, you can specify the DFREDUCE= i option to reduce the degrees of freedom by the number of manifest variables used in the model. See the section Counting the Degrees of Freedom on page 676 for more information.
DFREDUCE DFRED= i
reduces the degrees of freedom of the 2 test by i . In general, the number of degrees of freedom is the number of elements of the lower triangle of the predicted model matrix C , n ( n +1) / 2, minus the number of parameters, t . If the NODIAG option is used, the number of degrees of freedom is additionally reduced by n . Because negative values of i are allowed, you can also increase the number of degrees of freedom by using this option. If the DFREDUCE= or NODIAG option is used in a correlation structure analysis, PROC CALIS does not additionally reduce the degrees of freedom by the number of constant elements in the diagonal of the predicted model matrix, which is otherwise done automatically. See the section Counting the Degrees of Freedom on page 676 for more information.
G4= i
specifies the algorithm to compute the approximate covariance matrix of parameter estimates used for computing the approximate standard errors and modification indices when the information matrix is singular. If the number of parameters t used in the model you analyze is smaller than the value of i , the time-expensive Moore-Penrose (G4) inverse of the singular information matrix is computed by eigenvalue decomposition. Otherwise, an inexpensive pseudo (G1) inverse is computed by sweeping. By default, i = 60. For more details, see the section Estimation Criteria on page 644.
NODIAG NODI
omits the diagonal elements of the analyzed correlation or covariance matrix from the fit function. This option is useful only for special models with constant error variables. The NODIAG option does not allow fitting those parameters that contribute to the diagonal of the estimated moment matrix. The degrees of freedom are automatically reduced by n . A simple example for the usefulness of the NODIAG option is the fitofthefirst-order factor model, S = FF ² + U 2 . In this case, you do not have to estimate the diagonal matrix of unique variances U 2 that are fully determined by diag ( S ˆ’ FF ² ).
WPENALTY WPEN= r
specifies the penalty weight r ‰ 0 for the WLS and DWLS fit of the diagonal elements of a correlation matrix (constant 1s). The criterion for weighted least-squares estimation of a correlation structure is
where r is the penalty weight specified by the WPENALTY= r option and the w ij,kl are the elements of the inverse of the reduced ( n ( n ˆ’ 1) / 2) — ( n ( n ˆ’ 1) / 2) weight matrix that contains only the nonzero rows and columns of the full weight matrix W . The second term is a penalty term to fit the diagonal elements of the correlation matrix. The default value is 100. The reciprocal of this value replaces the asymptotic variance corresponding to the diagonal elements of a correlation matrix in the weight matrix W , and it is effective only with the ASYCOV=CORR option. The often used value r =1 seems to be too small in many cases to fit the diagonal elements of a correlation matrix properly. The default WPENALTY= value emphasizes the importance of the fit of the diagonal elements in the correlation matrix. You can decrease or increase the value of r if you want to decrease or increase the importance of the diagonal elements fit. This option is effective only with the WLS or DWLS estimation method and the analysis of a correlation matrix. See the section Estimation Criteria on page 644 for more details.
WRIDGE= r
defines a ridge factor r for the diagonal of the weight matrix W used in GLS, WLS, or DWLS estimation. The weight matrix W is transformed to
The WRIDGE= option is applied on the weight matrix
before the WPENALTY= option is applied on it
before the weight matrix is written to the OUTWGT= data set
before the weight matrix is displayed
Since there is no single nonlinear optimization algorithm available that is clearly superior (in terms of stability, speed, and memory) for all applications, different types of optimization techniques are provided in the CALIS procedure. Each technique can be modified in various ways. The default optimization technique for less than 40 parameters ( t < 40) is TECHNIQUE=LEVMAR. For 40 ‰ t < 400, TECHNIQUE=QUANEW is the default method, and for t ‰ 400, TECHNIQUE=CONGRA is the default method. For more details, see the section Use of Optimization Techniques on page 664. You can specify the following set of options in the PROC CALIS statement or in the NLOPTIONS statement.
TECHNIQUE TECH= name
OMETHOD OM= name
specifies the optimization technique. Valid values for name are as follows:
CONGRA CG | chooses one of four different conjugate-gradient optimization algorithms, which can be more precisely defined with the UPDATE= option and modified with the LINESEARCH= option. The conjugate-gradient techniques need only O ( t ) memory compared to the O ( t 2 ) memory for the other three techniques, where t is the number of parameters. On the other hand, the conjugate-gradient techniques are significantly slower than other optimization techniques and should be used only when memory is insufficient for more efficient techniques. When you choose this option, UPDATE=PB by default. This is the default optimization technique if there are more than 400 parameters to estimate. |
DBLDOG DD | performs a version of double dogleg optimization, which uses the gradient to update an approximation of the Cholesky factor of the Hessian. This technique is, in many aspects, very similar to the dual quasi-Newton method, but it does not use line search. The implementation is based on Dennis and Mei (1979) and Gay (1983). |
LEVMAR LM MARQUARDT | performs a highly stable but, for large problems, memory- and time-consuming Levenberg-Marquardt optimization technique, a slightly improved variant of the Mor (1978) implementation. This is the default optimization technique if there are fewer than 40 parameters to estimate. |
NEWRAP NR NEWTON | performs a usually stable but, for large problems, memory- and time-consuming Newton-Raphson optimization technique. The algorithm combines a line-search algorithm with ridging, and it can be modified with the LINESEARCH= option. In releases prior to Release 6.11, this option invokes the NRRIDG option. |
NRRIDG NRR NR | performs a usually stable but, for large problems, memory-and time-consuming Newton-Raphson optimization technique. This algorithm does not perform a line search. Since TECH=NRRIDG uses an orthogonal decomposition of the approximate Hessian, each iteration of TECH=NRRIDG can be slower than that of TECH=NEWRAP, which works with Cholesky decomposition. However, usually TECH=NRRIDG needs less iterations than TECH=NEWRAP. |
QUANEW QN | chooses one of four different quasi-Newton optimization algorithms that can be more precisely defined with the UPDATE= option and modified with the LINESEARCH= option. If boundary constraints are used, these techniques sometimes converge slowly. When you choose this option, UPDATE=DBFGS by default. If nonlinear constraints are specified in the NLINCON statement, a modification of Powell s (1982a, 1982b) VMCWD algorithm is used, which is a sequential quadratic programming (SQP) method. This algorithm can be modified by specifying VERSION=1, which replaces the update of the Lagrange multiplier estimate vector µ to the original update of Powell (1978a, 1978b) that is used in the VF02AD algorithm. This can be helpful for applications with linearly dependent active constraints. The QUANEW technique is the default optimization technique if there are nonlinear constraints specified or if there are more than 40 and fewer than 400 parameters to estimate. The QUANEW algorithm uses only first-order derivatives of the objective function and, if available, of the nonlinear constraint functions. |
TRUREG TR | performs a usually very stable but, for large problems, memory-and time-consuming trust region optimization technique. The algorithm is implemented similar to Gay (1983) and Mor and Sorensen (1983). |
NONE NO | does not perform any optimization. This option is similar to METHOD=NONE, but TECH=NONE also computes and displays residuals and goodness-of-fit statistics. If you specify METHOD=ML, METHOD=LSML, METHOD=GLS, METHOD=LSGLS, METHOD=WLS, or METHOD=LSWLS, this option allows computing and displaying (if the display options are specified) of the standard error estimates and modification indices corresponding to the input parameter estimates. |
UPDATE UPD= name
specifies the update method for the quasi-Newton or conjugate-gradient optimization technique.
For TECHNIQUE=CONGRA, the following updates can be used:
PB | performs the automatic restart update methodof Powell (1977) and Beale (1972). This is the default. |
FR | performs the Fletcher-Reeves update (Fletcher 1980, p. 63). |
PR | performs the Polak-Ribiere update (Fletcher 1980, p. 66). |
CD | performs a conjugate-descent update of Fletcher (1987). |
For TECHNIQUE=DBLDOG, the following updates (Fletcher 1987) can be used:
DBFGS | performs the dual Broyden, Fletcher, Goldfarb, and Shanno (BFGS) update of the Cholesky factor of the Hessian matrix. This is the default. |
DDFP | performs the dual Davidon, Fletcher, and Powell (DFP) update of the Cholesky factor of the Hessian matrix. |
For TECHNIQUE=QUANEW, the following updates (Fletcher 1987) can be used:
BFGS | performs original BFGS update of the inverse Hessian matrix. This is the default for earlier releases. |
DFP | performs the original DFP update of the inverse Hessian matrix. |
DBFGS | performs the dual BFGS update of the Cholesky factor of the Hessian matrix. This is the default. |
DDFP | performs the dual DFP update of the Cholesky factor of the Hessian matrix. |
LINESEARCH LIS SMETHOD SM= i
specifies the line-search method for the CONGRA, QUANEW, and NEWRAP optimization techniques. Refer to Fletcher (1980) for an introduction to line-search techniques. The value of i can be 1 , ..., 8; the default is i = 2.
LIS=1 | specifies a line-search method that needs the same number of function and gradient calls for cubic interpolation and cubic extrapolation; this method is similar to one used by the Harwell subroutine library. |
LIS=2 | specifies a line-search method that needs more function calls than gradient calls for quadratic and cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option. |
LIS=3 | specifies a line-search method that needs the same number of function and gradient calls for cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option. |
LIS=4 | specifies a line-search method that needs the same number of function and gradient calls for stepwise extrapolation and cubic interpolation. |
LIS=5 | specifies a line-search method that is a modified version of LIS=4. |
LIS=6 | specifies golden section line search (Polak 1971), which uses only function values for linear approximation. |
LIS=7 | specifies bisection line search (Polak 1971), which uses only function values for linear approximation. |
LIS=8 | specifies Armijo line-search technique (Polak 1971), which uses only function values for linear approximation. |
FCONV FTOL= r
specifies the relative function convergence criterion. The optimization process is terminated when the relative difference of the function values of two consecutive iterations is smaller than the specified value of r , that is
where FSIZE can be defined by the FSIZE= option in the NLOPTIONS statement. The default value is r = 10 ˆ’ FDIGITS , where FDIGITS either can be specified in the NLOPTIONS statement or is set by default to ˆ’ log 10 ( µ ), where µ is the machine precision.
GCONV GTOL= r
specifies the relative gradient convergence criterion (see the ABSGCONV= option on page 617 for the absolute gradient convergence criterion).
Termination of all techniques (except the CONGRA technique) requires the normalized predicted function reduction to be small,
where FSIZE can be defined by the FSIZE= option in the NLOPTIONS statement. For the CONGRA technique (where a reliable Hessian estimate G is not available),
is used. The default value is r = 10 ˆ’ 8 .
Note that for releases prior to Release 6.11, the GCONV= option specified the absolute gradient convergence criterion.
INSTEP= r
For highly nonlinear objective functions, such as the EXP function, the default initial radius of the trust-region algorithms TRUREG, DBLDOG, and LEVMAR or the default step length of the line-search algorithms can produce arithmetic overflows. If this occurs, specify decreasing values of 0 < r < 1 such as INSTEP=1E “1, INSTEP=1E “2, INSTEP=1E “4, ... , until the iteration starts successfully.
For trust-region algorithms (TRUREG, DBLDOG, and LEVMAR), the INSTEP option specifies a positive factor for the initial radius of the trust region. The default initial trust-region radius is the length of the scaled gradient, and it corresponds to the default radius factor of r = 1.
For line-search algorithms (NEWRAP, CONGRA, and QUANEW), INSTEP specifies an upper bound for the initial step length for the line search during the first five iterations. The default initial step length is r = 1.
For releases prior to Release 6.11, specify the SALPHA= and RADIUS= options. For more details, see the section Computational Problems on page 678.
LSPRECISION LSP= r
SPRECISION SP= r
specifies the degree of accuracy that should be obtained by the line-search algorithms LIS=2 and LIS=3. Usually an imprecise line search is inexpensive and successful. For more difficult optimization problems, a more precise and more expensive line search may be necessary (Fletcher 1980, p.22). The second (default for NEWRAP, QUANEW, and CONGRA) and third line-search methods approach exact line search for small LSPRECISION= values. If you have numerical problems, you should decrease the LSPRECISION= value to obtain a more precise line search. The default LSPRECISION= values are displayed in the following table.
TECH= | UPDATE= | LSP default |
---|---|---|
QUANEW | DBFGS, BFGS | r = 0.4 |
QUANEW | DDFP, DFP | r = 0.06 |
CONGRA | all | r = 0.1 |
NEWRAP | no update | r = 0.9 |
For more details, refer to Fletcher (1980, pp. 25 “29).
MAXFUNC MAXFU= i
specifies the maximum number i of function calls in the optimization process. The default values are displayed in the following table.
TECH= | MAXFUNC default |
---|---|
LEVMAR, NEWRAP, NRRIDG, TRUREG | i =125 |
DBLDOG, QUANEW | i =500 |
CONGRA | i =1000 |
The default is used if you specify MAXFUNC=0. The optimization can be terminated only after completing a full iteration. Therefore, the number of function calls that is actually performed can exceed the number that is specified by the MAXFUNC= option.
MAXITER MAXIT= i <n>
specifies the maximum number i of iterations in the optimization process. The default values are displayed in the following table.
TECH= | MAXITER default |
---|---|
LEVMAR, NEWRAP, NRRIDG, TRUREG | i =50 |
DBLDOG, QUANEW | i =200 |
CONGRA | i =400 |
The default is used if you specify MAXITER=0 or if you omit the MAXITER option.
The optional second value n is valid only for TECH=QUANEW with nonlinear constraints. It specifies an upper bound n for the number of iterations of an algorithm and reduces the violation of nonlinear constraints at a starting point. The default is n =20. For example, specifying
maxiter= . 0
means that you do not want to exceed the default number of iterations during the main optimization process and that you want to suppress the feasible point algorithm for nonlinear constraints.
RADIUS= r
is an alias for the INSTEP= option for Levenberg-Marquardt minimization.
SALPHA= r
is an alias for the INSTEP= option for line-search algorithms.
SPRECISION SP= r
is an alias for the LSPRECISION= option.
There are three kinds of options to control the displayed output:
The PCORR, KURTOSIS, MODIFICATION, NOMOD, PCOVES, PDETERM, PESTIM, PINITIAL, PJACPAT, PLATCOV, PREDET, PWEIGHT, RESIDUAL, SIMPLE, STDERR, and TOTEFF options refertospecific parts of displayed output.
The PALL, PRINT, PSHORT, PSUMMARY, and NOPRINT options refer to special subsets of the displayed output options mentioned in the first item. If the NOPRINT option is not specified, a default set of output is displayed. The PRINT and PALL options add other output options to the default output, and the PSHORT and PSUMMARY options reduce the default displayed output.
The PRIMAT and PRIVEC options describe the form in which some of the output is displayed (the only nonredundant information displayed by PRIVEC is the gradient).
Output Options | PALL | | default | PSHORT | PSUMMARY |
---|---|---|---|---|---|
fit indices | * | * | * | * | * |
linear dependencies | * | * | * | * | * |
PREDET | * | (*) | (*) | (*) | |
model matrices | * | * | * | * | |
PESTIM | * | * | * | * | |
iteration history | * | * | * | * | |
PINITIAL | * | * | * | ||
SIMPLE | * | * | * | ||
STDERR | * | * | * | ||
RESIDUAL | * | * | |||
KURTOSIS | * | * | |||
PLATCOV | * | * | |||
TOTEFF | * | * | |||
PCORR | * | ||||
MODIFICATION | * | ||||
PWEIGHT | * | ||||
PCOVES | |||||
PDETERM | |||||
PJACPAT | |||||
PRIMAT | |||||
PRIVEC |
KURTOSIS KU
computes and displays univariate kurtosis and skewness, various coefficients of multivariate kurtosis, and the numbers of observations that contribute most to the normalized multivariate kurtosis. See the section Measures of Multivariate Kurtosis on page 658 for more information. Using the KURTOSIS option implies the SIMPLE display option. This information is computed only if the DATA= data set is a raw data set, and it is displayed by default if the PRINT option is specified. The multivariate LS kappa and the multivariate mean kappa are displayed only if you specify METHOD=WLS and the weight matrix is computed from an input raw data set. All measures of skewness and kurtosis are corrected for the mean. If an intercept variable is included in the analysis, the measures of multivariate kurtosis do not include the intercept variable in the corrected covariance matrix, as indicated by a displayed message. Using the BIASKUR option displays the biased values of univariate skewness and kurtosis.
MODIFICATION MOD
computes and displays Lagrange multiplier test indices for constant parameter constraints, equality parameter constraints, and active boundary constraints, as well as univariate and multivariate Wald test indices. The modification indices are not computed in the case of unweighted or diagonally weighted least-squares estimation.
The Lagrange multiplier test (Bentler 1986; Lee 1985; Buse 1982) provides an estimate of the 2 reduction that results from dropping the constraint. For constant parameter constraints and active boundary constraints, the approximate change of the parameter value is displayed also. You can use this value to obtain an initial value if the parameter is allowed to vary in a modified model. For more information, see the section Modification Indices on page 673.
NOMOD
does not compute modification indices. The NOMOD option is useful in connection with the PALL option because it saves computing time.
NOPRINT NOP
suppresses the displayed output. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, Using the Output Delivery System.
PALL ALL
displays all optional output except the output generated by the PCOVES, PDETERM, PJACPAT, and PRIVEC options.
Caution: The PALL option includes the very expensive computation of the modification indices. If you do not really need modification indices, you can save computing time by specifying the NOMOD option in addition to the PALL option.
PCORR CORR
displays the (corrected or uncorrected) covariance or correlation matrix that is analyzed and the predicted model covariance or correlation matrix.
PCOVES PCE
displays the following:
the information matrix (crossproduct Jacobian)
the approximate covariance matrix of the parameter estimates (generalized inverse of the information matrix)
the approximate correlation matrix of the parameter estimates
The covariance matrix of the parameter estimates is not computed for estimation methods ULS and DWLS. This displayed output is not included in the output generated by the PALL option.
PDETERM PDE
displays three coefficients of determination: the determination of all equations (DETAE), the determination of the structural equations (DETSE), and the determination of the manifest variable equations (DETMV). These determination coefficients are intended to be global means of the squared multiple correlations for different subsets of model equations and variables. The coefficients are displayed only when you specify a RAM or LINEQS model, but they are displayed for all five estimation methods: ULS, GLS, ML, WLS, and DWLS.
You can use the STRUCTEQ statement to define which equations are structural equations. If you don t use the STRUCTEQ statement, PROC CALIS uses its own default definition to identify structural equations.
The term structural equation is not defined in a unique way. The LISREL program defines the structural equations by the user -defined BETA matrix. In PROC CALIS, the default definition of a structural equation is an equation that has a dependent left side variable that appears at least once on the right side of another equation, or an equation that has at least one right side variable that is the left side variable of another equation. Therefore, PROC CALIS sometimes identifies more equations as structural equations than the LISREL program does.
If the model contains structural equations, PROC CALIS also displays the Stability Coefficient of Reciprocal Causation, that is, the largest eigenvalue of the BB ² matrix, where B is the causal coefficient matrix of the structural equations. These coefficients are computed as in the LISREL VI program of J reskog and S rbom (1985). This displayed output is not included in the output generated by the PALL option.
PESTIM PES
displays the parameter estimates. In some cases, this includes displaying the standard errors and t values.
PINITIAL PIN
displays the input model matrices and the vector of initial values.
PJACPAT PJP
displays the structure of variable and constant elements of the Jacobian matrix. This displayed output is not included in the output generated by the PALL option.
PLATCOV PLC
displays the following:
the estimates of the covariances among the latent variables
the estimates of the covariances between latent and manifest variables
the latent variable score regression coefficients
The estimated covariances between latent and manifest variables and the latent variable score regression coefficients are written to the OUTSTAT= data set. You can use the score coefficients with PROC SCORE to compute factor scores.
PREDET PRE
displays the pattern of variable and constant elements of the predicted moment matrix that is predetermined by the analysis model. It is especially helpful in finding manifest variables that are not used or that are used as exogenous variables in a complex model specified in the COSAN statement. Those entries of the predicted moment matrix for which the model generates variable (rather than constant) elements are displayed as missing values. This output is displayed even without specifying the PREDET option if the model generates constant elements in the predicted model matrix different from those in the analysis moment matrix and if you specify at least the PSHORT amount of displayed output.
If the analyzed matrix is a correlation matrix (containing constant elements of 1s in the diagonal) and the model generates a predicted model matrix with q constant (rather than variable) elements in the diagonal, the degrees of freedom are automatically reduced by q . The output generated by the PREDET option displays those constant diagonal positions . If you specify the DFREDUCE= or NODIAG option, this automatic reduction of the degrees of freedom is suppressed. See the section Counting the Degrees of Freedom on page 676 for more information.
PRIMAT PMAT
displays parameter estimates, approximate standard errors, and t values in matrix form if you specify the analysis model in the RAM or LINEQS statement. When a COSAN statement is used, this occurs by default.
PRINT PRI
adds the options KURTOSIS, RESIDUAL, PLATCOV, and TOTEFF to the default output.
PRIVEC PVEC
displays parameter estimates, approximate standard errors, the gradient, and t values in vector form. The values are displayed with more decimal places. This displayed output is not included in the output generated by the PALL option.
PSHORT SHORT PSH
excludes the output produced by the PINITIAL, SIMPLE, and STDERR options from the default output.
PSUMMARY SUMMARY PSUM
displays the fit assessment table and the ERROR, WARNING, and NOTE messages.
PWEIGHT PW
displays the weight matrix W used in the estimation. The weight matrix is displayed after the WRIDGE= and the WPENALTY= options are applied to it.
RESIDUAL RES < = NORM VARSTAND ASYSTAND >
displays the absolute and normalized residual covariance matrix, the rank order of the largest residuals, and a bar chart of the residuals. This information is displayed by default when you specify the PRINT option.
Three types of normalized or standardized residual matrices can be chosen with the RESIDUAL= specification.
RESIDUAL= NORM Normalized Residuals
RESIDUAL= VARSTAND Variance Standardized Residuals
RESIDUAL= ASYSTAND Asymptotically Standardized Residuals
For more details, see the section Assessment of Fit on page 649.
SIMPLE S
displays means, standard deviations, skewness, and univariate kurtosis if available. This information is displayed when you specify the PRINT option. If you specify the UCOV, UCORR, or NOINT option, the standard deviations are not corrected for the mean. If the KURTOSIS option is specified, the SIMPLE option is set by default.
STDERR SE
displays approximate standard errors if estimation methods other than unweighted least squares (ULS) or diagonally weighted least squares (DWLS) are used (and the NOSTDERR option is not specified). If you specify neither the STDERR nor the NOSTDERR option, the standard errors are computed for the OUTRAM= data set. This information is displayed by default when you specify the PRINT option.
NOSTDERR NOSE
specifies that standard errors should not be computed. Standard errors are not computed for unweighted least-squares (ULS) or diagonally weighted least-squares (DWLS) estimation. In general, standard errors are computed even if the STDERR display option is not used (for file output).
TOTEFF TE
computes and displays total effects and indirect effects.
ALPHAECV= ±
specifies the significance level for a 1 ˆ’ ± confidence interval, 0 ‰ ± ‰ 1, for the Browne & Cudeck (1993) expected cross validation index (ECVI) . The default value is ± =0 . 1, which corresponds to a 90% confidence interval for the ECVI.
ALPHARMS= ±
specifies the significance level for a 1 ˆ’ ± confidence interval, 0 ‰ ± ‰ 1, for the Steiger & Lind (1980) root mean squared error of approximation (RMSEA) coefficient (refer to Browne and Du Toit 1992). The default value is ± = 0 . 1,which corresponds to a 90% confidence interval for the RMSEA.
ASINGULAR ASING= r
specifies an absolute singularity criterion r , r > 0, for the inversion of the information matrix, which is needed to compute the covariance matrix. The following singularity criterion is used:
In the preceding criterion, d j,j is the diagonal pivot of the matrix, and VSING and MSING are the specified values of the VSINGULAR= and MSINGULAR= options. The default value for ASING is the square root of the smallest positive double precision value. Note that, in many cases, a normalized matrix D ˆ’ 1 HD ˆ’ 1 is decomposed, and the singularity criteria are modified correspondingly.
BIASKUR
computes univariate skewness and kurtosis by formulas uncorrected for bias. See the section Measures of Multivariate Kurtosis on page 658 for more information.
COVSING= r
specifies a nonnegative threshold r , which determines whether the eigenvalues of the information matrix are considered to be zero. If the inverse of the information matrix is found to be singular (depending on the VSINGULAR=, MSINGULAR=, ASINGULAR=, or SINGULAR= option), a generalized inverse is computed using the eigenvalue decomposition of the singular matrix. Those eigenvalues smaller than r are considered to be zero. If a generalized inverse is computed and you do not specify the NOPRINT option, the distribution of eigenvalues is displayed.
DEMPHAS DE= r
changes the initial values of all parameters that are located on the diagonals of the central model matrices by the relationship
The initial values of the diagonal elements of the central matrices should always be nonnegative to generate positive definite predicted model matrices in the first iteration. By using values of r > 1, for example, r = 2, r = 10, ... , you can increase these initial values to produce predicted model matrices with high positive eigenvalues in the first iteration. The DEMPHAS= option is effective independent of the way the initial values are set; that is, it changes the initial values set in the model specification as well as those set by an INRAM= data set and those automatically generated for RAM, LINEQS, or FACTOR model statements. It also affects the initial values set by the START= option, which uses, by default, DEMPHAS=100 if a covariance matrix is analyzed and DEMPHAS=10 for a correlation matrix.
FDCODE
replaces the analytic derivatives of the program statements by numeric derivatives (finite difference approximations). In general, this option is needed only when you have program statements that are too difficult for the built-in function compiler to differentiate analytically. For example, if the program code for the nonlinear constraints contains many arrays and many DO loops with array processing, the built-in function compiler can require too much time and memory to compute derivatives of the constraints with respect to the parameters. In this case, the Jacobian matrix of constraints is computed numerically by using finite difference approximations. The FDCODE option does not modify the kind of derivatives specified with the HESSALG= option.
HESSALG HA = 1 2345611
specifies the algorithm used to compute the (approximate) Hessian matrix when TECHNIQUE=LEVMAR and NEWRAP, to compute approximate standard errors of the parameter estimates, and to compute Lagrange multipliers. There are different groups of algorithms available.
analytic formulas: HA= 1,2,3,4,11
finite difference approximation: HA= 5,6
dense storage: HA= 1,2,3,4,5,6
sparse storage: HA= 11
If the Jacobian is more than 25% dense, the dense analytic algorithm, HA= 1, is used by default. The HA= 1 algorithm is faster than the other dense algorithms, but it needs considerably more memory for large problems than HA= 2,3,4. If the Jacobian is more than 75% sparse, the sparse analytic algorithm, HA= 11, is used by default. The dense analytic algorithm HA= 4 corresponds to the original COSAN algorithm; you are advised not to specify HA= 4 due to its very slow performance. If there is not enough memory available for the dense analytic algorithm HA= 1 and you must specify HA= 2 or HA= 3, it may be more efficient to use one of the quasi-Newton or conjugate-gradient optimization techniques since Levenberg-Marquardt and Newton-Raphson optimization techniques need to compute the Hessian matrix in each iteration. For approximate standard errors and modification indices, the Hessian matrix has to be computed at least once, regardless of the optimization technique.
The algorithms HA= 5 and HA= 6 compute approximate derivatives by using forward difference formulas. The HA= 5 algorithm corresponds to the analytic HA= 1: it is faster than HA= 6, however it needs much more memory. The HA= 6 algorithm corresponds to the analytic HA= 2: it is slower than HA= 5, however it needs much less memory.
Test computations of large sparse problems show that the sparse algorithm HA= 11 can be up to ten times faster than HA= 1 (and needs much less memory).
MSINGULAR MSING= r
specifies a relative singularity criterion r , r> 0, for the inversion of the information matrix, which is needed to compute the covariance matrix. The following singularity criterion is used:
where d j,j is the diagonal pivot of the matrix, and ASING and VSING are the specified values of the ASINGULAR= and VSINGULAR= options. If you do not specify the SINGULAR= option, the default value for MSING is 1E “ 12; otherwise, the default value is 1E “ 4 * SINGULAR. Note that, in many cases, a normalized matrix D ˆ’ 1 HD ˆ’ 1 is decomposed, and the singularity criteria are modified correspondingly.
NOADJDF
turns off the automatic adjustment of degrees of freedom when there are active constraints in the analysis. When the adjustment is in effect, most fit statistics and the associated probability levels will be affected. This option should be used when the researcher believes that the active constraints observed in the current sample will have little chance to occur in repeated sampling.
RANDOM = i
specifies a positive integer as a seed value for the pseudo-random number generator to generate initial values for the parameter estimates for which no other initial value assignments in the model definitions are made. Except for the parameters in the diagonal locations of the central matrices in the model, the initial values are set to random numbers in the range 0 ‰ r ‰ 1. The values for parameters in the diagonals of the central matrices are random numbers multiplied by 10 or 100. For more information, see the section Initial Estimates on page 661.
SINGULAR SING = r
specifies the singularity criterion r , 0 <r< 1, used, for example, for matrix inversion. The default value is the square root of the relative machine precision or, equivalently, the square root of the largest double precision value that, when added to 1, results in 1.
SLMW= r
specifies the probability limit used for computing the stepwise multivariate Wald test. The process stops when the univariate probability is smaller than r . The default value is r = 0 . 05.
START = r
In general, this option is needed only in connection with the COSAN model statement, and it specifies a constant r as an initial value for all the parameter estimates for which no other initial value assignments in the pattern definitions are made. Start values in the diagonal locations of the central matrices are set to 100 r if a COV or UCOV matrix is analyzed and 10 r if a CORR or UCORR matrix is analyzed. The default value is r = . 5. Unspecified initial values in a FACTOR, RAM, or LINEQS model are usually computed by PROC CALIS. If none of the initialization methods are able to compute all starting values for a model specified by a FACTOR, RAM, or LINEQS statement, then the start values of parameters that could not be computed are set to r , 10 r , or 100 r . If the DEMPHAS= option is used, the initial values of the diagonal elements of the central model matrices are multiplied by the value specified in the DEMPHAS= option. For more information, see the section Initial Estimates on page 661.
VSINGULAR VSING= r
specifies a relative singularity criterion r , r > 0, for the inversion of the information matrix, which is needed to compute the covariance matrix. The following singularity criterion is used:
where d j,j is the diagonal pivot of the matrix, and ASING and MSING are the specified values of the ASINGULAR= and MSINGULAR= options. If you do not specify the SINGULAR= option, the default value for VSING is 1E ˆ’ 8; otherwise, the default value is SINGULAR. Note that in many cases a normalized matrix D ˆ’ 1 HD ˆ’ 1 is decomposed, and the singularity criteria are modified correspondingly.
COSAN matrix_term < + matrix_term ...> ;
where matrix_term represents matrix_definition < * matrix_definition ... >
and matrix_definition represents matrix_name (column_number < ,general_form < ,transformation >> )
The COSAN statement constructs the symmetric matrix model for the covariance analysis mentioned earlier (see the section The Generalized COSAN Model on page 552):
You can specify only one COSAN statement with each PROC CALIS statement. The COSAN statement contains m matrix_term s corresponding to the generalized COSAN formula. The matrix_term s are separated by plus signs (+) according to the addition of the terms within the model.
Each matrix_term of the COSAN statement contains the definitions of the first n ( k )+ 1 matrices, F k j and P k , separated by asterisks (*) according to the multiplication of the matrices within the term. The matrices of the right-hand-side product are redundant and are not specified within the COSAN statement.
Each matrix_definition consists of the name of the matrix ( matrix_name ), followed in parentheses by the number of columns of the matrix ( column_number ) and, optionally , one or two matrix properties, separated by commas, describing the form of the matrix.
The number of rows of the first matrix in each term is defined by the input correlation or covariance matrix. You can reorder and reduce the variables in the input moment matrix using the VAR statement. The number of rows of the other matrices within the term is defined by the number of columns of the preceding matrix.
The first matrix property describes the general form of the matrix in the model. You can choose one of the following specifications of the first matrix property. The default first matrix property is GEN.
Code | Description |
---|---|
IDE | specifies an identity matrix; if the matrix is not square, this specification describes an identity submatrix followed by a rectangular zero submatrix. |
ZID | specifies an identity matrix; if the matrix is not square, this specification describes a rectangular zero submatrix followed by an identity submatrix. |
DIA | specifies a diagonal matrix; if the matrix is not square, this specification describes a diagonal submatrix followed by a rectangular zero submatrix. |
ZDI | specifies a diagonal matrix; if the matrix is not square, this specification describes a rectangular zero submatrix followed by a diagonal submatrix. |
LOW | specifies a lower triangular matrix; the matrix can be rectangular. |
UPP | specifies an upper triangular matrix; the matrix can be rectangular. |
SYM | specifies a symmetric matrix; the matrix cannot be rectangular. |
GEN | specifies a general rectangular matrix (default). |
The second matrix property describes the kind of inverse matrix transformation. If the second matrix property is omitted, no transformation is applied to the matrix.
Code | Description |
---|---|
INV | uses the inverse of the matrix. |
IMI | uses the inverse of the difference between the identity and the matrix. |
You cannot specify a nonsquare parameter matrix as an INV or IMI model matrix. Specifying a matrix of type DIA, ZDI, UPP, LOW, or GEN is not necessary if you do not use the unspecified location list in the corresponding MATRIX statements. After PROC CALIS processes the corresponding MATRIX statements, the matrix type DIA, ZDI, UPP, LOW, or GEN is recognized from the pattern of possibly nonzero elements. If you do not specify the first matrix property and you use the unspecified location list in a corresponding MATRIX statement, the matrix is recognized as a GEN matrix. You can also generate an IDE or ZID matrix by specifying a DIA, ZDI, or IMI matrix and by using MATRIX statements that define the pattern structure. However, PROC CALIS would be unable to take advantage of the fast algorithms that are available for IDE and ZID matrices in this case.
For example, to specify a second-order factor analysis model
with m 1 = 3 first-order factors and m 2 = 2 second-order factors and with n = 9 variables, you can use the following COSAN statement:
cosan F1(3) * F2(2) * P2(2,SYM)+F1(3) * U2(3,DIA) * I1(3,IDE) +U1(9,DIA) * I2(9,IDE)
MATRIX matrix-name < location > = list < , location = list ...> ;
You can specify one or more MATRIX statements with a COSAN or FACTOR statement. A MATRIX statement specifies which elements of the matrix are constant and which are parameters. You can also assign values to the constant elements and initial values for the parameters. The input notation resembles that used in the COSAN program of R. McDonald and C. Fraser (personal communication), except that in PROC CALIS, parameters are distinguished from constants by giving parameters names instead of by using positive and negative integers.
A MATRIX statement cannot be used for an IDE or ZID matrix. For all other types of matrices, each element is assumed to be a constant of 0 unless a MATRIX statement specifies otherwise. Hence, there must be at least one MATRIX statement for each matrix mentioned in the COSAN statement except for IDE and ZID matrices. There can be more than one MATRIX statement for a given matrix. If the same matrix element is given different definitions, later definitions override earlier definitions.
At the start, all elements of each model matrix, except IDE or ZID matrices, are set equal to 0.
Description of location :
There are several ways to specify the starting location and continuation direction of a list with n +1, n ‰ 0, elements within the parameter matrix.
[ i,j ] | The list elements correspond to the diagonally continued matrix elements [ i,j ],[ i +1, j +1] , ... , [ i+n,j+n ]. The number of elements is defined by the length of the list and eventually terminated by the matrix boundaries. If the list contains just one element (constant or variable), then it is assigned to the matrix element [ i,j ]. |
[ i, ] | The list elements correspond to the horizontally continued matrix elements [ i,j ], [ i,j +1] , ... , [ i,j+n ], where the starting column j is the diagonal position for a DIA, ZDI, or UPP matrix and is the first column for all other matrix types. For a SYM matrix, the list elements refer only to the matrix elements in the lower triangle. For a DIA or ZDI matrix, only one list element is accepted. |
[ ,j ] | The list elements correspond to the vertically continued matrix elements [ i,j ], [ i +1, j ] , ... , [ i+n,j ], where the starting row i is equal to the diagonal position for a DIA, ZDI, SYM, or LOW matrix and is the first row for each other matrix type. For a SYM matrix, the list elements refer only to the matrix elements in the lower triangle. For a DIA or ZDI matrix, only one list element is accepted. |
[ , ] | unspecified location: The list is allocated to all valid matrix positions (except for a ZDI matrix) starting at the element [1,1] and continuing rowwise. The only valid matrix positions for a DIA or ZDI matrix are the diagonal elements; for an UPP or LOW matrix, the valid positions are the elements above or below the diagonal; and for a symmetric matrix, the valid positions are the elements in the lower triangle since the other triangle receives the symmetric allocation automatically. This location definition differs from the definitions with specified pattern locations in one important respect: if the number of elements in the list is smaller than the number of valid matrix elements, the list is repeated in the allocation process until all valid matrix elements are filled. |
Omitting the left-hand-side term is equivalent to using [ , ] for an unspecified location .
Description of list :
The list contains numeric values or parameter names, or both, that are assigned to a list of matrix elements starting at a specified position and proceeding in a specified direction. A real number r in the list defines the corresponding matrix element as a constant element with this value. The notation n * r generates n values of r in the list. A name in the list defines the corresponding matrix element as a parameter to be estimated. You can use numbered name lists ( X1-X10 ) or the asterisk notation (5 * X means five occurrences of the parameter X ). If a sublist of n 1 names inside a list is followed by a list of n 2 ‰ n 1 real values inside parentheses, the last n 2 parameters in the name sublist are given the initial values mentioned inside the parenthesis. For example, the following list
0. 1. A2-A5 (1.4 1.9 2.5) 5.
specifies that the first two matrix elements (specified by the location to the left of the equal sign) are constants with values 0 and 1. The next element is parameter A2 with no specified initial value. The next three matrix elements are the variable parameters A3 , A4 , and A5 with initial values 1.4, 1.9, and 2.5, respectively. The next matrix element is specified by the seventh list element to be the constant 5.
If your model contains many unconstrained parameters and it is too cumbersome to find different parameter names, you can specify all those parameters by the same prefix name. A prefix is a short name followed by a colon . The CALIS procedure generates a parameter name by appending an integer suffix to this prefix name. The prefix name should have no more than five or six characters so that the generated parameter name is not longer than eight characters . For example, if the prefix A (the parameter A1 ) is already used once in a list , the previous example would be identical to
0.1.4*A:(1.4 1.9 2.5) 5.
To avoid unintentional equality constraints, the prefix names should not coincide with explicitly defined parameter names.
If you do not assign initial values to the parameters (listed in parentheses following a name sublist within the pattern list), PROC CALIS assigns initial values as follows:
If the PROC CALIS statement contains a START= r option, each uninitialized parameter is given the initial value r . The uninitialized parameters in the diagonals of the central model matrices are given the initial value 10 r , 100 r , or r multiplied by the value specified in the DEMPHAS= option.
If the PROC CALIS statement contains a RANDOM= i option, each uninitialized parameter is given a random initial value 0 ‰ r ‰ 1. The uninitialized parameters in the diagonals of the central model matrices are given the random values multiplied by 10, 100, or the value specified in the DEMPHAS= option.
Otherwise, the initial value is set corresponding to START=0.5.
For example, to specify a confirmatory second-order factor analysis model
with m 1 = 3 first-order factors, m 2 = 2 second-order factors, and n = 9 variables and the following matrix pattern,
you can specify the following COSAN and MATRIX statements:
cosan f1(3) * f2(2) * p2(2,dia) + f1(3) * u2(3,dia) * i1(3,ide) + u1(9,dia) * i2(9,ide); matrix f1 [ ,1]= x1-x3, [ ,2]= 3 * 0x4-x6, [ ,3]= 6 * 0x7-x9; matrix u1 [1,1]=u1-u9; matrix f2 [ ,1]= 2 * y1, [ ,2]= 0. 2 * y2; matrix u2 = 3 * v:; matrix p2 = 2 * p; run;
The matrix pattern includes several equality constraints. Two loadings in the first and second factor of F 2 (parameter names Y1 and Y2 ) and the two factor correlations in the diagonal of matrix P 2 (parameter name P ) are constrained to be equal. There are many other ways to specify the same model. See Figure 19.2 for the path diagram of this model.
The MATRIX statement can also be used with the FACTOR model statement. See Using the FACTOR and MATRIX Statements on page 608 for the usage.
RAM list-entry < , list-entry ...> ;
where list-entry represents matrix-number row-number column-number <value><parameter-name>
The RAM statement defines the elements of the symmetric RAM matrix model
in the form of a list type input (McArdle and McDonald 1984).
The covariance structure is given by
with selection matrix J and
You can specify only one RAM statement with each PROC CALIS statement. Using the RAM statement requires that the first n variable numbers in the path diagram and in the vector v correspond to the numbers of the n manifest variables of the given covariance or correlation matrix. If you are not sure what the order of the manifest variables in the DATA= data set is, use a VAR statement to specify the order of these observed variables. Using the AUGMENT option includes the INTERCEPT variable as a manifest variable with number n + 1 in the RAM model. In this case, latent variables have to start with n + 2. The box of each manifest variable in the path diagram is assigned the number of the variable in the covariance or correlation matrix.
The selection matrix J is always a rectangular identity (IDE) matrix, and it does not have to be specified in the RAM statement. A constant matrix element is defined in a RAM statement by a list-entry with four numbers. You define a parameter element by three or four numbers followed by a name for the parameter. Separate the list entries with a comma. Each list-entry in the RAM statement corresponds to a path in the diagram, as follows:
The first number in each list entry ( matrix-number ) is the number of arrow heads of the path, which is the same as the number of the matrix in the RAM model (1 := A , 2 := P ).
The second number in each list entry ( row-number ) is the number of the node in the diagram to which the path points, which is the same as the row number of the matrix element.
The third number in each list entry ( column-number ) is the number of the node in the diagram from which the path originates, which is the same as the column number of the matrix element.
The fourth number ( value ) gives the (initial) value of the path coefficient. If you do not specify a fifth list-entry , this number specifies a constant coefficient; otherwise, this number specifies the initial value of this parameter. It is not necessary to specify the fourth item. If you specify neither the fourth nor the fifth item, the constant is set to 1 by default. If the fourth item ( value ) is not specified for a parameter, PROC CALIS tries to compute an initial value for this parameter.
If the path coefficient is a parameter rather than a constant, then a fifth item in the list entry ( parameter-name ) is required to assign a name to the parameter. Using the same name for different paths constrains the corresponding coefficients to be equal.
If the initial value of a parameter is not specified in the list, the initial value is chosen in one of the following ways:
If the PROC CALIS statement contains a RANDOM= i option, then the parameter obtains a randomly generated initial value r , such that 0 ‰ r ‰ 1. The uninitialized parameters in the diagonals of the central model matrices are given the random values r multiplied by 10, 100, or the value specified in the DEMPHAS= option.
If the RANDOM= option is not used, PROC CALIS tries to estimate the initial values.
If the initial values cannot be estimated, the value of the START= option is used as an initial value.
If your model contains many unconstrained parameters and it is too cumbersome to find different parameter names, you can specify all those parameters by the same prefix name. A prefix is a short name followed by a colon. The CALIS procedure then generates a parameter name by appending an integer suffix to this prefix name. The prefix name should have no more than five or six characters so that the generated parameter name is not longer than eight characters. To avoid unintentional equality constraints, the prefix names should not coincide with explicitly defined parameter names.
For example, you can specify the confirmatory second-order factor analysis model (mentioned on page 595)
using the following RAM model statement.
ram 1 1 10 x1, 1 2 10 x2, 1 3 10 x3, 1 4 11 x4, 1 5 11 x5, 1 6 11 x6, 1 7 12 x7, 1 8 12 x8, 1 9 12 x9, 1 10 13 y1, 1 11 13 y1, 1 11 14 y2, 1 12 14 y2, 2 1 1 u:, 2 2 2 u:, 2 3 3 u:, 2 4 4 u:, 2 5 5 u:, 2 6 6 u:, 2 7 7 u:, 2 8 8 u:, 2 9 9 u:, 2 10 10 v:, 2 11 11 v:, 2 12 12 v:, 2 13 13 p, 2 14 14 p; run;
The confirmatory second-order factor analysis model corresponds to the path diagram displayed in Figure 19.2.
There is a very close relationship between the RAM model algebra and the specification of structural linear models by path diagrams. See Figure 19.3 for an example.
Refer to McArdle (1980) for the interpretation of the models displayed in Figure 19.3.
LINEQS equation < , equation ...> ;
where equation represents dependent = term < + term... > and where term represents one of the following:
coefficient-name < (number) > variable-name
prefix-name < (number) > variable-name
< number > variable-name
The LINEQS statement defines the LINEQS model
You can specify only one LINEQS statement with each PROC CALIS statement. There are some differences from Bentler s notation in choosing the variable names. The length of each variable name is restricted to eight characters. The names of the manifest variables are defined in the DATA= input data set. The VAR statement can be used to select a subset of manifest variables in the DATA= input data set to analyze. You do not need to use a V prefix for manifest variables in the LINEQS statement nor do you need to use a numerical suffix in any variable name. The names of the latent variables must start with the prefix letter F (for Factor); the names of the residuals must start with the prefix letters E (for Error) or D (for Disturbance). The trailing part of the variable name can contain letters or digits. The prefix letter E is used for the errors of the manifest variables, and the prefix letter D is used for the disturbances of the latent variables. The names of the manifest variables in the DATA= input data set can start with F, E, or D, but these names should not coincide with the names of latent or error variables used in the model. The left-hand side (that is, endogenous dependent variable) of each equation should be either a manifest variable of the data set or a latent variable with prefix letter F. The left-hand-side variable should not appear on the right-hand side of the same equation; this means that matrix ² should not have a nonzero diagonal element. Each equation should contain, at most, one E or D variable.
The equations must be separated by a comma. The order of the equations is arbitrary. The displayed output generally contains equations and terms in an order different from the input.
Coefficients to estimate are indicated in the equations by a name preceding the independent variable s name. The coefficient s name can be followed by a number inside parentheses indicating the initial value for this coefficient. A number preceding the independent variable s name indicates a constant coefficient. If neither a coefficient name nor a number precedes the independent variable s name, a constant coefficient of 1 is assumed.
If the initial value of a parameter is not specified in the equation, the initial value is chosen in one of the following ways:
If you specify the RANDOM= option in the PROC CALIS statement, the variable obtains a randomly generated initial value r , such that 0 ‰ r ‰ 1. The uninitialized parameters in the diagonals of the central model matrices are given the nonnegative random values r multiplied by 10, 100, or the value specified in the DEMPHAS= option.
If the RANDOM= option is not used, PROC CALIS tries to estimate the initial values.
If the initial values cannot be estimated, the value of the START= option is used as an initial value.
In Bentler s notation, estimated coefficients are indicated by asterisks. Referring to a parameter in Bentler s notation requires the specification of two variable names that correspond to the row and column of the position of the parameter in the matrix. Specifying the estimated coefficients by parameter names makes it easier to impose additional constraints with code. You do not need any additional statements to express equality constraints. Simply specify the same name for parameters that should have equal values.
If your model contains many unconstrained parameters and it is too cumbersome to find different parameter names, you can specify all those parameters by the same prefix name. A prefix is a short name followed by a colon. The CALIS procedure then generates a parameter name by appending an integer suffix to this prefix name. The prefix name should have no more than five or six characters so that the generated parameter name is not longer than eight characters. To avoid unintentional equality constraints, the prefix names should not coincide with explicitly defined parameter names.
For example, you can specify confirmatory second-order factor analysis model (mentioned on page 595)
by using the LINEQS and STD statements:
lineqs V1=X1F1+E1, V2=X2F1+E2, V3=X3F1+E3, V4=X4F2+E4, V5=X5F2+E5, V6=X6F2+E6, V7=X7F3+E7, V8=X8F3+E8, V9=X9F3+E9, F1=Y1F4+D1, F2=Y1F4+Y2F5+D2, F3=Y2F5+D3; std E1-E9=9*U:, D1-D3=3*V:, F4F5=2*P; run;
STD assignment < , assignment ...> ;
where assignment represents variables = pattern-definition
The STD statement tells which variances are parameters to estimate and which are fixed. The STD statement can be used only with the LINEQS statement. You can specify only one STD statement with each LINEQS model statement. The STD statement defines the diagonal elements of the central model matrix . These elements correspond to the variances of the exogenous variables and to the error variances of the endogenous variables. Elements that are not defined are assumed to be 0.
Each assignment consists of a variable list ( variables ) on the left-hand side and a pattern list ( pattern-definition ) on the right-hand side of an equal sign. The assignments in the STD statement must be separated by commas. The variables list on the left-hand side of the equal sign should contain only names of variables that do not appear on the left-hand side of an equation in the LINEQS statement, that is, exogenous, error, and disturbance variables.
The pattern-definition on the right-hand side is similar to that used in the MATRIX statement. Each list element on the right-hand side defines the variance of the variable on the left-hand side in the same list position. A name on the right-hand side means that the corresponding variance is a parameter to estimate. A name on the right-hand side can be followed by a number inside parentheses that gives the initial value. A number on the right-hand side means that the corresponding variance of the variable on the left-hand side is fixed. If the right-hand-side list is longer than the left-hand-side variable list, the right-hand-side list is shortened to the length of the variable list. If the right-hand-side list is shorter than the variable list, the right-hand-side list is filled with repetitions of the last item in the list.
The right-hand side can also contain prefixes. A prefix is a short name followed by a colon. The CALIS procedure then generates a parameter name by appending an integer suffix to this prefix name. The prefix name should have no more than five or six characters so that the generated parameter name is not longer than eight characters. To avoid unintentional equality constraints, the prefix names should not coincide with explicitly defined parameter names. For example, if the prefix A is not used in any previous statement, this STD statement
std E1-E6=6 * A: (6 * 3.) ;
defines the six error variances as free parameters A 1, ... , A 6, all with starting values of 3.
COV assignment < , assignment ...> ;
where assignment represents variables < * variables2 > = pattern-definition
The COV statement tells which covariances are parameters to estimate and which are fixed. The COV statement can be used only with the LINEQS statement. The COV statement differs from the STD statement only in the meaning of the left-hand-side variables list. You can specify only one COV statement with each LINEQS statement. The COV statement defines the off-diagonal elements of the central model matrix . These elements correspond to the covariances of the exogenous variables and to the error covariances of the endogenous variables. Elements that are not defined are assumed to be 0. The assignment s in the COV statement must be separated by commas.
The variables list on the left-hand side of the equal sign should contain only names of variables that do not appear on the left-hand side of an equation in the LINEQS statement, that is, exogenous, error, and disturbance variables.
The pattern-definition on the right-hand side is similar to that used in the MATRIX statement. Each list element on the right-hand side defines the covariance of a pair of variables in the list on the left-hand side. A name on the right-hand side can be followed by a number inside parentheses that gives the initial value. A number on the right-hand side means that the corresponding covariance of the variable on the left-hand side is fixed. If the right-hand-side list is longer than the left-hand-side variable list, the right-hand-side list is shortened to the length of the variable list. If the right-hand-side list is shorter than the variable list, the right-hand-side list is filled with repetitions of the last item in the list.
You can use one of two alternatives to refer to parts of . Thefirst alternative uses only one variable list and refers to all distinct pairs of variables within the list. The second alternative uses two variable lists separated by an asterisk and refers to all pairs of variables among the two lists.
Using k variable names in the variables list on the left-hand side of an equal sign in a COV statement means that the parameter list ( pattern-definition ) on the right-hand side refers to all k ( k ˆ’ 1) / 2 distinct variable pairs in the below-diagonal part of the matrix. Order is very important. The order relation between the left-hand-side variable pairs and the right-hand-side parameter list is illustrated by the following example:
COV E1-E4 = PHI1-PHI6 ;
This is equivalent to the following specification:
COV E2 E1 = PHI1, E3 E1 = PHI2, E3 E2 = PHI3, E4 E1 = PHI4, E4 E2 = PHI5, E4 E3 = PHI6;
The symmetric elements are generated automatically. When you use prefix names on the right-hand sides, you do not have to count the exact number of parameters. For example,
COV E1-E4 = PHI: ;
generates the same list of parameter names if the prefix PHI is not used in a previous statement.
Using k 1 and k 2 variable names in the two lists (separated by an asterisk) on the left-hand side of an equal sign in a COV statement means that the parameter list on the right-hand side refers to all k 1 — k 2 distinct variable pairs in the matrix. Order is very important. The order relation between the left-hand-side variable pairs and the right-hand-side parameter list is illustrated by the following example:
COV E1 E2 * E3 E4 = PHI1-PHI4 ;
This is equivalent to the following specification:
COV E1 E3 = PHI1, E1 E4 = PHI2, E2 E3 = PHI3, E2 E4 = PHI4;
The symmetric elements are generated automatically.
Using prefix names on the right-hand sides lets you achieve the same purpose without counting the number of parameters. That is,
COV E1 E2 * E3 E4 = PHI: ;
FACTOR < options > ;
You can use the FACTOR statement to specify an exploratory or confirmatory first-order factor analysis of the given covariance or correlation matrix C ,
or
where U is a diagonal matrix and P is symmetric. Within this section, n denotes the number of manifest variables corresponding to the rows and columns of matrix C , and m denotes the number of latent variables (factors or components ) corresponding to the columns of the loading matrix F .
You can specify only one FACTOR statement with each PROC CALIS statement. You can specify higher-order factor analysis problems using a COSAN model specification. PROC CALIS requires more computing time and memory than PROC FACTOR because it is designed for more general structural estimation problems and is unable to exploit the special properties of the unconstrained factor analysis model.
For default (exploratory) factor analysis, PROC CALIS computes initial estimates for factor loadings and unique variances by an algebraic method of approximate factor analysis. If you use a MATRIX statement together with a FACTOR model specification, initial values are computed by McDonald s (McDonald and Hartmann 1992) method (if possible). For details, see Using the FACTOR and MATRIX Statements on page 608. If neither of the two methods are appropriate, the initial values are set by the START= option.
The unrestricted factor analysis model is not identified because any orthogonal rotated factor loading matrix is equivalent to the result F,
To obtain an identified factor solution, the FACTOR statement imposes zero constraints on the m ( m ˆ’ 1) / 2 elements in the upper triangle of F by default.
The following options are available in the FACTOR statement.
COMPONENT COMP
computes a component analysis instead of a factor analysis (the diagonal matrix U in the model is set to 0). Note that the rank of FF ² is equal to the number m of components in F . If m is smaller than the number of variables in the moment matrix C , the matrix of predicted model values is singular and maximum likelihood estimates for F cannot be computed. You should compute ULS estimates in this case.
HEYWOOD HEY
constrains the diagonal elements of U to be nonnegative; in other words, the model is replaced by
N = m
specifies the number of first-order factors or components. The number m of factors should not exceed the number n of variables in the covariance or correlation matrix analyzed. For the saturated model, m = n , the COMP option should generally be specified for U = 0; otherwise, df < 0. For m = 0 no factor loadings are estimated, and the model is C = U , with U = diag . By default, m = 1.
NORM
normalizes the rows of the factor pattern for rotation using Kaiser s normalization.
RCONVERGE= p
RCONV= p
specifies the convergence criterion for rotation cycles. The option is applicable to rotation using either the QUARTIMAX, VARIMAX, EQUAMAX, or PARSIMAX method in the ROTATE= option. Rotation stops when the scaled change of the simplicity function value is less than the RCONVERGE= value. The default convergence criterion is
where f new and f old are simplicity function values of the current cycle and the previous cycle, respectively, K = max (1 , f old ) is a scaling factor, and µ is 1E-9 by default and is modified by the RCONVERGE= value.
RITER= n
specifies the maximum number of cycles n for factor rotation using either the QUARTIMAX, VARIMAX, EQUAMAX, or PARSIMAX method in the ROTATE= option. The default n is the maximum between 100 and 10 times of the number of variables.
ROTATER= name
specifies an orthogonal rotation. By default, ROTATE=NONE. The possible values for name are as follows:
PRINCIPAL PC | specifies a principal axis rotation. If ROTATE=PRINCIPAL is used with a factor rather than a component model, the following rotation is performed: |
| |
where the columns of matrix T contain the eigenvectors of . | |
QUARTIMAX Q | specifies quartimax rotation. |
VARIMAX V | specifies varimax rotation. |
EQUAMAX E | specifies equamax rotation. |
PARSIMAX P | specifies parsimax rotation. |
NONE | performs no rotation (default). |
You can specify the MATRIX statement and the FACTOR statement to compute a confirmatory first-order factor or component analysis. You can define the elements of the matrices F , P , and U of the oblique model,
To specify the structure for matrix F , P , or U , you have to refer to the matrix _F_ , _P_ , or _U_ in the MATRIX statement. Matrix names automatically set by PROC CALIS always start with an underscore . As you name your own matrices or variables, you should avoid leading underscores.
The default matrix forms are as follows.
_F_ lower triangular matrix (0 upper triangle for problem identification, removing rotational invariance)
_P_ identity matrix (constant)
_U_ diagonal matrix
For details about specifying the elements in matrices, see the section MATRIX Statement on page 593. If you are using at least one MATRIX statement in connection with a FACTOR model statement, you can also use the BOUNDS or PARAMETERS statement and program statements to constrain the parameters named in the MATRIX statement. Initial estimates are computed by McDonald s (McDonald and Hartmann 1992) method. McDonald s method of computing initial values works better if you scale the factors by setting the factor variances to 1 rather than by setting the loadings of the reference variables equal to 1.
BOUNDS constraint < , constraint ...> ;
where constraint represents < number operator > parameter-list < operator number >
You can use the BOUNDS statement to define boundary constraints for any parameter that has its name specified in a MATRIX, LINEQS, STD, COV, or RAM statement or that is used in the model of an INRAM= data set. Valid operators are < =, < , > =, > , and = or, equivalently, LE, LT, GE, GT, and EQ. The following is an example of the BOUNDS statement:
bounds 0. <= a1-a9 x <= 1. , -1. <= c2-c5 , b1-b10 y >= 0. ;
You must separate boundary constraints with a comma, and you can specify more than one BOUNDS statement. The feasible region for a parameter is the intersection of all boundary constraints specified for that parameter; if a parameter has a maximum lower boundary constraint larger than its minimum upper bound, the parameter is set equal to the minimum of the upper bounds.
If you need to compute the values of the upper or lower bounds, create a TYPE=EST data set containing _TYPE_ = UPPERBD or _TYPE_ = LOWERBD observations and use it as an INEST= or INVAR= input data set in a later PROC CALIS run.
The BOUNDS statement can contain only parameter names and numerical constants. You cannot use the names of variables created in program statements.
The active set strategies made available in PROC CALIS cannot realize the strict inequality constraints < or > . For example, you cannot specify BOUNDS x > 0; to prevent infinite values for y = log ( x ). Use BOUNDS x > 1E-8; instead.
If the CALIS procedure encounters negative diagonal elements in the central model matrices during the minimization process, serious convergence problems can occur. You can use the BOUNDS statement to constrain these parameters to nonnegative values. Using negative values in these locations can lead to a smaller 2 value but uninterpretable estimates.
LINCON constraint < , constraint ...> ;
where constraint represents number operator linear-term or linear-term operator number ,
and linear-term is <+-><coefficient * > parameter <<+-><coefficient * > parameter...>
The LINCON statement specifies a set of linear equality or inequality constraints of the form
The constraints must be separated by commas. Each linear constraint i in the statement consists of a linear combination & pound ; j a ij x j of a subset of the n parameters x j ,j = 1 , ..., n, and a constant value b i separated by a comparison operator. Valid operators are < =, < , > =, > , and = or, equivalently, LE, LT, GE, GT, and EQ. PROC CALIS cannot enforce the strict inequalities < or > . Note that the coefficients a ij in the linear combination must be constant numbers and must be followed by an asterisk and the name of a parameter (for example, listed in the PARMS, STD or COV statement). The following is an example of the LINCON statement that sets a linear constraint on parameters x1 and x2:
lincon x1 + 3 * x2 <= 1;
Although you can easily express boundary constraints in LINCON statements, for many applications it is much more convenient to specify both the BOUNDS and the LINCON statements in the same PROC CALIS call.
The LINCON statement can contain only parameter names, operators, and numerical constants. If you need to compute the values of the coefficients a ij or right-hand sides b i , you can run a preliminary DATA step and create a TYPE=EST data set containing _TYPE_ = LE , _TYPE_ = GE , or _TYPE_ = EQ observations, then specify this data set as an INEST= or INVAR= data set in a following PROC CALIS run.
NLINCON NLC constraint < , constraint ...> ;
where constraint represents
number operator variable-list number operator or
variable-list operator number or
number operator variable-list
You can specify nonlinear equality and inequality constraints with the NLINCON or NLC statement. The QUANEW optimization subroutine is used when you specify nonlinear constraints using the NLINCON statement.
The syntax of the NLINCON statement is similar to that of the BOUNDS statement, except that the NLINCON statement must contain the names of variables that are defined in the program statements and are defined as continuous functions of parameters in the model. They must not be confused with the variables in the data set.
As with the BOUNDS statement, one- or two-sided constraints are allowed in the NLINCON statement; equality constraints must be one sided. Valid operators are < =, < , > =, > , and= or, equivalently, LE, LT, GE, GT, and EQ.
PROC CALIS cannot enforce the strict inequalities < or > but instead treats them as < = and > =, respectively. The listed nonlinear constraints must be separated by commas. The following is an example of the NLINCON statement that constrains the nonlinear parametric function x 1 * x 1 + u 1 , which is defined below in a program statement, to a fixed value of 1:
nlincon xx = 1; xx = x1 * x1 + u1;
Note that x1 and u1 are parameters defined in the model. The following three NLINCON statements, which require xx1 , xx2 , and xx3 to be between zero and ten, are equivalent:
nlincon 0. <= xx1-xx3, xx1-xx3 <= 10; nlincon 0. <= xx1-xx3 <= 10.; nlincon 10. >= xx1-xx3 >= 0.;
NLOPTIONS option(s) ;
Many options that are available in PROC NLP can now be specified for the optimization subroutines in PROC CALIS using the NLOPTIONS statement. The NLOPTIONS statement provides more displayed and file output on the results of the optimization process, and it permits the same set of termination criteria as in PROC NLP. These are more technical options that you may not need to specify in most cases. The available options are summarized in Table 19.2 through Table 19.4, and the options are described in detail in the following three sections.
Option | Short Description |
---|---|
Estimation Methods | |
G4= i | algorithm for computing STDERR |
Optimization Techniques | |
TECHNIQUE= name | minimization method |
UPDATE= name | update technique |
LINESEARCH= i | line-search method |
FCONV= r | relative change function convergence criterion |
GCONV= r | relative gradient convergence criterion |
INSTEP= r | initial step length (SALPHA=, RADIUS=) |
LSPRECISION= r | line-search precision |
MAXFUNC= i | maximum number of function calls |
MAXITER= i<n> | maximum number of iterations |
Miscellaneous Options | |
ASINGULAR= r | absolute singularity criterion for inversion of the information matrix |
COVSING= r | singularity tolerance of the information matrix |
MSINGULAR= r | relative M singularity criterion for inversion of the information matrix |
SINGULAR= r | singularity criterion for inversion of the Hessian |
VSINGULAR= r | relative V singularity criterion for inversion of the information matrix |
Option | Short Description |
---|---|
Options Used by All Techniques | |
ABSCONV= r | absolute function convergence criterion |
MAXFUNC= i | maximum number of function calls |
MAXITER= i<n> | maximum number of iterations |
MAXTIME= r | maximum CPU time |
MINITER= i | minimum number of iterations |
Options for Unconstrained and Linearly Constrained Techniques | |
ABSFCONV= r<n> | absolute change function convergence criterion |
ABSGCONV= r<n> | absolute gradient convergence criterion |
ABSXCONV= r<n> | absolute change parameter convergence criterion |
FCONV= r<n> | relative change function convergence criterion |
FCONV2= r<n> | function convergence criterion |
FDIGITS= r | precision in computation of the objective function |
FSIZE= r | parameter for FCONV= and GCONV= |
GCONV= r<n> | relative gradient convergence criterion |
GCONV2= r<n> | relative gradient convergence criterion |
XCONV= r<n> | relative change parameter convergence criterion |
XSIZE= r | parameter for XCONV= |
Options for Nonlinearly Constrained Techniques | |
ABSGCONV= r<n> | maximum absolute gradient of Lagrange function criterion |
FCONV2= r<n> | predicted objective function reduction criterion |
GCONV= r<n> | normalized predicted objective function reduction criterion |
Option | Short Description |
---|---|
Options for the Approximate Covariance Matrix of Parameter Estimates | |
CFACTOR= r | scalar factor for STDERR |
NOHLF | use Hessian of the objective function for STDERR |
Options for Additional Displayed Output | |
PALL | display initial and final optimization values |
PCRPJAC | display approximate Hessian matrix |
PHESSIAN | display Hessian matrix |
PHISTORY | display optimization history |
PINIT | display initial values and derivatives (PALL) |
PNLCJAC | display Jacobian matrix of nonlinear constraints (PALL) |
| display results of the optimization process |
Additional Options for Optimization Techniques | |
DAMPSTEP < =r > | controls initial line-search step size |
HESCAL= n | scaling version of Hessian or Jacobian |
LCDEACT= r | Lagrange multiplier threshold of constraint |
LCEPSILON= r | range for boundary and linear constraints |
LCSINGULAR= r | QR decomposition linear dependence criterion |
NOEIGNUM | suppress computation of matrices |
RESTART= i | restart algorithm with a steepest descent direction |
VERSION=1 2 | quasi-Newton optimization technique version |
Options Documented in the PROC CALIS Statement
The following options are the same as in the PROC CALIS statement and are documented in the section PROC CALIS Statement on page 568.
G4= i
specifies the method for computing the generalized (G2 or G4) inverse of a singular matrix needed for the approximate covariance matrix of parameter estimates. This option is valid only for applications where the approximate covariance matrix of parameter estimates is found to be singular.
TECHNIQUE TECH= name
OMETHOD OM= name
specifies the optimization technique.
UPDATE UPD= name
specifies the update method for the quasi-Newton or conjugate-gradient optimization technique.
LINESEARCH LIS= i
specifies the line-search method for the CONGRA, QUANEW, and NEWRAP optimization techniques.
FCONV FTOL= r
specifies the relative function convergence criterion. For more details, see the section Termination Criteria Options on page 615.
GCONV GTOL= r
specifies the relative gradient convergence criterion. For more details, see the section Termination Criteria Options on page 615.
INSTEP SALPHA RADIUS= r
restricts the step length of an optimization algorithm during the first iterations.
LSPRECISION LSP= r
specifies the degree of accuracy that should be obtained by the line-search algorithms LIS=2 and LIS=3.
MAXFUNC MAXFU= i
specifies the maximum number i of function calls in the optimization process. For more details, see the section Termination Criteria Options on page 615.
MAXITER MAXIT= i < n >
specifies the maximum number i of iterations in the optimization process. For more details, see the section Termination Criteria Options on page 615.
ASINGULAR ASING= r
specifies an absolute singularity criterion r , r > 0, for the inversion of the information matrix, which is needed to compute the approximate covariance matrix of parameter estimates.
COVSING= r
specifies a nonnegative threshold r , r > 0, that decides whether the eigenvalues of the information matrix are considered to be zero. This option is valid only for applications where the approximate covariance matrix of parameter estimates is found to be singular.
MSINGULAR MSING= r
specifies a relative singularity criterion r , r > 0, for the inversion of the information matrix, which is needed to compute the approximate covariance matrix of parameter estimates.
SINGULAR SING = r
specifies the singularity criterion r , 0 ‰ r ‰ 1, that is used for the inversion of the Hessian matrix. The default value is 1E “8.
VSINGULAR VSING= r
specifies a relative singularity criterion r , r > 0, for the inversion of the information matrix, which is needed to compute the approximate covariance matrix of parameter estimates.
Let x * be the point at which the objective function f ( ·) is optimized, and let x ( k ) be the parameter values attained at the k th iteration. All optimization techniques stop at the k th iteration if at least one of a set of termination criteria is satisfied. The specified termination criteria should allow termination in an area of sufficient size around x *. You can avoid termination respective to any of the following function, gradient, or parameter criteria by setting the corresponding option to zero. There is a default set of termination criteria for each optimization technique; most of these default settings make the criteria ineffective for termination. PROC CALIS may have problems due to rounding errors (especially in derivative evaluations) that prevent an optimizer from satisfying strong termination criteria.
Note that PROC CALIS also terminates if the point x ( k ) is fully constrained by linearly independent active linear or boundary constraints, and all Lagrange multiplier estimates of active inequality constraints are greater than a small negative tolerance.
The following options are available only in the NLOPTIONS statement (except for FCONV, GCONV, MAXFUNC, and MAXITER), and they affect the termination criteria.
The following five criteria are used by all optimization techniques.
ABSCONV ABSTOL= r
specifies an absolute function convergence criterion.
For minimization, termination requires
For maximization, termination requires
The default value of ABSCONV is
for minimization, the negative square root of the largest double precision value
for maximization, the positive square root of the largest double precision value
MAXFUNC MAXFU= i
requires the number of function calls to be no larger than i . The default values are listed in the following table.
TECH= | MAXFUNC default |
---|---|
LEVMAR, NEWRAP, NRRIDG, TRUREG | i =125 |
DBLDOG, QUANEW | i =500 |
CONGRA | i =1000 |
The default is used if you specify MAXFUNC=0. The optimization can be terminated only after completing a full iteration. Therefore, the number of function calls that is actually performed can exceed the number that is specified by the MAXFUNC= option.
MAXITER MAXIT= i < n >
requires the number of iterations to be no larger than i . The default values are listed in the following table.
TECH= | MAXITER default |
---|---|
LEVMAR, NEWRAP, NRRIDG, TRUREG | i =50 |
DBLDOG, QUANEW | i =200 |
CONGRA | i =400 |
The default is used if you specify MAXITER=0 or you omit the MAXITER option.
The optional second value n is valid only for TECH=QUANEW with nonlinear constraints. It specifies an upper bound n for the number of iterations of an algorithm and reduces the violation of nonlinear constraints at a starting point. The default value is n =20. For example, specifying MAXITER= . 0 means that you do not want to exceed the default number of iterations during the main optimization process and that you want to suppress the feasible point algorithm for nonlinear constraints.
MAXTIME= r
requires the CPU time to be no larger than r . The default value of the MAXTIME= option is the largest double floating point number on your computer.
MINITER MINIT= i
specifies the minimum number of iterations. The default value is i = 0.
The ABSCONV=, MAXITER=, MAXFUNC=, and MAXTIME= options are useful for dividing a time-consuming optimization problem into a series of smaller problems by using the OUTEST= and INEST= data sets.
This section contains additional termination criteria for all unconstrained, boundary, or linearly constrained optimization techniques.
ABSFCONV ABSFTOL= r < n >
specifies the absolute function convergence criterion. Termination requires a small change of the function value in successive iterations,
The default value is r = 0. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
ABSGCONV ABSGTOL= r < n >
specifies the absolute gradient convergence criterion. Termination requires the maximum absolute gradient element to be small,
The default value is r =1E ˆ’ 5. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
Note: In some applications, the small default value of the ABSGCONV= criterion is too difficult to satisfy for some of the optimization techniques.
ABSXCONV ABSXTOL= r < n >
specifies the absolute parameter convergence criterion. Termination requires a small Euclidean distance between successive parameter vectors,
The default value is r = 0. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
FCONV FTOL= r < n >
specifies the relative function convergence criterion. Termination requires a small relative change of the function value in successive iterations,
where FSIZE is defined by the FSIZE= option. The default value is r = 10 ˆ’ FDIGITS , where FDIGITS either is specified or is set by default to ˆ’ log 10 ( ˆˆ ), where ˆˆ is the machine precision. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
FCONV2 FTOL2= r < n >
specifies another function convergence criterion. For least-squares problems, termination requires a small predicted reduction
of the objective function.
The predicted reduction
is computed by approximating the objective function f by the first two terms of the Taylor series and substituting the Newton step
The FCONV2 criterion is the unscaled version of the GCONV criterion. The default value is r = 0. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
FDIGITS= r
specifies the number of accurate digits in evaluations of the objective function. Fractional values such as FDIGITS=4.7 are allowed. The default value is r = ˆ’ log 10 ˆˆ , where ˆˆ is the machine precision. The value of r is used for the specification of the default value of the FCONV= option.
FSIZE= r
specifies the FSIZE parameter of the relative function and relative gradient termination criteria. The default value is r = 0. See the FCONV= and GCONV= options.
GCONV GTOL= r < n >
specifies the relative gradient convergence criterion. For all techniques except the CONGRA technique, termination requires that the normalized predicted function reduction is small,
where FSIZE is defined by the FSIZE= option. For the CONGRA technique (where a reliable Hessian estimate G is not available),
is used. The default value is r =1E ˆ’ 8. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
Note: The default setting for the GCONV= option sometimes leads to early termination far from the location of the optimum. This is especially true for the special form of this criterion used in the CONGRA optimization.
GCONV2 GTOL2= r < n >
specifies another relative gradient convergence criterion. For least-squares problems and the TRUREG, LEVMAR, NRRIDG, and NEWRAP techniques, the criterion of Browne (1982) is used,
This criterion is not used by the other techniques. The default value is r = 0. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
XCONV XTOL= r < n >
specifies the relative parameter convergence criterion. Termination requires a small relative parameter change in subsequent iterations,
The default value is r = 0. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
XSIZE= r
specifies the XSIZE parameter of the relative function and relative gradient termination criteria. The default value is r = 0. See the XCONV= option.
The non-NMSIMP algorithms available for nonlinearly constrained optimization (currently only TECH=QUANEW) do not monotonically reduce either the value of the objective function or some kind of merit function that combines objective and constraint functions. Furthermore, the algorithm uses the watchdog technique with backtracking (Chamberlain et al., 1982). Therefore, no termination criteria are implemented that are based on the values ( x or f ) of successive iterations. In addition to the criteria used by all optimization techniques, only three more termination criteria are currently available, and they are based on the Lagrange function
and its gradient
Here, m denotes the total number of constraints, g = g ( x ) denotes the gradient of the objective function, and » denotes the m vector of Lagrange multipliers. The Kuhn-Tucker conditions require that the gradient of the Lagrange function is zero at the optimal point ( x * , » *):
The termination criteria available for nonlinearly constrained optimization follow.
ABSGCONV ABSGTOL= r < n >
specifies that termination requires the maximum absolute gradient element of the Lagrange function to be small,
The default value is r =1E ˆ’ 5. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
FCONV2 FTOL2= r < n >
specifies that termination requires the predicted objective function reduction to be small:
The default value is r =1E ˆ’ 6. This is the criterion used by the programs VMCWD and VF02AD (Powell 1982b). The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
GCONV GTOL= r < n >
specifies that termination requires the normalized predicted objective function reduction to be small:
where FSIZE is defined by the FSIZE= option. The default value is r =1E ˆ’ 8. The optional integer value n determines the number of successive iterations for which the criterion must be satisfied before the process can be terminated.
Miscellaneous Options
You can specify the following options to modify the approximate covariance matrix of parameter estimates.
CFACTOR= r
specifies the scalar factor for the covariance matrix of parameter estimates. The scalar r ‰ 0 replaces the default value c/NM . For more details, see the section Approximate Standard Errors on page 648.
NOHLF
specifies that the Hessian matrix of the objective function (rather than the Hessian matrix of the Lagrange function) is used for computing the approximate covariance matrix of parameter estimates and, therefore, the approximate standard errors.
It is theoretically not correct to use the NOHLF option. However, since most implementations use the Hessian matrix of the objective function and not the Hessian matrix of the Lagrange function for computing approximate standard errors, the NOHLF option can be used to compare the results.
You can specify the following options to obtain additional displayed output.
PALL ALL
displays information on the starting values and final values of the optimization process.
PCRPJAC PJTJ
displays the approximate Hessian matrix. If general linear or nonlinear constraints are active at the solution, the projected approximate Hessian matrix is also displayed.
PHESSIAN PHES
displays the Hessian matrix. If general linear or nonlinear constraints are active at the solution, the projected Hessian matrix is also displayed.
PHISTORY PHIS
displays the optimization history. The PHISTORY option is set automatically if the PALL or PRINT option is set.
PINIT PIN
displays the initial values and derivatives (if available). The PINIT option is set automatically if the PALL option is set.
PNLCJAC
displays the Jacobian matrix of nonlinear constraints specified by the NLINCON statement. The PNLCJAC option is set automatically if the PALL option is set.
PRINT PRI
displays the results of the optimization process, such as parameter estimates and constraints.
You can specify the following options, in addition to the options already listed, to fine-tune the optimization process. These options should not be necessary in most applications of PROC CALIS.
DAMPSTEP DS < =r >
specifies that the initial step-size value ± (0) for each line search (used by the QUANEW, CONGRA, or NEWRAP techniques) cannot be larger than r times the step-size value used in the former iteration. If the factor r is not specified, the default value is r = 2. The DAMPSTEP option can prevent the line-search algorithm from repeatedly stepping into regions where some objective functions are difficult to compute or where they can lead to floating point overflows during the computation of objective functions and their derivatives. The DAMPSTEP<= r > option can prevent time-costly function calls during line searches with very small step sizes ± of objective functions. For more information on setting the start values of each line search, see the section Restricting the Step Length on page 672.
HESCAL HS = 0 1 2 3
specifies the scaling version of the Hessian or crossproduct Jacobian matrix used in NRRIDG, TRUREG, LEVMAR, NEWRAP, or DBLDOG optimization. If HS is not equal to zero, the first iteration and each restart iteration sets the diagonal scaling matrix :
where are the diagonal elements of the Hessian or crossproduct Jacobian matrix. In every other iteration, the diagonal scaling matrix is updated depending on the HS option:
HS=0 | specifies that no scaling is done. |
HS=1 | specifies the Mor (1978) scaling update: |
| |
HS=2 | specifies the Dennis, Gay, and Welsch (1981) scaling update: |
| |
HS=3 | specifies that d i is reset in each iteration: |
|
In the preceding equations, ˆˆ is the relative machine precision. The default is HS=1 for LEVMAR minimization and HS=0 otherwise. Scaling of the Hessian or crossproduct Jacobian can be time-consuming in the case where general linear constraints are active.
LCDEACT LCD = r
specifies a threshold r for the Lagrange multiplier that decides whether an active inequality constraint remains active or can be deactivated. For maximization, r must be greater than zero; for minimization, r must be smaller than zero. The default is
where + stands for maximization, ˆ’ stands for minimization, ABSGCONV is the value of the absolute gradient criterion, and gmax ( k ) is the maximum absolute element of the (projected) gradient g ( k ) or Z ² g ( k ) .
LCEPSILON LCEPS LCE = r
specifies the range r , r ‰ 0, for active and violated boundary and linear constraints. If the point x ( k ) satisfies the condition
the constraint i is recognized as an active constraint. Otherwise, the constraint i is either an inactive inequality or a violated inequality or equality constraint. The default value is r =1E ˆ’ 8. During the optimization process, the introduction of rounding errors can force PROC NLP to increase the value of r by factors of 10. If this happens, it is indicated by a message displayed in the log.
LCSINGULAR LCSING LCS = r
specifies a criterion r , r ‰ 0, used in the update of the QR decomposition that decides whether an active constraint is linearly dependent on a set of other active constraints. The default is r =1E ˆ’ 8. The larger r becomes, the more the active constraints are recognized as being linearly dependent.
NOEIGNUM
suppresses the computation and displayed output of the determinant and the inertia of the Hessian, crossproduct Jacobian, and covariance matrices. The inertia of a symmetric matrix are the numbers of negative, positive, and zero eigenvalues. For large applications, the NOEIGNUM option can save computer time.
RESTART REST = i
specifies that the QUANEW or CONGRA algorithm is restarted with a steepest descent/ascent search direction after at most i iterations, i > 0. Default values are as follows:
CONGRA: UPDATE=PB: restart is done automatically so specification of i is not used.
CONGRA: UPDATE ‰ PB: i = min(10 n, 80), where n is the number of parameters.
QUANEW: i is the largest integer available.
VERSION VS=12
specifies the version of the quasi-Newton optimization technique with nonlinear constraints.
VS=1 | specifies the update of the µ vector as in Powell (1978a, 1978b) (update like VF02AD). |
VS=2 | specifies the update of the µ vector as in Powell (1982a, 1982b) (update like VMCWD). |
The default is VS=2.
PARAMETERS PARMS parameter(s) << = > number(s) > << , > parameter(s) << = > num ber(s) >...> ;
The PARAMETERS statement defines additional parameters that are not elements of a model matrix to use in your own program statements. You can specify more than one PARAMETERS statement with each PROC CALIS statement. The parameters can be followed by an equal sign and a number list. The values of the numbers list are assigned as initial values to the preceding parameters in the parameters list. For example, each of the following statements assigns the initial values ALPHA=.5 and BETA=-.5 for the parameters used in program statements:
parameters alfa beta=.5 -.5; parameters alfa beta (.5 -.5); parameters alfa beta .5 -.5; parameters alfa=.5 beta (-.5);
The number of parameters and the number of values does not have to match. When there are fewer values than parameter names, either the RANDOM= or START= option is used. When there are more values than parameter names, the extra values are dropped. Parameters listed in the PARAMETERS statement can be assigned initial values by program statements or by the START= or RANDOM= option in the PROC CALIS statement.
Caution: The OUTRAM= and INRAM= data sets do not contain any information about the PARAMETERS statement or additional program statements.
STRUCTEQ variable < variable ...> ;
The STRUCTEQ statement is used to list the dependent variables of the structural equations. This statement is ignored if you omit the PDETERM option. This statement is useful because the term structural equation is not defined in a unique way, and PROC CALIS has difficulty identifying the structural equations.
If LINEQS statements are used, the names of the left-hand-side (dependent) variables of those equations to be treated as structural equations should be listed in the STRUCTEQ statement.
If the RAM statement is used, variable names in the STRUCTEQ statements depend on the VARNAMES statement:
If the VARNAMES statement is used, variable names must correspond to those in the VARNAMES statement.
If the VARNAMES statement is not used, variable names must correspond to the names of manifest variables or latent (F) variables.
The STRUCTEQ statement also defines the names of variables used in the causal coefficient matrix of the structural equations, B , for computing the Stability Coefficient of Reciprocal Causation (the largest eigenvalue of the BB ² matrix). If the PROC CALIS option PDETERM is used without the STRUCTEQ statement, the structural equations are defined as described in the PDETERM option. See the PROC CALIS option PDETERM on page 585 for more details.
VARNAMES VNAMES assignment < , assignment ...> ;
where assignment represents
matrix-id variable-names or matrix-name = matrix-name
Use the VARNAMES statement in connection with the RAM, COSAN, or FACTOR model statement to allocate names to latent variables including error and disturbance terms. This statement is not needed if you are using the LINEQS statement.
In connection with the RAM model statement, the matrix-id must be specified by the integer number as it is used in the RAM list input (1 for matrix A , 2 for matrix P ). Because the first variables of matrix A correspond to the manifest variables in the input data set, you can specify names only for the latent variables following the manifest variables in the rows of A . For example, in the RAM notation of the alienation example, you can specify the latent variables by names F1, F2, F3 and the error variables by names E1, ... , E6, D1, D2, D3 with the following statement:
vnames 1 F1-F3, 2 E1-E6 D1-D3;
If the RAM model statement is not accompanied by a VNAMES statement, default variable names are assigned using the prefixes F, E, and D with numerical suffixes: latent variables are F1, F2, ... , and error variables are E1, E2, ... .
The matrix-id must be specified by its name when used with the COSAN or FACTOR statement. The variable-names following the matrix name correspond to the columns of this matrix. The variable names corresponding to the rows of this matrix are set automatically by
the names of the manifest variables for the first matrix in each term
the column variable names of the same matrix for the central symmetric matrix in each term
the column variable names of the preceding matrix for each other matrix
You also can use the second kind of name assignment in connection with a COSAN statement. Two matrix names separated by an equal sign allocate the column names of one matrix to the column names of the other matrix. This assignment assumes that the column names of at least one of the two matrices are already allocated. For example, in the COSAN notation of the alienation example, you can specify the variable names by using the following statements to allocate names to the columns of J , A , and P :
vnames J V1-V6 F1-F3 , A =J , P E1-E6 D1-D3 ;
BY variables ;
You can specify a BY statement with PROC CALIS to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the CALIS procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.
For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .
VAR variables ;
The VAR statement lists the numeric variables to be analyzed. If the VAR statement is omitted, all numeric variables not mentioned in other statements are analyzed. You can use the VAR statement to ensure that the manifest variables appear in correct order for use in the RAM statement. Only one VAR statement can be used with each PROC CALIS statement. If you do not use all manifest variables when you specify the model with a RAM or LINEQS statement, PROC CALIS does automatic variable selection. For more information, see the section Automatic Variable Selection on page 662.
PARTIAL variables ;
If you want the analysis to be based on a partial correlation or covariance matrix, use the PARTIAL statement to list the variables used to partial out the variables in the analysis. You can specify only one PARTIAL statement with each PROC CALIS statement.
FREQ variable ;
If one variable in your data set represents the frequency of occurrence for the other values in the observation, specify the variable s name in a FREQ statement. PROC CALIS then treats the data set as if each observation appears n i times, where n i is the value of the FREQ variable for observation i . Only the integer portion of the value is used. If the value of the FREQ variable is less than 1 or is missing, that observation is not included in the analysis. The total number of observations is considered to be the sum of the FREQ values when the procedure computes significance probabilities. You can use only one FREQ statement with each PROC CALIS statement.
WEIGHT variable ;
To compute weighted covariances or correlations, specify the name of the weighting variable in a WEIGHT statement. This is often done when the error variance associated with each observation is different and the values of the weight variable are proportional to the reciprocals of the variances. You can use only one WEIGHT statement with each PROC CALIS statement. The WEIGHT and FREQ statements have a similar effect, except the WEIGHT statement does not alter the number of observations unless VARDEF=WGT or VARDEF=WDF. An observation is used in the analysis only if the WEIGHT variable is greater than 0 and is not missing.
This section lists the program statements used to express the linear and nonlinear constraints on the parameters and documents the differences between program statements in PROC CALIS and program statements in the DATA step. The very different use of the ARRAY statement by PROC CALIS is also discussed. Most of the program statements that can be used in the SAS DATA step also can be used in PROC CALIS. Refer to SAS Language Reference: Dictionary for a description of the SAS program statements. You can specify the following SAS program statements to compute parameter constraints with the CALIS procedure:
ABORT ;
CALL name < ( expression < , expression ...>)> ;
DELETE;
DO < variable = expression < TO expression> < BY expression>
< , expression < TO expression> < BY expression> ...>>
< WHILE expression>
< UNTIL expression> ;
END;
GOTO statement-label ;
IF expression ;
IF expression THEN program-statement ;
ELSE program-statement ;
variable = expression ;
variable+expression ;
LINK statement-label ;
PUT <variable> <=> < ...> ;
RETURN ;
SELECT < ( expression ) > ;
STOP;
SUBSTR ( variable, index, length ) = expression ;
WHEN (expression) program-statement ;
OTHERWISE program-statement ;
For the most part, the SAS program statements work the same as they do in the SAS DATA step as documented in SAS Language Reference: Concepts . However, there are several differences that should be noted.
The ABORT statement does not allow any arguments.
The DO statement does not allow a character index variable. Thus,
do I=1,2,3;
is supported; however,
do I='A','B','C';
is not valid in PROC CALIS, although it is supported in the DATA step.
The PUT statement, used mostly for program debugging in PROC CALIS, supports only some of the features of the DATA step PUT statement, and it has some new features that the DATA step PUT statement does not have:
The CALIS procedure PUT statement does not support line pointers, factored lists, iteration factors, overprinting, _INFILE_, the colon (:) format modifier, or $.
The CALIS procedure PUT statement does support expressions enclosed in parentheses. For example, the following statement displays the square root of x:
put (sqrt(x));
The CALIS procedure PUT statement supports the print item _PDV_ to display a formatted listing of all variables in the program. For example, the following statement displays a much more readable listing of the variables than the _ALL_ print item:
put _pdv_ ;
The WHEN and OTHERWISE statements allow more than one target statement. That is, DO/END groups are not necessary for multiple WHEN statements. For example, the following syntax is valid:
select; when ( expression1 ) statement1; statement2; when ( expression2 ) statement3; statement4; end;
You can specify one or more PARMS statements to define parameters used in the program statements that are not defined in the model matrices (MATRIX, RAM, LINEQS, STD, or COV statement).
Parameters that are used only on the right-hand side of your program statements are called independent, and parameters that are used at least once on the left-hand side of an equation in the program code are called dependent parameters. The dependent parameters are used only indirectly in the minimization process. They should be fully defined as functions of the independent parameters. The independent parameters are included in the set X of parameters used in the minimization. Be sure that all independent parameters used in your program statements are somehow connected to elements of the model matrices. Otherwise the minimization function does not depend on those independent parameters, and the parameters vary without control (since the corresponding derivative is the constant 0). You also can specify the PARMS statement to set the initial values of all independent parameters used in the program statements that are not defined as elements of model matrices.
ARRAY arrayname <(dimensions)>< $ ><variables and constants> ;
The ARRAY statement is similar to, but not the same as, the ARRAY statement in the DATA step. The ARRAY statement is used to associate a name with a list of variables and constants. The array name can then be used with subscripts in the program to refer to the items in the list.
The ARRAY statement supported by PROC CALIS does not support all the features of the DATA step ARRAY statement. With PROC CALIS, the ARRAY statement cannot be used to give initial values to array elements. Implicit indexing variables cannot be used; all array references must have explicit subscript expressions. Only exact array dimensions are allowed; lower-bound specifications are not supported. A maximum of six dimensions is allowed.
On the other hand, the ARRAY statement supported by PROC CALIS does allow both variables and constants to be used as array elements. Constant array elements cannot be changed. Both the dimension specification and the list of elements are optional, but at least one must be given. When the list of elements is not given or fewer elements than the size of the array are listed, array variables are created by suffixing element numbers to the array name to complete the element list.