Details

Input Data Sets

You can use four different kinds of input data sets in the CALIS procedure, and you can use them simultaneously . The DATA= data set contains the data to be analyzed , and it can be an ordinary SAS data set containing raw data or a special TYPE=COV, TYPE=UCOV, TYPE=CORR, TYPE=UCORR, TYPE=SYMATRIX, TYPE=SSCP, or TYPE=FACTOR data set containing previously computed statistics. The INEST= data set specifies an input data set that contains initial estimates for the parameters used in the optimization process, and it can also contain boundary and general linear constraints on the parameters. If the model does not change too much, you can use an OUTEST= data set from a previous PROC CALIS analysis; the initial estimates are taken from the values of the PARMS observation. The INRAM= data set names a third input data set that contains all information needed to specify the analysis model in RAM list form (except for user -written program statements). Often the INRAM= data set can be the OUTRAM= data set from a previous PROC CALIS analysis. See the section OUTRAM= SAS-data-set on page 638 for the structure of both OUTRAM= and INRAM= data sets. Using the INWGT= data set enables you to read in the weight matrix W that can be used in generalized least-squares, weighted least-squares, or diagonally weighted least-squares estimation.

DATA= SAS-data-set

A TYPE=COV, TYPE=UCOV, TYPE=CORR, or TYPE=UCORR data set can be created by the CORR procedure or various other procedures. It contains means, standard deviations, the sample size , the covariance or correlation matrix, and possibly other statistics depending on which procedure is used.

If your data set has many observations and you plan to run PROC CALIS several times, you can save computer time by first creating a TYPE=COV, TYPE=UCOV, TYPE=CORR, or TYPE=UCORR data set and using it as input to PROC CALIS. For example, assuming that PROC CALIS is first run with an OUTRAM=MOD option, you can run

  * create TYPE=COV data set;   proc corr cov nocorr data=raw outp=cov(type=cov);   run;   * analysis using correlations;   proc calis data=cov inram=mod;   run;   * analysis using covariances;   proc calis cov data=cov inram=mod;   run;

Most procedures automatically set the TYPE= option of an output data set appropriately. However, the CORR procedure sets TYPE=CORR unless an explicit TYPE= option is used. Thus, (TYPE=COV) is needed in the preceding PROC CORR request, since the output data set is a covariance matrix. If you use a DATA step with a SET statement to modify this data set, you must declare the TYPE=COV, TYPE=UCOV, TYPE=CORR, or TYPE=UCORR attribute in the new data set.

You can use a VAR statement with PROC CALIS when reading a TYPE=COV, TYPE=UCOV, TYPE=CORR, TYPE=UCORR, or TYPE=SSCP data set to select a subset of the variables or change the order of the variables .

Caution: Problems can arise from using the CORR procedure when there are missing data. By default, PROC CORR computes each covariance or correlation from all observations that have values present for the pair of variables involved ( pairwise deletion ). The resulting covariance or correlation matrix can have negative eigenvalues. A correlation or covariance matrix with negative eigenvalues is recognized as a singular matrix in PROC CALIS, and you cannot compute (default) generalized least-squares or maximum likelihood estimates. You can specify the RIDGE option to ridge the diagonal of such a matrix to obtain a positive definite data matrix. If the NOMISS option is used with the CORR procedure, observations with any missing values are completely omitted from the calculations ( listwise deletion ), and there is no possibility of negative eigenvalues (but still a chance for a singular matrix).

PROC CALIS can also create a TYPE=COV, TYPE=UCOV, TYPE=CORR, or TYPE=UCORR data set that includes all the information needed for repeated analyses. If the data set DATA=RAW does not contain missing values, the following statements should give the same PROC CALIS results as the previous example.

  * using correlations;   proc calis data=raw outstat=cov inram=mod;   run;   * using covariances;   proc calis cov data=cov inram=mod;   run;

You can create a TYPE=COV, TYPE=UCOV, TYPE=CORR, TYPE=UCORR, or TYPE=SSCP data set in a DATA step. Be sure to specify the TYPE= option in parentheses after the data set name in the DATA statement, and include the _TYPE_ and _NAME_ variables. If you want to analyze the covariance matrix but your DATA= data set is a TYPE=CORR or TYPE=UCORR data set, you should include an observation with _TYPE_ =STD giving the standard deviation of each variable. If you specify the COV option, PROC CALIS analyzes the recomputed covariance matrix:

  data correl(type=corr);   input _type_ $ _name_ $ X1-X3;   datalines;   std   .   4.  2.  8.   corr  X1  1.0  .   .   corr  X2   .7 1.0  .   corr  X3   .5  .4 1.0   ;   proc calis cov inram=model;   run;

If you want to analyze the UCOV or UCORR matrix but your DATA= data set is a TYPE=COV or TYPE=CORR data set, you should include observations with _TYPE_ =STD and _TYPE_ =MEAN giving the standard deviation and mean of each variable.

INEST= SAS-data-set

You can use the INEST= (or INVAR= or ESTDATA=) input data set to specify the initial values of the parameters used in the optimization and to specify boundary constraints and the more general linear constraints that can be imposed on these parameters.

The variables of the INEST= data set must correspond to

a character variable _TYPE_ that indicates the type of the observation
n numeric variables with the parameter names used in the specified PROC CALIS model
the BY variables that are used in a DATA= input data set
a numeric variable _RHS_ (right-hand side) (needed only if linear constraints are used)
additional variables with names corresponding to constants used in the program statements

The content of the _TYPE_ variable defines the meaning of the observation of the INEST= data set. PROC CALIS recognizes observations with the following _TYPE_ specifications.

PARMS	specifies initial values for parameters that are defined in the model statements of PROC CALIS. The _RHS_ variable is not used. Additional variables can contain the values of constants that are referred to in program statements. At the beginning of each run of PROC CALIS, the values of the constants are read from the PARMS observation initializing the constants in the program statements.
UPPERBD UB	specifies upper bounds with nonmissing values. The use of a missing value indicates that no upper bound is specified for the parameter. The _RHS_ variable is not used.
LOWERBD LB	specifies lower bounds with nonmissing values. The use of a missing value indicates that no lower bound is specified for the parameter. The _RHS_ variable is not used.
LE < = <	specifies the linear constraint ˆ‘ _j a _ij x _j ‰ b _i . Then n parameter values contain the coefficients a _ij , and the _RHS_ variable contains the right-hand-side b _i . The use of a missing value indicates a zero coefficient a _ij .
GE > = >	specifies the linear constraint ˆ‘ _j a _ij x _j ‰ b _i . Then parameter values contain the coefficients a _ij , and the _RHS_ variable contains the right-hand-side b _i . The use of a missing value indicates a zero coefficient a _ij .
EQ =	specifies the linear constraint ˆ‘ _j a _ij x _j = b _i . Then parameter values contain the coefficients a _ij , and the _RHS_ variable contains the right-hand-side b _i . The use of a missing value indicates a zero coefficient a _ij .

The constraints specified in the INEST=, INVAR=, or ESTDATA= data set are added to the constraints specified in BOUNDS and LINCON statements.

You can use an OUTEST= data set from a PROC CALIS run as an INEST= data set in a new run. However, be aware that the OUTEST= data set also contains the boundary and general linear constraints specified in the previous run of PROC CALIS. When you are using this OUTEST= data set without changes as an INEST= data set, PROC CALIS adds the constraints from the data set to the constraints specified by a BOUNDS and LINCON statement. Although PROC CALIS automatically eliminates multiple identical constraints, you should avoid specifying the same constraint a second time.

INRAM= SAS-data-set

This data set is usually created in a previous run of PROC CALIS. It is useful if you want to reanalyze a problem in a different way such as using a different estimation method. You can alter an existing OUTRAM= data set, either in the DATA step or using the FSEDIT procedure, to create the INRAM= data set describing a modified model. For more details on the INRAM= data set, see the section OUTRAM= SAS-data-set on page 638.

In the case of a RAM or LINEQS analysis of linear structural equations, the OUTRAM= data set always contains the variable names of the model specified. These variable names and the model specified in the INRAM= data set are the basis of the automatic variable selection algorithm performed after reading the INRAM= data set.

INWGT= SAS-data-set

This data set enables you to specify a weight matrix other than the default matrix for the generalized, weighted, and diagonally weighted least-squares estimation methods . The specification of any INWGT= data set for unweighted least-squares or maximum likelihood estimation is ignored. For generalized and diagonally weighted least-squares estimation, the INWGT= data set must contain a _TYPE_ and a _NAME_ variable as well as the manifest variables used in the analysis. The value of the _NAME_ variable indicates the row index i of the weight w _ij . For weighted least squares, the INWGT= data set must contain _TYPE_ , _NAME_ , _NAM2_ , and _NAM3_ variables as well as the manifest variables used in the analysis. The values of the _NAME_ , _NAM2_ , and _NAM3_ variables indicate the three indices i,j,k of the weight w _ij,kl . You can store information other than the weight matrix in the INWGT= data set, but only observations with _TYPE_ =WEIGHT are used to specify the weight matrix W . This property enables you to store more than one weight matrix in the INWGT= data set. You can then run PROC CALIS with each of the weight matrices by changing only the _TYPE_ observation in the INWGT= data set with an intermediate DATA step.

For more details on the INWGT= data set, see the section OUTWGT= SAS-dataset on page 643.

Output Data Sets

OUTEST= SAS-data-set

The OUTEST= (or OUTVAR=) data set is of TYPE=EST and contains the final parameter estimates, the gradient, the Hessian, and boundary and linear constraints. For METHOD=ML, METHOD=GLS, and METHOD=WLS, the OUTEST= data set also contains the approximate standard errors, the information matrix (crossproduct Jacobian), and the approximate covariance matrix of the parameter estimates ((generalized) inverse of the information matrix). If there are linear or nonlinear equality or active inequality constraints at the solution, the OUTEST= data set also contains Lagrange multipliers, the projected Hessian matrix, and the Hessian matrix of the Lagrange function.

The OUTEST= data set can be used to save the results of an optimization by PROC CALIS for another analysis with either PROC CALIS or another SAS procedure. Saving results to an OUTEST= data set is advised for expensive applications that cannot be repeated without considerable effort.

The OUTEST= data set contains the BY variables, two character variables _TYPE_ and _NAME_ , t numeric variables corresponding to the parameters used in the model, a numeric variable _RHS_ (right-hand side) that is used for the right-hand-side value b _i of a linear constraint or for the value f = f ( x ) of the objective function at the final point x * of the parameter space, and a numeric variable _ITER_ that is set to zero for initial values, set to the iteration number for the OUTITER output, and set to missing for the result output.

The _TYPE_ observations in Table 19.5 are available in the OUTEST= data set, depending on the request.

Table 19.5: _TYPE_ Observations in the OUTEST= data set
_TYPE_	Description
ACTBC	If there are active boundary constraints at the solution x *, three observations indicate which of the parameters are actively constrained, as follows .
	_NAME_	Description
	GE	indicates the active lower bounds
	LE	indicates the active upper bounds
	EQ	indicates the active masks
COV	contains the approximate covariance matrix of the parameter estimates; used in computing the approximate standard errors.
COVRANK	contains the rank of the covariance matrix of the parameter estimates.
CRPJ_LF	contains the Hessian matrix of the Lagrange function (based on CRPJAC).
CRPJAC	contains the approximate Hessian matrix used in the optimization process. This is the inverse of the information matrix.
EQ	If linear constraints are used, this observation contains the i th linear constraint ˆ‘ _j a _ij x _j = b _i . The parameter variables contain the coefficients a _ij , j = 1 ,...,n , the _RHS_ variable contains b _i , and _NAME_ =ACTLC or _NAME_ =LDACTLC.
GE	If linear constraints are used, this observation contains the i th linear constraint ˆ‘ _j a _ij x _j ‰ b _i . The parameter variables contain the coefficients a _ij , j =1 ,...,n , and the _RHS_ variable contains b _i . If the constraint i is active at the solution x *, then _NAME_ =ACTLC or _NAME_ =LDACTLC.
GRAD	contains the gradient of the estimates.
GRAD_LF	contains the gradient of the Lagrange function. The _RHS_ variable contains the value of the Lagrange function.
HESSIAN	contains the Hessian matrix.
HESS_LF	contains the Hessian matrix of the Lagrange function (based on HESSIAN).
INFORMAT	contains the information matrix of the parameter estimates (only for METHOD=ML, METHOD=GLS, or METHOD=WLS).
INITIAL	contains the starting values of the parameter estimates.
JACNLC	contains the Jacobian of the nonlinear constraints evaluated at the final estimates.
JACOBIAN	contains the Jacobian matrix (only if the OUTJAC option is used).
LAGM BC	contains Lagrange multipliers for masks and active boundary constraints.
	_NAME_	Description
	GE	indicates the active lower bounds
	LE	indicates the active upper bounds
	EQ	indicates the active masks
LAGM LC	contains Lagrange multipliers for linear equality and active inequality constraints in pairs of observations containing the constraint number and the value of the Lagrange multiplier .
	_NAME_	Description
	LEC_NUM	number of the linear equality constraint
	LEC_VAL	corresponding Lagrange multiplier value
	LIC_NUM	number of the linear inequality constraint
	LIC_VAL	corresponding Lagrange multiplier value
LAGM NLC	contains Lagrange multipliers for nonlinear equality and active inequality constraints in pairs of observations containing the constraint number and the value of the Lagrange multiplier.
	_NAME_	Description
	NLEC_NUM	number of the nonlinear equality constraint
	NLEC_VAL	corresponding Lagrange multiplier value
	NLIC_NUM	number of the linear inequality constraint
	NLIC_VAL	corresponding Lagrange multiplier value
LE	If linear constraints are used, this observation contains the i th linear constraint ˆ‘ _j a _ij x _j ‰ b _i . The parameter variables contain the coefficients a _ij , j =1 ,...,n , and the _RHS_ variable contains b _i . If the constraint i is active at the solution x *, then _NAME_ =ACTLC or _NAME_ =LDACTLC.
LOWERBD LB	If boundary constraints are used, this observation contains the lower bounds. Those parameters not subjected to lower bounds contain missing values. The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.
NACTBC	All parameter variables contain the number n _abc of active boundary constraints at the solution x *.The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.
NACTLC	All parameter variables contain the number n _alc of active linear constraints at the solution x * that are recognized as linearly independent. The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.
NLC_EQ NLC_GE NLC_LE	contains values and residuals of nonlinear constraints. The _NAME_ variable is described as follows.
	_NAME_	Description
	NLC	inactive nonlinear constraint
	NLCACT	linear independent active nonlinear constr.
	NLCACTLD	linear dependent active nonlinear constr.
	NLDACTBC	contains the number of active boundary constraints at the solution x * that are recognized as linearly dependent. The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.
NLDACTLC	contains the number of active linear constraints at the solution x * that are recognized as linearly dependent. The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.
_NOBS_	contains the number of observations. PARMS contains the final parameter estimates. The _RHS_ variable contains the value of the objective function.
PCRPJ_LF	contains the projected Hessian matrix of the Lagrange function (based on CRPJAC).
PHESS_LF	contains the projected Hessian matrix of the Lagrange function (based on HESSIAN).
PROJCRPJ	contains the projected Hessian matrix (based on CRPJAC).
PROJGRAD	If linear constraints are used in the estimation, this observation contains the n ˆ’ n _act values of the projected gradient g _Z = Z ² g in the variables corresponding to the first n ˆ’ n _act parameters. The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.
PROJHESS	contains the projected Hessian matrix (based on HESSIAN).
SIGSQ	contains the scalar factor of the covariance matrix of the parameter estimates.
STDERR	contains approximate standard errors (only for METHOD=ML, METHOD=GLS, or METHOD=WLS).
TERMINAT	The _NAME_ variable contains the name of the termination criterion.
UPPERBD UB	If boundary constraints are used, this observation contains the upper bounds. Those parameters not subjected to upper bounds contain missing values. The _RHS_ variable contains a missing value, and the _NAME_ variable is blank.

If the technique specified by the TECH= option cannot be performed (for example, no feasible initial values can be computed, or the function value or derivatives cannot be evaluated at the starting point), the OUTEST= data set may contain only some of the observations (usually only the PARMS and GRAD observations).

OUTRAM= SAS-data-set

The OUTRAM= data set is of TYPE=RAM and contains the model specification and the computed parameter estimates. This data set is intended to be reused as an INRAM= data set to specify good initial values in a subsequent analysis by PROC CALIS.

The OUTRAM= data set contains the following variables:

the BY variables, if any
the character variable _TYPE_ , which takes the values MODEL, ESTIM, VARNAME, METHOD, and STAT
six additional variables whose meaning depends on the _TYPE_ of the observation

Each observation with _TYPE_ =MODEL defines one matrix in the generalized COSAN model. The additional variables are as follows.

Table 19.6: Additional Variables when _TYPE_=MODEL
Variable	Contents
_NAME_	name of the matrix (character)
_MATNR_	number for the term and matrix in the model (numeric)
_ROW_	matrix row number (numeric)
_COL_	matrix column number (numeric)
_ESTIM_	first matrix type (numeric)
_STDERR_	second matrix type (numeric)

If the generalized COSAN model has only one matrix term, the _MATNR_ variable contains only the number of the matrix in the term. If there is more than one term, then it is the term number multiplied by 10,000 plus the matrix number (assuming that there are no more than 9,999 matrices specified in the COSAN model statement).

Each observation with _TYPE_ =ESTIM defines one element of a matrix in the generalized COSAN model. The variables are used as follows.

Table 19.7: Additional Variables when _TYPE_=ESTIM
Variable	Contents
_NAME_	name of the parameter (character)
_MATNR_	term and matrix location of parameter (numeric)
_ROW_	row location of parameter (numeric)
_COL_	column location of parameter (numeric)
_ESTIM_	parameter estimate or constant value (numeric)
_STDERR_	standard error of estimate (numeric)

For constants rather than estimates, the _STDERR_ variable is 0. The _STDERR_ variable is missing for ULS and DWLS estimates if NOSTDERR is specified or if the approximate standard errors are not computed.

Each observation with _TYPE_ =VARNAME defines a column variable name of a matrix in the generalized COSAN model.

The observations with _TYPE_ =METHOD and _TYPE_ =STAT are not used to build the model. The _TYPE_ =METHOD observation contains the name of the estimation method used to compute the parameter estimates in the _NAME_ variable. If METHOD=NONE is not specified, the _ESTIM_ variable of the _TYPE_ =STAT observations contains the information summarized in Table 19.8 (described in the section Assessment of Fit on page 649).

Table 19.8: _ESTIM_ Contents for _TYPE_=STAT
_NAME_	_ESTIM_
N	sample size
NPARM	number of parameters used in the model
DF	degrees of freedom
N_ACT	number of active boundary constraints for ML, GLS, and WLS estimation
FIT	fit function
GFI	goodness-of-fit index (GFI)
AGFI	adjusted GFI for degrees of freedom
RMR	root mean square residual
PGFI	parsimonious GFI of Mulaik et al. (1989)
CHISQUAR	overall ²
P_CHISQ	probability > ²
CHISQNUL	null (baseline) model ²
RMSEAEST	Steiger & Lind s (1980) RMSEA index estimate
RMSEALOB	lower range of RMSEA confidence interval
RMSEAUPB	upper range of RMSEA confidence interval
P_CLOSFT	Browne & Cudeck s (1993) probability of close fit
ECVI_EST	Browne & Cudeck s (1993) ECV index estimate
ECVI_LOB	lower range of ECVI confidence interval
ECVI_UPB	upper range of ECVI confidence interval
COMPFITI	Bentler s (1989) comparative fit index
ADJCHISQ	adjusted ² for elliptic distribution
P_ACHISQ	probability corresponding adjusted ²
RLSCHISQ	reweighted least-squares ² (only ML estimation)
AIC	Akaike s information criterion
CAIC	Bozdogan s consistent information criterion
SBC	Schwarz s Bayesian criterion
CENTRALI	McDonald s centrality criterion
PARSIMON	Parsimonious index of James, Mulaik, and Brett
ZTESTWH	z test of Wilson and Hilferty
BB_NONOR	Bentler-Bonett (1980) nonnormed index
BB_NORMD	Bentler-Bonett (1980) normed index ”
BOL_RHO1	Bollen s (1986) normed index ₁
BOL_DEL2	Bollen s (1989a) nonnormed index ” ₂
CNHOELT	Hoelter s critical N index

You can edit the OUTRAM= data set to use its contents for initial estimates in a subsequent analysis by PROC CALIS, perhaps with a slightly changed model. But you should be especially careful for _TYPE_ =MODEL when changing matrix types. The codes for the two matrix types are listed in Table 19.9.

Table 19.9: Matrix Type Codes
Code	First Matrix Type	Description
1:	IDE	identity matrix
2:	ZID	zero:identity matrix
3:	DIA	diagonal matrix
4:	ZDI	zero:diagonal matrix
5:	LOW	lower triangular matrix
6:	UPP	upper triangular matrix
7:		temporarily not used
8:	SYM	symmetric matrix
9:	GEN	general-type matrix
10:	BET	identity minus general-type matrix
11:	PER	selection matrix
12:		first matrix ( J ) in LINEQS model statement
13:		second matrix ( ² ) in LINEQS model statement
14:		third matrix ( ³ ) in LINEQS model statement
Code	Second Matrix Type	Description
0:		noninverse model matrix
1:	INV	inverse model matrix
2:	IMI	identity minus inverse model matrix

OUTSTAT= SAS-data-set

The OUTSTAT= data set is similar to the TYPE=COV, TYPE=UCOV, TYPE=CORR, or TYPE=UCORR data set produced by the CORR procedure. The OUTSTAT= data set contains the following variables:

the BY variables, if any
two character variables, _TYPE_ and _NAME_
the variables analyzed, that is, those in the VAR statement, or if there is no VAR statement, all numeric variables not listed in any other statement but used in the analysis. ( Caution : Using the LINEQS or RAM model statements selects variables automatically.)

The OUTSTAT= data set contains the following information (when available):

the mean and standard deviation
the skewness and kurtosis (if the DATA= data set is a raw data set and the KURTOSIS option is specified)
the number of observations
if the WEIGHT statement is used, sum of the weights
the correlation or covariance matrix to be analyzed
the predicted correlation or covariance matrix
the standardized or normalized residual correlation or covariance matrix
if the model contains latent variables, the predicted covariances between latent and manifest variables, and the latent variable (or factor) score regression coefficients (see the PLATCOV display option on page 586)

In addition, if the FACTOR model statement is used, the OUTSTAT= data set contains:

the unrotated factor loadings, the unique variances, and the matrix of factor correlations
the rotated factor loadings and the transformation matrix of the rotation
the matrix of standardized factor loadings

Each observation in the OUTSTAT= data set contains some type of statistic as indicated by the _TYPE_ variable. The values of the _TYPE_ variable are given in Table 19.10.

Table 19.10: _TYPE_ Observations in the OUTSTAT= data set
_TYPE_	Contents
MEAN	means
STD	standard deviations
USTD	uncorrected standard deviations
SKEWNESS	univariate skewness
KURTOSIS	univariate kurtosis
N	sample size
SUMWGT	sum of weights (if WEIGHT statement is used)
COV	covariances analyzed
CORR	correlations analyzed
UCOV	uncorrected covariances analyzed
UCORR	uncorrected correlations analyzed
ULSPRED	ULS predicted model values
GLSPRED	GLS predicted model values
MAXPRED	ML predicted model values
WLSPRED	WLS predicted model values
DWLSPRED	DWLS predicted model values
ULSNRES	ULS normalized residuals
GLSNRES	GLS normalized residuals
MAXNRES	ML normalized residuals
WLSNRES	WLS normalized residuals
DWLSNRES	DWLS normalized residuals
ULSSRES	ULS variance standardized residuals
GLSSRES	GLS variance standardized residuals
MAXSRES	ML variance standardized residuals
WLSSRES	WLS variance standardized residuals
DWLSSRES	DWLS variance standardized residuals
ULSASRES	ULS asymptotically standardized residuals
GLSASRES	GLS asymptotically standardized residuals
MAXASRES	ML asymptotically standardized residuals
WLSASRES	WLS asymptotically standardized residuals
DWLSASRS	DWLS asymptotically standardized residuals
UNROTATE	unrotated factor loadings
FCORR	matrix of factor correlations
UNIQUE_V	unique variances
TRANSFOR	transformation matrix of rotation
LOADINGS	rotated factor loadings
STD_LOAD	standardized factor loadings
LSSCORE	latent variable (or factor) score regression coefficients for ULS method
SCORE	latent variable (or factor) score regression coefficients other than ULS method

The _NAME_ variable contains the name of the manifest variable corresponding to each row for the covariance, correlation, predicted, and residual matrices and con-tains the name of the latent variable in case of factor regression scores. For other observations, _NAME_ is blank.

The unique variances and rotated loadings can be used as starting values in more difficult and constrained analyses.

If the model contains latent variables, the OUTSTAT= data set also contains the latent variable score regression coefficients and the predicted covariances between latent and manifest variables. You can use the latent variable score regression coefficients with PROC SCORE to compute factor scores.

If the analyzed matrix is a (corrected or uncorrected) covariance rather than a correlation matrix, the _TYPE_ =STD or _TYPE_ =USTD observation is not included in the OUTSTAT= data set. In this case, the standard deviations can be obtained from the diagonal elements of the covariance matrix. Dropping the _TYPE_ =STD or _TYPE_ =USTD observation prevents PROC SCORE from standardizing the observations before computing the factor scores.

OUTWGT= SAS-data-set

You can create an OUTWGT= data set that is of TYPE=WEIGHT and contains the weight matrix used in generalized, weighted, or diagonally weighted least-squares estimation. The inverse of the weight matrix is used in the corresponding fit function. The OUTWGT= data set contains the weight matrix on which the WRIDGE= and the WPENALTY= options are applied. For unweighted least-squares or maximum likelihood estimation, no OUTWGT= data set can be written. The last weight matrix used in maximum likelihood estimation is the predicted model matrix (observations with _TYPE_ =MAXPRED) that is included in the OUTSTAT= data set.

For generalized and diagonally weighted least-squares estimation, the weight matrices W of the OUTWGT= data set contain all elements w _ij , where the indices i and j correspond to all manifest variables used in the analysis. Let varnam _i be the name of the i th variable in the analysis. In this case, the OUTWGT= data set contains n observations with variables as displayed in the following table.

Table 19.11: Contents of OUTWGT= data set for GLS and DWLS Estimation
Variable	Contents
_TYPE_	WEIGHT (character)
_NAME_	name of variable varnam _i (character)
varnam ₁	weight w _i ₁ for variable varnam ₁ (numeric)

varnam _n	weight w _in for variable varnam _n (numeric)

For weighted least-squares estimation, the weight matrix W of the OUTWGT= data set contains only the nonredundant elements w _ij,kl . In this case, the OUTWGT= data set contains n ( n + 1)(2 n +1) / 6 observations with variables as follows.

Table 19.12: Contents of OUTWGT= data set for WLS Estimation
Variable	Contents
_TYPE_	WEIGHT (character)
_NAME_	name of variable varnam _i (character)
_NAM2_	name of variable varnam _j (character)
_NAM3_	name of variable varnam _k (character)
varnam ₁	weight w _ij,k ₁ for variable varnam ₁ (numeric)

varnam _n	weight w _ij,kn for variable varnam _n (numeric)

Symmetric redundant elements are set to missing values.

Missing Values

If the DATA= data set contains raw data (rather than a covariance or correlation matrix), observations with missing values for any variables in the analysis are omitted from the computations . If a covariance or correlation matrix is read, missing values are allowed as long as every pair of variables has at least one nonmissing value.

Estimation Criteria

The following five estimation methods are available in PROC CALIS:

unweighted least squares (ULS)
generalized least squares (GLS)
normal-theory maximum likelihood (ML)
weighted least squares (WLS, ADF)
diagonally weighted least squares (DWLS)

An INWGT= data set can be used to specify other than the default weight matrices W for GLS, WLS, and DWLS estimation.

In each case, the parameter vector is estimated iteratively by a nonlinear optimization algorithm that optimizes a goodness-of-fit function F . When n denotes the number of manifest variables, S denotes the given sample covariance or correlation matrix for a sample with size N , and C denotes the predicted moment matrix, then the fit function for unweighted least-squares estimation is

For normal-theory generalized least-squares estimation, the function is

For normal-theory maximum likelihood estimation, the function is

The first three functions can be expressed by the generalized weighted least-squares criterion (Browne 1982):

For unweighted least squares, the weight matrix W is chosen as the identity matrix I ; for generalized least squares, the default weight matrix W is the sample covariance matrix S ; and for normal-theory maximum likelihood, W is the iteratively updated predicted moment matrix C . The values of the normal-theory maximum likelihood function F _ML and the generally weighted least-squares criterion F _GWLS with W = C are asymptotically equivalent.

The goodness-of-fit function that is minimized in weighted least-squares estimation is

where Vec ( s _ij ˆ’ c _ij ) denotes the vector of the n ( n +1)/2 elements of the lower triangle of the symmetric matrix S ˆ’ C , and W = ( w _ij,kl ) is a positive definite symmetric matrix with n ( n +1)/2 rows and columns .

If the moment matrix S is considered as a covariance rather than a correlation matrix, the default setting of W = ( w _ij,kl ) is the consistent but biased estimators of the asymptotic covariances ƒ _ij,kl of the sample covariance s _ij with the sample covariance s _kl

where

The formula of the asymptotic covariances of uncorrected covariances (using the UCOV or NOINT option) is a straightforward generalization of this expression.

The resulting weight matrix W is at least positive semidefinite (except for rounding errors). Using the ASYCOV option, you can use Browne s (1984, formula (3.8)) unbiased estimators

There is no guarantee that this weight matrix is positive semidefinite. However, the second part is of order O ( N ^{ˆ’ 1} ) and does not destroy the positive semidefinite first part for sufficiently large N . For a large number of independent observations, default settings of the weight matrix W result in asymptotically distribution-free parameter estimates with unbiased standard errors and a correct ² test statistic (Browne 1982, 1984).

If the moment matrix S is a correlation (rather than a covariance) matrix, the default setting of W = ( w _ij,kl ) is the estimators of the asymptotic covariances ƒ _ij,kl of the correlations S = ( s _ij ) (Browne and Shapiro 1986; DeLeeuw 1983)

where

The asymptotic variances of the diagonal elements of a correlation matrix are 0. Therefore, the weight matrix computed by Browne and Shapiro s formula is always singular. In this case the goodness-of-fit function for weighted least-squares estimation is modified to

where r is the penalty weight specified by the WPENALTY= r option and the w ^ij,kl are the elements of the inverse of the reduced ( n ( n ˆ’ 1)/2) — ( n ( n ˆ’ 1)/2) weight matrix that contains only the nonzero rows and columns of the full weight matrix W . The second term is a penalty term to fit the diagonal elements of the moment matrix S . The default value of r = 100 can be decreased or increased by the WPENALTY= option. The often used value of r = 1 seems to be too small in many cases to fit the diagonal elements of a correlation matrix properly. If your model does not fitthe diagonal of the moment matrix S , you can specify the NODIAG option to exclude the diagonal elements from the fit function.

Storing and inverting the huge weight matrix W in WLS estimation needs considerable computer resources. A compromise is found by implementing the DWLS method that uses only the diagonal of the weight matrix W from the WLS estimation in the minimization function

The statistical properties of DWLS estimates are still not known.

In generalized, weighted, or diagonally weighted least-squares estimation, you can change from the default settings of weight matrices W by using an INWGT= data set. Because the diagonal elements w _ii,kk of the weight matrix W are interpreted as asymptotic variances of the sample covariances or correlations, they cannot be negative. The CALIS procedure requires a positive definite weight matrix that has positive diagonal elements.

Relationships among Estimation Criteria

The five estimation functions, F _ULS , F _GLS , F _ML , F _WLS , and F _DWLS , belong to the following two groups:

The functions F _ULS , F _GLS , and F _ML take into account all n ² elements of the symmetric residual matrix S ˆ’ C . This means that the off-diagonal residuals contribute twice to F , as lower and as upper triangle elements.
The functions F _WLS and F _DWLS take into account only the n ( n +1)/2 lower triangular elements of the symmetric residual matrix S ˆ’ C . This means that the off-diagonal residuals contribute to F only once.

The F _DWLS function used in PROC CALIS differs from that used by the LISREL 7 program. Formula (1.25) of the LISREL 7 manual (J reskog and S rbom 1988, p. 23) shows that LISREL groups the F _DWLS function in the first group by taking into account all n ² elements of the symmetric residual matrix S ˆ’ C .

Relationship between DWLS and WLS:

PROC CALIS: The F _DWLS and F _WLS estimation functions deliver the same results for the special case that the weight matrix W used by WLS estimation is a diagonal matrix.

LISREL 7: This is not the case.
Relationship between DWLS and ULS:

LISREL 7: The F _DWLS and F _ULS estimation functions deliver the same results for the special case that the diagonal weight matrix W used by DWLS estimation is an identity matrix (contains only 1s).

PROC CALIS: To obtain the same results with F _DWLS and F _ULS estimation, set the diagonal weight matrix W used in DWLS estimation to

Because the reciprocal elements of the weight matrix are used in the goodness-of-fit function, the off-diagonal residuals are weighted by a factor of 2.

Testing Rank Deficiency in the Approximate Covariance Matrix

The inverse of the information matrix (or approximate Hessian matrix) is used for the covariance matrix of the parameter estimates, which is needed for the computation of approximate standard errors and modification indices. The numerical condition of the information matrix (computed as the crossproduct J ² J of the Jacobian matrix J ) can be very poor in many practical applications, especially for the analysis of unscaled covariance data. The following four-step strategy is used for the inversion of the information matrix.

The inversion (usually of a normalized matrix D ^{ˆ’ 1} HD ^{ˆ’ 1} ) is tried using a modified form of the Bunch and Kaufman (1977) algorithm, which allows the specification of a different singularity criterion for each pivot. The following three criteria for the detection of rank loss in the information matrix are used to specify thresholds:
- ASING specifies absolute singularity.
- MSING specifies relative singularity depending on the whole matrix norm.
- VSING specifies relative singularity depending on the column matrix norm.
  
  1 If no rank loss is detected , the inverse of the information matrix is used for the covariance matrix of parameter estimates, and the next two steps are skipped .
The linear dependencies among the parameter subsets are displayed based on the singularity criteria.
If the number of parameters t is smaller than the value specified by the G4= option (the default value is 60), the Moore-Penrose inverse is computed based on the eigenvalue decomposition of the information matrix. If you do not specify the NOPRINT option, the distribution of eigenvalues is displayed, and those eigenvalues that are set to zero in the Moore-Penrose inverse are indicated. You should inspect this eigenvalue distribution carefully .
If PROC CALIS did not set the right subset of eigenvalues to zero, you can specify the COVSING= option to set a larger or smaller subset of eigenvalues to zero in a further run of PROC CALIS.

Approximate Standard Errors

Except for unweighted and diagonally weighted least-squares estimation, approximate standard errors can be computed as the diagonal elements of the matrix

The matrix H is the approximate Hessian matrix of F evaluated at the final estimates, c = 1 for the WLS estimation method, c = 2 for the GLS and ML method, and N is the sample size. If a given correlation or covariance matrix is singular, PROC CALIS offers two ways to compute a generalized inverse of the information matrix and, therefore, two ways to compute approximate standard errors of implicitly constrained parameter estimates, t values, and modification indices. Depending on the G4= specification, either a Moore-Penrose inverse or a G2 inverse is computed. The expensive Moore-Penrose inverse computes an estimate of the null space using an eigenvalue decomposition. The cheaper G2 inverse is produced by sweeping the linearly independent rows and columns and zeroing out the dependent ones. The information matrix, the approximate covariance matrix of the parameter estimates, and the approximate standard errors are not computed in the cases of unweighted or diagonally weighted least-squares estimation.

Assessment of Fit

This section contains a collection of formulas used in computing indices to assess the goodness of fit by PROC CALIS. The following notation is used:

N for the sample size
n for the number of manifest variables
t for the number of parameters to estimate
df for the degrees of freedom
³ = X for the t vector of optimal parameter estimates
S = ( s _ij ) for the n — n input COV, CORR, UCOV, or UCORR matrix
C = ( c _ij ) = for the predicted model matrix
W for the weight matrix ( W = I for ULS, W = S for default GLS, and W = C for ML estimates)
U for the n ² — n ² asymptotic covariance matrix of sample covariances
( x » , df ) for the cumulative distribution function of the noncentral chisquared distribution with noncentrality parameter ‹‹

The following notation is for indices that allow testing nested models by a ² difference test:

f for the function value of the independence model
df for the degrees of freedom of the independence model
f _min = F for the function value of the fitted model
df _min = df for the degrees of freedom of the fitted model

The degrees of freedom df _min and the number of parameters t are adjusted automatically when there are active constraints in the analysis. The computation of many fit statistics and indices are affected. You can turn off the automatic adjustment using the NOADJDF option. See the section Counting the Degrees of Freedom on page 676 for more information.

Residuals

PROC CALIS computes four types of residuals and writes them to the OUTSTAT= data set.

Raw Residuals

The raw residuals are displayed whenever the PALL, the PRINT, or the RESIDUAL option is specified.
Variance Standardized Residuals

The variance standardized residuals are displayed when you specify
- the PALL, the PRINT, or the RESIDUAL option and METHOD=NONE, METHOD=ULS, or METHOD=DWLS
- RESIDUAL=VARSTAND
  
  The variance standardized residuals are equal to those computed by the EQS 3 program (Bentler 1989).
- Asymptotically Standardized Residuals
  
  The matrix J is the n ² — t Jacobian matrix d & pound ; / d ³ , and Cov ( ³ ) is the t — t asymptotic covariance matrix of parameter estimates (the inverse of the information matrix). Asymptotically standardized residuals are displayed when one of the following conditions is met:
- The PALL, the PRINT, or the RESIDUAL option is specified, and METHOD=ML, METHOD=GLS, or METHOD=WLS, and the expensive information and Jacobian matrices are computed for some other reason.
- RESIDUAL= ASYSTAND is specified.
  
  The asymptotically standardized residuals are equal to those computed by the LISREL 7 program (J reskog and S rbom 1988) except for the denominator NM in the definition of matrix U .

Normalized Residuals

where the diagonal elements u _ij,ij of the n ² — n ² asymptotic covariance matrix U of sample covariances are defined for the following methods.
- GLS as
- ML as
- WLS as
Normalized residuals are displayed when one of the following conditions is met:
- The PALL, the PRINT, or the RESIDUAL option is specified, and METHOD=ML, METHOD=GLS, or METHOD=WLS, and the expensive information and Jacobian matrices are not computed for some other reason.
- RESIDUAL=NORM is specified.
The normalized residuals are equal to those computed by the LISREL VI program (J reskog and S rbom 1985) except for the definition of the denominator NM in matrix U .

For estimation methods that are not BGLS estimation methods (Browne 1982, 1984), such as METHOD=NONE, METHOD=ULS, or METHOD=DWLS, the assumption of an asymptotic covariance matrix U of sample covariances does not seem to be appropriate. In this case, the normalized residuals should be replaced by the more relaxed variance standardized residuals. Computation of asymptotically standardized residuals requires computing the Jacobian and information matrices. This is computationally very expensive and is done only if the Jacobian matrix has to be computed for some other reason, that is, if at least one of the following items is true:

The default, PRINT, or PALL displayed output is requested , and neither the NOMOD nor NOSTDERR option is specified.
Either the MODIFICATION (included in PALL), PCOVES, or STDERR (included in default, PRINT, and PALL output) option is requested or RESIDUAL=ASYSTAND is specified.
The LEVMAR or NEWRAP optimization technique is used.
An OUTRAM= data set is specified without using the NOSTDERR option.
An OUTEST= data set is specified without using the NOSTDERR option.

Since normalized residuals use an overestimate of the asymptotic covariance matrix of residuals (the diagonal of U ), the normalized residuals cannot be larger than the asymptotically standardized residuals (which use the diagonal of U ˆ’ J Cov ( ³ ) J ² ).

Together with the residual matrices, the values of the average residual, the average off-diagonal residual, and the rank order of the largest values are displayed. The distribution of the normalized and standardized residuals is displayed also.

Goodness-of-Fit Indices Based on Residuals

The following items are computed for all five kinds of estimation:ULS, GLS, ML, WLS, and DWLS. All these indices are written to the OUTRAM= data set. The goodness of fit (GFI), adjusted goodness of fit (AGFI), and root mean square residual (RMR) are computed as in the LISREL VI program of J reskog and S rbom (1985).

Goodness-of-Fit Index

The goodness-of-fit index for the ULS, GLS, and ML estimation methods is

but for WLS and DWLS estimation, it is

where W = diag for DWLS estimation, and Vec ( s _ij ˆ’ c _ij ) denotes the vector of the n ( n +1) / 2 elements of the lower triangle of the symmetric matrix S ˆ’ C . For a constant weight matrix W , the goodness-of-fit index is 1 minus the ratio of the minimum function value and the function value before any model has been fitted. The GFI should be between 0 and 1. The data probably do not fit the model if the GFI is negative or much larger than 1.
Adjusted Goodness-of-Fit Index

The AGFI is the GFI adjusted for the degrees of freedom of the model

The AGFI corresponds to the GFI in replacing the total sum of squares by the mean sum of squares.

Caution:
- Large n and small df can result in a negative AGFI. For example, GFI=0.90, n=19, and df=2 result in an AGFI of -8.5.
- AGFI is not defined for a saturated model, due to division by df = 0.
- AGFI is not sensitive to losses in df .
  
  The AGFI should be between 0 and 1. The data probably do not fit the model if the AGFI is negative or much larger than 1. For more information, refer to Mulaik et al. (1989).

Root Mean Square Residual

The RMR is the mean of the squared residuals:
Parsimonious Goodness-of-Fit Index

The PGFI (Mulaik et al. 1989) is a modification of the GFI that takes the parsimony of the model into account:

The PGFI uses the same parsimonious factor as the parsimonious normed Bentler-Bonett index (James, Mulaik, and Brett 1982).

Goodness-of-Fit Indices Based on the ²

The following items are transformations of the overall ² value and in general depend on the sample size N. These indices are not computed for ULS or DWLS estimates.

Uncorrected ²

The overall ² measure is the optimum function value F multiplied by N ˆ’ 1 if a CORR or COV matrix is analyzed, or multiplied by N if a UCORR or UCOV matrix is analyzed. This gives the likelihood ratio test statistic for the null hypothesis that the predicted matrix C has the specified model structure against the alternative that C is unconstrained. The ² test is valid only if the observations are independent and identically distributed, the analysis is based on the nonstandardized sample covariance matrix S , and the sample size N is sufficiently large (Browne 1982; Bollen 1989b; J reskog and S rbom 1985). For ML and GLS estimates, the variables must also have an approximately multivariate normal distribution. The notation Prob>Chi**2 means the probability under the null hypothesis of obtaining a greater ² statistic than that observed .

where F is the function value at the minimum.
Value of the Independence Model

The value of the independence model

and the corresponding degrees of freedom df can be used (in large samples) to evaluate the gain of explanation by fitting the specific model (Bentler 1989).
RMSEA Index (Steiger and Lind 1980)

The Steiger and Lind (1980) root mean squared error approximation (RMSEA) coefficient is

The lower and upper limits of the confidence interval are computed using the cumulative distribution function of the noncentral chi-squared distribution ( x » , df )= ± , with x = NM * F , » _L satisfying , and » _U satisfying :

Refer to Browne and Du Toit (1992) for more details. The size of the confidence interval is defined by the option ALPHARMS= ± , 0 ‰ ± ‰ 1. The default is ± = 0 . 1, which corresponds to the 90% confidence interval for the RMSEA.
Probability for Test of Close Fit (Browne and Cudeck 1993)

The traditional exact ² test hypothesis H : ˆˆ _± = 0 is replaced by the null hypothesis of close fit H : ˆˆ _± ‰ . 05 and the exceedance probability P is computed as

where x = NM * F and » * =0 . 05 ² * NM * df . The null hypothesis of close fit is rejected if P is smaller than a prespecified level (for example, P <0 . 05).
Expected Cross Validation Index (Browne and Cudeck 1993)

For GLS and WLS, the estimator c of the ECVI is linearly related to AIC:

For ML estimation, c _ML is used.

The confidence interval ( c _L ; c _U ) for c is computed using the cumulative distribution function ( x » , df ) of the noncentral chi-squared distribution,

with , and . The confidence interval for c _ML is

where and . Refer to Browne and Cudeck (1993). The size of the confidence interval is defined by the option ALPHAECV= ± , 0 ‰ ± ‰ 1. The default is ± =0 . 1, which corresponds to the 90% confidence interval for the ECVI.
Comparative Fit Index (Bentler 1989)
Adjusted ² Value (Browne 1982)

If the variables are n -variate elliptic rather than normal and have significant amounts of multivariate kurtosis (leptokurtic or platykurtic), the ² value can be adjusted to

where · ₂ is the multivariate relative kurtosis coefficient.
Normal Theory Reweighted LS ² Value

This index is displayed only if METHOD=ML. Instead of the function value F _ML , the reweighted goodness-of-fit function F _GWLS is used,

where F _GWLS is the value of the function at the minimum.
Akaike s Information Criterion (AIC) (Akaike 1974; Akaike 1987)

This is a criterion for selecting the best model among a number of candidate models. The model that yields the smallest value of AIC is considered the best.
Consistent Akaike s Information Criterion (CAIC) (Bozdogan 1987)

This is another criterion, similar to AIC, for selecting the best model among alternatives. The model that yields the smallest value of CAIC is considered the best. CAIC is preferred by some people to AIC or the ² test.
Schwarz s Bayesian Criterion (SBC) (Schwarz 1978; Sclove 1987) This is another criterion, similar to AIC, for selecting the best model. The model that yields the smallest value of SBC is considered the best. SBC is preferred by some people to AIC or the ² test.
McDonald s Measure of Centrality (McDonald and Hartmann 1992)
Parsimonious Normed Fit Index (James, Mulaik, and Brett 1982)

The PNFI is a modification of Bentler-Bonett s normed fit index that takes parsimony of the model into account,

The PNFI uses the same parsimonious factor as the parsimonious GFI of Mulaik et al. (1989).
Z-Test (Wilson and Hilferty 1931)

The Z-Test of Wilson and Hilferty assumes an n -variate normal distribution:

Refer to McArdle (1988) and Bishop, Fienberg, and Holland (1977, p. 527) for an application of the Z-Test.
Nonnormed Coefficient (Bentler and Bonett 1980)

Refer to Tucker and Lewis (1973).
Normed Coefficient (Bentler and Bonett 1980)

Mulaik et al. (1989) recommend the parsimonious weighted form PNFI.
Normed Index ₁ (Bollen 1986)

₁ is always less than or equal to 1; ₁ < 0 is unlikely in practice. Refer to the discussion in Bollen (1989a).
Nonnormed Index ” ₂ (Bollen 1989a)

is a modification of Bentler & Bonett s ” that uses df and lessens the dependence on N . Refer to the discussion in Bollen (1989b). ” ₂ is identical to Mulaik et al. s (1989) IFI2 index.
Critical N Index (Hoelter 1983)

where is the critical chi-square value for the given df degrees of freedom and probability ± =0 . 05,and F is the value of the estimation criterion (minimization function). Refer to Bollen (1989b, p. 277). Hoelter (1983) suggests that CN should be at least 200; however, Bollen (1989b) notes that the CN value may lead to an overly pessimistic assessment of fit for small samples.

Squared Multiple Correlation

The following are measures of the squared multiple correlation for manifest and endogenous variables and are computed for all five estimation methods: ULS, GLS, ML,WLS,andDWLS.Thesecoefficients are computed as in the LISREL VI program of J reskog and S rbom (1985). The DETAE, DETSE, and DETMV determination coefficients are intended to be global means of the squared multiple correlations for different subsets of model equations and variables. These coefficients are displayed only when you specify the PDETERM option with a RAM or LINEQS model.

R ² Values Corresponding to Endogenous Variables
Total Determination of All Equations
Total Determination of the Structural Equations
Total Determination of the Manifest Variables

Caution: In the LISREL program, the structural equations are defined by specifying the BETA matrix. In PROC CALIS, a structural equation has a dependent left-hand-side variable that appears at least once on the right-hand side of another equation, or the equation has at least one right-hand-side variable that is the left-hand-side variable of another equation. Therefore, PROC CALIS sometimes identifies more equations as structural equations than the LISREL program does.

Measures of Multivariate Kurtosis

In many applications, the manifest variables are not even approximately multivariate normal. If this happens to be the case with your data set, the default generalized least-squares and maximum likelihood estimation methods are not appropriate, and you should compute the parameter estimates and their standard errors by an asymptotically distribution-free method, such as the WLS estimation method. If your manifest variables are multivariate normal, then they have a zero relative multivariate kurtosis, and all marginal distributions have zero kurtosis (Browne 1982). If your DATA= data set contains raw data, PROC CALIS computes univariate skewness and kurtosis and a set of multivariate kurtosis values. By default, the values of univariate skewness and kurtosis are corrected for bias (as in PROC UNIVARIATE), but using the BIASKUR option enables you to compute the uncorrected values also. The values are displayed when you specify the PROC CALIS statement option KURTOSIS.

Corrected Variance for Variable z _j
Corrected Univariate Skewness for Variable z _j
Uncorrected Univariate Skewness for Variable z _j
Corrected Univariate Kurtosis for Variable z _j
Uncorrected Univariate Kurtosis for Variable z _j
Mardia s Multivariate Kurtosis
Relative Multivariate Kurtosis
Normalized Multivariate Kurtosis
Mardia Based Kappa
Mean Scaled Univariate Kurtosis
Adjusted Mean Scaled Univariate Kurtosis

with

If variable Z _j is normally distributed, the uncorrected univariate kurtosis ³ ₂₍ j ₎ is equal to 0. If Z has an n -variate normal distribution, Mardia s multivariate kurtosis ³ ₂ is equal to 0. A variable Z _j is called leptokurtic if it has a positive value of ³ ₂₍ j ₎ and is called platykurtic if it has a negative value of ³ ₂₍ j ₎ . The values of ₁ , ₂ , and ₃ should not be smaller than a lower bound (Bentler 1985):

PROC CALIS displays a message if this happens.

If weighted least-squares estimates (METHOD=WLS or METHOD=ADF) are specified and the weight matrix is computed from an input raw data set, the CALIS procedure computes two more measures of multivariate kurtosis.

Multivariate Mean Kappa

where

and m = n ( n + 1)( n + 2)( n +3) / 24 is the number of elements in the vector s _ij,kl (Bentler 1985).
Multivariate Least-Squares Kappa

where

s ₄ is the vector of the s _ij,kl , and s ₂ is the vector of the elements in the denominator of (Bentler 1985).

The occurrence of significant nonzero values of Mardia s multivariate kurtosis ³ ₂ and significant amounts of some of the univariate kurtosis values ³ ₂₍ j ₎ indicate that your variables are not multivariate normal distributed. Violating the multivariate normality assumption in (default) generalized least-squares and maximum likelihood estimation usually leads to the wrong approximate standard errors and incorrect fit statistics based on the ² value. In general, the parameter estimates are more stable against violation of the normal distribution assumption. For more details, refer to Browne (1974, 1982, 1984).

Initial Estimates

Each optimization technique requires a set of initial values for the parameters. To avoid local optima, the initial values should be as close as possible to the globally optimal solution. You can check for local optima by running the analysis with several different sets of initial values; the RANDOM= option in the PROC CALIS statement is useful in this regard.

RAM and LINEQS: There are several default estimation methods available in PROC CALIS for initial values of parameters in a linear structural equation model specified by a RAM or LINEQS model statement, depending on the form of the specified model.
- two-stage least-squares estimation
- instrumental variable method (H gglund 1982; Jennrich 1987)
- approximative factor analysis method
- ordinary least-squares estimation
- estimation method of McDonald (McDonald and Hartmann 1992)

FACTOR: For default (exploratory) factor analysis, PROC CALIS computes initial estimates for factor loadings and unique variances by an algebraic method of approximate factor analysis. If you use a MATRIX statement together with a FACTOR model specification, initial values are computed by McDonald s (McDonald and Hartmann 1992) method if possible. McDonald s method of computing initial values works better if you scale the factors by setting the factor variances to 1 rather than setting the loadings of the reference variables equal to 1. If none of the two methods seems to be appropriate, the initial values are set by the START= option.
COSAN: For the more general COSAN model, there is no default estimation method for the initial values. In this case, the START= or RANDOM= option can be used to set otherwise unassigned initial values.

Poor initial values can cause convergence problems, especially with maximum likelihood estimation. You should not specify a constant initial value for all parameters since this would produce a singular predicted model matrix in the first iteration. Sufficiently large positive diagonal elements in the central matrices of each model matrix term provide a nonnegative definite initial predicted model matrix. If maximum likelihood estimation fails to converge, it may help to use METHOD=LSML, which uses the final estimates from an unweighted least-squares analysis as initial estimates for maximum likelihood. Or you can fit a slightly different but better-behaved model and produce an OUTRAM= data set, which can then be modified in accordance with the original model and used as an INRAM= data set to provide initial values for another analysis.

If you are analyzing a covariance or scalar product matrix, be sure to take into account the scales of the variables. The default initial values may be inappropriate when some variables have extremely large or small variances.

Automatic Variable Selection

You can use the VAR statement to reorder the variables in the model and to delete the variables not used. Using the VAR statement saves memory and computation time. If a linear structural equation model using the RAM or LINEQS statement (or an INRAM= data set specifying a RAM or LINEQS model) does not use all the manifest variables given in the input DATA= data set, PROC CALIS automatically deletes those manifest variables not used in the model.

In some special circumstances, the automatic variable selection performed for the RAM and LINEQS statements may be inappropriate, for example, if you are interested in modification indices connected to some of the variables that are not used in the model. You can include such manifest variables as exogenous variables in the analysis by specifying constant zero coefficients.

For example, the first three steps in a stepwise regression analysis of the Werner Blood Chemistry data (J reskog and S rbom 1988, p. 111) can be performed as follows:

  proc calis data=dixon method=gls nobs=180 print mod;   lineqs y=0 x1+0 x2+0 x3+0 x4+0 x5+0 x6+0 x7+e;   std    e=var;   run;   proc calis data=dixon method=gls nobs=180 print mod;   lineqs y=g1 x1+0 x2+0 x3+0 x4+0 x5+0 x6+0 x7+e;   std    e=var;   run;   proc calis data=dixon method=gls nobs=180 print mod;   lineqs y=g1 x1+0 x2+0 x3+0 x4+0 x5+g6 x6+0 x7+e;   std    e=var;   run;

Using the COSAN statement does not automatically delete those variables from the analysis that are not used in the model. You can use the output of the predetermined values in the predicted model matrix (PREDET option) to detect unused variables. Variables that are not used in the model are indicated by 0 in the rows and columns of the predetermined predicted model matrix.

Exogenous Manifest Variables

If there are exogenous manifest variables in the linear structural equation model, then there is a one-to-one relationship between the given covariances and corresponding estimates in the central model matrix ( P or ). In general, using exogenous manifest variables reduces the degrees of freedom since the corresponding sample correlations or covariances are not part of the exogenous information provided for the parameter estimation. See the section Counting the Degrees of Freedom on page 676 for more information.

If you specify a RAM or LINEQS model statement, or if such a model is recognized in an INRAM= data set, those elements in the central model matrices that correspond to the exogenous manifest variables are reset to the sample values after computing covariances or correlations within the current BY group.

The COSAN statement does not automatically set the covariances in the central model matrices that correspond to manifest exogenous variables.

You can use the output of the predetermined values in the predicted model matrix (PREDET option) that correspond to manifest exogenous variables to see which of the manifest variables are exogenous variables and to help you set the corresponding locations of the central model matrices with their covariances.

The following two examples show how different the results of PROC CALIS can be if manifest variables are considered either as endogenous or as exogenous variables. (See Figure 19.5.) In both examples, a correlation matrix S is tested against an identity model matrix C ; that is, no parameter is estimated. The three runs of the first example (specified by the COSAN, LINEQS, and RAM statements) consider the two variables y and x as endogenous variables.

Figure 19.5: Exogenous and Endogenous Variables

  title2 'Data: FULLER (1987, p.18)';   data corn;   input y x;   datalines;   86  70   115  97   90  53   86  64   110  95   91  64   99  50   96  70   99  94   104  69   96  51   ;   title3 'Endogenous Y and X';   proc calis data=corn;   cosan corr(2,ide);   run;   proc calis data=corn;   lineqs   y=ey,   x=ex;   std    ey ex=2 * 1;   run;   proc calis data=corn;   ram   1  1  3  1.,   1  2  4  1.,   2  3  3  1.,   2  4  4  1.;   run;

The two runs of the second example (specified by the LINEQS and RAM statements) consider y and x as exogenous variables.

  title3 'Exogenous Y and X';   proc calis data=corn;   stdyx=2*1;   run;   proc calis data=corn;   ram   2  1  1  1.,   2  2  2  1.;   run;

The LINEQS and the RAM model statements set the covariances (correlations) of exogenous manifest variables in the estimated model matrix and automatically reduce the degrees of freedom.

Use of Optimization Techniques

No algorithm for optimizing general nonlinear functions exists that will always find the global optimum for a general nonlinear minimization problem in a reasonable amount of time. Since no single optimization technique is invariably superior to others, PROC CALIS provides a variety of optimization techniques that work well in various circumstances. However, you can devise problems for which none of the techniques in PROC CALIS will find the correct solution. All optimization techniques in PROC CALIS use O ( n ² ) memory except the conjugate gradient methods, which use only O ( n ) of memory and are designed to optimize problems with many parameters.

The PROC CALIS statement NLOPTIONS can be especially helpful for tuning applications with nonlinear equality and inequality constraints on the parameter estimates. Some of the options available in NLOPTIONS may also be invoked as PROC CALIS options. The NLOPTIONS statement can specify almost the same options as the SAS/OR NLP procedure.

Nonlinear optimization requires the repeated computation of

the function value (optimization criterion)
the gradient vector (first-order partial derivatives)
for some techniques, the (approximate) Hessian matrix (second-order partial derivatives)
values of linear and nonlinear constraints
the first-order partial derivatives (Jacobian) of nonlinear constraints

For the criteria used by PROC CALIS, computing the gradient takes more computer time than computing the function value, and computing the Hessian takes much more computer time and memory than computing the gradient, especially when there are many parameters to estimate. Unfortunately, optimization techniques that do not use the Hessian usually require many more iterations than techniques that do use the (approximate) Hessian, and so they are often slower. Techniques that do not use the Hessian also tend to be less reliable (for example, they may terminate at local rather than global optima).

The available optimization techniques are displayed in Table 19.13 and can be chosen by the TECH= name option.

Table 19.13: Optimization Techniques
TECH=	Optimization Technique
LEVMAR	Levenberg-Marquardt Method
TRUREG	Trust-Region Method
NEWRAP	Newton-Raphson Method with Line Search
NRRIDG	Newton-Raphson Method with Ridging
QUANEW	Quasi-Newton Methods (DBFGS, DDFP, BFGS, DFP)
DBLDOG	Double-Dogleg Method (DBFGS, DDFP)
CONGRA	Conjugate Gradient Methods (PB, FR, PR, CD)

Table 19.14 shows, for each optimization technique, which derivatives are needed (first-order or second-order) and what kind of constraints (boundary, linear, or nonlinear) can be imposed on the parameters.

Table 19.14: Derivatives Needed and Constraints Allowed
	Derivatives		Constraints
TECH=	First Order	Second Order	Boundary	Linear	Nonlinear
LEVMAR	x	x	x	x	-
TRUREG	x	x	x	x	-
NEWRAP	x	x	x	x	-
NRRIDG	x	x	x	x	-
QUANEW	x	-	x	x	x
DBLDOG	x	-	x	x	-
CONGRA	x	-	x	x	-

The Levenberg-Marquardt, trust-region, and Newton-Raphson techniques are usually the most reliable, work well with boundary and general linear constraints, and generally converge after a few iterations to a precise solution. However, these techniques need to compute a Hessian matrix in each iteration. For HESSALG=1, this means that you need about 4( n ( n +1) / 2) t bytes of work memory ( n = the number of manifest variables, t = the number of parameters to estimate) to store the Jacobian and its cross product. With HESSALG=2 or HESSALG=3, you do not need this work memory, but the use of a utility file increases execution time. Computing the approximate Hessian in each iteration can be very time- and memory-consuming, especially for large problems (more than 60 or 100 parameters, depending on the computer used). For large problems, a quasi-Newton technique, especially with the BFGS update, can be far more efficient.

For a poor choice of initial values, the Levenberg-Marquardt method seems to be more reliable.

If memory problems occur, you can use one of the conjugate gradient techniques, but they are generally slower and less reliable than the methods that use second-order information.

There are several options to control the optimization process. First of all, you can specify various termination criteria. You can specify the GCONV= option to specify a relative gradient termination criterion. If there are active boundary constraints, only those gradient components that correspond to inactive constraints contribute to the criterion. When you want very precise parameter estimates, the GCONV= option is useful. Other criteria that use relative changes in function values or parameter estimates in consecutive iterations can lead to early termination when active constraints cause small steps to occur. The small default value for the FCONV= option helps prevent early termination. Using the MAXITER= and MAXFUNC= options enables you to specify the maximum number of iterations and function calls in the optimization process. These limits are especially useful in combination with the INRAM= and OUTRAM= options; you can run a few iterations at a time, inspect the results, and decide whether to continue iterating.

Nonlinearly Constrained QN Optimization

The algorithm used for nonlinearly constrained quasi-Newton optimization is an efficient modification of Powell s (1978a, 1978b, 1982a, 1982b) Variable Metric Constrained WatchDog (VMCWD) algorithm. A similar but older algorithm (VF02AD) is part of the Harwell library. Both VMCWD and VF02AD use Fletcher s VE02AD algorithm (also part of the Harwell library) for positive definite quadratic programming. The PROC CALIS QUANEW implementation uses a quadratic programming subroutine that updates and downdates the approximation of the Cholesky factor when the active set changes. The nonlinear QUANEW algorithm is not a feasible point algorithm, and the value of the objective function need not decrease (minimization) or increase (maximization) monotonically. Instead, the algorithm tries to reduce a linear combination of the objective function and constraint violations, called the merit function .

The following are similarities and differences between this algorithm and VMCWD:

A modification of this algorithm can be performed by specifying VERSION=1, which replaces the update of the Lagrange vector µ with the original update of Powell (1978a, 1978b), which is used in VF02AD. This can be helpful for some applications with linearly dependent active constraints.
If the VERSION= option is not specified or VERSION=2 is specified, the evaluation of the Lagrange vector µ is performed in the same way as Powell (1982a, 1982b) describes.
Instead of updating an approximate Hessian matrix, this algorithm uses the dual BFGS (or DFP) update that updates the Cholesky factor of an approximate Hessian. If the condition of the updated matrix gets too bad, a restart is done with a positive diagonal matrix. At the end of the first iteration after each restart, the Cholesky factor is scaled.
The Cholesky factor is loaded into the quadratic programming subroutine, automatically ensuring positive definiteness of the problem. During the quadratic programming step, the Cholesky factor of the projected Hessian matrix and the QT decomposition are updated simultaneously when the active set changes. Refer to Gill et al. (1984) for more information.
The line-search strategy is very similar to that of Powell (1982a, 1982b). However, this algorithm does not call for derivatives during the line search; hence, it generally needs fewer derivative calls than function calls. The VMCWD algorithm always requires the same number of derivative and function calls. It was also found in several applications of VMCWD that Powell s line-search method sometimes uses steps that are too long during the first iterations. In those cases, you can use the INSTEP= option specification to restrict the step length ± of the first iterations.
Also the watchdog strategy is similar to that of Powell (1982a, 1982b). However, this algorithm doesn t return automatically after a fixed number of iterations to a former better point. A return here is further delayed if the observed function reduction is close to the expected function reduction of the quadratic model.
Although Powell s termination criterion still is used (as FCONV2), the QUANEW implementation uses two additional termination criteria (GCONV and ABSGCONV).

This algorithm is automatically invoked when you specify the NLINCON statement. The nonlinear QUANEW algorithm needs the Jacobian matrix of the first-order derivatives (constraints normals) of the constraints

where nc is the number of nonlinear constraints for a given point x .

You can specify two update formulas with the UPDATE= option:

UPDATE=DBFGS performs the dual BFGS update of the Cholesky factor of the Hessian matrix. This is the default.
UPDATE=DDFP performs the dual DFP update of the Cholesky factor of the Hessian matrix.

This algorithm uses its own line-search technique. All options and parameters (except the INSTEP= option) controlling the line search in the other algorithms do not apply here. In several applications, large steps in the first iterations are troublesome . You can specify the INSTEP= option to impose an upper bound for the step size ± during the first five iterations. The values of the LCSINGULAR=, LCEPSILON=, and LCDEACT= options, which control the processing of linear and boundary constraints, are valid only for the quadratic programming subroutine used in each iteration of the nonlinear constraints QUANEW algorithm.

Optimization and Iteration History

The optimization and iteration histories are displayed by default because it is important to check for possible convergence problems.

The optimization history includes the following summary of information about the initial state of the optimization.

the number of constraints that are active at the starting point, or more precisely, the number of constraints that are currently members of the working set. If this number is followed by a plus sign, there are more active constraints, of which at least one is temporarily released from the working set due to negative Lagrange multipliers.
the value of the objective function at the starting point
if the (projected) gradient is available, the value of the largest absolute (projected) gradient element
for the TRUREG and LEVMAR subroutines, the initial radius of the trust region around the starting point

The optimization history ends with some information concerning the optimization result:

the number of constraints that are active at the final point, or more precisely, the number of constraints that are currently members of the working set. If this number is followed by a plus sign, there are more active constraints, of which at least one is temporarily released from the working set due to negative Lagrange multipliers.
the value of the objective function at the final point
if the (projected) gradient is available, the value of the largest absolute (projected) gradient element
other information specific to the optimization technique

The iteration history generally consists of one line of displayed output containing the most important information for each iteration. The _LIST_ variable (see the SAS Program Statements section on page 628) also enables you to display the parameter estimates and the gradient in some or all iterations.

The iteration history always includes the following (the words in parentheses are the column header output):

the iteration number (Iter)
the number of iteration restarts (rest)
the number of function calls (nfun)
the number of active constraints (act)
the value of the optimization criterion (optcrit)
the difference between adjacent function values (difcrit)
the maximum of the absolute gradient components corresponding to inactive boundary constraints (maxgrad)

An apostrophe trailing the number of active constraints indicates that at least one of the active constraints is released from the active set due to a significant Lagrange multiplier.

For the Levenberg-Marquardt technique (LEVMAR), the iteration history also includes the following information:

An asterisk trailing the iteration number means that the computed Hessian approximation is singular and consequently ridged with a positive lambda value. If all or the last several iterations show a singular Hessian approximation, the problem is not sufficiently identified. Thus, there are other locally optimal solutions that lead to the same optimum function value for different parameter values. This implies that standard errors for the parameter estimates are not computable without the addition of further constraints.
the value of the Lagrange multiplier (lambda); this is 0 if the optimum of the quadratic function approximation is inside the trust region (a trust-region-scaled Newton step can be performed) and is greater than 0 when the optimum of the quadratic function approximation is located at the boundary of the trust region (the scaled Newton step is too long to fit in the trust region and a quadratic constraint optimization is performed). Large values indicate optimization difficulties. For a nonsingular Hessian matrix, the value of lambda should go to 0 during the last iterations, indicating that the objective function can be well approximated by a quadratic function in a small neighborhood of the optimum point. An increasing lambda value often indicates problems in the optimization process.
the value of the ratio (rho) between the actually achieved difference in function values and the predicted difference in the function values on the basis of the quadratic function approximation. Values much less than 1 indicate optimization difficulties. The value of the ratio indicates the goodness of the quadratic function approximation; in other words, << 1 means that the radius of the trust region has to be reduced. A fairly large value of means that the radius of the trust region need not be changed. And a value close to or larger than 1 means that the radius can be increased, indicating a good quadratic function approximation.

For the Newton-Raphson technique (NRRIDG), the iteration history also includes the following information:

the value of the ridge parameter. This is 0 when a Newton step can be performed, and it is greater than 0 when either the Hessian approximation is singular or a Newton step fails to reduce the optimization criterion. Large values indicate optimization difficulties.
the value of the ratio (rho) between the actually achieved difference in function values and the predicted difference in the function values on the basis of the quadratic function approximation. Values much less than 1.0 indicate optimization difficulties.

For the Newton-Raphson with line-search technique (NEWRAP), the iteration history also includes

the step size ± (alpha) computed with one of the line-search algorithms
the slope of the search direction at the current parameter iterate. For minimization, this value should be significantly negative. Otherwise, the line-search algorithm has difficulty reducing the function value sufficiently.

For the Trust-Region technique (TRUREG), the iteration history also includes the following information.

An asterisk after the iteration number means that the computed Hessian approximation is singular and consequently ridged with a positive lambda value.
the value of the Lagrange multiplier (lambda). This value is zero when the optimum of the quadratic function approximation is inside the trust region (a trust-region-scaled Newton step can be performed) and is greater than zero when the optimum of the quadratic function approximation is located at the boundary of the trust region (the scaled Newton step is too long to fitinthe trust region and a quadratically constrained optimization is performed). Large values indicate optimization difficulties. As in Gay (1983), a negative lambda value indicates the special case of an indefinite Hessian matrix (the smallest eigenvalue is negative in minimization).
the value of the radius ” of the trust region. Small trust region radius values combined with large lambda values in subsequent iterations indicate optimization problems.

For the quasi-Newton (QUANEW) and conjugate gradient (CONGRA) techniques, the iteration history also includes the following information:

the step size (alpha) computed with one of the line-search algorithms
the descent of the search direction at the current parameter iterate. This value should be significantly smaller than 0. Otherwise, the line-search algorithm has difficulty reducing the function value sufficiently.

Frequent update restarts (rest) of a quasi-Newton algorithm often indicate numerical problems related to required properties of the approximate Hessian update, and they decrease the speed of convergence. This can happen particularly if the ABSGCONV= termination criterion is too small, that is, when the requested precision cannot be obtained by quasi-Newton optimization. Generally, the number of automatic restarts used by conjugate gradient methods are much higher.

For the nonlinearly constrained quasi-Newton technique, the iteration history also includes the following information:

the maximum value of all constraint violations,
the value of the predicted function reduction used with the GCONV and FCONV2 termination criteria,
the step size ± of the quasi-Newton step. Note that this algorithm works with a special line-search algorithm.
the maximum element of the gradient of the Lagrange function,

For the double dogleg technique, the iteration history also includes the following information:

the parameter ‹‹ of the double-dogleg step. A value ‹‹ = 0 corresponds to the full (quasi) Newton step.
the slope of the search direction at the current parameter iterate. For minimization, this value should be significantly negative.

Line-Search Methods

In each iteration k , the (dual) quasi-Newton, hybrid quasi-Newton, conjugate gradient, and Newton-Raphson minimization techniques use iterative line-search algorithms that try to optimize a linear, quadratic, or cubic approximation of the nonlinear objective function f of n parameters x along a feasible descent search direction s ( k )

by computing an approximately optimal scalar ± ⁽ ^k ⁾ > 0. Since the outside iteration process is based only on the approximation of the objective function, the inside iteration of the line-search algorithm does not have to be perfect. Usually, it is satisfactory that the choice of ± significantly reduces (in a minimization) the objective function. Criteria often used for termination of line-search algorithms are the Goldstein conditions (Fletcher 1987).

Various line-search algorithms can be selected by using the LIS= option (page 580). The line-search methods LIS=1, LIS=2, and LIS=3 satisfy the left-hand-side and right-hand-side Goldstein conditions (refer to Fletcher 1987). When derivatives are available, the line-search methods LIS=6, LIS=7, and LIS=8 try to satisfy the right-hand-side Goldstein condition; if derivatives are not available, these line-search algorithms use only function calls.

The line-search method LIS=2 seems to be superior when function evaluation consumes significantly less computation time than gradient evaluation. Therefore, LIS=2 is the default value for Newton-Raphson, (dual) quasi-Newton, and conjugate gradient optimizations.

Restricting the Step Length

Almost all line-search algorithms use iterative extrapolation techniques that can easily lead to feasible points where the objective function f is no longer defined (result-inginindefinite matrices for ML estimation) or is difficult to compute (resulting in floating point overflows). Therefore, PROC CALIS provides options that restrict the step length or trust region radius, especially during the first main iterations.

The inner product g ² s of the gradient g and the search direction s is the slope of f ( ± ) = f ( x + ± s ) along the search direction s with step length ± . The default starting value ± ⁽⁰⁾ = ± ⁽ ^k, ⁰⁾ in each line-search algorithm (min _{± >} f ( x + ± s )) during the main iteration k is computed in three steps.

Use either the difference df = f ⁽ ^k ⁾ ˆ’ f ⁽ ^k ^{ˆ’ 1)} of the function values during the last two consecutive iterations or the final stepsize value ± ˆ’ of the previous iteration k ˆ’ 1 to compute a first value .
- Using the DAMPSTEP<= r > option:
  
  The initial value for the new step length can be no larger than r times the final step length ± ˆ’ of the previous iteration. The default is r =2.
- Not using the DAMPSTEP option:
  
  with
  
  This value of can be too large and can lead to a difficult or impossible function evaluation, especially for highly nonlinear functions such as the EXP function.
During the first five iterations, the second step enables you to reduce to a smaller starting value using the INSTEP= r option:

After more than five iterations, is set to .
The third step can further reduce the step length by

where u is the maximum length of a step inside the feasible region.

The INSTEP= r option lets you specify a smaller or larger radius of the trust region used in the first iteration by the trust-region, double-dogleg, and Levenberg-Marquardt algorithm. The default initial trust region radius is the length of the scaled gradient (Mor 1978). This step corresponds to the default radius factor of r =1. This choice is successful in most practical applications of the TRUREG, DBLDOG, and LEVMAR algorithms. However, for bad initial values used in the analysis of a covariance matrix with high variances, or for highly nonlinear constraints (such as using the EXP function) in your programming code, the default start radius can result in arithmetic overflows. If this happens, you can try decreasing values of INSTEP= r , 0 < r < 1, until the iteration starts successfully. A small factor r also affects the trust region radius of the next steps because the radius is changed in each iteration by a factor 0 < c ‰ 4 depending on the ratio. Reducing the radius corresponds to increasing the ridge parameter that produces smaller steps directed closer toward the gradient direction.

Modification Indices

While fitting structural models, you may want to modify the specified model in order to

reduce the ² value significantly
reduce the number of parameters to estimate without increasing the ² value too much

If you specify the MODIFICATION or MOD option, PROC CALIS computes and displays a default set of modification indices:

Univariate Lagrange multiplier test indices for most elements in the model matrices that are constrained to equal constants . These are second-order approximations of the decrease in the ² value that would result from allowing the constant matrix element to vary. Besides the value of the Lagrange multiplier, the corresponding probability ( df =1) and the approximate change of the parameter value (should the constant be changed to a parameter) are displayed. If allowing the constant to be a free estimated parameter would result in a singular information matrix, the string sing is displayed instead of the Lagrange multiplier index. Not all elements in the model matrices should be allowed to vary; the diagonal elements of the inverse matrices in the RAM or LINEQS model must be constant ones. The univariate Lagrange multipliers are displayed at the constant locations of the model matrices.
Univariate Wald test indices for those matrix elements that correspond to parameter estimates in the model. These are second-order approximations of the increase in the ² value that would result from constraining the parameter to a 0 constant. The univariate Wald test indices are the same as the t values that are displayed together with the parameter estimates and standard errors. The univariate Wald test indices are displayed at the parameter locations of the model matrices.
Univariate Lagrange multiplier test indices that are second-order approximations of the decrease in the ² value that would result from the release of equality constraints . Multiple equality constraints containing n >2 parameters are tested successively in n steps, each assuming the release of one of the equality-constrained parameters. The expected change of the parameter values of the separated parameter and the remaining parameter cluster are displayed, too.
Univariate Lagrange multiplier test indices for releasing active boundary constraints specified by the BOUNDS statement
Stepwise multivariate Wald test indices for constraining estimated parameters to 0 are computed and displayed. In each step, the parameter that would lead to the smallest increase in the multivariate ² value is set to 0. Besides the multivariate ² value and its probability, the univariate increments are also displayed. The process stops when the univariate probability is smaller than the specified value in the SLMW= option.

All of the preceding tests are approximations. You can often get more accurate tests by actually fitting different models and computing likelihood ratio tests. For more details about the Wald and the Lagrange multiplier test, refer to MacCallum (1986), Buse (1982), Bentler (1986), or Lee (1985).

Note that, for large model matrices, the computation time for the default modification indices can considerably exceed the time needed for the minimization process.

The modification indices are not computed for unweighted least-squares or diagonally weighted least-squares estimation.

Caution: Modification indices are not computed if the model matrix is an identity matrix (IDE or ZID), a selection matrix (PER), or the first matrix J in the LINEQS model. If you want to display the modification indices for such a matrix, you should specify the matrix as another type; for example, specify an identity matrix used in the COSAN statement as a diagonal matrix with constant diagonal elements of 1.

Constrained Estimation Using Program Code

The CALIS procedure offers a very flexible way to constrain parameter estimates. You can use your own programming statements to express special properties of the parameter estimates. This tool is also present in McDonald s COSAN implementation but is considerably easier to use in the CALIS procedure. PROC CALIS is able to compute analytic first- and second-order derivatives that you would have to specify using the COSAN program. There are also three PROC CALIS statements you can use:

the BOUNDS statement, to specify simple bounds on the parameters used in the optimization process
the LINCON statement, to specify general linear equality and inequality constraints on the parameters used in the optimization process
the NLINCON statement, to specify general nonlinear equality and inequality constraints on the parameters used in the optimization process. The variables listed in the NLINCON statement must be specified in the program code.

There are some traditional ways to enforce parameter constraints by using parameter transformations (McDonald 1980).

One-sided boundary constraints: For example, the parameter q _k should be at least as large (or at most as small) as a given constant value a (or b ),

This inequality constraint can be expressed as an equality constraint

in which the fundamental parameter x _j is unconstrained.
Two-sided boundary constraints: For example, the parameter q _k should be located between two given constant values a and b , a < b

This inequality constraint can be expressed as an equality constraint

in which the fundamental parameter x _j is unconstrained.
One-sided order constraints: For example, the parameters q ₁ , , q _k should be ordered in the form

These inequality constraints can be expressed as a set of equality constraints

in which the fundamental parameters x ₁ , , x _k are unconstrained.
Two-sided order constraints: For example, the parameters q ₁ , , q _k should be ordered in the form

These inequality constraints can be expressed as a set of equality constraints

in which the fundamental parameters x ₁ , , x _k are unconstrained.
Linear equation constraints: For example, the parameters q ₁ , q ₂ , q ₃ should be linearly constrained in the form

which can be expressed in the form of three explicit equations in which the fundamental parameters x ₁ and x ₂ are unconstrained:

Refer to McDonald (1980) and Browne (1982) for further notes on reparameterizing techniques. If the optimization problem is not too large to apply the Levenberg-Marquardt or Newton-Raphson algorithm, boundary constraints should be requested by the BOUNDS statement rather than by reparameterizing code. If the problem is so large that you must use a quasi-Newton or conjugate gradient algorithm, reparameterizing techniques may be more efficient than the BOUNDS statement.

Counting the Degrees of Freedom

In a regression problem, the number of degrees of freedom for the error estimate is the number of observations in the data set minus the number of parameters. The NOBS=, DFR= (RDF=), and DFE= (EDF=) options refer to degrees of freedom in this sense. However, these values are not related to the degrees of freedom of a test statistic used in a covariance or correlation structure analysis. The NOBS=, DFR=, and DFE= options should be used in PROC CALIS to specify only the effective number of observations in the input DATA= data set.

In general, the number of degrees of freedom in a covariance or correlation structure analysis is defined as the difference between the number of nonredundant values q in the observed n — n correlation or covariance matrix S and the number t of free parameters X used in the fit of the specified model, df = q ˆ’ t . Both values, q and t , are counted differently in different situations by PROC CALIS.

The number of nonredundant values q is generally equal to the number of lower triangular elements in the n — n moment matrix S including all diagonal elements, minus a constant c dependent upon special circumstances,

The number c is evaluated by adding the following quantities :

If you specify a linear structural equation model containing exogenous manifest variables by using the RAM or LINEQS statement, PROC CALIS adds to c the number of variances and covariances among these manifest exogenous variables, which are automatically set in the corresponding locations of the central model matrices (see the section Exogenous Manifest Variables on page 662).
If you specify the DFREDUCE= i option, PROC CALIS adds the specified number i to c . The number i can be a negative integer.
If you specify the NODIAG option to exclude the fit of the diagonal elements of the data matrix S , PROC CALIS adds the number n of diagonal elements to c .
If all the following conditions hold, then PROC CALIS adds to c the number of the diagonal locations:
- NODIAG and DFREDUC= options are not specified.
- A correlation structure is being fitted.
- The predicted correlation matrix contains constants on the diagonal.

In some complicated models, especially those using programming statements, PROC CALIS may not be able to detect all the constant predicted values. In such cases, you must specify the DFREDUCE= option to get the correct degrees of freedom.

The number t is the number of different parameter names used in constructing the model if you do not use programming statements to impose constraints on the parameters. Using programming statements in general introduces two kinds of parameters:

independent parameters, which are used only at the right-hand side of the expressions
dependent parameters, which are used at least once at the left-hand side of the expressions

The independent parameters belong to the parameters involved in the estimation process, whereas the dependent parameters are fully defined by the programming statements and can be computed from the independent parameters. In this case, the number t is the number of different parameter names used in the model specification, but not used in the programming statements, plus the number of independent parameters. The independent parameters and their initial values can be defined in a model specification statement or in a PARMS statement.

The degrees of freedom are automatically increased by the number of active constraints in the solution. Similarly, the number of parameters are decreased by the number of active constraints. This affects the computation of many fit statistics and indices. Refer to Dijkstra (1992) for a discussion of the validity of statistical inferences with active boundary constraints. If the researcher believes that the active constraints will have a small chance of occurrence in repeated sampling, it may be more suitable to turn off the automatic adjustment using the NOADJDF option.

Computational Problems

First Iteration Overflows

Analyzing a covariance matrix including high variances in the diagonal and using bad initial estimates for the parameters can easily lead to arithmetic overflows in the first iterations of the minimization algorithm. The line-search algorithms that work with cubic extrapolation are especially sensitive to arithmetic overflows. If this occurs with quasi-Newton or conjugate gradient minimization, you can specify the INSTEP= option to reduce the length of the first step. If an arithmetic overflow occurs in the first iteration of the Levenberg-Marquardt algorithm, you can specify the INSTEP= option to reduce the trust region radius of the first iteration. You also can change the minimization technique or the line-search method. If none of these help, you should consider

scaling the covariance matrix
providing better initial values
changing the model

No Convergence of Minimization Process

If convergence does not occur during the minimization process, perform the following tasks :

If there are negative variance estimates in the diagonal locations of the central model matrices, you can
- specify the BOUNDS statement to obtain nonnegative variance estimates
- specify the HEYWOOD option, if the FACTOR model statement is specified

Change the estimation method to obtain a better set of initial estimates. For example, if you use METHOD=ML, you can
- change to METHOD=LSML
- run some iterations with METHOD=DWLS or METHOD=GLS, write the results in an OUTRAM= data set, and use the results as initial values specified by an INRAM= data set in a second run with METHOD=ML

Change the optimization technique. For example, if you use the default TECH=LEVMAR, you can
- change to TECH=QUANEW or to TECH=NEWRAP
- run some iterations with TECH=CONGRA, write the results in an OUTRAM= data set, and use the results as initial values specified by an INRAM= data set in a second run with a different TECH= technique

Change or modify the update technique or the line-search algorithm, or both, when using TECH=QUANEW or TECH=CONGRA. For example, if you use the default update formula and the default line-search algorithm, you can
- change the update formula with the UPDATE= option
- change the line-search algorithm with the LIS= option
- specify a more precise line search with the LSPRECISION= option, if you use LIS=2 or LIS=3

You can allow more iterations and function calls by using the MAXIT= and MAXFU= options.
Change the initial values. For many categories of model specifications done by the LINEQS, RAM, or FACTOR model, PROC CALIS computes an appropriate set of initial values automatically. However, for some of the model specifications (for example, structural equations with latent variables on the left-hand side and manifest variables on the right-hand side), PROC CALIS can generate very obscure initial values. In these cases, you have to set the initial values yourself.
- Increase the initial values of the parameters located at the diagonal of central matrices
  - manually, by setting the values in the model specification
  - automatically, by using the DEMPHAS= option
- Use a slightly different, but more stable, model to obtain preliminary estimates.
- Use additional information to specify initial values, for example, by using other SAS software like the FACTOR, REG, SYSLIN, and MODEL (SYSNLIN) procedures for the modified, unrestricted model case.

Change the optimization technique. For example, if you use the default TECH=LEVMAR, you can
- change to TECH=QUANEW or to TECH=NEWRAP
- run some iterations with TECH=CONGRA, write the results in an OUTRAM= data set, and use the results as initial values specified by an INRAM= data set in a second run with a different TECH= technique

Change or modify the update technique or the line-search algorithm, or both, when using TECH=QUANEW or TECH=CONGRA. For example, if you use the default update formula and the default line-search algorithm, you can

change the update formula with the UPDATE= option
change the line-search algorithm with the LIS= option
specify a more precise line search with the LSPRECISION= option, if you use LIS=2 or LIS=3

Temporarily change the estimation method to obtain a better set of initial estimates. For example, if you use METHOD=ML, you can
- change to METHOD=LSML
- run some iterations with METHOD=DWLS or GLS, write the results in an OUTRAM= data set, and use the results as initial values specified by an INRAM= data set in a second run with METHOD=ML

You can allow more iterations and function calls by using the MAXIT= and MAXFU= options.

Unidentified Model

The parameter vector x in the covariance structure model

is said to be identified in a parameter space G , if

implies . The parameter estimates that result from an unidentified model can be very far from the parameter estimates of a very similar but identified model. They are usually machine dependent. Don t use parameter estimates of an unidentified model as initial values for another run of PROC CALIS.

Singular Predicted Model Matrix

You can easily specify models with singular predicted model matrices, for example, by fixing diagonal elements of central matrices to 0. In such cases, you cannot compute maximum likelihood estimates (the ML function value F is not defined). Since singular predicted model matrices can also occur temporarily in the minimization process, PROC CALIS tries in such cases to change the parameter estimates so that the predicted model matrix becomes positive definite. In such cases, the following message is displayed:

  NOTE: Parameter set changed.

This process does not always work well, especially if there are fixed instead of variable diagonal elements in the central model matrices. A famous example where you cannot compute ML estimates is a component analysis with fewer components than given manifest variables. See the section FACTOR Model Statement on page 606 for more details. If you continue to get a singular predicted model matrix after changing initial values and optimization techniques, then your model is perhaps specified so that ML estimates cannot be computed.

Saving Computing Time

For large models, the most computing time is needed to compute the modification indices. If you don t really need the Lagrange multipliers or multiple Wald test indices (the univariate Wald test indices are the same as the t values), using the NOMOD option can save a considerable amount of computing time.

Central Matrices with Negative Eigenvalues

A covariance matrix cannot have negative eigenvalues, since a negative eigenvalue means that some linear combination of the variables has negative variance. PROC CALIS displays a warning if a central model matrix has negative eigenvalues but does not actually compute the eigenvalues. Sometimes this warning can be triggered by 0 or very small positive eigenvalues that appear negative because of numerical error. If you want to be sure that the central model matrix you are fitting can be considered to be a variance-covariance matrix, you can use the SAS/IML command VAL=EIGVAL(U) to compute the vector VAL of eigenvalues of matrix U .

Negative R ² Values

The estimated squared multiple correlations R ² of the endogenous variables are computed using the estimated error variances

If the model is a poor fit, it is possible that , which results in .

Displayed Output

The output displayed by PROC CALIS depends on the statement used to specify the model. Since an analysis requested by the LINEQS or RAM statement implies the analysis of a structural equation model, more statistics can be computed and displayed than for a covariance structure analysis following the generalized COSAN model requested by the COSAN statement. The displayed output resulting from use of the FACTOR statement includes all the COSAN displayed output as well as more statistics displayed only when you specify the FACTOR statement. Since the displayed output using the RAM statement differs only in its form from that generated by the LINEQS statement, in this section distinctions are made between COSAN and LINEQS output only.

The unweighted least-squares and diagonally weighted least-squares estimation methods do not provide a sufficient statistical basis to provide the following output (neither displayed nor written to an OUTEST= data set):

most of the fit indices
approximate standard errors
normalized or asymptotically standardized residuals
modification indices
information matrix
covariance matrix of parameter estimates

The notation S = ( s _ij ) is used for the analyzed covariance or correlation matrix, C = ( c _ij ) for the predicted model matrix, W for the weight matrix (for example, W = I for ULS, W = S for GLS, W = C for ML estimates), X for the vector of optimal parameter estimates, n for the number of manifest variables, t for the number of parameter estimates, and N for the sample size.

The output of PROC CALIS includes the following:

COSAN and LINEQS: List of the matrices and their properties specified by the generalized COSAN model if you specify at least the PSHORT option.
LINEQS: List of manifest variables that are not used in the specified model and that are automatically omitted from the analysis. Note that there is no automatic variable reduction with the COSAN or FACTOR statement. If necessary, you should use the VAR statement in these cases.
LINEQS: List of the endogenous and exogenous variables specified by the LINEQS, STD, and COV statements if you specify at least the PSHORT option.
COSAN: Initial values of the parameter matrices indicating positions of constants and parameters. The output, or at least the default output, is displayed if you specify the PINITIAL option.
LINEQS: The set of structural equations containing the initial values and indicating constants and parameters, and output of the initial error variances and covariances. The output, or at least the default output, is displayed if you specify the PINITIAL option.
COSAN and LINEQS: The weight matrix W is displayed if GLS, WLS, or DWLS estimation is used and you specify the PWEIGHT or PALL option.
COSAN and LINEQS: General information about the estimation problem: number of observations ( N ), number of manifest variables ( n ), amount of independent information in the data matrix (information, n ( n +1) / 2), number of terms and matrices in the specified generalized COSAN model, and number of parameters to be estimated (parameters, t ). If there are no exogenous manifest variables, the difference between the amount of independent information ( n ( n +1) / 2) and the number of requested estimates ( t ) is equal to the degrees of freedom ( df ). A necessary condition for a model to be identified is that the degrees of freedom are nonnegative. The output, or at least the default output, is displayed if you specify the SIMPLE option.
COSAN and LINEQS: Mean and Std Dev (standard deviation) of each variable if you specify the SIMPLE option, as well as skewness and kurtosis if the DATA= data set is a raw data set and you specify the KURTOSIS option.
COSAN and LINEQS: Various coefficients of multivariate kurtosis and the numbers of observations that contribute most to the normalized multivariate kurtosis if the DATA= data set is a raw data set and the KURTOSIS option, or you specify at least the PRINT option. See the section Measures of Multivariate Kurtosis on page 658 for more information.
COSAN and LINEQS: Covariance or correlation matrix to be analyzed and the value of its determinant if you specify the output option PCORR or PALL. A 0 determinant indicates a singular data matrix. In this case, the generalized least-squares estimates with default weight matrix S and maximum likelihood estimates cannot be computed.
LINEQS: If exogenous manifest variables in the linear structural equation model are specified, then there is a one-to-one relationship between the given covariances and corresponding estimates in the central model matrix or P . The output indicates which manifest variables are recognized as exogenous, that is, for which variables the entries in the central model matrix are set to fixed parameters. The output, or at least the default output, is displayed if you specify the PINITIAL option.
COSAN and LINEQS: Vector of parameter names, initial values, and corresponding matrix locations, also indicating dependent parameter names used in your program statements that are not allocated to matrix locations and have no influence on the fit function. The output, or at least the default output, is displayed if you specify the PINITIAL option.
COSAN and LINEQS: The pattern of variable and constant elements of the predicted moment matrix that is predetermined by the analysis model is displayed if there are significant differences between constant elements in the predicted model matrix and the data matrix and you specify at least the PSHORT option. It is also displayed if you specify the PREDET option. The output indicates the differences between constant values in the predicted model matrix and the data matrix that is analyzed.
COSAN and LINEQS: Special features of the optimization technique chosen if you specify at least the PSHORT option.
COSAN and LINEQS: Optimization history if at least the PSHORT option is specified. For more details, see the section Use of Optimization Techniques on page 664.
COSAN and LINEQS: Specific output requested by options in the NLOPTIONS statement; for example, parameter estimates, gradient, gradient of Lagrange function, constraints, Lagrange multipliers, projected gradient, Hessian, projected Hessian, Hessian of Lagrange function, Jacobian of nonlinear constraints.
COSAN and LINEQS: The predicted model matrix and its determinant, if you specify the output option PCORR or PALL.
COSAN and LINEQS: Residual and normalized residual matrix if you specify the RESIDUAL, or at least the PRINT option. The variance standardized or asymptotically standardized residual matrix can be displayed also. The average residual and the average off-diagonal residual are also displayed. See the section Assessment of Fit on page 649 for more details.
COSAN and LINEQS: Rank order of the largest normalized residuals if you specify the RESIDUAL, or at least the PRINT option.
COSAN and LINEQS: Bar chart of the normalized residuals if you specify the RESIDUAL, or at least the PRINT option.
COSAN and LINEQS: Value of the fit function F . See the section Estimation Criteria on page 644 for more details. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: Goodness-of-fit index (GFI), adjusted goodness-of-fit index (AGFI), and root mean square residual (RMR) (J reskog and S rbom 1985). See the section Assessment of Fit on page 649 for more details. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: Parsimonious goodness-of-fit index (PGFI) of Mulaik et al. (1989). See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: Overall ² , df , and Prob>Chi**2 if the METHOD= option is not ULS or DWLS. The ² measure is the optimum function value F multiplied by ( N ˆ’ 1) if a CORR or COV matrix is analyzed or multiplied by N if a UCORR or UCOV matrix is analyzed; ² measures the likelihood ratio test statistic for the null hypothesis that the predicted matrix C has the specified model structure against the alternative that C is unconstrained. The notation Prob>Chi**2 means the probability under the null hypothesis of obtaining a greater ² statistic than that observed. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If METHOD= is not ULS or DWLS, the value of the independence model and the corresponding degrees of freedom can be used (in large samples) to evaluate the gain of explanation by fitting the specific model (Bentler 1989). See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If METHOD= is not ULS or DWLS, the value of the Steiger & Lind (1980) root mean squared error of approximation (RMSEA) coefficient and the lower and upper limits of the confidence interval. The size of the confidence interval is defined by the option ALPHARMS= ± , 0 ‰ ± ‰ 1. The default is ± = 0 . 1, which corresponds to a 90% confidence interval. See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If the value of the METHOD= option is not ULS or DWLS, the value of the probability of close fit (Browne and Cudeck 1993). See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If the value of the METHOD= option is not ULS or DWLS, the value of the Browne & Cudeck (1993) expected cross validation (ECVI) index and the lower and upper limits of the confidence interval. The size of the confidence interval is defined by the option ALPHAECV= ± , 0 ‰ ± ‰ 1. The default is ± = 0 . 1, which corresponds to a 90% confidence interval. See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If the value of the METHOD= option is not ULS or DWLS, Bentler s (1989) Comparative Fit Index. See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If you specify METHOD=ML or METHOD=GLS, the ² value and corresponding probability adjusted by the relative kurtosis coefficient · ₂ , which should be a close approximation of the ² value for elliptically distributed data (Browne 1982). See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: The Normal Theory Reweighted LS ² Value is displayed if METHOD= ML. Instead of the function value F _ML , the reweighted goodness-of-fit function F _GWLS is used. See the section Assessment of Fit on page 649 for more detail.
COSAN and LINEQS: Akaike s Information Criterion if the value of the METHOD= option is not ULS or DWLS. See the section Assessment of Fit on page 649. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: Bozdogan s (1987) Consistent Information Criterion, CAIC. See the section Assessment of Fit on page 649. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: Schwarz s Bayesian Criterion (SBC) if the value of the METHOD= option is not ULS or DWLS (Schwarz 1978). See the section Assessment of Fit on page 649. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: If the value of the METHOD= option is not ULS or DWLS, the following fit indices based on the overall ² value are displayed:
- McDonald s (McDonald and Hartmann 1992) measure of centrality
- Parsimonious index of James, Mulaik, and Brett (1982)
- Z-Test of Wilson and Hilferty (1931)
- Bentler and Bonett s (1980) nonnormed coefficient
- Bentler and Bonett s (1980) normed coefficient
- Bollen s (1986) normed index ₁
- Bollen s (1989a) nonnormed index ” ₂
See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.

COSAN and LINEQS: Hoelter s (1983) Critical N Index is displayed (Bollen 1989b, p. 277). See the section Assessment of Fit on page 649 for more detail. This output can be suppressed only by the NOPRINT option.
COSAN and LINEQS: Equations of linear dependencies among the parameters used in the model specification if the information matrix is recognized as singular at the final solution.
COSAN: Model matrices containing the parameter estimates. Except for ULS or DWLS estimates, the approximate standard errors and t values are also displayed. This output is displayed if you specify the PESTIM option or at least the PSHORT option.
LINEQS: Linear equations containing the parameter estimates. Except for ULS and DWLS estimates, the approximate standard errors and t values are also displayed. This output is displayed if you specify the PESTIM option, or at least the PSHORT option.
LINEQS: Variances and covariances of the exogenous variables. This output is displayed if you specify the PESTIM option, or at least the PSHORT.
LINEQS: Linear equations containing the standardized parameter estimates. This output is displayed if you specify the PESTIM option, or at least the PSHORT option.
LINEQS: Table of correlations among the exogenous variables. This output is displayed if you specify the PESTIM option, or at least the PSHORT option.
LINEQS: Correlations among the exogenous variables. This output is displayed if you specify the PESTIM option, or at least the PSHORT option.
LINEQS: Squared Multiple Correlations table, which displays the error variances of the endogenous variables. These are the diagonal elements of the predicted model matrix. Also displayed is the Total Variance and the R ² values corresponding to all endogenous variables. See the section Assessment of Fit on page 649 for more detail. This output is displayed if you specify the PESTIM option, or at least the PSHORT option.
LINEQS: If you specify the PDETERM or the PALL option, the total determination of all equations (DETAE), the total determination of the structural equations (DETSE), and the total determination of the manifest variables (DETMV) are displayed. See the section Assessment of Fit on page 649 for more details. If one of the determinants in the formulas is 0, the corresponding coefficient is displayed as a missing value. If there are structural equations, PROC CALIS also displays the Stability Coefficient of Reciprocal Causation, that is, the largest eigenvalue of the BB ² matrix, where B is the causal coefficient matrix of the structural equations.
LINEQS: The matrix of estimated covariances among the latent variables if you specify the PLATCOV option, or at least the PRINT option.
LINEQS: The matrix of estimated covariances between latent and manifest variables used in the model if you specify the PLATCOV option, or at least the PRINT option.
LINEQS and FACTOR: The matrix FSR of latent variable scores regression coefficients if you specify the PLATCOV option, or at least the PRINT option.

The FSR matrix is a generalization of Lawley and Maxwell s (1971, p.109) factor scores regression matrix,

where C _xx is the n — n predicted model matrix (predicted covariances among manifest variables) and C _yx is the n _lat — n matrix of the predicted covariances between latent and manifest variables. You can multiply the manifest observations by this matrix to estimate the scores of the latent variables used in your model.
LINEQS: The matrix TEF of total effects if you specify the TOTEFF option, or at least the PRINT option. For the LINEQS model, the matrix of total effects is

(For the LISREL model, refer to J reskog and S rbom 1985) The matrix of indirect effects is displayed also.
FACTOR: The matrix of rotated factor loadings and the orthogonal transformation matrix if you specify the ROTATE= and PESTIM options, or at least the PSHORT options.
FACTOR: Standardized (rotated) factor loadings, variance estimates of endogenous variables, R ² values, correlations among factors, and factor scores regression matrix, if you specify the PESTIM option, or at least the PSHORT option. The determination of manifest variables is displayed only if you specify the PDETERM option.
COSAN and LINEQS: Univariate Lagrange multiplier and Wald test indices are displayed in matrix form if you specify the MODIFICATION (or MOD) or the PALL option. Those matrix locations that correspond to constants in the model in general contain three values: the value of the Lagrange multiplier, the corresponding probability ( df =1), and the estimated change of the parameter value should the constant be changed to a parameter. If allowing the constant to be an estimated parameter would result in a singular information matrix, the string sing is displayed instead of the Lagrange multiplier index. Those matrix locations that correspond to parameter estimates in the model contain the Wald test index and the name of the parameter in the model. See the section Modification Indices on page 673 for more detail.
COSAN and LINEQS: Univariate Lagrange multiplier test indices for releasing equality constraints if you specify the MODIFICATION (or MOD) or the PALL option. See the section Modification Indices on page 673 for more detail.
COSAN and LINEQS: Univariate Lagrange multiplier test indices for releasing active boundary constraints specified by the BOUNDS statement if you specify the MODIFICATION (or MOD) or the PALL option. See the section Modification Indices on page 673 for more detail.
COSAN and LINEQS: If the MODIFICATION (or MOD) or the PALL option is specified, the stepwise multivariate Wald test for constraining estimated parameters to zero constants is performed as long as the univariate probability is larger than the value specified in the PMW= option (default PMW=0.05). See the section Modification Indices on page 673 for more details.

ODS Table Names

PROC CALIS assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 19.15: ODS Tables Created in PROC CALIS
ODS Table Name	Model ^[1]	Description	Option ^[2]
AddParms	C, F, L, R	Additional parameters in the PARAMETERS statement	PINITIAL, or default
AsymStdRes	C, F, L, R	Asymptotically standardized residual matrix	RESIDUAL=, or PRINT
AveAsymStdRes	C, F, L, R	Average absolute asymptotically standardized residuals	RESIDUAL=, or PRINT
AveNormRes	C, F, L, R	Average absolute normalized residuals	RESIDUAL=, or PRINT
AveRawRes	C, F, L, R	Average absolute raw residuals	RESIDUAL=, or PRINT
AveVarStdRes	C, F, L, R	Average absolute variance standardized residuals	RESIDUAL=, or PRINT
ContKurtosis	C, F, L, R	Contributions to kurtosis	KURTOSIS, or PRINT
ConvergenceStatus	C, F, L, R	Convergence status	PSHORT
CorrExog	L	Correlations among exogenous variables	PESTIM, or PSHORT
CorrParm	C, F, L, R	Correlations among parameter estimates	PCOVES, and default
CovMat	C, F, L, R	Assorted cov matrices	PCOVES, and default
DependParms	C, F, L, R	Dependent parameters (if specified by program statements)	PRIVEC, and default
Determination	L, F, R	Coefficients of determination	PDETERM, and default
DistAsymStdRes	C, F, L, R	Distribution of asymptotically standardized residuals	RESIDUAL=, or PRINT
DistNormRes	C, F, L, R	Distribution of normalized residuals	RESIDUAL=, or PRINT
DistVarStdRes	C, F, L, R	Distribution of variance standardized residuals	RESIDUAL=, or PRINT
EndogenousVar	L	Endogenous variables	PESTIM, or PSHORT
EstCovExog	L	Estimated covariances among exogenous variables	PESTIM, or PSHORT
Estimates	C, F, L, R	Vector of estimates	PRIVEC
EstLatentEq	L	Estimated latent variable equations	PESTIM, or PSHORT
EstManifestEq	L	Estimated manifest variable equations	PESTIM, or PSHORT
EstParms	C, F	Estimated parameter matrix	PESTIM, or PSHORT
EstVarExog	L	Estimated variances of exogenous variables	PESTIM, or PSHORT
ExogenousVar	L	List of exogenous variables	PESTIM, or PSHORT
FactCorrExog	F	Correlations among factors	PESTIM, or PSHORT
FactScoreCoef	F	Factor score regression coefficients	PESTIM, or PSHORT
Fit	C, F, L, R	Fit statistics	PSUMMARY
GenModInfo	C, F, L, R	General modeling information	PSIMPLE, or default
Gradient	C, F, L, R	First partial derivatives (Gradient)	PRIVEC, and default
InCorr	C, F, L, R	Input correlation matrix	PCORR, or PALL
InCorrDet	C, F, L, R	Determinant of the input correlation matrix	PCORR, or PALL
InCov	C, F, L, R	Input covariance matrix	PCORR, or PALL
InCovDet	C, F, L, R	Determinant of the input covariance matrix	PCORR, or PALL
InCovExog	L	Input covariances among exogenous variables	PESTIM, or PSHORT
Indirect Effects	L, R	Indirect effects	TOTEFF, or PRINT
Information	C, F, L, R	Information matrix	PCOVES, and default
InitEstimates	C, F, L, R	Initial vector of parameter estimates	PINITIAL, or default
InitParms	C, F	Initial matrix of parameter estimates	PINITIAL, or default
InitParms	L, R	Initial matrix of parameter estimates	PRIMAT, and default
InitRAMEstimates	R	Initial RAM estimates	PESTIM, or PSHORT
InLatentEq	L	Input latent variable equations	PESTIM, or PSHORT
InManifestEq	L	Input manifest variable equations	PESTIM, or PSHORT
InSymmetric	C, F, L, R	Input symmetric matrix (SYMATRIX data type)	PCORR, or PALL
InVarExog	L	Input variances of exogenous variables	PESTIM, or PSHORT
IterHist	C, F, L, R	Iteration history	PSHORT
IterStart	C,F,L,R	Iterationstart	PSHORT
IterStop	C, F, L, R	Iteration stop	PSHORT
Jacobian	C, F, L, R	Jacobi column pattern	PJACPAT
Kurtosis	C, F, L, R	Kurtosis, with raw data input	KURTOSIS, or PRINT
LagrangeBoundary	C, F, L, R	Lagrange, releasing active boundary constraints	MODIFICATION ^[3] , or PALL
LagrangeEquality	C, F, L, R	Lagrange, releasing equality constraints	MODIFICATION, or PALL
LatentScoreCoef	L, R	Latent variable regression score coefficients	PLATCOV, or PRINT
ModelStatement	C, F, L, R	Model summary	PSHORT
ModIndices	C, F, L, R	Lagrange multiplier and Wald test statistics	MODIFICATION, or PALL
NormRes	C, F, L, R	Normalized residual matrix	RESIDUAL=, or PRINT
PredetElements	C, F, L, R	Predetermined elements	PREDET, or PALL
PredModel	C, F, L, R	Predicted model matrix	PCORR, or PALL
PredModelDet	C, F, L, R	Predicted model determinant	PCORR, or PALL
PredMomentLatent	L, R	Predicted latent variable moments	PLATCOV, or PRINT
PredMomentManLat	L, R	Predicted manifest and latent variable moments	PLATCOV, or PRINT
ProblemDescription	C, F, L, R	Problem description	PSHORT
RAMCorrExog	R	Correlations among exogenous variables	PESTIM, or PSHORT
RAMEstimates	R	RAM Final Estimates	PESTIM, or PSHORT
RAMStdEstimates	R	Standardized estimates	PESTIM, or PSHORT
RankAsymStdRes	C, F, L, R	Ranking of the largest asymptotically standardized residuals	RESIDUAL=, or PRINT
RankLagrange	C, F, L, R	Ranking of the largest Lagrange indices	RESIDUAL=, or PRINT
RankNormRes	C, F, L, R	Ranking of the largest normalized residuals	RESIDUAL=, or PRINT
RankRawRes	C, F, L, R	Ranking of the largest raw residuals	RESIDUAL=, or PRINT
RankVarStdRes	C, F, L, R	Ranking of the largest variance standardized residuals	RESIDUAL=, or PRINT
RawRes	C, F, L, R	Raw residual matrix	RESIDUAL=, or PRINT
RotatedLoadings	F	Rotated loadings, with ROTATE= option in FACTOR statement	PESTIM, or PSHORT
Rotation	F	Rotation Matrix, with ROTATE= option in FACTOR statement	PESTIM, or PSHORT
SetCovExog	L, R	Set covariance parameters for manifest exogenous variables	PINITIAL, or default
SimpleStatistics	C, F, L, R	Simple statistics, with raw data input	SIMPLE, or default
SqMultCorr	F, L, R	Squared multiple correlations	PESTIM, or PSHORT
Stability	L, R	Stability of reciprocal causation	PDETERM, and default
StdErrs	C, F, L, R	Vector of standard errors	PRIVEC, and default
StdLatentEq	L	Standardized latent variable equations	PESTIM, or PSHORT
StdLoadings	F	Standardized factor loadings	PESTIM, or PSHORT
StdManifestEq	L	Standardized manifest variable equations	PESTIM, or PSHORT
StructEq	L, R	Variables in the structural equations	PDETERM, and default
SumSqDif	C, F, L, R	Sum of squared differences of predetermined elements	PREDET, or PALL
TotalEffects	L, R	Total effects	TOTEFF, or PRINT
tValues	C, F, L, R	Vector of t values	PRIVEC, and default
VarSelection	L, R	Manifest variables, if not all are used, selected for Modeling	default
VarStdRes	C, F, L, R	Variance standardized residual matrix	RESIDUAL=, or PRINT
WaldTest	C, F, L, R	Wald test	MODIFICATION, or PALL
Weights	C, F, L, R	Weight matrix	PWEIGHT ^[4] , or PALL
WeightsDet	C, F, L, R	Determinant of the weight matrix	PWEIGHT ^[4] , or PALL
^[1] Most CALIS output tables are specific to the model statement used. Keys: C: COSAN model, F: FACTOR model, L: LINEQS model, R: RAM model. ^[2] The printing options PALL, PRINT, default , PSHORT, and PSUMM form hierarchical levels of output control, with PALL including all the output enabled by the options at the lower levels, and so on. The default option means that NOPRINT is not specified. Therefore, in the table, for example, if PSHORT is the printing option for an output, PALL, PRINT, or default will also enable the same output printing. ^[3] The printing of LagrangeBoundary is effective only if you have set some boundary constraints for parameters. ^[4] The printing of Weights or WeightsDet is effective only if your estimation method uses the weight matrix (e.g., WLS or LSWLS).