WHAT IS STRUCTURAL EQUATION MODELING?


Structural equation modeling (SEM) encompasses an entire family of models known by many names , among them covariance structure analysis, latent variable analysis, confirmatory factor analysis, and often simply LISREL analysis (the name of one of the more popular software packages). Resulting from an evolution of multi-equation modeling developed principally in econometrics and merged with the principles of measurement from psychology and sociology, SEM has emerged as an integral tool in both managerial and academic research (Austin and Calderon, 1996; Bagozzi and Yi, 1988; Bentler, 1980; Breckler, 1990; Dolan, 1996; Duncan, 1975; Fan, 1997; Hatcher, 1996; Hox, 1995; Joreskog and Sorbom, 1993a; Marsh and Hoceuar, 1994; McDonald and Marsh, 1990; Neale et al., 1989; O'Brien and Reilly, 1995; Predhazur and Schmelkin, 1992; Rigton, 1996; Robles, 1996; Rubio and Gillespie, 1995; Steenkamp and van Trijp, 1991; Tremblay and Gardner, 1996) can also be used as a means of estimating other multivariate models, including regression, principal components (Dolan, 1996), canonical correlation (Fan, 1997), and even MANOVA (Bagozzi, 1988).

As might be expected for a technique with such widespread use and so many variations in applications, many researchers are uncertain about what constitutes structural equation modeling. Yet all SEM techniques are distinguished by two characteristics: estimation of multiple and interrelated dependence relationships, and the ability to represent unobserved concepts in these relationships and account for measurement error in the estimation process.

ACCOMMODATING MULTIPLE INTERRELATED DEPENDENCE RELATIONSHIPS

The most obvious difference between SEM and other multivariate techniques is the use of separate relationships for each of a set of dependent variables. In simple terms, SEM estimates a series of separate, but interdependent, multiple regression equations simultaneously by specifying the structural model used by the statistical program. First, the experimenter draws upon theory, prior experience, and the research objectives to distinguish which independent variables predict each dependent variable. For example, we may first want to predict "car" image. We then may want to use "car" image to predict satisfaction, both of which in turn may be used to predict "car" loyalty. Thus, some dependent variables become independent variables in subsequent relationships, giving rise to the interdependent nature of the structural model. Moreover, many of the same variables affect each of the dependent variables, but with differing effects. The structural model expresses these relationships among independent and dependent variables, even when a dependent variable becomes an independent variable in other relationships.

The proposed relationships are then translated into a series of structural equations (similar to regression equations) for each dependent variable. This feature sets SEM apart from techniques discussed previously that accommodate multiple dependent variables ” multivariate analysis of variance and canonical correlation ” in that they allow only a single relationship between dependent and independent variables.

INCORPORATING VARIABLES THAT WE DO NOT MEASURE DIRECTLY

The estimation of multiple interrelated dependence relationships is not the only unique element of structural equation modeling. SEM also has the ability to incorporate latent variables into the analysis. A latent variable is a hypothesized and unobserved concept that can only be approximated by observable or measurable variables. The observed variables, which we gather from respondents through various data collection methods (e.g., surveys, tests, observations), are known as manifest variables. Yet why would we want to use a latent variable that we did not measure instead of the exact data (manifest variables) the respondents provided? Although this may sound like a nonsensical or "black box" approach, it has both practical and theoretical justification by improving statistical estimation, better representing theoretical concepts, and accounting for measurement error.

IMPROVING STATISTICAL ESTIMATION

Statistical theory tells us that a regression coefficient is actually composed of two elements: the "true" or structural coefficient between the dependent and independent variable and the reliability of the predictor variable. Reliability is the degree to which the independent variable is "error-free" (Blalock, 1982). In all the multivariate techniques to this point, we have assumed we had no error in our variables. But we know from both practical and theoretical perspectives that we cannot perfectly measure a concept and that there is always some degree of measurement error. For example, when asking about something as straightforward as household income, we know some people will answer incorrectly, either overstating or understating the amount or not knowing it precisely. The answers provided have some measurement error and thus affect the estimation of the "true" structural coefficient (Rigdon, 1994).

The impact of measurement error (and the corresponding lowered reliability) can be shown from an expression of the regression coefficient as

² yx = ² s X x

where ² yx is the observed regression coefficient, ² s is the "true" structural coefficient, and x is the reliability of the predictor variable. Unless the reliability is 100%, the observed correlation will always understate the "true" relationship. Because all dependence relationships are based on the observed correlation (and resulting regression coefficient) between variables, we would hope to "strengthen" the correlations used in the dependence models and make them more accurate estimates of the structural coefficients by first accounting for the correlation attributable to any number of measurement problems.

OVERALL GOODNESS-OF-FIT MEASURES FOR STRUCTURAL EQUATION MODELING

Assessing the overall goodness-of-fit for structural equation models is not as straightforward as with other multivariate dependence techniques such as multiple regression, discriminant analysis, multivariate analysis of variance, or even conjoint analysis. SEM has no single statistical test that best describes the "strength" of the model's predictions . Instead, experimenters have developed a number of goodness-of-fit measures that, when used in combination, assess the results from three perspectives: overall fit, comparative fit to a base model, and model parsimony. The discussions that follow present alternative measures for each of these perspectives, along with the methods of calculation for those measures that are not contained in the results and that must be computed separately.

One common question arises in the discussion of each measure: What is an acceptable level of fit? None of the measures (except the chi-square statistic) has an associated statistical test. Although in many instances guidelines have been suggested, no absolute test is available, and the experimenter must ultimately decide whether the fit is acceptable. Bollen (1989, p. 275) addresses this issue directly: "Overall, selecting a rigid cutoff for the incremental fit indices is like selecting a minimum R 2 for a regression equation. Any value will be controversial . Awareness of the factors affecting the values and good judgment are the best guides to evaluating their size ." This advice applies equally well to the other goodness-of-fit measures.

Before examining the various goodness-of-fit measures, it may be useful to review the derivation of degrees of freedom in structural models. The number of unique data values in the input matrix is s (where s = 1/ 2 ( k )( k - 1) and k is the total number of indicators for both endogenous and exogenous constructs). The degrees of freedom ( df ) for any estimated model are then calculated as df = s - t , where t is the number of estimated coefficients. If the experimenter knows the df for an estimated model and the total number of indicators, then t can be calculated directly as t = s - df .

The examination and derivation of goodness-of-fit measures for SEM has gained widespread interest among academic researchers in recent years , resulting in the continual development of new goodness-of-fit measures (Ding et al., 1995; Rigdon, 1994, 1996; Tanaka, 1993; Satorra and Bentler, 1994; and others). This is reflected in the statistical programs as they are continually modified to provide the most relevant information regarding the estimated model. In this discussion, we have focused our attention on the LISREL program because of its widespread application. It has undergone these changes as well. The newest version of LISREL substantially expands the number and type of fit indices available directly in the output. For this reason, the following discussion and example data detail the calculations of those measures not provided in earlier versions of the program.

MEASURES OF ABSOLUTE FIT

Absolute fit measures determine the degree to which the overall model (structural and measurement models) predicts the observed covariance or correlation matrix. No distinction is made as to whether the model fit is better or worse in the structural or measurement models. Among the absolute fit measures commonly used to evaluate SEM are the chi-square statistic, the noncentrality parameter, the goodness-of-fit statistic, the root mean square error, the root mean square error of approximation , and the expected cross-validation index.

Likelihood -Ratio Chi-Square Statistic

The most fundamental measure of overall fit is the likelihood-ratio chi-square ( 2 ) statistic, the only statistically based measure of goodness-of-fit available in SEM (Joreskog and Sorbom, 1993b). A large value of chi-square relative to the degrees of freedom signifies that the observed and estimated matrices differ considerably. Statistical significance levels indicate the probability that these differences are due solely to sampling variations. Thus, low chi-square values, which result in significance levels greater than .05 or .01, indicate that the actual and predicted input matrices are not statistically different. In this instance, the experimenter is looking for nonsignificant differences because the test is between actual and predicted matrices. The experimenter must remember that this method differs from the customary desire to find statistical significance. However, even statistical nonsignificance does not guarantee that the "correct" model has been identified, but only that this proposed model fits the observed covariances and correlations well. It does not assure the experimenter that another model would not fit as well or better. The .05 significance level is recommended as the minimum accepted, and levels of .1 or .2 should be exceeded before nonsignificance is confirmed (Fornell, 1983).

An important criticism of the chi-square measure is that it is too sensitive to sample size differences, especially for cases in which the sample size exceeds 200 respondents. As sample size increases , this measure has a greater tendency to indicate significant differences for equivalent models. If the sample size becomes large enough, significant differences will be found for any specified model. Moreover, as the sample size nears 100 or goes even lower, the chi-square test will show acceptable fit (nonsignificant differences in the predicted and observed input matrices), even when none of the model relationships is shown to be statistically significant. Thus, the chi-square statistic is quite sensitive in different ways to both small and large sample sizes, and the experimenter is encouraged to complement this measure with other measures of fit in all instances. The use of chi-square is appropriate for sample sizes between 100 and 200, with the significance test becoming less reliable with sample sizes outside this range.

The sensitivity of the chi-square measure extends past sample size considerations. For example, it has been shown that this measure varies based on the number of categories in the response variable (Green et al., 1997). Given its sensitivity to many factors, the researcher is encouraged to complement the chi-square measure with other goodness-of-fit measures.

Noncentrality and Scaled Noncentrality Parameters

The noncentrality parameter (NCP) is the result of statisticians' search for an alternative measure to the likelihood-ratio chi-square statistic that is less affected by or independent of the sample size. Statistical theory suggests that a noncentrality chi-square measure will be less affected by sample size in its representation of the differences between the actual and estimated data matrices (McDonald and Marsh, 1990). In a LISREL problem, the noncentrality parameter can be calculated as:

NCP = 2 - Degrees of freedom

Although this measure adjusts the chi-square by the degrees of freedom of the estimated model, it is still in terms of the original sample size. To "standardize" the NCP, divide it by the sample size to obtain the scaled noncentrality parameter (SNCP) (McDonald and Marsh, 1990). This can be calculated as

SNCP = [ 2 - Degrees of freedom]/Sample size

This scaled measure is analogous to the average squared Euclidean distance measure between the estimated model and the unrestricted model (McDonald and Marsh, 1990). For both the unscaled and the scaled parameters, the objective is to minimize the parameter value. Because there is no statistical test for this measure, it is best used in making comparisons between alternative models.

Goodness-of-Fit Index

The goodness-of-fit index (Joreskog and Sorbom, 1988b; Joreskog and Sorbom, 1993a) is another measure provided by LISREL. It is a nonstatistical measure ranging in value from 0 (poor fit) to 1.0 (perfect fit). It represents the overall degree of fit (the squared residuals from prediction compared with the actual data), but it is not adjusted for the degrees of freedom. Higher values indicate better fit, but no absolute threshold levels for acceptability have been established.

Root Mean Square Residual (RMSR)

The root mean square residual is the square root of the mean of the squared residuals ” an average of the residuals between observed and estimated input matrices. If covariances are used, the RMSR is the average residual covariance. If a correlation matrix is used, then the RMSR is in terms of an average residual correlation. The RMSR is more useful for correlations, which are all on the same scale, than for covariances, which may differ from variable to variable depending on unit of measure. Again, no threshold level can be established, but the experimenter can assess the practical significance of the magnitude of the RMSR in light of the research objectives and the observed or actual covariances or correlations (Bagozzi and Yi, 1988).

Root Mean Square Error of Approximation

Another measure that attempts to correct for the tendency of the chi-square statistic to reject any specified model with a sufficiently large sample is the root mean square error of approximation (RMSEA). Similar to the RMSR, the RMSEA is the discrepancy per degree of freedom. It differs from the RMSR, however, in that the discrepancy is measured in terms of the population, not just the sample used for estimation (Steiger, 1990). The value is representative of the goodness-of-fit that could be expected if the model were estimated in the population, not just the sample drawn for the estimation. Values ranging from .05 to .08 are deemed acceptable. An empirical examination of several measures found that the RMSEA was best suited to use in a confirmatory or competing models strategy with larger samples (Rigdon, 1996).

Expected Cross-Validation Index

The expected cross-validation index (ECVI) is an approximation of the goodness-of-fit the estimated model would achieve in another sample of the same size. Based on the sample covariance matrix, it takes into account the actual sample size and the difference that could be expected in another sample. The ECVI also takes into account the number of estimated parameters for both the structural and measurement models. The ECVI is calculated as

ECVI = 2 /Sample size - 1 + (2 — number of estimated parameters)/Sample size - 1

The EVCI has no specified range of acceptable values, but it is used in comparisons between alternative models.

Cross-Validation Index

The cross-validation index (CVI) assesses goodness-of-fit when an actual cross-validation has been performed. Cross-validation is performed in two steps. First, the overall sample is split into two samples ” an estimation sample and a validation sample. The estimation sample is used to estimate a model and create the estimated correlation of covariance matrix. This matrix is then compared to the sample from the validation sample. A double cross-validation process can be performed by comparing the estimated correlation or covariance matrix from each sample to a data matrix from the other sample.

INCREMENTAL FIT MEASURES

The second class of measures compares the proposed model to some baseline model, most often referred to as the null model. The null model should be some realistic model that all other models should be expected to exceed. In most cases, the null model is a single-construct model with all indicators perfectly measuring the construct (i.e., this represents the chi-square value associated with the total variance in the set of correlations or covariances). There is, however, some disagreement over exactly how to specify the null model in many situations (Sobel and Bohrnstedt, 1985).

Adjusted Goodness-of-Fit Index

The adjusted goodness-of-fit is an extension of the GFI, adjusted by the ratio of degrees of freedom for the proposed model to the degrees of freedom for the null model. It is quite similar to the parsimonious normed fit index and a recommended acceptance level is a value greater than or equal to .90.

Tucker-Lewis Index

The next incremental fit measure is the Tucker-Lewis index (Tucker and Lewis, 1973), also known as the nonnormed fit index (NNFI). First proposed as a means of evaluating factor analysis, the TLI has been extended to SEM. It combines a measure of parsimony into a comparative index between the proposed and null models, resulting in values ranging from 0 to 1.0. It is expressed as:

click to expand

A recommended value of TLI is .90 or greater. This measure can also be used for comparing between alternative models by substituting the alternative model for the null model.

Normed Fit Index

One of the more popular measures is the normed fit index (Bentler and Bonnett, 1980), which is a measure ranging from 0 (no fit at all) to 1.0 (perfect fit). Again, the NFI is a relative comparison of the proposed model to the null model. The NFI is calculated as:

As with the Tucker-Lewis index, there is no absolute value indicating an acceptable level of fit, but a commonly recommended value is .90 or greater.

Other Incremental Fit Measures

A number of other incremental fit measures have been proposed, and the newer version of LISREL includes three in its output. The relative fit index (RFI), the incremental fit index (IFI), and the comparative fit index (CFI) all represent comparisons between the estimated model and a null or independence model. The values lie between 0 and 1.0, and larger values indicate higher levels of goodness-of-fit. The CFI has been found to be more appropriate in a model development strategy or when a smaller sample is available (Rigdon, 1996). The interested reader can find the specific details of each measure in selected readings (Bollen, 1986 and 1989; Bentler, 1990).

PARSIMONIOUS FIT MEASURES

Parsimonious fit measures relate the goodness-of-fit of the model to the number of estimated coefficients required to achieve this level of fit. Their basic objective is to diagnose whether model fit has been achieved by "overfitting" the data with too many coefficients. This procedure is similar to the "adjustment" of the R 2 in multiple regression. However, because no statistical test is available for these measures, their use in an absolute sense is limited in most instances to comparisons between models.

Parsimonious Normed Fit Index

The first measure in this case is the parsimonious normed fit index (PNFI) (James et al., 1982), a modification of the NFI. The PNFI takes into account the number of degrees of freedom used to achieve a level of fit. Parsimony is defined as achieving higher degrees of fit per degree of freedom used (one degree of freedom per estimated coefficient). Thus more parsimony is desirable. The PNFI is defined as:

Higher values of PNFI are better, and its principal use is for the comparison of models with differing degrees of freedom. It is used to compare alternative models, and there are no recommended levels of acceptable fit. However, when comparing between models, differences of .06 to .09 are proposed to be indicative of substantial model differences (Williams and Holahan, 1994).

Parsimonious Goodness-of-Fit Index

The parsimonious goodness-of-fit index (PGFI) modifies the GFI differently from the AGFI. Where the AGFI's adjustment of the GFI was based on the degrees of freedom in the estimated and null models, the PGFI is based on the parsimony of the estimated model. It adjusts the GFI in the following manner:

PGFI = [ df proposed ]/1/2 (No. of manifest variables)(No. of manifest variables + 1) — GFI

The value varies between 0 and 1.0, with higher values indicating greater model parsimony.

Normed Chi-Square

Joreskog (1970) proposed that the chi-square be "adjusted" by the degrees of freedom to assess model fit for various models. This measure can be termed the normed chi-square and is the ratio of the chi-square divided by the degrees of freedom. This measure provides two ways to assess inappropriate models: (1) a model that may be "overfitted," thereby capitalizing on chance, typified by values less than 1.0; and (2) models that are not yet truly representative of the observed data and thus need improvement, having values greater than an upper threshold, either 2.0 or 3.0 (Carmines and McIver, 1981) or the more liberal limit of 5.0 (Joreskog, 1970). However, because the chi-square value is the major component of this measure, it is subject to the sample size effects discussed earlier with regard to the chi-square statistic.

The normed chi-square has been shown to be somewhat unreliable (Hayduk, 1987; Wheaton, 1987), so experimenters should always combine it with other goodness-of-fit measures.

Akaike Information Criterion

Another measure based on statistical information theory is the Akaike information criterion (AIC; Akaike, 1987). Similar to the PNFI, the AIC is a comparative measure between models with differing numbers of constructs. The AIC is calculated as:

AIC = 2 + 2 — Number of estimated parameters

AIC values closer to zero indicate better fit and greater parsimony. A small AIC generally occurs when small chi-square values are achieved with fewer estimated coefficients. This shows not only a good fit of observed versus predicted covariances or correlations but also a model not prone to "overfitting."




Six Sigma and Beyond. Statistics and Probability
Six Sigma and Beyond: Statistics and Probability, Volume III
ISBN: 1574443127
EAN: 2147483647
Year: 2003
Pages: 252

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net