LIMITATIONS | Data Mining: Opportunities and Challenges

data mining: opportunities and challenges

Chapter X - Maximum Performance Efficiency Approaches for Estimating Best Practice Costs
Data Mining: Opportunities and Challenges
by John Wang (ed)
Idea Group Publishing 2003


	Brought to you by Team-Fly

Limitations may be discussed for both the new estimation technique itself and for its application to the present context and data. In order to more fully parallel existing OLS theory for model aptness testing, attention should be given to potential outliers, independence of the v_j, and constancy of the distribution of the v_j from trial to trial (analogous to homoscedasticity in OLS theory; see, for example, Madansky, 1988, and Neter, Wasserman & Kutner, 1985). Theory developments for these issues are not yet available for the MPE model.

Hypothesis tests and confidence intervals for the estimates do not appear to be readily derivable from the proposed approach. However, information on their variances can be obtained by simulation using additional specific assumptions. As an illustration, 100 data sets of 62 observations each were simulated as follows. A value of v_j was generated using the density model (4.2) and the estimates of p, α, and β. Then a vector, Y_rj, was generated according to the uniform distribution on the convex polytope where a_r* is given by (3.4). Then the MPE model, (2.3) (2.5) was solved for each data set and descriptive statistics for the estimates were obtained. Additional details on the simulation steps are given in the Appendix. The results are shown in Table 3.

Table 3: Descriptive statistics estimated from 100 simulated data sets
Estimate	a^₁*	*a^₂**	*a^₃**	*a^₄**

Mean	0.2687	0.0510	0.1446	0.1412
Standard Deviation	0.0625	0.0131	0.0314	0.0297

The proposed NLOB criterion is a strong standard for performance effectiveness. It requires that squared distance performance with respect to the target set be as good or better than that of unbiased multivariate normal-like performance with respect of a point target in . A still weaker class of target effectiveness densities might be developed in further research by inclusion of a vector parameter corresponding to possible bias in the multivariate normal-like model.

With regard to limitations of the methodology for the application setting and data used here, we discuss first the cost of unused capacity connection again. A cost of unused capacity in the Cooper and Kaplan (1992) sense, which can be denoted as s_j^ck, might coexist along with a cost of inefficiency, s_j^I, as used in the present chapter; so that s_j = s_j^ck + s_j^I. The effect of such unused capacities, as distinct from costs of inefficiencies, on the present results would be to understate the true efficiencies. The approach taken with the MPE model is worst-case in the sense that when the s_j^ck are identifiable, the appropriate data adjustment would be x_j' = x_j - s_j^ck and the average performance efficiency would be expected to be larger. Thanassoulis et al. (1987) also discuss what we have called comparability of these units. A concern was noted relative to activity four whose monetary driver level might have been affected by the prosperity of the community being served. That is, offices with above average community prosperity and corresponding activity four levels might be considered as being unfairly compared to the others. Other things being equal, units with an inappropriately inflated value of a driver level would be expected to exert heavy downward pressure on the corresponding estimate in model MPE. We believe this kind of incomparability should ideally be removed by some kind of normalization process such as division by a socio-economic index. For the sake of concentrating on essential features of the present technique and maintaining comparability with the results of Dyson and Thanassoulis (1988), we leave this level of detailed analysis beyond the scope of the chapter.

In the use of the NLOB criterion for this data, the α parameter was compared to n/2 when n=4 was chosen. This assumes that the Y_r data are truly four-dimensional. The discussion of the data in Thanassoulis et al. (1987) suggested to us that units were free to emphasize or vary all four drivers with the possible exception of the fourth one. If this driver is regarded as not available for improvement by the units, then the data should be considered as three-dimensional. In this case, the intended α would be compared with 1.5. Since , the NLOB criterion is still met by the data under this assumption.


	Brought to you by Team-Fly