DEA is a mathematical programming approach developed to evaluate the relative efficiency of a set of units that have multiple performance measures inputs and outputs (Charnes, Cooper, & Rhodes, 1978). DEA is particularly useful when the relationship among the multiple performance measures are unknown. Through the optimization for each individual unit, DEA yields an efficient frontier or tradeoff curve that represents the relations among the multiple performance measures. For example, consider the tradeoff between IT investment and the number of bank employees. Figure 1 illustrates the efficient frontier or tradeoff curve containing F1, F2, and F3 and the area dominated by the curve. A bank's performance (or IT investment strategy) on the efficient frontier is nondominated (efficient) in the sense that there exists no performance that is strictly better in both IT investment and employee. Through performance evaluation, the efficient frontier that represents the best practice is identified, and any current inefficient performance (e.g., point F) can be improved onto the efficient frontier with suggested directions (to F1, F2, F3, or other points along the curve) (see Figure 1). Suppose we have n observations on a set of banks; i.e., we have observed input and output values of x_{ij} (i =1, , m) and y_{rj} (r =1, , s) for bank j, respectively, where j = 1, , n. The (empirical) efficient frontier is formed by these n observations. The following two properties ensure that we have a piecewise linear approximation to the efficient frontier and the area dominated by the frontier (Banker, Charnes, & Cooper, 1984).
Consider Figure 1 where the IT investment and the number of employees represent two inputs. Applying Property 1 to F1, F2, and F3 yields a piecewise linear approximation to the curve in Figure 1. Applying both properties expands the line segments F1F2 and F2F3 into the area dominated by the curve. Applying the above two properties to specific inputs of x_{i} (i =1, , m) and outputs of y_{r} (r =1, , s) yields
The DEA efficient frontier is determined by the nondominated observations satisfying (1). Based upon (1), we have the following DEA model:
where x_{ijo} is the ith input and y_{rjo} is the rth output of the j_{o}th bank (observation) under evaluation. If θ^{*} = 1, then the j_{o}th bank is located on the frontier (or efficient). Otherwise if θ^{*} < 1, then the j_{o}th bank is inefficient. Model (2) is called inputoriented DEA model where the goal is to minimize input usage while keeping the outputs at their current levels. Similarly, we can have an outputoriented DEA model where the goal is to maximize the output production while keeping the inputs at their current levels.
Both models (2) and (3) identify the same efficient frontier, because θ^{*} = 1 if and only if Φ^{*} = 1. To further illustrate the DEA methodology, we consider dual program of model (3)
The above model is equivalent to
Subject to
Let . Then the model (4) seeks to determine the relative efficiency of each bank. It is clear from the model (4) that smaller value of h_{o}^{*} is preferred since we prefer larger values of y_{ro} and smaller values of x_{io}. Therefore, model (4) tries to find a set of weights v_{i} and u_{r} so that the ratio of aggregated x_{io} to aggregated y_{ro} reaches the minimum. Note that model (4) is solved for each bank. Therefore, model (4) does not seek the average best performance, but the efficient or best performance achievable by a proper set of optimized weights. Note that when h_{o}^{*} = 1, we have
where (*) represents the optimal values in model (4). It can be seen that (5) is similar to the regression model with α^{*} the intercept on the yaxis. The implicit difference between model (4) and the regression model lies in the fact that (i) model (4) deals with more than one dependent variables (y_{rj}) at the same time, and (ii) equation (5) is obtained for each bank with a score of one. Further, (5) represents the efficient frontier. Since different units with score of one in model (4) may not be on the same frontier, the resulting efficient frontier is a piecewise linear one, as shown in Figure 1. From the above discussion, we can see that DEA can be an excellent datamining approach with respect to extracting efficiency information from the performance data. Consider three twostage bank operations as presented in Table 1, where the first stage has two inputs (IT investment and labor) and one output (deposit), and the second stage has one input (deposit generated from the first stage) and one output (profit) (see, e.g., Figure 2).
Applying model (2) to the two stages indicates that the banks A and B in the first stage, and bank C in the second stage are efficient. Now, if we ignore the intermediate measure of deposit and apply model (2), the last column of Table 1 indicates that all banks are efficient. This simple numerical example indicates that the conventional DEA fails to correctly characterize the performance of twostage operations, since an overall DEA efficient performance does not necessarily indicate efficient performance in individual component. Consequently, improvement to the best practice can be distorted, i.e., the performance improvement of one stage affects the efficiency status of the other because of the presence of intermediate measures. In the next section, we present a DEA model that can directly evaluate the performance of twostage operations, and set performance targets for intermediate measures.
 
