Introduction

 < Day Day Up > 



Data mining (DM), which is also referred to as knowledge discovery, is a systematic approach to finding hidden patterns, trends and relationships in data (Chen, Park, & Yu, 1996; Pass, 1997). Recently, DM has attracted tremendous attention from both researchers and practitioners because of its applicability to information management, decision support, fraud detection, marketing strategy, financial forecasting, process control and many other applications (Chen et al., 1996; Cheung, Ng, Fu, & Fu, 1996).

Many data mining algorithms and methodologies have been proposed (Agrawal, Ghosh, Imielinski, & Swami, 1992; Agrawal, Imielinkski, & Swami, 1993a, 1993b; Agrawal, Link, Sawhney, & Shim, 1995; Agrawal, Mehta, Shafer, Srikant, Arning, & Bollinger, 1996; Agrawal & Srikant, 1994; Agrawal & Srikant, 1995; Cheeseman & Stutz, 1996; Chen, Han, & Yu, 1996; Chen et al., 1996; Cheung et al., 1996). Different researchers classified the data mining algorithms based on the kinds of databases to work on, kind of knowledge to be mined, and kind of techniques to be utilized (Chen et al., 1996; Cheung et al., 1996). Recently, Pass (1997) classified approaches to data mining into two categories: methodologies and technologies. According to Pass (1997), the methodologies to data mining consist of cluster analysis, linkage analysis, visualization, and categorization analysis. The technologies were connectionist models/neural networks, decision trees, genetic algorithms, fuzzy logic, statistical approaches and time series approaches.

Software development and maintenance involve significant cost for the organizations (Banker & Slaughter, 1997). The productivity improvements in software development and maintenance can enable IS departments to reduce long-term costs and increase profitability. The previous research (Banker & Slaughter, 1997; Subramanian & Zarnich, 1996) suggests that systems development methodology, project size, developer's experience, and system design tools (e.g., CASE tools) all significantly explain the software effort and productivity. While most of the studies focused on the study of the significance of the predictor variables with the predicted software effort, very few studies focused on applying data mining techniques to the historical data to improve the software effort estimation in the future. There is evidence that the relationships between the predictor and predicted variables is nonlinear. In a recent study, Banker and Slaughter (1997) used a data envelopment analysis approach to illustrate that scale economies exist in software maintenance. The existence of the scale economies in software productivity indicates that better estimates can be obtained if non-linear forecasting models were used to learn and predict software effort. Further, lack of a priori knowledge about the exact non-linear relationships between the multivariate independent variables with the dependent variable precludes the use of any predetermined non-linear statistical technique.

In the current research, we use connectionist (also called artificial neural networks) and evolutionary (also called genetic algorithms) models to discover and learn the non-linear relationships between the predictor variables in the training samples and predict software effort in holdout samples. We use previously reported data (Subramanian & Zarnich, 1996) on 40 different software projects and split them into two sets of 30 projects and 10 projects for training and testing purposes. The rest of the paper is organized as follows. First, we provide a brief review of literature on data mining. We then provide a description of connectionist and evolutionary models for our problem domain. This description is followed by special design considerations in using these models for our research. In sections after the design issues, we report the results of our experiments. In the end, we provide the directions for the future work.



 < Day Day Up > 



Managing Data Mining Technologies in Organizations(c) Techniques and Applications
Managing Data Mining Technologies in Organizations: Techniques and Applications
ISBN: 1591400570
EAN: 2147483647
Year: 2003
Pages: 174

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net