|< Day Day Up >|| |
Database systems, a subfield of computer science, has met with notable accelerated advances over the past few decades. A major strength of database systems is their ability to store volumes of data and to provide rapid access to information while correctly capturing and reflecting database updates. The advent of direct access storage device (DASD) along with significant theoretical results (i.e., the relational model and the transaction model) has resulted in the deployment of large-scale operational applications such as airline reservation systems, disability benefit systems, financial trading systems, patient registration systems and telephone call routing systems, just to name a few.
Together with the advances in database systems, our relationship with data has evolved from the pre-relational and relational period to the data warehouse period. Today, we are in the knowledge discovery and data mining (KDDM) period where the emphasis is not so much on identifying ways to store data or on consolidating and aggregating data to provide a single, unified perspective. The emphasis of KDDM is on sifting through large volumes of historical data for new and valuable information that will lead to competitive advantage. The evolution to KDDM is natural since our capabilities to produce, collect and store information have grown exponentially. Electronic banking, e-commerce transactions, the widespread introduction of bar codes for commercial products and advances in remote sensing data capture devices have all contributed to the mountains of data stored in business, government and academic databases. Traditional analytical techniques, especially standard query and reporting and online analytical processing, are ineffective in situations involving large amounts of data and where one is less than certain about the exact nature of information one wishes to extract.
Data mining has thus emerged as a class of analytical techniques that go beyond statistics and that aim at examining large quantities of data. According to Hirji (1999), data mining is the analysis and non trivial extraction of data from databases for the purpose of discovering new and valuable information, in the form of patterns and rules, from relationships between data elements. Data mining is receiving widespread attention in the academic and public press literature (Adriaans & Zantinge, 1996; Brand & Gerritsen, 1998; Chen, Han & Yu, 1996; Fayyad, Piatetsky-Shapiro & Smyth, 1996; Fayyad & Smyth, 1993) and case studies and anecdotal evidence to date suggest that companies are increasingly investigating the potential of data mining technology to deliver competitive advantage.
The enthusiasm surrounding data mining continues to grow; however, at the same time, there are claims that data mining projects fail in delivering the expected value. Many of the causes of the failures can be traced back to strategy, process and technology variables and some of the reasons for the failures are: (i) lack of discipline, accountability and skills, (ii) no common understanding of requirements, (iii) failure to set and manage expectations, (iv) it was believed the tool was the answer, and (v) making a buy decision and implementing a build project.
Research into data mining has thus far focused on developing new algorithms (Fayyad & Smyth, 1993) and on identifying future application areas (Simoudis, Livezey & Kerber, 1996). As a relatively new field of study, it is not surprising that data mining research is not equally well developed in all areas. To date, no theory-based process model of data mining has emerged. The purpose of this chapter is to discover a process for performing data mining projects and to propose this process to practitioners as a starting point when making decisions about planning, organizing, executing and closing data mining projects. At a minimum, the benefits of having such a process are:
it can be used to jump-start a data mining project
it can be used to foster understanding and enable clear communication among project team members
it can mitigate risk associated with performing data mining projects
Literature is reviewed in the section titled, "Literature Review" to develop a straw man of what a data mining process might look like. The research methodology, including research design, case analysis and data collection, is then described in "Research Methodology." In the "Findings" section, the key areas where observations point to new insights are highlighted. "Conclusions" includes a discussion on the implications for practitioners and managers as well as provides some directions for future research.
|< Day Day Up >|| |