data mining: opportunities and challenges
Chapter IX - The Pitfalls of Knowledge Discovery in Databases and Data Mining
Data Mining: Opportunities and Challenges
by John Wang (ed) 
Idea Group Publishing 2003
Brought to you by Team-Fly

DM in an organization has both benefits and drawbacks. Naturally, the manner in which we interpret data will determine its ultimate benefit. Gathering data is generally not the issue here; there is much data already stored in data warehouses. We need to remember that DM, when misinterpreted, may lead to costly errors. There are a number of organizational factors and issues that also may be drawbacks and limit DM's implementation and effectiveness. These factors will be discussed in this section.

A recent survey of retail IT indicated that of the companies using DM, 53% attribute no direct benefit to their bottom line from DM. About 20% of respondents indicated that DM has contributed very little, while only 8.4% of the respondents indicated that DM has contributed substantially to profitability. Additionally, 64% of all companies responding indicated that they do not plan to use dedicated technology to manage their customer relationships (Anonymous, 2001).

The lack of bottom line results can be partially attributed to a number of organizational factors. First, start-up costs for implementing DM projects are very high and can create barriers that many corporations choose not to overcome. There are also short-term and long-term administrative costs associated with DM. The information and output provided by DM systems must be delivered to end-users within the organization in a meaningful form, otherwise the information may not be useable, or worse, may be used inappropriately. This can actually lead to a reduction in the return on investment (ROI).

Finally, organizations implementing DM systems must view the project as an enterprise-wide endeavor, rather than a departmental endeavor. Failure to address these problems will limit a firm's ability to successfully implement DM initiatives and recognize the benefits to its profitability.

Qualifications of Information Technology (IT) Staff

The move by corporations to DM has added to the administrative burdens of already overworked IT staffs. Often, DM requires that data be accessible 24 hours a day, 7 days a week, and that it is adequately protected at all times. This is a very difficult task given the shortage of personnel available for around-the-clock management. Before the project even begins, it is vital that IT staffs qualify the current network hardware and software, particularly in terms of how much data will be held on the servers.

Additionally, they must qualify the amount of data that will be handled on a daily basis, the devices that are already in place to perform data backup and recovery, and the software running those devices. Careful analysis must be given to the current network configuration, the number of servers, their location, where those servers are administered, and where and how the data are backed up. Administrators must also analyze future needs for the corporation, such as employees, functions, and business processes.

Successful DM projects require two components: appropriate technical expertise, and an appropriate technical infrastructure for accessing the Web. Many problems can be traced back to these two issues. Other pitfalls include inadequate technical expertise, inadequate planning, inadequate integration, and inadequate security (Griffin, 2000).

Requirements of Information Technology Infrastructure (ITIS)

In general, for most corporations that are involved in the process of DM, ITIS is not a major issue because they will already have an ITIS and sufficient hardware/software in place to handle DM algorithms. Please note that the IT structure meant here is not limited to hardware and software but also includes personnel. Another simple point to keep in mind here is that the more data being mined, the more powerful the ITIS required.

An ITIS does become an issue for companies that do not have one in place. Establishing one will usually involve rather large capital expenditures, and this cost would not be limited to a one-time event. Instead, as software and hardware modifications are made, it will be necessary for capital expenditures to continue to be made for maintenance and future upgrades of the existing ITIS. Total cost of ownership (TCO) will be very high. This might become even more of an issue for smaller to medium-sized firms that might not have sufficient resources for such measures.

Accessibility and Usability

Many organizations have experienced serious problems implementing a standard DM project. Most problems can do not lie with the technology, but rather with the people using it. To have a successful impact on an organization, a DM system must be speedy, accessible, and user friendly. In today's highly technological society, many of these systems employ very sophisticated tools running against data stored on high-end systems, but it is important that management can interpret the end results. The information desired must be accessible quickly through a PC or Web browser. If the end-users have difficulty using a system or cannot deliver needed information in a short time and be able to resolve their needs, they may abandon the application altogether, and hence the benefits will not be realized.

Affordability and Efficiency

A simple pitfall that must be viewed by potential DM users is its cost, which can range from a just a few thousand to millions of dollars for both hardware and software. Implementing an effective DM system can be a very complicated and expensive endeavor for a firm. META Group (2001) estimates that within the next two years, "the top 2000 global firms will spend $250 million each on Customer Relationship Management (CRM) and DM solutions." On the average, data warehouse users spent an average of $6 million on their data warehouse in 1998. This cost has been rising at an annual rate of 35% since 1996 (Groth, 2001).

Given these costs, corporate executives who may be seeking to cut expenses and contain losses may not want to invest the resources needed to develop such a system. Add to this that this development may take a few years to implement and may not have an immediate, observable impact on the firm's profitability, and corporate reluctance is even clearer. Furthermore, even with all of the money being spent on DM projects, many companies have not been able to calculate the resulting ROI for these systems. As a result, all too often corporate executives do not view DM projects in the appropriate manner, i.e., as long-term strategic investments.

It becomes apparent that expenditures towards maintaining DM through faster hardware and more powerful DM applications will become a constant cash outflow for many companies. As the DB increases in size over time, more capital will be needed for hardware and software upgrades. While this might not represent a large commitment for companies of substantial size, small to medium-sized firms may not be able to afford an ITIS or a DM application and the support it needs over the long term.

Efficiency may be defined as a measure of the ability to do a task within certain time and space constraints and should include the bottlenecks that might arise in a system when doing certain tasks (Wasserman, 1999). As a dataset becomes larger, most IT managers will want to reduce bottlenecks in the network to increase speed or supplement processing power through investment in new hardware.

The costs of implementing DM systems may include the costs of intellectual property, licensing, hardware, and software, and are especially burdensome to small firms because these represent a much larger percentage of small firm's revenues. This may place a greater burden on the small firm's ability to identify and realize measurable results quickly. Failure to do so may even jeopardize the company's very survival. This requires extreme patience, vision, and long-range planning to build a successful system.

Scalability and Adaptability

Scalability refers to how well a computer system's hardware or software can adapt to increased demands. Since DM tends to work with large amounts of data, scalability of the computer system often becomes a major issue. The network and computers on the network must be scalable or large enough to handle increased data flows, otherwise this may bring a network or individual computers on the network to a grinding halt. Also, system solutions need to be able to grow along with evolving user needs in such a way as to not lock the organization into a particular vendor's infrastructure as technology changes.

This is highlighted as an important issue as the volume of data has increased in recent years at a significant rate. One paper further points out, "that some companies already have data warehouses in the terabyte range (e.g., FedEx, UPS, Wal-Mart). Similarly, scientific data is reaching gigantic proportions (e.g., NASA space missions, Human Genome Project)" (Two Crows Corporation, 2001).

Most recent research notes scalability as a possible pitfall of DM. Even in the cases of very simple forms of data analysis, speed and memory become issues. Since hard drive access speed or network speed is not as fast as resident memory, many older DM applications prefer to be loaded into memory. As the datasets become larger or more variables are added, it follows that the amount of memory needed increases. Without this hardware, virtual memory may have to be used. In general terms, virtual memory is a process of using space on the hard drive to serve as actual memory. The problem with virtual memory is that it is slower, which in turn makes DM slower.

An Enterprise View

Another organizational factor that can be a problem to DM is not viewing the project as an enterprise-wide endeavor. In many failed DM projects, companies viewed DM as an IT project that was relegated to a specific department, rather than an enterprise-wide initiative. Most DM initiatives have been fragmented, implemented within departments without a cross-organizational perspective.

Given the cost and scope of DM projects, it is essential that all facets of the business have representation in the design of the system. Customer information is vital to the overall success of sales, marketing, and customer service, and this information must be shared across the organization. If one or more departments are excluded from this design process, there will be an inclination to not accept the finished product, or departments will lack the required knowledge to operate the system successfully.

Final Thoughts on Organizational Implications

There are many organizational factors that can serve as drawbacks to successful utilization of DM. Initially, organizations may balk at the high price tag that is associated with DM, and choose to not invest in items such as computer hardware, software, etc. The decision to do so may only come with plausible estimations of ROI, something, as was said above, that is often very difficult to measure. The output provided by DM must be usable information that can be quickly acted upon; otherwise, end-users will not rely on or effectively use the information.

Finally, DM must be viewed as a company-wide initiative. That way all employees will feel that they have a stake in the systems outcome. Failure to take this route may result in failure of the DM project. Remember that failure rates are often as high as 70%. Companies embarking on such projects must take these issues into account when making decisions.

Brought to you by Team-Fly

Data Mining(c) Opportunities and Challenges
Data Mining: Opportunities and Challenges
ISBN: 1591400511
EAN: 2147483647
Year: 2003
Pages: 194
Authors: John Wang © 2008-2017.
If you may any questions please contact us: