data mining: opportunities and challenges
Chapter XVIII - Social, Ethical and Legal Issues of Data Mining
Data Mining: Opportunities and Challenges
by John Wang (ed) 
Idea Group Publishing 2003
Brought to you by Team-Fly

After mounting complaints about excessive force, false arrests, and racial profiling, the Los Angeles Police Department (LAPD) is being forced under a federal consent decree to implement a computerized risk-management system that uses data mining to track officers' conduct and performance (Mearian & Rosencrance, 2001). In the year 2000, LAPD paid out more than $28 million as a result of civil lawsuits (Mearian & Rosencrance). The New Jersey State Police as well as the Pittsburgh Police Department have installed similar systems to track officers' use of force, search and seizure, citizen complaints, criminal charges, civil lawsuits, and commendations and awards earned. However, one cannot forget that these systems are only as good as the data entered. Here are some other examples of data-mining software used in well-known industries. "Advanced Scout" is a data-mining application developed by an IBM researcher. NBA coaches use it. It can provide information such as "Under what circumstances do the Chicago Bulls outscore the New York Knicks?" (Glode, 1997). This application can also be used by television announcers, fans at NBA Websites, and also used with other sports. In the automobile industry, manufacturers search huge databases of car repairs using pattern recognition algorithms to identify patterns of breakdowns (Waldrup, 2001). The University of California at Berkeley's LINDI system is "used to help geneticists search the biomedical literature and produce plausible hypotheses for the function of newly discovered genes" (Sheier, 2001). Carnegie Mellon University's Informedia II system produces a computer-searchable index of, for example, CNN news clips "by automatically dividing each clip into individual scenes accompanied by transcripts and headlines" (Sheier, 2001). Data mining is also used for national security and military applications examples of which are provided in the next two paragraphs.

After the attack on the World Trade Center on September 11, 2001, the new antiterrorist law (The USA Patriot Act of 2001) allows wiretapping of the Net, among other things. However through this increased ability for surveillance comes the realization that we do not have enough personnel working for the FBI to analyze all this data. Therefore, data mining will play a part in providing national security. However, using data mining to search for suspicious keywords could be problematic.

"As 60 Minutes reported, the Canadian Security Agency identified a mother as a potential terrorist after she told a friend on the phone that her son had bombed in his school play. Filtered or unfiltered information taken out of context is no substitute for the genuine knowledge about a person that can emerge only slowly over time." (Rosen, 2001, p. 19)

Advances in networking, medical remote sensing, and data mining will, in the future, be combined to detect the presence and origin of chemical weapons on the battlefield (Ceruti, 2000). This will be accomplished by:

"Mining geographic patterns from networks of devices worn by troops in the field designed to record and transmit a soldier's or a marine's vital health data and environmental data. These geographic patterns will help to identify the origin of the attack. It will also affect the early response and treatment of wartime casualties with a result of more lives saved on the battlefield." (Ceruti, 2000, p. 1875)

Another military application of data mining revolves around intrusion detection systems for military networks. Unlike their commercial counterparts, military networks

"often face unique constraints operation over wireless media, unique message traffic, different perceived threats, limited bandwidth, mobile and dynamic environment, robustness in the face of direct attacks on infrastructure that lead to normal operation that is different from civilian networks. This results in an unacceptably high false-alarm rate from Intrusion Detection systems." (Clifton & Gengo, 2000, p. 440)

Data mining can be used to identify patterns of false alarms created during battlefield conditions that are substantially different from commercial traffic.

What are the social implications of such systems? Next, we explore the benefits and drawbacks of data mining in terms of individuals, society, and businesses.

Data Mining and Consumers

Why pursue data mining? Simply satisfying customers is not enough. Relationships need to be built based on loyalty fostered by employee enthusiasm and customized product and service offerings that delight customers. Customers expect that businesses anticipate their needs. A number of studies have shown that it costs five to ten times as much money to attract a new customer as it does to sell to an existing customer. Data mining allows management to create and analyze customer profiles to appropriately customize marketing efforts. Companies are beginning to realize that they do not need to market to large masses of customers in order to maximize profits. Hence, data mining can be beneficial to both consumers and businesses. However, the benefits to a company, which may reduce costs for the consumer, may also be seen as an invasion of consumer privacy. Obviously, this has large-scale social implications. We will take a look at some examples of businesses mining data.

Insurance Rates: One insurer used data mining to help develop competitive rates for sports car owners. These owners are often charged higher rates because it is assumed these drivers are riskier than other drivers. One insurer found that a majority of sportscar owners are between 30 and 50 years old, married and own two cars. "Data mining uncovered the fact that those customers were not high-risk drivers, so the insurer reduced its premiums and provided a competitive edge" (Maciag, 2001, p. 35).

Manufacturing and Inventory: How can suppliers increase their ability to deliver the right products to the right place at the right time? The supply chain can account for as much as 75% of a product's cost. One company that is attempting to decrease this cost by manufacturing its product in direct proportion to actual consumer demand rather than forecasts is Proctor & Gamble (P&G). P&G has tested technology that will inform it the moment a customer lifts one of its products off the store shelf, whether it is Downy fabric softener, Tide laundry detergent, Crest toothpaste, or Charmin toilet paper (Dalton, 2001). A trial run was conducted in Baltimore, Maryland with an unidentified retailer.

This new technology could provide a wealth of data that consumer goods manufacturers could mine for predictive patterns and, rather than reacting to sales, they could foresee peaks and valleys in demand. It costs approximately $10 each for the circuit boards for store shelves and five cents apiece for the chips for individual packages (Dalton, 2001). The hope is to reduce this latter price to one cent each. When a consumer picks up a product, it signals the shelf. It also signals if it is put back on the shelf. The shelf transmits the data to a computer which then periodically, but frequently, transmits it to P&G. P&G can then adjust its manufacturing plans accordingly. Currently, a few large suppliers like P&G wait only a few (four or five) days to obtain data collected at checkout. Most companies wait from 20 to 30 days to obtain checkout data. Therefore, even if this technology proves too expensive for widespread deployment, it can be used at a few select locations to test consumer response to new products and predict demand for existing products. Hence, what benefits could P&G accrue from such technology? It could obtain sales data in practically real time, which would in turn decrease pipeline inventories. This would also increase working capital, and give P&G a greater ability to deliver the right products to the right place at the right time ultimately benefiting consumers in the form of lower costs and less stockouts.

Record Matching: Most data is never used. Data warehouses are often data tombs. However, data-mining techniques make this data more useful and meaningful. Another area in which data mining might be used in a positive social manner is where "fuzzy set logic" is used. When analyzing data, it is important to weed out "matches" or duplicate entries. Data mining helps determine if you have a match, whether it matches well, matches fairly well, matches somewhat, matches poorly, or there is a complete mismatch (Bell & Sethi, 2001). For example, if you were analyzing records, and the social security number matches but nothing else matches, then the records associated with one of the two individuals might possibly indicate fraud. The logic associated with matching records from a number of sources can take into account events that naturally occur in one's life. For example, the rule:

If the first name matches well and the last name is a mismatch, and the subject is a female, and date of birth matches, and social security number matches, and the maiden name matches well, and the place of birth matches well, and marital status of the old record is single or divorced, and marital status of the new record is married, then the record probably matches [Marriage]. (Bell & Sethi, 2001, p. 88)

Targeting Audiences: Instead of bombarding consumers indiscriminately, smart, technologically savvy corporations use advanced data warehousing and data-mining techniques to target their marketing efforts. Managing specialized customer knowledge allows businesses to respond to customer needs even before these needs are expressed. David Diamond is the President of Catalina Marketing, a Florida firm that operates customer loyalty programs for 14,000 supermarkets in the United States. Through data mining, he stated that one chain had searched through customer purchases data finding low-fat-food buyers who never bought potato chips. Using this data, the company offered these customers a coupon for a new brand of low-calorie chips. Diamond stated that 40% responded, which is much higher than the typical 1 or 2% you get from coupons in the mail ("Selling is getting personal," 2000).

Web Mining: One of the benefits of data mining, based on clickstream analysis, is that you can remove inventory from your product line that is not enhancing profitability. Another benefit is the ability to enhance product exposure through better marketing. With the widespread use of the Internet, we cannot talk about data mining without looking at Web mining. Specifically, we will look at "clickstream analysis." Companies with a Web presence that concentrate on content rather than on the medium for distributing that content will be those that thrive and survive on the Internet. But how does a company know what content is important? One answer is by analyzing clickstream data. Clickstream data is much more than just tracking page hits. As a consumer traverses a website, a trail of activity reveals his or her browsing behavior and potential interest in a corporation's products and services. Clickstream data captures data such as which links are clicked by surfers, how long a surfer spends on each Web page, and what search terms are used. Furthermore, if visitors fill out a profile or survey, even more information can be gleaned by matching these with surfing behavior. Lastly, if the site belongs to a third-party banner ad exchange, then the company might be able to track surfing behavior beyond its own site. This trail of activity is commonly referred to as clickstream data.

However, the sheer volume of data generated by consumers traversing a corporation's website requires that the data be condensed and summarized prior to its storage in the corporate data warehouse (Inmon, 2001). As a consumer journeys through a corporation's website, many of the pages displayed have dynamic content. Capturing this dynamic content in such a way that the user's experience can later be analyzed is a time-consuming and expensive endeavor. What often happens is that companies have the capability to gather the data, but fail to act upon it because of the cost associated with analyzing it. As analysis tools simplify, more and more corporations will be mining their clickstream data for patterns that might indicate answers to a number of questions.

Some issues that clickstream data can shed light on are (Inmon, 2001, p. 22):

  1. What items did the customer look at?

  2. What items did the customer not look at?

  3. What items did the customer purchase?

  4. What items did the customer examine and not purchase?

  5. What items did the customer buy in conjunction with other items?

  6. What items did the customer look at in conjunction with other items, but did not purchase?

  7. What advertisements or promotions are effective?

  8. What advertisements or promotions are ineffective?

  9. What advertisements or promotions generate a lot of attention but few sales?

  10. What advertisements or promotions are the most effective at generating sales?

  11. Are certain products too hard to find?

  12. Are certain products too expensive?

  13. Is there a substitute product that the customer finds first?

  14. Are there too many products for the customer to wade through?

  15. Are certain products not being promoted?

  16. Do the products have adequate descriptions?

Clickstream data allows an unprecedented analysis of consumer behavior. Obviously, when a consumer enters a store at a traditional physical location such as a mall, it would be a very time-consuming process to analyze every product a consumer picked up, every department he or she entered, and every interaction he or she had with a sales clerk. Theoretically, you could do that, but, in essence, you would be conducting surveillance on individuals. If the public found out such activity was being conducted, clearly there would be a public outcry. On the other hand, on the Web, such analysis is much easier. As time goes by, more and more companies are going to be turning to their clickstream data and creating profiles based on customers' online shopping data. Of course, with this increased scrutiny should come increased protections, whether through self-regulation or laws. This leads us to the issue of whether data mining benefits or harms society.

Customer Satisfaction: Over the past few years, companies have learned that they need to interact with customers in new ways. Customers are not brand loyal like they once were. Hence, companies need to better understand their customers and quickly respond to their wants and needs. The time frame for a company to respond is also shrinking with our "need it now" society. Businesses can no longer wait until the signs of customer dissatisfaction are obvious before corrective action is taken. Businesses must now anticipate customer desires. Given all of this, data mining is an obvious choice that businesses must use to gain a strategic advantage, to maintain their customer base, and to attract new customers.

It is often useful to examine a topic from several different perspectives. With respect to data mining, three obvious perspectives are (1) businesses, (2) individuals, and (3) society. Table 1 provides a list of benefits of data mining from each of these perspectives. The list is not meant to be comprehensive, but rather illustrates how each of these groups benefits from data mining. Table 2 provides some representative drawbacks of data mining, once again from the three different perspectives.

Table 1: Benefits of data mining

For Businesses

  • Tailor offerings to specific customer needs

  • Determine product and service features that are important to customers

  • Better customer relationship management

  • Money and time savings

  • Find, attract, and retain the best customers

  • Distinguish preferred from marginal customers

  • Customize marketing plans to specific markets

  • Identify customers who may be planning to defect to competitors

  • Identify new market opportunities

  • Enhance productivity

  • Develop insight into changing customer requirements

  • Find out what customers will be interested in the company's new products or services

  • Reduce risk

  • Analyze delivery channels

  • Manage portfolios

  • Target pricing

  • Identify patterns showing which customers are open to cross-selling

  • Identify customers that have a high rate for purchasing particular products or services

For Individuals

  • Can reach conclusions that are beyond simple human analysis

  • Companies will understand the needs of consumers better

  • Companies can react to customers faster

  • Customers receive more customized products

  • Customers get better prices, better deals, better facilities

  • Better customer relationships

  • Treat the more valuable customer differently

  • Rapid access to integrated systems and information increases the number of choices for consumers

  • Customers get exactly what they need

For Society

  • Gaining intelligence information, such as information that might lead to terrorist activity

  • Identifying criminal activity

Table 2: Drawbacks of data mining

For Businesses

  • Open themselves up to possible lawsuits

  • Data may be flawed, resulting in inaccurate conclusions

  • With respect to information gathered via the WWW, many people enter false information

  • Cost of setting up data warehouses

For Individuals

  • Invasion of privacy

  • Conclusions can be made about individuals that are not necessarily true

  • Information gathered could be used inappropriately to the detriment of an individual

  • When a large quantity of information is distributed, data errors can greatly affect and even destroy the lives of individuals

For Society

  • Ethical questions about what is appropriate to mine

Does Data Mining Benefit or Harm Society?

How can governments take data from a number of different sources and mine it for possible suspects associated with terrorist acts? In the aftermath of the World Trade Center tragedy, governments will be seeking new broad powers to monitor electronic communications in an effort to identify and locate potential terrorists. One of the means at their disposal is to gather data from a number of different sources and use data mining to identify suspect financial transactions, which it is hoped will ultimately lead back to those who sponsor terrorism.

Shari Steele, an executive director of the San Francisco-based Electronic Frontier Foundation, a special interest group that aims to keep the Web as free and democratic as possible, is concerned about potential new powers for the government to invade personal privacy (Swisher, 2001):

"Now it looks like the government will be able to know that and a whole lot more, such as from where you surf, patterns of your e-mail use, what you buy. The ability to learn these patterns has been the dream of marketers and the rallying point for privacy advocates, who have fought successfully since cyberspace's earliest days to prevent such snooping." (p. B1)

Prior to the World Trade Center disaster, it would have been impossible for the government to put legislation in place that would allow it to monitor behavior on such an individual basis. In the aftermath, it is not clear whether those who are in society who are clamoring for restraint and a reasonable response will be heard. More than likely, their calls for a reasoned response will be drowned out by the cries of the masses for measures to safeguard society.

Another drawback is that there can be flaws in the data-mining process. There are a number of reasons why records in different databases even though they actually contain information about the same person or entity may not "match up" through the data-mining process. Some of the difficulties that arise might be through letter, field, or word-related mismatches. Bell and Sethi (2001), in their article, "Matching Records in a National Medical Patient Index," examine some of the reasons why data from a number of different sources may have difficulty matching records. Under the letter-related mismatches, transposition of letters, omission of letters, misspellings, and typing errors can occur during data entry. With respect to word- or field-related mismatches, maybe the person whose data is contained in two different databases may have had a change of address or a change of name. There is always the possibility of fraud, as well as change of ZIP code or change of phone number. So, one of the concerns that consumers must have is the accuracy of mined information. And that data from another person's history is not accidentally through improper matching coupled with their own legitimate data. In addition to the letter- and field-related mismatches, there are a number of other issues that might create inaccuracies. For example, Asian names often have surname and given name reversal. If someone is not aware that this is the case, then upon data entry, the names may be switched. Another example where names may appear differently for the same individual is when a person may be named William, but is going by Bill. Hence, through the data-mining process, these types of possibilities must be taken into account. Another problem is misspelling, e.g., when a last name might be Cook and is entered as Cooke. Another similar example is the name Smith, which can be mistakenly spelled Smith, Smithe, or Smyth.

Having examined ethical and social implications, we now turn our attention to legal issues. It is important that companies protect sensitive data while allowing key decision-makers access to real-time information online. This next section discusses legal matters and their role in data mining. Laws dictate what kind of data can be mined. It is important to understand some of the federal laws currently related to privacy to understand their impact (or lack thereof) on how data is currently mined. Following are brief descriptions of many federal laws that are important when dealing with privacy. While reading this section, keep this in mind: if you collect, manipulate, disseminate, and mine data, investing in legal advice is a smart decision.

Brought to you by Team-Fly

Data Mining(c) Opportunities and Challenges
Data Mining: Opportunities and Challenges
ISBN: 1591400511
EAN: 2147483647
Year: 2003
Pages: 194
Authors: John Wang © 2008-2017.
If you may any questions please contact us: