APPLICATIONS | Data Mining: Opportunities and Challenges

data mining: opportunities and challenges

Chapter VI - Data Mining Based on Rough Sets
Data Mining: Opportunities and Challenges
by John Wang (ed)
Idea Group Publishing 2003


	Brought to you by Team-Fly

LERS has been used in the medical field, nursing, global warming, environmental protection, natural language, and data transmission. LERS may process big datasets and frequently outperforms not only other rule induction systems but also human experts.

Medical Field

In the medical field, LERS was used for prediction of preterm birth, for diagnosis of melanoma, for prediction of behavior under mental retardation, and for analysis of animal models for prediction of self-injurious behavior.

Predicting which pregnant woman is at risk for giving birth prematurely is a difficult problem in health care. Medical science and research has not offered viable solutions for the prematurity problem. In one of our projects, completed in 1992 93, three large prenatal databases were acquired. Each database was divided in two halves 50% for training data and 50% for testing data. Each data set was then analyzed using statistical and data-mining programs. The best predictive accuracy was accomplished using LERS. Manual methods of assessing preterm birth have a positive predictive value of 17-38%. The data-mining methods based on LERS reached a positive predictive value of 59-92%.

Another project was associated with melanoma diagnosis based on the well-known ABCD formula. Our main objective was to check whether the original ABCD formula is optimal. As a result of more than 20,000 experiments, the optimal ABCD formula was found, reducing thus the error rate from 10.21% (original ABCD formula) to 6.04% (optimal ABCD formula).

In yet another project, data on heart rate were linked to environmental and behavioral data coded from videotapes of one adult subject diagnosed with severe mental retardation who engaged in problem behavior. The results of the analysis suggest that using the LERS system will be a valuable strategy for exploring large data sets that include heart rate, environmental, and behavioral measures.

Similarly, LERS was used for prediction of animal models based on their behavioral responsiveness to a dopamine agonist, GBR12909. The three animal groups received five injections of GBR12909 and were observed for stereotyped and self-injurious behaviors immediately following the injections and six hours after injections. Differences in the rule sets computed for each group enabled the prediction of the stereotyped behaviors that may occur prior to occurrence of self-injurious behavior.

Also, LERS has been used by the NASA Johnson Space Center as a tool to develop an expert system that may be used in medical decisionmaking on board the International Space Station.

Natural Language

One of our projects was to derive data associated with the word concept from the Oxford English Dictionary and then place additional terms in Roget's Thesaurus. Two rule sets were computed from training data, using algorithms LEM2 and All-Rules of LERS, respectively. Both rule sets were validated by testing data. The rule set computed by the All-Rules algorithm was much better than the rule set computed by LEM2 algorithm. This conclusion is yet another endorsement of the claim that the knowledge acquisition approach is better for rule induction than the machine-learning approach.

Another project in this area was a data-mining experiment for determining parts of speech from a file containing the last three characters of words from the entire Roget's Thesaurus. Every entry was classified as belonging to one of five parts of speech: nouns, verbs, adjectives, adverbs, and prepositions. The file had 129,797 entries. Only a small portion of the file (4.82%) was consistent. LEM2 algorithm of LERS computed 836 certain rules and 2,294 possible rules. Since the file was created from the entire Roget's Thesaurus, the same file was used for training and testing. The final error rate was equal to 26.71%, with the following partial error rates: 11.75% for nouns, 73.58% for verbs, 11.99% for adverbs, 33.50% for adjectives, and 85.76% for prepositions.


	Brought to you by Team-Fly