DM FOR BIOINFORMATICS

data mining: opportunities and challenges
Chapter XX - Critical and Future Trends in Data Mining A Review of Key Data Mining Technologies/Applications
Data Mining: Opportunities and Challenges
by John Wang (ed) 
Idea Group Publishing 2003
Brought to you by Team-Fly

DM FOR BIOINFORMATICS

Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting, and utilizing information from biological sequences and molecules. It has been fueled mainly by advances in DNA sequencing and mapping techniques. The Human Genome Project has resulted in an exponentially growing DB of genetic sequences. KDD techniques are playing an increasingly important role in the analysis and discovery of sequence, structure, and functional patterns or models from large sequence DBs. High performance techniques are also becoming central to this task (Han et al., 2001; Han & Kamber, 2001).

Bioinformatics provides opportunities for developing novel mining methods. Some of the main challenges in bioinformatics include protein structure prediction, homology search, multiple alignment and phylogeny construction, genomic sequence analysis, gene finding and gene mapping, as well as applications in gene expression data analysis, and drug discovery in the pharmaceutical industry. As a consequence of the large amounts of data produced in the field of molecular biology, most of the current bioinformatics projects deal with the structural and functional aspects of genes and proteins. Many of these projects are related to the Human Genome Project. The data produced by thousands of research teams all over the world are collected and organized in DBs specialized for particular subjects; examples include GDB, SWISS-PROT, GenBank, and PDB. Computational tools are needed to analyze the collected data in the most efficient manner. For example, bioinformaticists are working on the prediction of the biological functions of genes and proteins based on structural data (Chalifa-Caspi. Prilusky, & Lancet1998). Another example of a bioinformatics application is the GeneCards encyclopedia (Rebhan, Chalifa-Caspi, Prilusky, & Lancet 1997). This resource contains data about human genes, their products and the diseases in which they are involved.

Since DM offers the ability to discover patterns and relationships from large amounts of data, it seems ideally suited to use in the analysis of DNA. This is because DNA is essentially a sequence or chain of four main components called nucleotides. A group of several hundred nucleotides in a certain sequence is called a gene, and there are about 100,000 genes that make up the human genome. Aside from the task of integrating DBs of biological information noted above, another important application is the use of comparison and similarity search on DNA sequences. This is useful in the study of genetics-linked diseases, as it would be possible to compare and contrast the gene sequences of normal and diseased tissues and attempt to determine what sequences are found in the diseased, but not in the normal, tissues. There are a number of projects that are being conducted in this area, whether on the areas discussed above, or on the analysis of micro-array data and related topics. Among the centers doing research in this area are the European Bioinformatics Institute (EBI) in Cambridge, UK, and the Weizmann Institute of Science in Israel.

Brought to you by Team-Fly


Data Mining(c) Opportunities and Challenges
Data Mining: Opportunities and Challenges
ISBN: 1591400511
EAN: 2147483647
Year: 2003
Pages: 194
Authors: John Wang

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net