Chapter 5: Text Mining: Clustering Concepts


5.1 What Is Text Mining?

The explosion in the amount of data generated from government and corporate databases, e-mails, Internet survey forms, phone and cellular records, and other communications has led to the use of several data mining technologies, including the need to extract concepts and keywords from unstructured data via text mining tools using unique clustering techniques. Patterns in digital textual files provide clues to the identity and features of criminals, which forensic investigators and intelligence analysts can uncover via the use of a special genre of text mining tools.

Based on a field of AI known as natural language processing (NLP), text mining tools can capture critical features of a document's content based on the analysis of its linguistic characteristics. NLP attempts to analyze, understand, and generate languages that humans use naturally. This goal is not easy to reach. Understanding language means, among other things, knowing what concept a word or phrase stands for and how to link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master.

NLP development may take place in attempting to understand the optimal ways in which natural language can be incorporated into multimedia interfaces, such as software agents, or in integrating linguistic processing with speech recognition, both to make speech recognition more accurate and to use the results of speech recognition in practical applications. Much of the work of NLP focuses on the development of a natural language parsing and semantic interpretation system based on unification grammar for such applications as retrieving airline schedules, fares, and related information from, say, relational databases or the development of a spoken-language interface to synthetic forces in military battlefield simulations. However, in the context of forensic data mining, NLP is most applicable in the involvement of interpreting and extracting information from written text, such as online newspaper articles or other types of evidence documents.

Text mining tools use a variety of search methods that combine lexical parsing and clustering techniques to extract phrases from gigabytes of text to organize their content and key concepts. Text mining allows the investigator and analyst to discover hot keywords and key concepts within documents and groups of similar documents without having to read an entire database of documents. Text mining tools eliminate up-front manual categorization, tagging, or building of tree topics and documenting of indexes. They provide automatic identification and indexing of concepts within the text. Some text mining tools enable users to make new associations and relationships, offering three-dimensional charts, paths, and links for further analysis by forensic investigators.




Investigative Data Mining for Security and Criminal Detection
Investigative Data Mining for Security and Criminal Detection
ISBN: 0750676132
EAN: 2147483647
Year: 2005
Pages: 232
Authors: Jesus Mena

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net