5.4 Searching for Clues in Aviation Crashes: A Case Study

NASA developed a suite of data mining tools called Perilog designed to retrieve and organize contextually relevant data from any sequence of terms. Perilog has been used to sort through thousands of narrative reports in order to extract key terms for identifying the root causes of air crashes. The software measures the degree of contextual association for large numbers of term pairs in text or any sequence to produce models to measure their degree of similarity to a query model. It also develops a ranking of relevance and presents the search results in a table format.

Perilog was originally designed to support the FAA's Aviation Safety Reporting System (ASRS). The NASA software was used to analyze thousands of aviation accident incident reports, which typically contain free-form narrative descriptions written by participants, such as flight or ground crews, air traffic controllers, and other professionals. Perilog was used to sort through a voluminous number of incident reports in order to extract the dominant causes of airline crashes, such as mechanical failure or pilot error. Perilog relies on four methods for text mining:

Keyword-in-context search which retrieves narrative that contains one or more user-specified keywords in typical or selected context and ranks the narratives on their relevance to the keyword in context
A flexible, model-based phrase search that retrieves narrative that contains one or more user-specified phrases and ranks them on their relevance to the phrases
Model-based phrase generation, which produces a list of phrases from documents that contain a user-specified word or group of words
Narrative-based phrase discovery, which finds phrases that are related to topics of interest by generating a list of narratives similar in meaning to the keyword or phrase query

Relevance ranking is a process of sorting a list of items so that those likely to be of greater relevance to one's concerns and interests appear closer to the top of the list. Relevance ranking can help an analyst to read and interpret efficiently very large collections of narratives, reports, and text. Perilog can be used to sort through thousands of pages and rank and prioritize phrases in pairs by a relational metric value that is highest when there is a match:

       Probe Term   Term in Context   Relational Metric Value       FBI          crash             205

Perilog's manipulation of patterned or sequential symbols, data, items, objects, events, causes, time spans, actions, attributes, entities, relations, and representations allows for searching of any type of information repository, not just text. What is interesting about this NASA-developed software is that it can perform smart retrieval of sound, voice, or audio data making it an ideal context search and retrieval tool for investigative monitoring analysis of multimedia. NASA is looking for a commercial developer to bring the government-developed software to market.