CONCLUSION


The concept of textual warehouses we propose allows manipulating the documents of a heterogeneous collection by their structures and their contents, contrary to other systems that impose a predefined structure. Indeed, the proposed generic model is suitable for storing heterogeneous documents according to their logical structures and for applying the techniques of information retrieval (restitution of passages but not the whole documents), data interrogation (restitution of factual information), and multidimensional analysis (analyzing data according to several dimensions by using a graphic language that offers a great simplicity for the users).

Several experiments have been carried out on two aspects ” first on the integration of large collections of heterogeneous documents issued from the Laboratory Intranet, and then on the analysis and use of this warehouse content by several non-experimented users. The distinction between the generic and the specific structures improved the expressiveness of a large document collection in the way to retrieve, exploit and analyze its content. The graphic language is also open enough to allow any user to construct any query, even a complex one.

At present, our main goal is to continue the merging of the techniques developed within the framework of the information retrieval and the data warehouses. Indeed, the specifications of the document warehouse need to be extended in order to:

  • define an interrogation language appropriate for the warehouse instead of using SQL language to facilitate query syntax;

  • apply the multidimensional operators to textual marts in a textual way, according to a formalism or in a graphic way;

  • extract statistical information and knowledge to explain the behaviors of users and the definition of user profiles.

Let us assume that the document warehouse is the base for the definition of a business memory; it is intended for any person in an organization who must quickly access and analyze any useful information. This memory must contain any knowledge extracted from document content (i.e., from structure and textual parts ). Our future work will aim to extend the process of textual analysis to integrate personalization criteria and metadata (by the user himself or by an automatic process).




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net