Chapter VI: Multilingual Web Content Mining - A User-Oriented Approach


Rowena Chau, Monash University, Australia
Chung-Hsing Yeh, Monash University, Australia

This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual linguistic knowledge required for multilingual web content mining is made available by encoding all multilingual concept- term relationships using a multilingual concept space. With this linguistic knowledge base, a concept-based multilingual text classifier is developed. It reveals the conceptual content of multilingual web documents and forms concept categories of multilingual web documents on a conceptbased browsing interface. To personalize multilingual web content mining, a concept-based user profile is generated from a user's bookmark file to highlight the user's topics of information interest on the browsing interface. As such, both explorative browsing and user-oriented, concept-focused information filtering in multilingual web are facilitated.

INTRODUCTION

The rapid expansion of the World Wide Web throughout the globe means electronically accessible information is now available in an ever-increasing number of languages. With the majority of this web data being unstructured text (Chakrabarti, 2000), web content mining technology capable of discovering useful knowledge from multilingual web documents, thus, holds the key to exploiting the vast human knowledge hidden beneath this largely untapped multilingual text. Moreover, users' information interests differ . Knowledge useful to one user may not be useful to another. Mining the multilingual web content and delivering the discovered knowledge without considering the user's information interest may not be effective.

To help each user discover knowledge specific to his domain of interest from the multilingual web, a user-oriented approach to multilingual web content mining is required. The user-oriented, concept-based, multilingual web content mining approach introduced in this chapter is such an approach. The objective of this approach is to facilitate personalized multilingual web content mining, which is important, especially when the user's motive for information seeking is personalized global knowledge discovery.




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net