BACKGROUND


Web content mining has attracted much research attention in recent years (Kosala & Blockeel, 2000). It has emerged as an area of text mining specific to web documents, focusing on analyzing and deriving meaning from textual collections on the Internet (Chang et al., 2001). Currently, web content mining technology is still limited to processing monolingual web documents.

The challenge of discovering knowledge from textual data which are significantly linguistically diverse has been well recognized by text mining research (Tan, 1999). In a monolingual environment, the conceptual content of documents can be discovered by directly detecting patterns of frequent features (i.e., terms) without precedential knowledge of the concept-term relationship. Documents containing an identical known term pattern, thus, share the same concept. However, in a multilingual environment, vocabulary mismatch among diverse languages implies that documents exhibiting a similar concept will not contain identical term patterns. This feature incompatibility problem, thus, makes the inference of conceptual contents using term pattern matching inapplicable.

To enable multilingual web content mining, linguistic knowledge of concept-term relationships is essential to exploit any knowledge relevant to the domain of a multilingual document collection. Without such linguistic knowledge, no text or web mining algorithm can effectively infer the conceptual content of the multilingual documents.

In addition, in the multilingual WWW, a user 's motive for information seeking is global knowledge exploration. As such, major multilingual web content mining activities include: (a) explorative browsing that aims to gain a general overview of a certain domain; and (b) user-oriented concept-focused information filtering that looks only for knowledge relevant to the user's personal topics of interest. To support global knowledge exploration, it is necessary to reveal the conceptual content of multilingual web documents by suggesting some scheme of document browsing to the user that suits his information seeking needs.




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net