INTRODUCTION


The World Wide Web (Web) is an interesting and growing environment for different research fields, e.g., Agents and multi-agent systems (see Balabanovic et al., 1995; Knoblock et al., 2000), Information Retrieval (Baeza-Yates & Ribeiro-Neto, 1999; Jones & Willett, 1997), Software Engineering (Petrie, 1996), etc. Over the past two decades, the evolution of the Web, and especially the stored information that can be obtained from the connected electronic sources, have led to an explosion of system development and research efforts.

However, the success of the Web could be its main pitfall: the enormous growth of the information stored creates so many problems that building and maintaining a web application is difficult. Actually, there is increasing interest in building systems which could reuse the information stored in the Web (Fan & Gauch, 1999). To build these systems, several problems need to be analyzed and solved , i.e. how to retrieve, extract and reuse the stored information.

Information extraction (see Freitag, 1998; Kushmerick et al., 1997) is a complex problem because many of the electronic sources connected in the Web do not provide their information in a standardized way. So, it will be necessary to use several types of specialized agents (or any other type of applications) to retrieve and extract the stored knowledge from the HTML pages. Once this knowledge is extracted, it could be used by the other agents.

Several solutions for information extraction have been proposed. Some of the most popular solutions, which have actually been implemented, are related to the Semantic Web (Berners-Lee et al., 2001). Others use XML-based specifications (Bremer & Gertz, 2002) and ontologies (Gruber, 1993) to represent, in a coherent way, the information stored in the Web. In the near future, this approach will provide the possibility of building robust distributed web applications. However, the Semantic Web is still evolving. So, if we wish to build an application that could reuse the information, we need to use other approaches that allow the system to extract the information.

The Wrapper approach (Sahuguet & Azavant, 1999) is one of the most widely used. It uses wrappers (see Sahuguet & Azavant, 2001; Serafini & Ghidini, 2000) which allow access to the Web as a relational database (see Ashish & Knoblock, 1997; Camacho et al., 2002c; Fan & Gauch, 1999). Building those wrappers may be a complex task because, when the information source changes, it is necessary to reprogram the wrappers as well. Several toolkits, including W4F (Sahuguet & Azavant, 2001) and WrapperBuilder (Ashish & Knoblock, 1997), have been deployed to help engineers build and maintain wrappers.

The main goal of this work is to search for mechanisms that allow for the design and implementation of robust and adaptable multi-agent web systems. These mechanisms should also integrate, like a particular skill of some specialized agents (web agents), the ability to automatically filter and extract the available web knowledge. Toward this end, our approach will use a semiautomatic web parser, or simply WebParser , that is deployed as a reusable software component.

The WebParser is used by different web agents, and they can change its behavior by modifying two sets of rules. The first rules are used by the agents to define the knowledge to be extracted from the HTML pages (i.e., different agents can access different sources), and the second set of rules is used to represent the final structure for store the knowledge that has been retrieved (so that any agent can adapt the extracted knowledge). Finally, this parser will be used as a specific skill in several agents to build a specific multi-agent web system (such as SimpleNews).




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net