data mining: opportunities and challenges
Chapter XIII - Query-By-Structure Approach for the Web
Data Mining: Opportunities and Challenges
by John Wang (ed) 
Idea Group Publishing 2003
The content of a document is clearly crucial in any type of query. However, based on the content presented in this chapter, it should be apparent that the query-by-structure has potential as well. The CLaP system demonstrated that utilizing structure as well as content could significantly improve query performance. The neural network systems resulted in an equally good performance and demonstrated that the burden of identifying structure within the document can become the responsibility of the system rather than the user. Since most data on the Web is in the form of HTML documents, this data can be thought of as at least semi-structured because the HTML tags within the document conform the data to some form of structure. Clearly, being able to take advantage of this structure could improve the query process.

At first glance, the new Semantic Web described earlier might lead one to think that the query-by-structure approach presented here could become obsolete rather shortly. However, there are at least two major implications that should lead one to conclude the exact opposite. First, what is to be done with the billions of already created or soon to be created Web pages containing only HTML with no semantic information whatsoever. Clearly, it would be unreasonable to think that all of these documents must be redesigned to adapt to this new Web. Secondly, if one thinks of these new semantic tags as simply another form of structure (i.e., semantic structure instead of presentation structure), the query-by-structure approach might, in fact, be extremely relevant to this new Semantic Web.

Year: 2003
Pages: 194
