MODELLING BLOCKS


In this work, it was important to adopt a flexible yet powerful way to represent both background information as well as a document. XML (Bray et al., 1998) was, thus, adopted to represent both. Background information is stored in an XML file, which is used to represent index terms. The file has the structure shown in Figure 1.

start figure
 <indexTerms>         <general_category indexChildNodes= "true" >                 <name> diseases </name>                 <sameAs> disorders </sameAs>         </general_category>         <general_category indexChildNodes= "true" >                 <name> Varieties </name>         </general_category>         <disease indexChildNodes= "false" >                 <name>Powdery Mildew</name>                 <sameAs>  aSynonym  </sameAs>                 <sameAs> ........... </sameAs>         </disease>         ...         ...         <operation indexChildNodes= "false" >                 <name>  aNameOfanOperation  </name>                 <sameAs>  aSynonym  </sameAs>         </operation>         ...         ...         <pest indexChildNodes= "false" >                 <name>  aNameOfaPest  </name>         ...         ...         </pest>         ...         ... </indexTerms> 
end figure

Figure 1: XML Representation of Background Knowledge

This representation, despite its simplicity, allows for the mapping of various phrases to their corresponding categories, and provides a simple thesaurus using the <sameAs> tag. The indexChildNodes can be used to specify whether or not specializations of a given term should be indexed as belonging to that term , i.e. whether or not a document's hierarchy is to be utilized.

A document will have the XML representation illustrated in Figure 2.

start figure
 <doc>         <title>  aTitle  </title>         <section>                 <id>102328933656>/id>                 <level>1</level>   the level of a section within a document hierarchy   <heading>  the text heading of the section  </heading>                 <text>  a pure text representation of the contents of the section  </text>                 <html> <![CDATA[  the html text representation of this section  ]] < /html>         </section>         <section>                 .....                 .....         </section> </doc> 
end figure

Figure 2: XML Representation of an Unindexed Document



(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net