POTENTIAL SELECTION CASES IN DTDS


In the real world, a web user usually tries to find the information relevant to a given topic. The categorization of web databases into subject (topic) domains can help to alleviate the time-consuming problem of searching a large number of databases. Once the user submits a query, he/she is directly guided to the appropriate web databases with relevant topic documents. As a result, the database selection task will be simplified and become effective.

In this section, we will analyze potential database selection cases in DTDs, based on the relationships between the subject domains that the content of the databases may cover. If all the databases have the same subject domain as that which the user query involves, relevant documents are likely to be found from these databases. Clearly, under such a DTD environment, the above database selection task will be drastically simplified. Unfortunately, the databases distributed on the Internet, especially those large-scale commercial web sites, usually contain the documents of various topic categories. Informally, we know that there exist four basic relationships with respect to topic categories of the databases: (a) identical; (b) inclusion; (c) overlap; and (d) disjoint .

The formal definitions of different potential selection cases are shown as follows :

Definition 6
start example

For a given user query q , if the contents of the documents of all the databases come from the same subject domain(s), we will say that an identical selection case occurs in DTDs corresponding to the query q .

end example
 
Definition 7
start example

For a given user query q , if the set of subject domains that one database contains is a subset of the set of subject domains of another database, we will say that an inclusion selection case occurs in DTDs corresponding to the query q .

end example
 

For example, for database S i , the contents of all its documents are only related to the subject domains, c 1 and c 2 . For database S j , the contents of all its documents are related to the subject domains, c 1 , c 2 and c 3 . So, C i C j .

Definition 8
start example

For a given user query q , if the intersection of the set of subject domains for any two databases is empty, we will say that a disjoint selection case occurs in DTDs corresponding to the query q . That is, ˆ S i , S j ˆˆ S ( 1 i, j n, i ‰  j ), C i ˆ C j = ˜.

end example
 

For example, suppose database S i contains the documents of subject domains c 1 and c 2 , but database S j contains the documents of subject domains c 4 , c 5 and c 6 . So, C i ˆ C j = ˜.

Definition 9
start example

For a given user query q , if the set of subject domains for database S i satisfies the following conditions: ˆ S j ˆˆ S ( 1 j n, i ‰  j ), (1) C i ˆ C j ‰  ˜, (2) C i ‰  C j , and (3) C i C j or C j C i , we will say that an overlap selection case occurs in DTDs corresponding to the query q .

end example
 

For example, suppose database S i contains the documents of subject domains c 1 and c 2 , but database S j contains the documents of subject domains c 2 , c 5 and c 6 . So, C i ˆ C j = c 2 .

Definition 10
start example

For a given user query q , ˆ S i , S j ˆˆ S ( 1 i, j n, i ‰  j ), c k ˆˆ C i ˆ C j ( 1 k p ) and the subsets of documents corresponding to topic category c k in these two databases, D ik and D jk , respectively. If they satisfy the following conditions:

  1. the numbers of documents in both D ik and D jk are equal, and

  2. all these documents are the same,

then we define D ik = D jk . Otherwise, D ik ‰  D jk .

end example
 
Definition 11
start example

For a given user query q , ˆ S i , S j ˆˆ S ( 1 i, j n, i ‰  j ), if the proposition c k ˆˆ C i ˆ C j ( 1 k p ), D ik = D jk Simi Li (D ik , q) = Simi Lj (D jk , q) is true, we will say that a non - conflict selection case occurs in DTDs corresponding to the query q . Otherwise, the selection is a conflict selection case . Simi Li (S i , q) (1 i n ) is the local similarity function for the i th database with respect to the query q .

end example
 
Theorem 1
start example

A disjoint selection case is neither a non-conflict selection case nor a conflict selection case.

end example
 

Proof: For a disjoint selection case, ˆ S i , S j ˆˆ S ( 1 i, j n, i ‰  j ), C i ˆ C j = ˜, and D i ‰  D j . Hence, databases S i and S j are incomparable with respect to the user query q . So, this is neither a non-conflict selection case nor a conflict selection case.

By using a similar analysis to those on the previously page, we can prove that there are seven kinds of potential selection cases in DTDs as follows:

  1. Non-conflict identical selection cases

  2. Conflict identical selection cases

  3. Non-conflict inclusion selection cases

  4. Conflict inclusion selection cases

  5. Non-conflict overlap selection cases

  6. Conflict overlap selection cases

  7. Disjoint selection cases

In summary, given a number of databases S , we can first identify which kind of selection case exists in a DTD based on the relationships of subject domains among them.




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net