Basically, the architecture of the research problem lies largely on the design of the agents. It is important to know what types of agents are needed before solving the problem. Figure 1 shows the relationship among the modular agents .
Each mobile user can activate the Main Agent of the OntoQuery system. On the whole, three major application agents, namely, the Synonyms Editor Agent, the Ontology Editor Agent, and the Query Formation Agent, are interfaced to the Main Agent. The Synonyms Editor Agent merely interacts with the Synonyms Agent by editing the semantics of the synonyms. The Ontology Editor Agent interfaces with the Product Ontology Agent to allow the user to edit his own product ontology.
At the same time it saves the new ontology terms with the help of the Synonyms Agent to prepare the synonyms for editing. The Query Formation Agent uses the Product Ontology Agent to aid the user to form queries. It also calls for the Information Retrieval Agent, which is interfaced directly to the Normal Search Agent and Genetic Algorithm Agent. The Genetic Algorithm Agent employs the Restructuring Agent to restructure query population according to the feedback and requirements received from the Feedback Agent. Meanwhile, the Restructuring Agent does query restructuring with the help of the Synonyms Agent. The Feedback Agent also links up the Normal Search Agent, which will submit queries to retrieve information wirelessly from the databases in the Internet through the Internet service provider. In addition, there will be Search Agents residing at the host or the Internet service provider, which provides the information retrieval service with its connected databases. Here, many mobile users are connected to the Internet via the OntoQuery architecture while the Internet connects to many databases. The Normal Search Agent will then display the retrieval results via the Results Agent.
Query formation is a rather straightforward application; there will be multiple requirements just like present query systems in the industry. The challenges here lie in the way queries are created and the format in which they will be sent to the Search Agents. Here, query formation will be done with the aid of tree ontology. Following the tree path will help form the requirements of a query. This allows forming a query easily. Multiple requirements help form a complete query. An illustration of the query formation process is shown in Figure 2. As can be seen from this illustration, using ontology to form a query has become simply a three-step procedure. This helps the user to save several steps by forming a query using the ontology path that is selected. Thus, it can be claimed that forming queries using ontology is actually more efficient then using keywords.
The design of parallel combination is rather straightforward. An ontology does not cover everything. Thus, besides having ontology for the user to click on when forming a query, there should be some fields present for the user to fill in. When these fields are being filled in, they can replace the use of ontology either partially or completely.
For a serial combination, keywords are used to look for ontology terms in the ontology. This is necessary because when the ontology is too large, search for an ontology term by manual clicking becomes difficult. Thus, there would be a field that allows the user to highlight the terms in the ontology itself. This process is similar to how we search for words in Microsoft Word documents, except that now the search is to transverse in an ontology tree structure as shown in Figure 3. From this illustration, it can be seen that using keywords to search ontology terms in the ontology creates an efficient environment and context for the user. It increases the chances for getting results from a tedious search of ontology terms, especially when the product ontology gets bigger.
Using the query formed by the query formation application, an application searches the databases to retrieve information. Intuitively, this application would first do a normal search before allowing the user to proceed with a genetic algorithm search. This is because a genetic algorithm search would definitely take a much longer time than a normal search because of its expensive iterations. The retrieval results are presented to the user and if he is not satisfied, he can then choose to proceed with a genetic algorithm search.
If the user requests the use of a genetic algorithm, the system will request some input from the user to perform genetic algorithm computation. The system then creates a population of queries from the original query. Figure 4 shows the flowchart of the sequence of events. Basically, the genetic algorithm will mutate the queries according to the synonyms of the terms in the ontology.
The main concern of using the genetic algorithm is the design of the fitness function. In this application, three major elements are used to define the fitness function. They are, namely, the fitness of the number of documents retrieved ( f d ), the fitness of the average quality of the query results ( f q ), and the overall correlation for the query ( f r ). The fitness of each chromosome is calculated as follows :
. Indicates that the fitness function is normalized to form a population distribution function.
The calculation of the value of f d is not as straightforward. Since, the drawback is getting either too few or too many documents, the user should be able to specify the ideal number of documents ( i ). Typically, if the user does not know the value of i , the default value will be 20. Using the value of i as the mean value, two band -pass-filter-like functions, namely, the triangular and Gaussian functions, are used to create a more flexible mapping from number of documents retrieved ( d ) to f d . The triangular function gives a constant drop in "gain" (decrease in fitness) from the "center frequency" (the mean). This function is good when the user wants to give an equal amount of demerits for every document that is away from his expected or ideal number. By specifying the value of the gradient for the right side of the triangle, the user can determine how many demerits to give for every document that exceeds his expected number. When the value of this gradient is 0, the triangular function will look like a "high-pass filter". The Gaussian function is a more robust or high-ordered "band pass" such that its "bandwidth" could be specified by the value of the standard deviation. Thus, this function is useful when the user wants to give heavy demerits to queries that do not fall near his expected or ideal number.
Only requirements that are specified by numerical constraints will have the value f q . For example, documents retrieved according to price < 2000 have numerical results and would retrieve results under the columns of price, while documents retrieved according to brand = Sony do not have numerical results under the columns of brand. Here, it is required that the numerical values are summed up and averaged. Then, they are normalized. In addition, there is another consideration. The signs "<" and ">" formed during the query indicate the direction in which the quality is favored towards. For example, when encountered with a "<" sign, it is logical to give higher fitness to the chromosomes with smaller values in the numerical results. Here, a reciprocal function (1/x) is used before normalization takes place.
The next interesting portion that contributes to the fitness function is the correlation of the synonyms ( f r ) with the ontology terms. A value from 0 to 1 is assigned to each relation between the ontology terms and their synonyms. When a requirement is <Television><Price><<><2000>, the value of f r will be the product of all the discrete correlations . Also, the user should be able to edit the correlation values to his preference. In addition, when there are many requirements in the query, these requirements will be linked with "OR" or "AND" terms. In this case, multiplying these correlation values together would not be rational. This is because, when too many terms are different, the value of f r would become very small. In order to alleviate this problem, using maximum and minimum approaches would be more suitable than the product approach.
The concept of mutation is to replace some terms with synonyms when parsing the results. Basically the mutants are the terms that are included in each query. These terms are mutated randomly according to the synonyms so that new populations will be formed. Crossover will only be just interchanging the different genes between two different chromosomes. A one-point crossover will be performed. This is also done randomly .
The survivors are selected according to their overall fitness in the roulettewheel selection manner. However, before this is done, the system will prompt for feedback from the user. The feedback will show the user some quality of each query. From this quality metric, the user may choose to kill queries that do not meet his requirements. Figure 5 shows a screenshot of a feedback presented to the user.
Those selected query results from user feedback will have their fitness reevaluated and then go through a selection of survival. In addition, if the user is satisfied with the results, he can choose to end the genetic algorithm by clicking on the "Stop" button. In this way, he can look at the retrieved results immediately. The final result will show the product items that were retrieved according to those queries that were evolved each round and survived the feedback selection.
There is a trade-off between accurate feedback and computational time. As our feedback displays the fitness results of the user, it gains accuracy but loses efficiency in terms of computational time. When computational time is more important than accuracy, instead of displaying the fitness results after evolution, it might be better if the system just displays the mutated queries before evolution. Thus, the feedback could be provided earlier to the stage when the queries are just mutated. Also, if the combination of the two types of feedback is to be adopted, early feedback can kill some queries first and the survivors will go through the second feedback.
Besides replacing query terms using synonyms, logical operators and numerical constraints can be restructured. This is necessary when the system cannot even retrieve any document with its synonyms. However, it is still necessary to inform the user and consult his permission to carry on with this restructuring.
The system will mutate "AND" operators that link each requirement to "OR" operators. Logically, no documents will be retrieved if two far- fetched requirements are submitted as a query. Thus, mutating a logical "AND" term to "OR" term can be a good choice to relax query constraints and yet it does not affect the originality of the query by a lot. In addition, the system can mutate numerical constraints. The system applies step-by-step relaxation to numerical constraints in a query until some documents are found. An illustration is shown in Figure 6.
This illustration is performed at steps of only 5%. As can be seen, because of the compounding effect, one item could be found during the fifth iteration of the process. Obviously, this method is still an effective method in terms of query restructuring because it is able to retrieve something which may be useful for the user.
The screenshots used for the illustrations are implemented using Java in a desktop computer. One major consideration here is that mobile devices tend to have a much smaller screen. Therefore, in order to realize the full OntoQuery architecture, there is a need to scale down the display size . Another possible solution is using scrollbars to view the screen, as shown in Figure 7.