Evaluation of the Genetic Algorithm

Effectiveness of the genetic algorithm

It is believed that the effectiveness of the genetic algorithm chosen is mainly determined by its supremacy in query effectiveness amplification. This is because its evolution power allows more retrieval results. The system was tested with a product list database. The effectiveness was measured by testing a series of queries with and without using the genetic algorithm. For example, a query "<Product Ontology><Grocery><Price><<> <3>" can only retrieve 9 items when only a normal search was performed but can retrieve 98 items when a genetic algorithm was performed. Table 1 shows the other results obtained by other queries.

Table 1: Results showing effectiveness of GA

Query Formed

Without GA

With GA

<Product Ontology><Drinks><Price><<><2>



<Product Ontology><Diary><Price><<><3>



<Product Ontology><Candy><Price><<><3>



<Product Ontology><Confectionery><Price><<><3>



<Product Ontology><Confectionery><Supplier> <Contains><Ho>



By comparing the results shown in Table 1, it is obvious that using a genetic algorithm does in fact retrieve more items than using a normal search.

Effect of the fitness function

The fitness function in a genetic algorithm determines how well it can optimize a query. The OntoQuery system tested out various fitness functions to improve the power of the genetic algorithm. The usage of triangle or Gaussian functions to evaluate the fitness for the number of documents retrieved suggested some ways to counter the "too many or too few retrieved documents" dilemma in typical search engines. The fitness for quality allows the assessment of queries based on what they can retrieve. This gives a good weight to prevent the fitness for the number of documents retrieved to dominate the fitness function.

It is thought that the introduction of using correlation prevents the original query from mutating into irrelevant queries. Implementing a choice to select whether to include the correlation fitness in the fitness function in OntoQuery creates a chance to test this claim. Figure 8 shows the graphs of the trends. From the trendlines, it can be seen that a converging trend was achieved such that the queries will still be quite relevant when correlation fitness is used. A more diverged or decreasing trend for the mutated queries was obtained when correlation fitness is not included. This proves that the use of correlation can prevent the original query from mutating into irrelevant queries.

Figure 8: Trends for the correlations between mutated queries and original query

Efficiency of the genetic algorithm

Although using the genetic algorithm allows a more flexible and effective platform in retrieving information, there is no doubt that it trades off efficiency due to its expensive iterations. Thus, the only study that can be made here is about its improvement over relevance feedback. In relevance feedback, query expansion is achieved by modifying a query. Similarly, the genetic algorithm extends the relevance feedback techniques with an addition rule, the survival of the fittest.

In this research, the efficiency of the system is measured as follows :


  • E denotes the effectiveness of the system.

  • t denotes the time taken for the system.

  • D denotes the number of relevant documents retrieved.

  • I denotes the number of iterations.

Efficiency is formulated as above because it is believed that the number of documents retrieved is linearly proportional to the effectiveness of the system. Also, the number of iterations is directly related to the time taken to retrieve the results.

Effect of the population size

This test involved the comparison of the results obtained by varying population size from values 1 to 6. A few queries were used to find out the optimal population size by comparing the efficiency obtained when each population size was used. The number of generations is set to 3. Figure 9 shows the results obtained by averaging three samples.

Figure 9: Graph of efficiency versus population size

From Figure 9, it can be seen that the efficiency initially increases with the population size but eventually decreases for all the queries used in this test. Every query has an optimal population size. However, these optimal values are not the same. In addition, the optimal efficiencies peaked at different values. A study was made on these findings and it was found that the optimum value of the population size depends much on the uniformity of the database. For example, in the category "Confectionery" or "Candy", the number of items that are contained under the category of each synonym in the database is more uniform than in the category of "Drinks" or "Grocery". Also, the optimal efficiency value is higher when the database is bigger. This is reasonable, as more results would be returned.