9.7 The Search Service


The search service uses the same collected object pattern as the crawler/indexer. Our two classes this time are the QueryBean, which is the main entry point into the search service, and the HitBean, a representation of a single result from the result set. In order to perform a search, we need to know the location of the index to search, the search query itself, and which field of the indexed documents to search:

private String query; private String index; private String field;

We also need an extensible collection to store our search results:

private List results = new ArrayList( );

We must provide a constructor for the class, which will take three values:

public QueryBean(String index, String query, String field)   {       this.field = field;       this.index = index;       this.query = query;   }

The field variable contains the name of the field of an indexable document we want to search. We want this to be configurable so future versions might allow searching on any field in the document; for our first version, the only important field is "contents". We provide an overload of the constructor that only takes index and query and uses "contents" as the default for field:

public QueryBean(String index, String query)   {       this(index, query, "contents");   }

The search feature itself is fairly straightforward:

public void execute( ) throws IOException, ParseException {     results.clear( );     if (query == null) return;     if (field == null) throw new IllegalArgumentException("field cannot be null");     if (index == null) throw new IllegalArgumentException("index cannot be null");     IndexSearcher indexSearcher = new IndexSearcher(index);     try {       Analyzer analyzer = new StandardAnalyzer( );       Query q = QueryParser.parse(query, field, analyzer);       Hits hits = indexSearcher.search(q);       for (int n=0; n<hits.length( ); n++) {         if (hits.score(n) < THRESHOLD_SCORE) {           return;         }         Document d = hits.doc(n);         String title = safeGetFieldString(d, "title");         results.add(new HitBean(d.getField("url").stringValue( ),                     safeGetFieldString(d, "title"), hits.score(n)));       }     } finally {       indexSearcher.close( );     }   }

First, we make sure our results collection is empty and all our arguments are within appropriate ranges. If they are, we create a new instance of Lucene's IndexSearcher, pointing it to the current version of the search index. Next, we invoke Lucene to do the actual search by creating an instance of Lucene's Query class, passing in our search term(s), the field we are searching, and a new instance of Lucene's StandardAnalyzer. The result of the IndexSearcher's search method is a collection of Lucene Hit objects, sorted in descending order by score. We grab the values we need from them in order to create instances of our own HitBean class. Notice we're using the helper method safeGetFieldString to retrieve values from the hit's document:

private String safeGetFieldString(Document d, String field) {     Field f = d.getField(field);     return (f == null) ? "" : f.stringValue( ); }

This prevents us from adding a null instead of the empty string as our field value. Last, but certainly not least (it's in the finally block because it's important), we close the indexSearcher handle to the index. This step is vital when we start exposing the service via a web service: open handles to the index prevent other users from accessing it.

The HitBean is primarily for storing simple result data:

final String url; final String title; final float score; private static NumberFormat nf; static {   nf = NumberFormat.getNumberInstance( );   nf.setMaximumFractionDigits(2); } public HitBean(String url, String summary, float score) {   this.url = url;   this.title = summary;   this.score = score; } public String getScoreAsString( ) {   return nf.format(getScore( )); } public String getUrl( ) {   return url; } public String getTitle( ) {   return title; } public float getScore( ) {   return score; }

Instances of the class store a full URL to the result file, the title of that file, and a relative rank score. We provide a series of getters to retrieve those values and a single constructor to initialize them. The only interesting part is the use of the java.text.NumberFormat class to create a formatter for our result score.

Once we chose Lucene as our search tool, our code became very straightforward. After a user supplies a search term, we simply verify that the query will run as provided and then execute it, compiling the results into a simple series of HitBean instances.

9.7.1 Principles in Action

  • Keep it simple: simple objects representing query and results, unit tests for search results

  • Choose the right tools: Lucene, JUnit

  • Do one thing, and do it well: QueryBean focuses on search, ResultBean is simple data structure, and IndexPathBean encapsulates the configurable index property

  • Strive for transparency: shadow-copied index so search and index can run simultaneously

  • Allow for extension: none



Better, Faster, Lighter Java
Better, Faster, Lighter Java
ISBN: 0596006764
EAN: 2147483647
Year: 2003
Pages: 111

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net