So far, the spider has crawled the pages, and the search engine has analyzed the markup and text on each one. The next step is creating the search index, which is a specially designed database that the search engine uses to quickly find the matching pages in response to any search query.
A search engine "remembers" which words are on which pages by storing them in its search index. In its simplest form, a search index contains a record for every word, followed by a list of the pages that contain that word. So, when searching for the word "glaucoma" in Google, the Google search engine looks in its index for the record for glaucoma and retrieves the list of pages.
When a search engine creates its search index, it examines the unique words on each page that the spider finds, and for each word it checks to see whether a record exists in the index. If so, it adds the Web page's address (Uniform Record Locator, or URL) to the end of the record. If no record for that word exists, a new record containing that URL is created. The actual URL would take up a lot of space in the index, so the search engine converts each URL to a unique number that it stores in the index. Figure 2-5 showed how a simple search index might look.
In addition, the search engine stores the metadata about each page for use in displaying the search results. So, it stores the URL, the title, and any information needed to display the snippets that highlight where the terms were found. That way, when it must display that page as a search result, it has all the information in the index to do so.