Aggregated search and indexing is one of the most important features of Microsoft Office SharePoint Server 2007. This chapter shows you how to use the various features of Search.
At the heart of the search engine is the crawler. The crawler goes out and gathers from the content source the content that needs to be placed in the index. After the index is built, users execute a query against the index to receive a result set. The crawler does only what it is instructed to do. Hence, when it crawls content, it only crawls the content that you have instructed it to crawl, and the crawl actions occur within the security rule and timing rules that you create manually.
For those who ran previous versions of SharePoint Server, the crawler was encapsulated inside the Gatherer service. The Gatherer service is now called the crawler. The crawler works by connecting to the content source and extracting the data. However, it can only connect to the content source if the appropriate protocol handler is installed. The protocol handler is used by the crawler to connect to the content source. Once connected, the crawler opens the files (read permission is all that is required for this action) and extracts the contents of the file without copying down the entire file. The crawler can only extract content if the correct iFilter (Index Filter) is installed on the indexing server. The iFilter instructs the crawler as to what type and kind of documents it will be reading once it arrives at the content source. For example, if you need to crawl the contents of a file server, then the crawler will load the "File" protocol handler, which will allow the crawler to use Server Message Blocks (SMB) and Remote Procedure Calls (RPC) to connect to the content source. The crawler will also load various file-type iFilters, such as Word, Excel, and text, so that, once connected, the crawler can open the files on the file server and read its content.
After the crawler has connected to the content source and has cracked open the documents, it streams the content from the content source to the index server. When the content arrives at the index server, the Indexer chunks the stream into 64-KB chunks, performs word breaking and stemming on the words, removes the noise words (words that you have specified not to appear in the index), and then sends the content itself to the index (Content Store) and the metadata to the SQL search databases for the Shared Services Provider, also known as the Property Store.