Searching in WSS and MOSS | Software Testing Fundamentals: Methods and Metrics

One of the greatest time savers is search engines. Just look at how often you use MSN Search or Google, just to mention a couple of them. On the Internet, searching is absolutely critical, since you have no idea where information is stored, and there may be new sources one minute from now. That is why you search all the time. This is not really that different from the way you use your internal network. True, the volume of information is much smaller in your network, and you know where at least some of it is stored, since you created it. Still, it does not take much activity within an organization to create so much information that the average user loses track of where things are stored. So, users start looking around to find the file, document or whatever they are looking for. After some minutes, they find it. The question then becomes is this the latest version, or is there a newer version somewhere? Then when they get what they're looking for, they most likely want to be notified if that document gets updated later on. What you need is a solution that helps you:

q Find information regardless of where it is stored.
q Make sure that it is the latest version.
q Send a notification to you when this information is updated.

SharePoint has solutions for the first bulleted item, but there is a great difference between what MOSS is offering and the WSS environment, as the following section will cover in more detail. The second bullet is covered by the built-in version management of documents and list content in both MOSS and WSS, and the third bulleted item is the alert feature, also a built-in feature in MOSS and WSS. So, let's focus on the search functionality.

Searching in WSS

WSS 3.0 has a basic search feature that will allow users to search for content. This is a big change from previous WSS versions, which required WSS to be configured to run on SQL Server 2000, using its Full- Text Indexing engine. Another important change is that WSS 3.0 will search in subsites, while the previous version of WSS only searched in the current site. The following list summarizes the search functionality in WSS, when combined with SQL Server:

q Finds information of any type, stored in the current site, or a subsite.
q Provides free-text searching in documents, files, and all list content.

SharePoint will create a special Windows SharePoint Services Search (MSSearch.exe) in Windows 2003. This service must be running before WSS can use it for indexing and searching. The steps to configure WSS to use this search feature are described in the following Try It Out.

Try It Out Activate Search Indexing in WSS

Log on as an administrator.
Start SharePoint's Central Administrative tool. Switch to the Operations page, then click Services on Server.
Make sure that the correct Server is selected, and then start Windows SharePoint Services Search service, if it's not already started. Fill in this information in the web form:

Important
If WSS was installed using SQL Express, the Search service may not be listed. If this is the case, select All on the View menu on the toolbar.
1. Service Account: Enter the service account and its password for the search service; be sure to include the domain name, for example, filobit\sp_service.
2. Content Access Account: Enter the default user account to be used by the search service when searching content sources. You can later configure other user accounts for specific content sources.
3. Search Database: Enter the SQL server name, plus the Database Name for the index. Use the default database and name, if you don't have a good reason not to.
4. Indexing Schedule: Enter how often the index process will run. The default is every five minutes.
5. Click OK to save and close the page.
Important
When you create a new web application, make sure to select a search server.
When the process is done, all documents and lists are also indexed and ready to be searched within five minutes. Open any WSS team site, and use the search field at the top-right corner of the page. Type a text string that you know exists in any of the lists or inside any documents stored in this team site.

Understanding the Search Feature in WSS

When searching is activated in WSS, there is nothing more to configure. The search engine in WSS is fast and stable; its behavior is controlled by stored procedures in SQL Server. You may find tips on how to optimize these stored procedures, but before you do that you must understand that this will violate the conditions for getting help from Microsoft's support team! It may also create problems when you install the next service pack or upgrade to the next release.

The objects indexed by Full-Text Indexing are these:

q List items: Such as individual names in a Contact list.
q Documents: Documents of these types: .doc, .xls, .ppt, .txt, and .html.
q Lists: Such as Announcements, and Events.

There are also objects that will not be indexed and therefore not searchable:

q Nontext columns in lists - for example, Lookup fields, currency, Yes/No.
q Attachments to list items.
q Survey lists.
q Hidden lists.

The process of reindexing new or modified information is automatic in WSS; the default schedule is to run an incremental indexing process every five minutes. As soon as this process is done, users can search for it.

The search field in the top-right corner of the web page (unless moved) is visible in all team sites. Enter the string you are searching and press Enter or click the icon to the right of the search field. Note that if you enter more than one text string, it will match any object with either or both of the strings; this is called a Boolean OR search. The search engine is using a type of search called FREETEXT; this type of search uses a feature called stemming. For example, if you search for the word Run, it will also match Running and Ran. Therefore, you must enter the complete word. You cannot search for Admin and find Administrator, for example, since Admin is an abbreviation, not a complete word.

Important

Stemming only works with certain languages, such as English and German.

All of these constraints and behaviors are due to the way the stored procedures are defined. If you absolutely must change this, be sure to make a backup of the original stored procedure, and make notes of what you did and why, so that later on anyone can restore or remove this customization, if necessary.

Important

You may find tips on the Internet about how to enhance the search functionality in WSS, but remember the warnings above about modifying stored procedures in SQL, since Microsoft will not support your system!

Indexing New File Types

You can have the MS Search service in WSS index more file types than it does by default. The most common request is for Adobe's PDF files. What MS Search needs to index any file type is a program that can open that file and read its text. Such a program is called an index filter, or IFilter for short. So, to index PDF files you need an IFilter for PDF. The good news is that this IFilter is free to download from Adobe's web site:

     www.adobe.com/support/downloads/detail.jsp?ftpID=2611

Note that this IFilter is regularly updated; make sure you get the latest version. After you have downloaded this program, install it on the SQL server, if you are using separate WSS and SQL servers.

Important

This is only true if you are running a pure WSS environment! If you are running a MOSS server, this IFilter must be installed on all SharePoint servers with the Index role.

After the installation, Microsoft recommends in the following Knowledge Base article that all existing PDF files must be reloaded, or updated, in order to be indexed:

     http://support.microsoft.com/kb/927675/en-us

But in many cases it will actually be enough to force a full update in order to index these existing PDF files.

Searching in MOSS

This is one of the strongest features in Office SharePoint Server! It has its own search and index engine, completely independent of the Full-Text Indexing service in SQL Server. In fact, you can activate them both. However, it will be a waste of resources, since the MOSS search and indexing feature works in any web site, including both MOSS sites and WSS sites. A summary of the search features in MOSS are:

q Search everywhere in SharePoint - any MOSS site, any team site, and any workspace site.
q Can search almost any content source outside SharePoint - file servers, MS Exchange servers, Lotus Notes, and other web servers, including any public web site on the Internet.
q With MOSS Enterprise Edition you can use the Business Data Catalog feature to search in external databases and applications, such as Oracle, SAP, and Navision.
q Search all MS Office file types by default, plus all neutral file formats, such as TXT, HTML, and so on.
q Can be extended to search any file type. All that is needed is an IFilter for each file type.
q You can control which file types are to be indexed, even if there is an IFilter installed for them.
q The user profile properties will be indexed. You can search for a user with a specific property.
q You can set the schedule for full and incremental indexing. You can also force a full indexing anytime.

This indexing and search feature is activated by default for all information stored in SharePoint, both MOSS sites and WSS team sites; there is no special configuration needed to activate this. Since this feature is much more advanced than the full-text search in SQL Server, there is also a lot more configuration you can do; however, this also requires more management. You, as an administrator, must understand how this feature works in MOSS and what you can do to optimize it. This is especially true when a problem arises, such as when the search results are not as expected, or when a content source isn't indexed. The following section will tell you all you need to know for your everyday work as an administrator, and how to extend and adjust this very important feature.

Important

For an in-depth description of the Search and Indexing feature, see the Microsoft SharePoint Product and Technology 2007 Resource Kit.

The Basics

There are two MOSS services engaged in this feature:

q Indexing: Responsible for crawling content sources and building index files.
q Searching: Responsible for finding all information matching the search query by searching the index files.

This is important: All searching is performed against the index files; if they don't contain what the user is looking for, there will not be a match. So, the index files are critical to the success of the search feature of MOSS. In fact, practically all configuration and management is related to the indexing service. The search functionality can be described in its simplest form as a web page where the user defines his or her search query.

The index role can be configured to run on its own MOSS server, or run together with all the other roles, such as the Web service, Excel Services and Forms Services. It performs its indexing tasks following this general workflow:

SharePoint stores all configuration settings for the indexing in its database.
When activated, the index will look in SharePoint's databases to see what content sources to index, and what type of indexing to perform, such as a full or incremental indexing.
The index service will start a program called the Gatherer, which is a program that will try to open the content that should be indexed.
For each information type, the Gatherer will need an Index Filter, or IFilter, that knows how to read text inside this particular type of information. For example, to read a MS Word file, an IFilter for .DOC is needed.
The Gatherer will receive a stream of Unicode characters from the IFilter. It will now use a small program called a Word Breaker; its job is to convert the stream of Unicode characters into words.
However, some words are not interesting to store in the index, such as "the", "a", and numbers; the Gatherer will now compare each word found against a list of Noise Words. This is a text file that contains all words that will be removed from the stream of words.
The remaining words are stored in an index file, together with a link to the source. If that word already exists, only the source will be added, so one word can point to multiple sources.
If the source was information stored in SharePoint, or a file in the file system, the index will also store the security settings for this source. This will prevent a user from getting search results that he or she is not allowed to open.

Pretty straightforward, if you think about it. But the underlying process is a bit more complex. Fortunately you do not need to dive into these details, unless you have a very good reason to. By default, MOSS will create a single index file. This index file is not stored on the SQL server, as the other information stored in SharePoint is; instead, it is stored in the file system on the server configured to run the Index role in the SharePoint farm. This index file is stored in separate folders in the following location (assuming that you have used the default installation folder):

     <Drive:>\Program Files\Microsoft Office Servers\12.0\DATA\Office↩     Server\Applications\<Application GUID>

The Application GUID is a unique hexadecimal string that identifies a specific SSP instance, such as . If you have more than one SSP instance created on the same server, you can check the following registry key to see exactly what portal this Application GUID is pointing to:

     HKEY_Local_Machine/Software/Microsoft/Office↩     Server/12.0/Search/Applications/<GUID>/CatalogNames

The property DisplayName will tell you what SSP instance this is. The number of files and folders stored in each index folder may surprise you, but indexing is a complex process and it shows here. You do not need to configure these files, since everything is managed by SharePoint's administration pages.

The Gatherer process keeps a log of all its activities. These log files are also stored in this folder structure, but the easiest way to view these log entries is to use SharePoint's administrative web pages.

Configuring Searching and Indexing

By default, SharePoint takes care of configuring the search and index feature. Still, there are a lot of things for an administrator to do, especially when you want to extend the information indexed, for example, by adding new content sources and new file types, or by just forcing a full reindexing. To open the start page for all these administrative activities, start SharePoint's Central Administration tool, click the SSP instance name, (e.g., SharedServices1), and then click Search settings. The next page is divided into three sections:

q Crawl Settings (see Figure 8-12): Contains the status of the index, the number of documents found, and when it was last indexed. It also contains links to the main part of all configuration settings related to search and indexing, such as managing content sources, metadata property mappings, and resetting all crawled content. This section is what you will work with most of the time, when it comes to search and indexing activities, as you will see later in this chapter.

Figure 8-12
q Scopes (see Figure 8-13): Use this section of the Search settings page when you need to manage existing search scopes or create new ones.

Figure 8-13
q Authoritative Pages (see Figure 8-13): Use this section of the Search settings page when you need to define what web site URLs are more important than others, regarding search results. Sites listed here will be listed before other URL sources in the search results, that is, you can control the ranking of search results. There are three levels of authoritative pages: Most, Second, and Third, and it works like this: If you search for the string ABC, and there are two documents containing this word, one in a web page defined as Second-level and one in a Third-level authoritative page, the document stored in the Second-level page will get higher ranking, and therefore be listed above the other document. To define web page URLs as Most, Second, or Third level pages, click the link Specify authoritative pages, and enter the URLs for each level, as depicted in Figure 8-14.

image from book
Figure 8-14

Checking Errors and Warnings

With this information in mind, let's work with the indexing feature now. For example, say that you want to check for any error or warning listed in the Crawl Settings section above. Look at the line's Errors in log; if you see something other than 0 here, you may have a problem. To see exactly what error or warning it is, click on the error number, or use the link Crawl Logs. This will open the Crawl Log page, as shown in Figure 8-15.

image from book
Figure 8-15

A typical error would be that the crawler process failed to connect to a site; for example, an Internet site such as http://www.microsoft.com. This may be a problem with accessing Internet. Another problem may be a file that is locked by another process. This is usually no problem, since SharePoint will try to index that location next time.

Forcing an Update

Still on the Search settings page, click Content sources and crawl schedules link to open the page where you can see more details about the crawler process (see Figure 8-16). This page will show the current status, and when the next full crawl and the next incremental crawl will take place. If you see None as in Figure 8-16, this means that there is no schedule defined yet. To get more details about a specific content source, such as Local Office SharePoint Server sites, click on its name; this will show the following information:

image from book
Figure 8-16

Important

The content source Local Office SharePoint Server sites is also known as the Default Content Source. Every time you create a new site collection, its URL is added to this content source.

q Name of the content source.
q Details, such as type, status, the number of sources it crawls, the last time it was crawled, and errors.
q Start addresses, that is, the URL sources that are crawled.
q Crawl settings - should only these start addresses be crawled, or should everything under them be crawled as well?
q Crawl schedules (full and incremental).
q If you want to start a full crawl now.

To force a full or incremental update, use the quick menu for the content source, and select Start Full Crawl or Start Incremental Crawl. You can also start all crawls (for all content sources) by clicking on the line Start all crawls, in the Quick Launch bar. This menu also allows you to stop or pause an active crawl.

Managing the Indexing Schedules

SharePoint uses different indexing schedules for different content sources. You need to know exactly when the index is updated to understand when you can expect updated information to be searchable. These are the default schedules used by MOSS:

q SharePoint content - the content in any MOSS and WSS site:
- q Incremental: No schedule defined by default.
- q Full: No schedule defined by default.
q WSS Search - only applied when WSS alone is installed:
- q Incremental: Every 5 minutes.

Important

The WSS Search service is renamed Windows SharePoint Service Help Search when you install MOSS, since then it will only be used to indexing the help system files.

The consequence of this default schedule is that new and updated information on any of the SharePoint sites in a MOSS installation will never be indexed, unless you set a schedule. A WSS-only installation will be indexed every 5 minutes, 24 hours per day. To set the schedule for MOSS indexing, use the Content sources and crawl schedules page, described earlier. To change the WSS Search setting, follow the steps in the Try It Out below.

Try It Out Define WSS Search Service Indexing Schedule

Open Central Administration tool, and switch to the Operations page.
Click on Services on Server.
Click on the name Windows SharePoint Services Search.
Define the indexing schedule (by default every five minutes).
Click OK to save and close the page.

Controlling What Files to Index

When the indexing process is running, as described earlier in this chapter, the Gatherer process will open the files found in the content sources. But exactly what file types will it open? This is controlled by a list of file types that you can manage by the link File types on the Configure Search Settings page described earlier. The information in this list shows two things:

q What file types the Gatherer will try to open.
q If there is any icon defined in SharePoint for this file type.

The last bullet is interesting; if a file type does not have an icon next to it in this list, then this file will not have a specific icon when listed in a document library. Instead, it will have the icon used for unknown file types. This can be modified; later in this chapter you will learn how to add an icon for the PDF file, and the same technique can be used for any file types.

Look at the list "File types." If you are missing one file type, you can add it now by clicking New File Type. But this will not be enough; the Gatherer also needs the specific IFilter for this file type. Some file types actually are managed by the default IFilters and still are not listed here; for example, RTF files. To add the RTF file type, click New File Type, enter rtf, and click OK. Note that it is now listed and that the MS WordPad icon was automatically associated with it.

Important

You can use this list to temporarily stop indexing a specific file type; just remove it from this list. Then add it when you want to index that file type again.

Managing Search Scope

SharePoint allows you to limit the search scope, in order to make it easier for users to find the information they are searching for. This is especially handy when the index file contains information from several content sources. For example, if the user knows that the document they are looking for is stored somewhere in the file system, set the search scope to the file system only. This will make the search faster and more focused, and generate less CPU load on the SharePoint server.

By default, there is one single search scope: All Sites. To define new search scopes is a two-step process: first you create the search scope in the Central Administration tool, then you enable this search scope in a site collection. Depending on what scope you want to use, this is easy or may require some planning. For example, say that you want to create a search scope that only matches information in the team site Sales, but no other site. The following Try It Out shows how you do this.

Try It Out Add a New Search Scope

Open the Central Administration tool, click on the SSP name (e.g., SharedServices1), and click on Shared services.
In the section Crawl Setting, click View scopes.
Click New Scope, then enter these values (see Figure 8-17):

Figure 8-17
1. Title: Sales Web Only. The name for this scope.
2. Description: "This search scope only shows results from the Sales team site."
3. Target Result Page: Select Use the default Search Results Page. This choice will make sure that search results based on this search scope will use the default result page. If you create a custom result page, then use the other option: Specify a different page for searching this scope, and enter the URL to that .aspx file.
4. Click OK to save and close the page.
The new search scope is now listed along with all the others. Look at the Update Status column. It says Empty – Add Rules; this is actually a link to the page where this search scope is defined. Click Add Rules.
1. On the "Add Scope Rule" page, select "Scope Rule Type" = Web Address. This will expand this page with more settings (see Figure 8-18).
  
  Figure 8-18
2. Set Folder to http://srv1/sitedirectory/sales. This is the URL to the Sales site.
3. Select Include. Any item that matches this rule will be included, unless the item is excluded by another rule.
4. Click OK.
Now this search scope has another Update Status message: New Scope – Ready after the next update (start in xx minutes), where "xx" can be anything from 1 to 20 minutes. In other words, this search scope cannot be used until this period has passed. But you can force SharePoint to rebuild the search scope directly: Click Search Settings in the breadcrumb trail at the top of this page, then click Start update now in the Scope section, and wait until the Update status shows Idle.

The global definition of this search scope is now done. The next step is to enable this search scope in a site collection. To do that, follow the steps of our next Try It Out

Try It Out Add a New Search Scope

Open the intranet page; for example, http://srv1.
Make sure that the top site in this site collection is open, then click Site Actions Site Settings Modify All Site Settings.
Click on Search scopes in the Site Collection Administration section. Note that the following View Scopes page lists this new search scope as Unused Scopes. To enable it for the search drop- down menu, do this:
1. Click on the link Search Dropdown.
2. On the following page, Edit Scope Display Group in the Scopes section and check Sales Web Only (the search scope you created earlier).
3. Click OK to save and close the page.
4. Note that this search scope is now listed in the display group Search Dropdown. If you want to add this search scope to the Advanced Search page as well, click that link and repeat step b.
Test the new search scope. Open the top site, and note the search dropdown menu at the top right of the page; it will now contain the new search scope. Enter a text string that you know exists in any of the content of the Sales web, and then select the Sales Web Only search scope. The results will now only display content on that site.

There is more to say about search scope, but first you must understand how to add new content sources; more about that shortly.

Managing Crawls of the Site Directory

By default, all sites in a site collection will be indexed. If necessary, you can change which sites are indexed by following the steps in the Try It Out below.

Try It Out Manage Crawls of Sites

Open the site to be managed.
Click Site Actions Site Settings.
Click Search visibility in the Site Administration section.
Use the Allow this web to appear in search results setting to enable or disable searching of the site.
Click OK to save and close the page.

Important

In contrast to the previous version of SharePoint, this setting will only apply to the current site, regardless of whether it is a top site or a subsite.

Adding New Content Sources

When installing SharePoint, your organization will have a lot of information stored in your file servers. Some, but most likely not all, of this information will be moved into SharePoint, making it easy to search that content. What should you do with the other files? You probably don't want to delete them; after all, that information may be needed someday. An elegant solution to make this information available to the user is to add this content to SharePoint's index file. This will enable users to search for both old and new information, without requiring them to know exactly where this information is stored.

To add external information to SharePoint's index file, you create new content sources. You may recall from previous sections in this chapter that SharePoint can index almost any source and location, such as SharePoint's own database, any fileserver, MS Exchange folders, Lotus Notes databases, other web applications, and external web applications. The way to make that information searchable is to define a content source that points to that location. This will enable the index engine to crawl that content.

For example, say that you want to index a specific file share: \\dc1\projects. The following Try It Out shows you how to do this.

Try It Out Add a Content Source

Open the Central Administration tool, and switch to the SSP instance (e.g., SharedServices1).
Click Search settings Content sources and crawl schedules.

On the Manage Content Sources page, click New Content Source, and add this information:

Name = "Project Files on DC1"
In the Content Source Type section, select File Shares. This will add more options to this page.
In the Start Addresses section, enter \\dc1\projects. If you want to add more start addresses here to other file locations, add them one per line.
In the Crawl Settings section, select the folder and all subfolders of each start address if you want the index engine to crawl any subfolder.
In the Crawl Schedules section, click Create schedule for "Full Crawl" and "Incremental Crawl" to set the schedule for when to crawl this content source.
In the Start Full Crawl section, check Start full crawl of this content source, if you want the crawler to start indexing the content source immediately.
Click OK to save and close this page.

Important

If the content source contains a large volume of information to be crawled, you must plan when to run a full crawl. One solution is to run incremental crawl only, and then manually force a full crawl when necessary, for example after a restore.

The new content source is now listed along with the others, on the Manage Content Sources page. If you did not choose to start a full crawl in step 4f above, you can do it now: use the quick menu for the new content source, and select Start Full Crawl. You can safely leave this page; the indexing will continue to run.

If you need to modify an existing content source, click on its name on the Manage Content Source page, and make whatever changes are needed. You can also delete a content source by using its quick menu. This will immediately remove all results from that content source.

Important

Before you can add a Lotus Notes database as a content source, you must install a Lotus Notes client on the SharePoint server. The Gatherer will use this client to read the Notes database. Unless this client is installed, there will be no option to install Lotus Notes content sources.

Adding New File Types

Besides the default file types indexed, you can add almost any other well-known file type. In fact, you can add your own type, if necessary, but this will require that you write some code to do it. There are two things you must do to enable the Gatherer to crawl a new file type:

q The file type must be listed in the File types list discussed previously.
q There must be an IFilter installed that can read that type of file.

The trick, of course, is to find the IFilter. The good news is that there are lots of sources on the Internet. These IFilters are not specific to the SharePoints index engine, but most will also work for the SQL Server Full-Text Indexing and other MS Search–based engines. The same type of IFilters used for SPS 2003 will also work fine with the MOSS 2007 search engine. Below is a list of the most common IFilters and at least one source. Some are free; others are commercial, but most have a low price:

Table 8.1: Common IFilters
Open table as spreadsheet
File Type	Download Source	Price
PDF	http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611	Free
ZIP	http://www.citeknet.com	Free
RAR	http://www.citeknet.com	Free
HLP	http://www.citeknet.com	Free
CHM	http://www.citeknet.com	Free
MHT	http://www.citeknet.com	Free
CAB	http://www.citeknet.com	Free
EXE	http://www.citeknet.com	Free
DWF	http://ifiltershop.com	$200 per server
StarOffice	http://www.ifilter.org	Free for personal use
Inventor	http://ifiltershop.com	$299 per server
SHTML	http://ifiltershop.com	$299 per server
vCard	http://ifiltershop.com	$19 per server
OpenOffice	http://www.ifilter.org	Free for personal use
MindManager	http://www.ifiltershop.com	$299 per server
MS Project	http://www.ifiltershop.com	$299 per server
MS Visio	http://www.microsoft.com/downloads	Free
OneNote	Install MS OneNote on the SharePoint server
Audio/Video files: MP3, WMA, WMV, ASF	http://www.aimingtech.com	Free for personal use
DWG AutoCad	http://www.cadcompany.nl	250 Euros

This list is long, and it grows constantly. Remember that each new file type indexed will increase the CPU load and the size of the index files; be sure you really need to search files like MP3s before you add it that type, even if it is cool!

Important

http://www.citeknet.com has a very nice (and free) IFilter Explorer. Use it to see all IFilters installed on the server.

If you need to remove an IFilter, just uninstall it like any other program, using the Add/Remove Programs applet in the Control Panel.

So let's practice all this. In the following example you will add PDF as an indexed file. The download link to the IFilter is listed in the table above, and you know how to add PDF as a file type to be indexed. But in this case, and some others too, there will be one thing missing. Users will not see the familiar PDF icon next to PDF files in SharePoint's document libraries, so you must also download this icon and install it in a proper way. The following Try It Out shows how to do this.

Try It Out Index PDF Files in MOSS

Download the IFilter for PDF as listed in the table above. Install the IFilter on the SharePoint server. If you are running a SharePoint farm, it must be installed on the MOSS server running the Index role!
Open the Central Administration tool, and switch to the SSP instance (e.g., SharedServices1).
Click Search settings File types.
Click New File Type and enter pdf. Click OK to save and close the page. Check that PDF is now listed among the indexed file types. Also note that it does not have any icon. This is a cosmetic, but nevertheless important, problem.
Download the file pdf16.tif from the Internet; for example, from http://sharepoint-blog.com/?p=6. Save the pdf16.tif file in the following location on the SharePoint Server: C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES.

Important
Do not change the name - it must be PDF16.tif or you will not see the icon in every place there is a PDF file!
The next step is to get SharePoint to display this icon for each PDF file. Open the following file with Notepad: C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\XML\DOCICON.XML.

Important
Make a backup of the original DOCICON.XML file just to be on the safe side.
1. Add the line <Mapping Key="pdf" Value="pdf16.tif"/> somewhere in this file, in the section that starts with <ByExtensions>. The exact location is not important, but why not add it before the "png" to get it into a nice sorting order?
2. Save and close the DOCICON.XML file.
3. Open a command prompt, and run iisreset.
Open the SharePoint administrative page File types again. You should now see that the PDF file type has its well-known icon next to it! If it doesn't, you did something wrong. Everything is done; all new PDF files will now be indexed.

Important

Start a full crawl for all content sources where there are PDF files, to ensure that they are indexed.

Some Tips about Searching

The search and index functionality in MOSS has many features, as you have seen so far, and there are still more things you can do. One common request is to activate wildcard search in MOSS. This is possible, but requires coding; the details are described in the SharePoint Products and Technology Software Development Kit (SPPT SDK). There are also products that enhance the search feature, such as Cartesis 10 (http://www.cartesis.com), Infosys (http://www.infosys.com), and Ontolica (http://www.mondosoft.com). This last section about searching and indexing will describe how to define a specific user account that can be used by the crawler when it opens a specific content source.

You may remember from Chapter 4 that the Default Content Access Account is the user account used by default when the Gatherer crawls external information, such as a file system or an Exchange public folder. If this user account doesn't have at least Read access in these sources, it will generate an access error in the crawl log (see Figure 8-19). If you can use another account with read access granted for this particular source location the crawl process will succeed.

image from book
Figure 8-19

Important

To define the default crawler account, use the Central Administration tool SSP Search settings Default content access account.

To make the content source to use a special account, you must configure Exclude and Include rules for the content index file used by the content source. Follow these steps in the Try It Out below to set up another search account when using advanced search mode.

Try It Out Configure a Custom Search Access Account

Open the Central Administration tool, and switch to the SSP instance (e.g., SharedServices1).
Click Search settings Crawl rules.
Click New Crawl Rule, then enter the following (see Figure 8-20):

Figure 8-20
1. In the Path section, Enter the start address for the content source; for example, \\dc1\projects.
2. Select the option Include all items in this path.
3. Select the option Specify a different content access account. Enter the account name and its password twice. Make sure to check Do not allow Basic Authentication (see Figure 8-20).
4. Click OK to save the new rule. The new rule will be listed along the other rules.
Force a full update of this content source now. When ready, check the crawl logs to make sure that the content source was indexed this time.

With this knowledge about the search and indexing process, you will be able to set up the most common search scenarios, as well as solving most of the problems that arise.