Using the Microsoft Indexing Service


In the first part of this chapter, you examined how to search documents stored in a database table. For the remainder of this chapter, you examine how to search through static documents stored on the file system. You learn how to use the Microsoft Indexing Service with an ASP.NET page.

In the following sections, you learn how to create both a search page that enables you to perform free-form queries and a search page that enables you to perform Boolean queries. You also learn how to work with document properties.

Configuring the Microsoft Indexing Service

Microsoft Windows 2000 includes the Microsoft Indexing Service as a component of the operating system. In fact, if you have ever searched for a file by clicking Start, Search, For Files or Folders, you have already used the Indexing Service.

Before you can use the service in an ASP.NET page, you must verify that the Indexing Service has been started. To do so, follow these steps:

  1. Go to Start, Program Files, Administrative Tools, Computer Management.

  2. Expand the Services and Applications node.

  3. Select Indexing Service.

  4. Click the Start Indexing button (the VCR Run button).

After you start the Indexing Service, it runs continuously in the background. The service automatically scans the directories selected for indexing for new or changed files.

The indexes are stored in catalogs. Windows 2000 includes two default catalogs: System and Web. You use the Web catalog to index all your pages for your Web applications.

You can index any directory accessible to your Web server. You can enable indexing for a directory within the Internet Services Manager. To do this, launch the Internet Services Manager, open the property sheet for a directory, select the Directory tab, and check the option Index This Resource.

After you enable indexing for a directory, you must be patient. It might take several minutes for the Indexing Service to start scanning the directory. If you are impatient, you can force the service to scan a directory within Computer Management by right-clicking the name of the directory, selecting All Tasks, and then selecting Rescan.

The Indexing Service has two important configuration options. First, you can specify the level of performance for the Indexing Service. To do so, you need to temporarily stop the service by following these steps:

  1. Within Computer Management, select Indexing Service.

  2. Click the Stop Indexing button (the stop VCR button).

  3. Right-click Indexing Service and select All Tasks and then Tune Performance.

You are provided with the option of making the server a dedicated Indexing Service server. If you are less ambitious, you can select Used Often, Used Occasionally, Never Used, or a custom level. After you select a performance level, remember to restart the Indexing Service by clicking the Start Indexing button.

You also have the option of generating abstracts for each document when the document is indexed. This option is useful for displaying a brief summary of a document next to each search result in a search page. The disadvantage of generating abstracts is that it requires more work for the Indexing Service and consumes more hard drive space.

To automatically generate abstracts, follow these steps:

  1. Within Computer Management, open the property sheet for the Indexing Service (click the hand holding the sheet of paper).

  2. Check the option labeled Generate Abstracts.

After this option is enabled, the Indexing Service generates and stores a brief summary of each document in the catalog for the index.

Connecting to Microsoft Indexing Service

You can communicate with the Microsoft Indexing Service by using the Microsoft OLE DB Provider for Indexing Service. The name of the provider is MSIDXS. For example, the following connection string opens a connection to Microsoft Indexing Service:

 
 Dim con As new OleDbConnection( "Provider=MSIDXS;Data Source=Web") 

Notice that this statement opens a particular catalog: the Web catalog. Specifying a particular catalog is optional.

After you open a connection to index server, you can execute queries like the following using either an OleDbCommand or OleDbDataAdapter object:

 
 Select FileName From Scope() Where FreeText( 'ASP.NET') 

This particular query retrieves a list of files that contain the word ASP.NET from indexed files on your server's hard drive.

Retrieving Document Properties

When you perform queries by using the Indexing Service, you can return any of a number of standard document properties. For example, the following query returns the document filename, document file size, and author name for each document that matches the search expression:

 
 SELECT FileName, Size, DocAuthor    FROM SCOPE()    WHERE FREETEXT('ASP.NET')    ORDER BY RANK DESC 

The Indexing Service supports several standard properties. However, additional custom properties can be added by new applications.

A list of standard properties appears in Table 14.1. (This table contains a partial list; see the Windows 2000 Help file for a complete list.)

Table 14.1. Standard Document Properties

Access

Date and time the document was last accessed.

Characterization

Summary of the document automatically generated by the Indexing Service.

Created

Date and time the document was created.

Directory

Physical path to the document, not including the document name.

DocAppName

Name of the application that created the document.

DocAuthor

Author of the document.

DocByteCount

Number of bytes in the document.

DocCategory

Type of document (such as a memo, schedule, or white paper).

DocCharCount

Number of characters in the document.

DocComments

Comments about the document.

DocCompany

Name of the company for which the document was written.

DocCreatedTm

Time the document was created.

DocEditTime

Total time spent editing the document.

DocHiddenCount

Number of hidden slides in a Microsoft PowerPoint document.

DocKeywords

Document keywords.

DocLastAuthor

User who edited the document most recently.

DocLastPrinted

Time the document was last printed.

DocLastSavedTm

Time the document was last saved.

DocLineCount

Number of lines contained in the document.

DocPageCount

Number of pages in the document.

DocParaCount

Number of paragraphs in the document.

DocPartTitles

Names of document parts , such as spreadsheet names in a Microsoft Excel document or slide titles in a Microsoft PowerPoint slide show.

DocSubject

Subject of the document.

DocTitle

Title of the document.

DocWordCount

Number of words in the document.

FileIndex

Unique ID of the document.

FileName

Name of the document.

HitCount

Number of hits (elements in the results list) in the document.

Path

Full physical path to the document, including the document name.

Rank

Rank of how well an item in a result list matches query criteria. The range is from 0 to 1000; larger numbers indicate better matches.

ShortFileName

Short (8.3 format) document name.

Size

Size of the document, in bytes.

VPath

Full virtual path to the document, including the document name. If more than one path is possible, the best match for the specific query is chosen .

WorkId

Internal ID for the document used within the Indexing Service.

Write

Last time the document was modified (written).

Performing Free Text Queries with File System Data

You can perform free text queries with the Microsoft Indexing Service by using the FREETEXT function. A free text query can contain any word, phrase, or sentence . Any Boolean operators or wildcard characters that appear in a free text query are ignored.

You can use the following query, for example, to retrieve a list of all documents that match the search phrase How do you use ASP.NET? :

 
 SELECT RANK, FileName, Characterization    FROM SCOPE()    WHERE FREETEXT( 'How do you use ASP.NET?')    ORDER BY RANK DESC 

This query returns three properties for each search result: RANK , FileName , and Characterization . The RANK represents how well each result matched the search phrase. The FileName represents the name of the matching document. Finally, the Characterization contains a brief summary of the document.

Notice that the query is performed with a particular scope . In this case, all documents in all directories enabled for indexing are searched. You could, however, provide a path with the SCOPE() function to limit the search to a particular directory. For example, if you want to limit your search to only those files in the Products virtual directory, you would specify your query like this:

 
 SELECT RANK, FileName, Characterization    FROM SCOPE( 'SHALLOW TRAVERSAL OF ""/"" ')    WHERE FREETEXT('How do you use ASP.NET?')    ORDER BY RANK DESC 

This query performs a search of the wwwroot directory. Notice the special phrase SHALLOW TRAVERSAL OF . This phrase causes the search to be performed only against the contents of the wwwroot directory and not any subdirectories. If you also want to search subdirectories, you can use the phrase DEEP TRAVERSAL OF .

The page in Listing 14.9 illustrates how you can create a search form for performing free text queries within an ASP.NET page (see Figure 14.9).

Listing 14.9 FileFreeText.aspx
 <%@ Import Namespace="System.Data" %> <%@ Import Namespace="System.Data.OleDb" %> <Script Runat="Server"> Sub Button_Click( s As Object, e As EventArgs )   Dim conMyData As OleDbConnection   Dim strSearch As String   Dim cmdSearch As OleDbCommand   Dim dtrSearch As OleDbDataReader   conMyData = New OleDbConnection( "Provider=MSIDXS;Data Source=Web" )   strSearch = "SELECT RANK, FileName, VPath, Characterization " & _     "FROM SCOPE() " & _     "WHERE FREETEXT( '" & txtSearchPhrase.Text & _     "') " & _     "ORDER BY RANK DESC"   cmdSearch = New OleDbCommand( strSearch, conMyData )   conMyData.Open()   Try     dtrSearch = cmdSearch.ExecuteReader()     lblResults.Text = "<ul>"     While dtrSearch.Read       lblResults.Text &= "<li> (" & dtrSearch( "RANK" ) / 10 & "%) "       lblResults.Text &= "<a href=""" & dtrSearch( "VPath" ).ToString()       lblResults.Text &= """>"       lblResults.Text &= dtrSearch( "FileName" ) & "</a><br>"       lblResults.Text &= dtrSearch( "Characterization" )       lblResults.Text &= "<p>"     End While   Catch exc As Exception     lblResults.Text = "Please rephrase your query"   End Try   conMyData.Close() End Sub </Script> <html> <head><title>FileFreeText.aspx</title></head> <body> <form Runat="Server"> <h2>File Free Text Search:</h2> <asp:TextBox   ID="txtSearchPhrase"   Columns="50"   Runat="Server" /> <asp:Button   Text="Search!"   OnClick="Button_Click"   Runat="Server" /> <hr> <asp:Label   ID="lblResults"   EnableViewState="False"   Width="400px"   Runat="Server" /> </form> </body> </html> 

The C# version of this code can be found on the CD-ROM.

Figure 14.9. Performing a free text query with the file system.

graphics/14fig09.jpg

In Listing 14.9, the FREETEXT function is used in the query executed in the Button_Click subroutine. The RANK, filename, and characterization are displayed for each result. Furthermore, each query result links to the document associated with the result.

Notice that a TRY...CATCH block is used when displaying the query results. Certain search phrases generate errors when used with the Microsoft Indexing Service. For example, a search phrase that contains the single word The would generate an error because the search phrase would contain only words that are ignored by the Indexing Service.

Performing Boolean Queries with File System Data

You can perform Boolean queries with the Microsoft Indexing Service by using the CONTAINS function. The Indexing Service supports searches that contain the Boolean operators AND , OR , and AND NOT .

The CONTAINS function also supports several advanced text matching features. For example, you can perform proximity and weighted matches. You can also use the CONTAINS function to match both singular and plural forms of a word.

The following sample query returns the names of documents that contain the word apple but not the word green :

 
 SELECT FileName, Characterization    FROM SCOPE()    WHERE CONTAINS( 'apple AND NOT green')    ORDER BY RANK DESC 

The page in Listing 14.10 illustrates how you can execute a query by using the CONTAINS function in an ASP.NET page (see Figure 14.10).

Listing 14.10 FileContains.aspx
 <%@ Import Namespace="System.Data" %> <%@ Import Namespace="System.Data.OleDb" %> <Script Runat="Server"> Sub Button_Click( s As Object, e As EventArgs )   Dim conMyData As OleDbConnection   Dim strSearch As String   Dim cmdSearch As OleDbCommand   Dim dtrSearch As OleDbDataReader   conMyData = New OleDbConnection( "Provider=MSIDXS;Data Source=Web" )   strSearch = "SELECT RANK, FileName, VPath, Characterization " & _     "FROM SCOPE() " & _     "WHERE CONTAINS( '" & txtSearchPhrase.Text & "') " & _     "ORDER BY RANK DESC"   cmdSearch = New OleDbCommand( strSearch, conMyData )   conMyData.Open()   Try     dtrSearch = cmdSearch.ExecuteReader()     lblResults.Text = "<ul>"     While dtrSearch.Read       lblResults.Text &= "<li> (" & dtrSearch( "RANK" ) / 10 & "%) "       lblResults.Text &= "<a href=""" & dtrSearch( "VPath" ).ToString()       lblResults.Text &= """>"       lblResults.Text &= dtrSearch( "FileName" ) & "</a><br>"       lblResults.Text &= dtrSearch( "Characterization" )       lblResults.Text &= "<p>"     End While   Catch exc As Exception     lblResults.Text = "Please rephrase your query"   End Try   conMyData.Close() End Sub </Script> <html> <head><title>FileContains.aspx</title></head> <body> <form Runat="Server"> <h2>File Contains Search:</h2> <asp:TextBox   ID="txtSearchPhrase"   Columns="50"   Runat="Server" /> <asp:Button   Text="Search!"   OnClick="Button_Click"   Runat="Server" /> <hr> <asp:Label   ID="lblResults"   EnableViewState="False"   Width="400px"   Runat="Server" /> </form> </body> </html> 

The C# version of this code can be found on the CD-ROM.

Figure 14.10. Performing a contains query with the file system.

graphics/14fig10.jpg

The page in Listing 14.10 is almost exactly the same as the page in Listing 14.9 except for the fact that the query uses the CONTAINS function rather than the FREETEXT function.

Performing Queries with Document Properties

When you perform a query with the Microsoft Indexing Service, you can use any of the document properties to limit the results returned by the query. For example, suppose that you want to return a list of all Microsoft Word documents that contain the word Apple , which were authored by Stephen Walther. To do so, you could create the query as follows :

 
 SELECT FileName, Characterization    FROM SCOPE()    WHERE CONTAINS('Apple')    AND DocAuthor='Stephen Walther'    AND DocAppName LIKE 'Microsoft Word%'    ORDER BY RANK DESC 

This query uses the DocAuthor property to limit the results to only those documents authored by Stephen Walther. A LIKE operator is used with the DocAppName property to return only Microsoft Word documents (any version).

NOTE

The LIKE operator enables you to do wildcard matches. Microsoft Indexing Service also supports regular expression matches through the MATCHES predicate.


To see a list of additional document properties you can use when performing a query, refer to the section "Retrieving Document Properties" earlier in the chapter.



ASP.NET Unleashed
ASP.NET 4 Unleashed
ISBN: 0672331128
EAN: 2147483647
Year: 2003
Pages: 263

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net