In the first part of this chapter, you examined how to search documents stored in a database table. For the remainder of this chapter, you examine how to search through static documents stored on the file system. You learn how to use the Microsoft Indexing Service with an ASP.NET page. In the following sections, you learn how to create both a search page that enables you to perform free-form queries and a search page that enables you to perform Boolean queries. You also learn how to work with document properties. Configuring the Microsoft Indexing ServiceMicrosoft Windows 2000 includes the Microsoft Indexing Service as a component of the operating system. In fact, if you have ever searched for a file by clicking Start, Search, For Files or Folders, you have already used the Indexing Service. Before you can use the service in an ASP.NET page, you must verify that the Indexing Service has been started. To do so, follow these steps:
After you start the Indexing Service, it runs continuously in the background. The service automatically scans the directories selected for indexing for new or changed files. The indexes are stored in catalogs. Windows 2000 includes two default catalogs: System and Web. You use the Web catalog to index all your pages for your Web applications. You can index any directory accessible to your Web server. You can enable indexing for a directory within the Internet Services Manager. To do this, launch the Internet Services Manager, open the property sheet for a directory, select the Directory tab, and check the option Index This Resource. After you enable indexing for a directory, you must be patient. It might take several minutes for the Indexing Service to start scanning the directory. If you are impatient, you can force the service to scan a directory within Computer Management by right-clicking the name of the directory, selecting All Tasks, and then selecting Rescan. The Indexing Service has two important configuration options. First, you can specify the level of performance for the Indexing Service. To do so, you need to temporarily stop the service by following these steps:
You are provided with the option of making the server a dedicated Indexing Service server. If you are less ambitious, you can select Used Often, Used Occasionally, Never Used, or a custom level. After you select a performance level, remember to restart the Indexing Service by clicking the Start Indexing button. You also have the option of generating abstracts for each document when the document is indexed. This option is useful for displaying a brief summary of a document next to each search result in a search page. The disadvantage of generating abstracts is that it requires more work for the Indexing Service and consumes more hard drive space. To automatically generate abstracts, follow these steps:
After this option is enabled, the Indexing Service generates and stores a brief summary of each document in the catalog for the index. Connecting to Microsoft Indexing ServiceYou can communicate with the Microsoft Indexing Service by using the Microsoft OLE DB Provider for Indexing Service. The name of the provider is MSIDXS. For example, the following connection string opens a connection to Microsoft Indexing Service: Dim con As new OleDbConnection( "Provider=MSIDXS;Data Source=Web") Notice that this statement opens a particular catalog: the Web catalog. Specifying a particular catalog is optional. After you open a connection to index server, you can execute queries like the following using either an OleDbCommand or OleDbDataAdapter object: Select FileName From Scope() Where FreeText( 'ASP.NET') This particular query retrieves a list of files that contain the word ASP.NET from indexed files on your server's hard drive. Retrieving Document PropertiesWhen you perform queries by using the Indexing Service, you can return any of a number of standard document properties. For example, the following query returns the document filename, document file size, and author name for each document that matches the search expression: SELECT FileName, Size, DocAuthor FROM SCOPE() WHERE FREETEXT('ASP.NET') ORDER BY RANK DESC The Indexing Service supports several standard properties. However, additional custom properties can be added by new applications. A list of standard properties appears in Table 14.1. (This table contains a partial list; see the Windows 2000 Help file for a complete list.) Table 14.1. Standard Document Properties
Performing Free Text Queries with File System DataYou can perform free text queries with the Microsoft Indexing Service by using the FREETEXT function. A free text query can contain any word, phrase, or sentence . Any Boolean operators or wildcard characters that appear in a free text query are ignored. You can use the following query, for example, to retrieve a list of all documents that match the search phrase How do you use ASP.NET? : SELECT RANK, FileName, Characterization FROM SCOPE() WHERE FREETEXT( 'How do you use ASP.NET?') ORDER BY RANK DESC This query returns three properties for each search result: RANK , FileName , and Characterization . The RANK represents how well each result matched the search phrase. The FileName represents the name of the matching document. Finally, the Characterization contains a brief summary of the document. Notice that the query is performed with a particular scope . In this case, all documents in all directories enabled for indexing are searched. You could, however, provide a path with the SCOPE() function to limit the search to a particular directory. For example, if you want to limit your search to only those files in the Products virtual directory, you would specify your query like this: SELECT RANK, FileName, Characterization FROM SCOPE( 'SHALLOW TRAVERSAL OF ""/"" ') WHERE FREETEXT('How do you use ASP.NET?') ORDER BY RANK DESC This query performs a search of the wwwroot directory. Notice the special phrase SHALLOW TRAVERSAL OF . This phrase causes the search to be performed only against the contents of the wwwroot directory and not any subdirectories. If you also want to search subdirectories, you can use the phrase DEEP TRAVERSAL OF . The page in Listing 14.9 illustrates how you can create a search form for performing free text queries within an ASP.NET page (see Figure 14.9). Listing 14.9 FileFreeText.aspx<%@ Import Namespace="System.Data" %> <%@ Import Namespace="System.Data.OleDb" %> <Script Runat="Server"> Sub Button_Click( s As Object, e As EventArgs ) Dim conMyData As OleDbConnection Dim strSearch As String Dim cmdSearch As OleDbCommand Dim dtrSearch As OleDbDataReader conMyData = New OleDbConnection( "Provider=MSIDXS;Data Source=Web" ) strSearch = "SELECT RANK, FileName, VPath, Characterization " & _ "FROM SCOPE() " & _ "WHERE FREETEXT( '" & txtSearchPhrase.Text & _ "') " & _ "ORDER BY RANK DESC" cmdSearch = New OleDbCommand( strSearch, conMyData ) conMyData.Open() Try dtrSearch = cmdSearch.ExecuteReader() lblResults.Text = "<ul>" While dtrSearch.Read lblResults.Text &= "<li> (" & dtrSearch( "RANK" ) / 10 & "%) " lblResults.Text &= "<a href=""" & dtrSearch( "VPath" ).ToString() lblResults.Text &= """>" lblResults.Text &= dtrSearch( "FileName" ) & "</a><br>" lblResults.Text &= dtrSearch( "Characterization" ) lblResults.Text &= "<p>" End While Catch exc As Exception lblResults.Text = "Please rephrase your query" End Try conMyData.Close() End Sub </Script> <html> <head><title>FileFreeText.aspx</title></head> <body> <form Runat="Server"> <h2>File Free Text Search:</h2> <asp:TextBox ID="txtSearchPhrase" Columns="50" Runat="Server" /> <asp:Button Text="Search!" OnClick="Button_Click" Runat="Server" /> <hr> <asp:Label ID="lblResults" EnableViewState="False" Width="400px" Runat="Server" /> </form> </body> </html> The C# version of this code can be found on the CD-ROM. Figure 14.9. Performing a free text query with the file system.
In Listing 14.9, the FREETEXT function is used in the query executed in the Button_Click subroutine. The RANK, filename, and characterization are displayed for each result. Furthermore, each query result links to the document associated with the result. Notice that a TRY...CATCH block is used when displaying the query results. Certain search phrases generate errors when used with the Microsoft Indexing Service. For example, a search phrase that contains the single word The would generate an error because the search phrase would contain only words that are ignored by the Indexing Service. Performing Boolean Queries with File System DataYou can perform Boolean queries with the Microsoft Indexing Service by using the CONTAINS function. The Indexing Service supports searches that contain the Boolean operators AND , OR , and AND NOT . The CONTAINS function also supports several advanced text matching features. For example, you can perform proximity and weighted matches. You can also use the CONTAINS function to match both singular and plural forms of a word. The following sample query returns the names of documents that contain the word apple but not the word green : SELECT FileName, Characterization FROM SCOPE() WHERE CONTAINS( 'apple AND NOT green') ORDER BY RANK DESC The page in Listing 14.10 illustrates how you can execute a query by using the CONTAINS function in an ASP.NET page (see Figure 14.10). Listing 14.10 FileContains.aspx<%@ Import Namespace="System.Data" %> <%@ Import Namespace="System.Data.OleDb" %> <Script Runat="Server"> Sub Button_Click( s As Object, e As EventArgs ) Dim conMyData As OleDbConnection Dim strSearch As String Dim cmdSearch As OleDbCommand Dim dtrSearch As OleDbDataReader conMyData = New OleDbConnection( "Provider=MSIDXS;Data Source=Web" ) strSearch = "SELECT RANK, FileName, VPath, Characterization " & _ "FROM SCOPE() " & _ "WHERE CONTAINS( '" & txtSearchPhrase.Text & "') " & _ "ORDER BY RANK DESC" cmdSearch = New OleDbCommand( strSearch, conMyData ) conMyData.Open() Try dtrSearch = cmdSearch.ExecuteReader() lblResults.Text = "<ul>" While dtrSearch.Read lblResults.Text &= "<li> (" & dtrSearch( "RANK" ) / 10 & "%) " lblResults.Text &= "<a href=""" & dtrSearch( "VPath" ).ToString() lblResults.Text &= """>" lblResults.Text &= dtrSearch( "FileName" ) & "</a><br>" lblResults.Text &= dtrSearch( "Characterization" ) lblResults.Text &= "<p>" End While Catch exc As Exception lblResults.Text = "Please rephrase your query" End Try conMyData.Close() End Sub </Script> <html> <head><title>FileContains.aspx</title></head> <body> <form Runat="Server"> <h2>File Contains Search:</h2> <asp:TextBox ID="txtSearchPhrase" Columns="50" Runat="Server" /> <asp:Button Text="Search!" OnClick="Button_Click" Runat="Server" /> <hr> <asp:Label ID="lblResults" EnableViewState="False" Width="400px" Runat="Server" /> </form> </body> </html> The C# version of this code can be found on the CD-ROM. Figure 14.10. Performing a contains query with the file system.
The page in Listing 14.10 is almost exactly the same as the page in Listing 14.9 except for the fact that the query uses the CONTAINS function rather than the FREETEXT function. Performing Queries with Document PropertiesWhen you perform a query with the Microsoft Indexing Service, you can use any of the document properties to limit the results returned by the query. For example, suppose that you want to return a list of all Microsoft Word documents that contain the word Apple , which were authored by Stephen Walther. To do so, you could create the query as follows : SELECT FileName, Characterization FROM SCOPE() WHERE CONTAINS('Apple') AND DocAuthor='Stephen Walther' AND DocAppName LIKE 'Microsoft Word%' ORDER BY RANK DESC This query uses the DocAuthor property to limit the results to only those documents authored by Stephen Walther. A LIKE operator is used with the DocAppName property to return only Microsoft Word documents (any version). NOTE The LIKE operator enables you to do wildcard matches. Microsoft Indexing Service also supports regular expression matches through the MATCHES predicate. To see a list of additional document properties you can use when performing a query, refer to the section "Retrieving Document Properties" earlier in the chapter. |