Full Text Search Indexes


Full text search (FTS) is one of the more powerful and exciting features introduced in ADS 7.0. With FTS you can find records based on the words in your text fields, including the presence or absence of specific words, the number of instances of specific words, and the proximity of specific words to each other. For example, you can select all records in which the word “punctual” appears in a comment field. Alternatively, you can select all records in which the word “performance” appears near the word “database.”

Like an AOF, an FTS may be fully or partially optimized, or not optimized at all. When an FTS is not optimized, it means that ADS must read data from every record in the table, searching for the selection criteria. As you can imagine, unless your table is relatively small, an FTS that is not optimized can be very slow.

To have a fully or partially optimized FTS, you must create one or more FTS index orders. An FTS index order contains one key for each search word in each record. A fully optimized FTS can determine which records contain the search words from the FTS index alone. A partially optimized FTS refers to the records that might satisfy the search criteria, but ADS must read the individual records identified by the index to complete the operation.

For example, imagine that you want to search for all records that contain the word “database” in the Comments field. If you have created an FTS index order on the Comments field, the selection of which records contain this word can be performed exclusively through the index.

However, to find those records in which the word “performance” appears in proximity to the word “database,” an FTS is used initially to identify those records that contain both words in the Comments field, but a record by record examination of those identified records is necessary to make a final determination. Such a search is partially optimized.

You can create FTS index orders for both ADT tables and for FoxPro DBF tables that use CDX indexes. You cannot create FTS index orders for Clipper DBF tables that use NTX indexes. You can still perform FTS selections on Clipper tables, but they will not be optimized by definition.

Creating FTS Index Orders

You create FTS index orders using the Create New FTS Index page of the Index Management dialog box. This section demonstrates how to create FTS index orders by walking you through the process of creating an FTS index on a table. For this process to be meaningful, it is best if the table on which you are creating the FTS has either a large text field or a memo field. And although the simple CUST.ADT table that you have been working with up to this point does have a Comments field of type MEMO, that field contains very little data.

The following steps makes use of one of the free tables located on the CD-ROM associated with this book.

ON THE CD

Before you continue, you should copy the tables in the sample database from the CD-ROM to a directory on your computer’s hard drive. See Appendix B for a description of these files, including how you can download these files from the Internet if you have lost your CD-ROM.

Use the following steps to demonstrate the creation of an FTS index:

  1. Select File | Open Tables to display the Open Table(s) dialog box. Set Path to the directory into which you copied the sample ADT tables for this book, and set File(s) to CUSTOMER.ADT. Click OK. The CUSTOMER.ADT table is opened in the Table Browser, as shown in Figure 3-7.

    click to expand
    Figure 3-7: The CUSTOMER.ADT table opened in the Table Browser

  2. Click the Index Management button from the Table Browser’s toolbar.

  3. From the Index Management dialog box, select the Create New FTS Index tab. The Create New FTS Index page is displayed, as shown in Figure 3-8.

    click to expand
    Figure 3-8: The Create New FTS Index page of the Index Management dialog box

  4. You use the options on the Create New FTS Index page of the Index Management dialog box to set the options for your FTS index. Leave Index File Name set to CUSTOMER in order to create this FTS index in the structural index.

  5. Use the Index Key Field dropdown list to choose the Notes field for the FTS Index. This dropdown list only includes the names of the text fields for which you can build an FTS index. Select Notes.

  6. When you set Index Key Field to Notes, the Index Name field is automatically set to Notes as well. You can keep this index name, or set this index name to any other valid name. In this case, keep Index Name set to Notes.

  7. Leave all other options set to their defaults, and click Create Index to create the FTS index. How these FTS index options affect your FTS index order is discussed later in this section.

  8. Click Close.

Now that you have created your FTS index, you are ready to test it.

Testing FTS Index Orders

FTS index orders, unlike the expression-based indexes you created earlier in this chapter, are used for filtering records, and not for sorting data. Therefore, you test an FTS index by setting filters and executing queries. The following steps demonstrate how to test an FTS-based filter:

  1. With the CUSTOMER.ADT table open in the Table Browser, enter contains(Notes, 'birthday or anniversary') in the field below the Set Filter button and click Set Filter. (Note that the word “Notes” in the CONTAINS function is the name of the field on which to perform a full text search. It is not the index name.)

  2. The Table Browser responds by displaying only those records whose Notes field contains the words “birthday” or “anniversary” or both. You can double-click the Notes field to examine the memo and verify that at least one of the search words is present. You will also notice that the green circle appears, indicating that this is an optimized filter. Click Clear Filter.

  3. Change the filter to contains(*, 'train*') and click Set Filter.

  4. Once again a green circle indicates that an optimized FTS index was used. In this case, the asterisk (*) instructed ADS to use all FTS indexes available for this table in its search. Since there was only one FTS index, this had the same effect as when you specified the Notes field explicitly. Click Clear Filter.

  5. Finally, try searching on a field for which there is no FTS index order. Enter contains(Last Name, 'zap*') in the filter field and click Set Filter. This time the red circle indicates that a non-optimized filter was used. In this case, ADS searched the Last Name field record-by-record to locate the matching values.

The preceding example demonstrated how to test FTS filters. Use the following steps to demonstrate the use of FTS indexes in SQL SELECT statements:

  1. Begin by closing the Table Browser for the CUSTOMER.ADT table.

  2. Next, Select Tools | Native SQL Utility from the Advantage Data Architect main menu.

  3. Set Connection Type to Path, and use the browse button (…) to choose the directory into which you exported the CUSTOMER.ADT table.

  4. Click Connect to make a connection to this directory. You should see a message indicating that the connection succeeded. If not, fix your directory path and click Connect again.

  5. Enter the following SQL statement in the SQL editor pane located at the top-left corner of the Native SQL Utility:

    SELECT * FROM CUSTOMER WHERE CONTAINS(Notes, 'birthday NEAR card')

  6. Click the Execute SQL button. Two records are returned, as shown in Figure 3-9.

    click to expand
    Figure 3-9: A SQL query using full text search in the Native SQL Utility

  7. Now, edit the SQL statement to look like the following:

    SELECT * FROM CUSTOMER WHERE CONTAINS(Notes, 'birthday')   and SCORE(Notes, 'birthday') > 1
  8. Click the Execute SQL button again. This time only records where the word “birthday” appears more than once are displayed.

Options for FTS Index Orders

The Notes FTS index order created in the earlier section “Creating FTS Index Orders” used the default settings for FTS indexes. As you can see on the Create New FTS Index page of the Index Management dialog box (shown previously in Figure 3-8), there are many different options, and they can dramatically influence what keys the FTS index order contains.

Each of the options associated with FTS index orders is discussed briefly here. Note, however, that some of these options have somewhat complicated implications. For a complete discussion of FTS index order options, please refer to the Advantage documentation.

After an FTS index is created, you can view its options by selecting the index from the View Index Structures page of the Index Management dialog box. The options for the Notes index are shown in Figure 3-10. You should refer to this information if you have any questions about the default values for FTS indexes.

click to expand
Figure 3-10: The default settings used for the Notes FTS index order are visible on the View Index Structures page of the Index Management dialog box.

Minimum Word Length

Keys in an FTS index are not created for any words shorter than the minimum word length. The default minimum word length is 3.

Maximum Word Length

The maximum word length value identifies the longest key that will be stored in an FTS index order. If your text field contains a word longer than the maximum word length, that value is truncated to the maximum word length. Consequently, words longer than the maximum word length affect an FTS index, but only up to the maximum word length. The default maximum word length is 30.

Delimiters

Delimiters are the ANSI characters that separate words in a text-based field. For example, if you stored Java language code segments in a memo field, and wanted to search for specific Java class members, you would want to include the period (.) as a delimiter, since a period separates member references in Java’s dot notation.

The default delimiters are the following:

backspace

tab

newline

vertical tab

form feed

carriage return

space

If you want to use the default delimiters, leave the Use Defaults checkbox of the Delimiters section checked. If you want to define your own delimiters, uncheck Use Defaults and enter your own delimiters in the Additional field. To use the default delimiters in addition to some of your own, leave Use Defaults checked, and enter your additional delimiters in the Additional field.

Drop Characters

Drop characters are those that are unconditionally ignored when the index is being created. The default drop characters are the double quote, single quote, and apostrophe characters.

You can use the default drop characters, replace the drop characters with your own, or add your own to the defaults, as described in the preceding section, “Delimiters.”

Conditional Drop Characters

Conditional drop characters are those that are dropped only if they appear at the beginning or end of a word. A good example of a conditional drop character is a comma, which can appear in numbers, for example, 3,000. By making this character a conditional drop character, it will be dropped if it appears at the end of a word, as it did in the preceding phrase, but will be retained if it appears in a number.

The default conditional drop characters are the following:

comma

period

question mark

exclamation point

semicolon

colon

at sign (@)

pound or hash sign (#)

dollar sign

percent sign

caret or hat (^)

ampersand

open paren

close paren

n-dash or minus sign (–)

underscore

You can use the default conditional drop characters, replace the conditional drop characters with your own, or add your own to the defaults, as described in the earlier section “Delimiters.”

Keep Score

Score refers to the number of times a key appears in a text-based field. If you use the SCORE function on a field regularly, you can improve the speed of your FTS index order by checking the Keep Score checkbox. Doing so causes ADS to include the score in the index.

Including the score in the index means that ADS must constantly calculate the score as records are updated. If you do not keep score, and use the SCORE function, ADS uses the index to locate the records in which the word or words appear, but then must read the individual records to calculate the score at runtime.

Scores are not included in FTS index orders by default.

Case Sensitive

By default, the keys of an FTS index order are not case sensitive. If you want case- sensitive index orders, check the Case Sensitive checkbox.

Fixed

A fixed FTS index order is not maintained, meaning that the index is not updated during write operations to the associated table. The index is only rebuilt by explicit request.

If you make frequent changes to text fields in a table, and only occasionally need to perform full text searches, you may be able to improve the speed of table updates by using a fixed FTS index, requesting that the index be rebuilt in code prior to performing a full text search.

FTS index orders are maintained by default.

Protect Numbers

Protect numbers is used when you need to include periods and/or commas as delimiter characters. Consider the previous example of Java code. If you added a period as a delimiter, the floating-point constant 1.000 would be divided into two values, 1 and 000. If periods or commas are used as delimiters, check the Protect Numbers checkbox to treat numbers that include these delimiters as the original numbers.

Protect numbers is not enabled by default.

Noise Words

Noise words are common words that you do not want to appear as keys in an FTS index order. For example, it is extremely unlikely that you would search a text field for the word “the.” To prevent this word from being treated as a content word, it is included as a default noise word.

The default noise words are as follows:

about

after

all

also

and

another

any

are

because

been

before

being

between

both

but

came

can

come

could

did

does

each

else

for

from

get

got

had

has

have

her

here

him

himself

his

how

into

its

just

like

make

many

might

more

most

much

must

never

now

only

other

our

out

over

said

same

see

should

since

some

still

such

take

than

that

the

their

them

then

there

these

they

this

those

through

too

under

use

very

want

was

way

well

were

what

when

where

which

while

who

will

with

would

you

your

You can add additional noise words by entering them into the Additional area of the Noise Words section, separated by spaces. If you want to specify your own set of noise words, uncheck the Use Defaults checkbox.

In the next chapter you will learn how to create and use data dictionaries.




Advantage Database Server. The Official Guide
Advantage Database Server: The Official Guide
ISBN: 0072230843
EAN: 2147483647
Year: 2002
Pages: 129

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net