Dig Server | Red Hat Enterprise Linux & Fedora Edition (DVD): The Complete Reference

< Day Day Up >

Dig, known officially as ht://Dig, is a Web indexing and search system designed for small networks or intranets. Dig is not considered a replacement for full-scale Internet search systems, such as Lycos, Infoseek, or AltaVista. Unlike Web server–based search engines, Dig can span several Web servers at a site. Dig was developed at San Diego State University and is distributed free under the GNU Public License. You can obtain information, documentation, and software packages at www.htdig.org.

Dig Searches

Dig supports simple and complex searches, including complex Boolean and fuzzy search methods. Fuzzy searching supports a number of search algorithms, including exact, soundex, and synonyms. Searches can be carried out on both text and HTML documents. HTML documents can have keywords placed in them for more accurate retrieval, and you can also use HTML templates to control how results are displayed.

Searches can be constrained by authentication requirements, location, and search depth. To protect documents in restricted directories, Dig can be informed to request a specific username and password. You can also restrict a search to retrieve documents in a certain URL, to search subsections of the database, or to retrieve only documents that are a specified number of links away.

Dig Configuration

All the ht://Dig programs use the same configuration file, htdig.conf, located in the /etc/htdig directory. The configuration file consists of attribute entries, each beginning with the attribute line and followed by the value after a colon. Each program takes only the attributes it needs:

max_head_length: 10000

You can specify attributes such as allow_virtual_hosts, which indexes virtual hosts as separate servers, and search_algorithm, which specifies the search algorithms to use for searches.

Dig Tools

Dig consists of five programs: htdig, htmerge, htfuzzy, htnotify, and htsearch. htdig, htmerge, and htfuzzy generate the index, while htsearch performs the actual searches. First, htdig gathers information on your database, searching all URL connections in your domain and associating Web pages with terms. The htmerge program uses this information to create a searchable database, merging the information from any previously generated database. htfuzzy creates indexes to allow searches using fuzzy algorithms, such as soundex and synonyms. Once the database is created, users can use Web pages that invoke htsearch to search this index. Results are listed on a Web page. You can use META tags in your HTML documents to enter specific htdig keywords, exclude a document from indexing, or provide notification information such as an e-mail address and an expiration date. htnotify uses the e-mail and expiration date to notify Web page authors when their pages are out of date.

The htsearch program is a CGI script that expects to be invoked by an HTML form, and it accepts both the GET and POST methods of passing data. The htsearch program can accept a search request from any form containing the required configuration values. Values include search features such as config (configuration file), method (search method), and sort (sort criteria). For the Web page form that invokes htsearch, you can use the default page provided by htdig or create your own. Output is formatted using templates you can modify. Several sample files are included with the htdig software: rundig is a sample script for creating a database, searchform.html is a sample HTML document that contains a search form for submitting htdig searches, header.html is a sample header for search headers, and footer.html is for search footers.

< Day Day Up >