Verity's proximity operators specify how close together search words must be within a document for it to count as a match. For example, if you are looking for rules about where smoking is permitted, you might want only documents that have the words smoking and permitted sitting pretty close to one another within the actual text. A document that has the word smoking at the beginning and the word permitted way at the end probably won't interest you. The proximity operators include NEAR, NEAR/N, PARAGRAPH, and SENTENCE.
The NEAR operator specifies that you are most interested in those documents in which the search words are closest together. Verity considers all documents in which the words are within 1,000 words of each other to be "found," but the closer together the words are, the higher the document's score is, which means it will be up at the top of the list. The following is an example:
CRITERIA="smoking <NEAR> permitted"
The NEAR/N operator is just like NEAR, except that you get to specify how close together the words must be to qualify as a match. This operator still ranks documents based on the closeness of the words. In reality, NEAR is just shorthand for NEAR/1000. Some examples of the NEAR/N operator are as follows:
CRITERIA="smoking <NEAR/3> permitted" CRITERIA="<NEAR/3>(smoking,permitted)"
The PARAGRAPH and SENTENCE operators specify that the words need to be in the same paragraph or sentence, respectively. Sometimes these work better than NEAR or NEAR/N because you know that the words are related in some way having to do with their actual linguistic contexts, rather than their proximity in the text. Some examples follow:
CRITERIA="smoking <PARAGRAPH> permitted" CRITERIA="<SENTENCE> (smoking permitted)"
The PHRASE operator enables you to search for a phrase. A phrase consists of two or more words in a specific order, as in this example:
CRITERIA="<PHRASE>(not permitted) <OR> (not allowed)"
Another proximity operator, IN, enables you to search HTML, XML, and SGML documents and limit your search to certain tag bodies or zones. For example, suppose you want to search for specific film descriptions in the following XML file:
<?xml version="1.0" encoding="ISO-8859-1" ?> <films> <film> <filmtitle >Horror From the Deep</filmtitle> <filmgenre>Horror</filmgenre> <filmdesc>Grade B monster-flick rip-off of Creature the Depths</filmdesc> </film> <film> <filmtitle >Dracula Sucks</filmtitle> <filmgenre>Horror</filmgenre> <filmdesc>Cheesy horror flick/black comedy about a disrespected vampire.</filmdesc> </film> <film> <filmtitle >Little Ship of Horror</filmtitle> <filmgenre>Comedy</filmgenre> <filmdesc>Schlocky takeoff on the classic from the '50s featuring Seymour Krelboing.</filmdesc> </film> </films>
If you wanted to search only the <filmgenre> tag body for the word Horror, the following search would do it:
CRITERIA="Horror <IN> filmgenre"