SPEDIS Function


SPEDIS Function

Determines the likelihood of two words matching, expressed as the asymmetric spelling distance between the two words

Category: Character

Syntax

SPEDIS ( query , keyword )

Arguments

query

  • identifies the word to query for the likelihood of a match. SPEDIS removes trailing blanks before comparing the value.

keyword

  • specifies a target word for the query. SPEDIS removes trailing blanks before comparing the value.

Details

SPEDIS returns the distance between the query and a keyword, a nonnegative value that is usually less than 100 but never greater than 200 with the default costs.

SPEDIS computes an asymmetric spelling distance between two words as the normalized cost for converting the keyword to the query word by using a sequence of operations. SPEDIS( QUERY , KEYWORD ) is not the same as SPEDIS( KEYWORD , QUERY ).

Costs for each operation that is required to convert the keyword to the query are

Operation

Cost

Explanation

match

no change

singlet

25

delete one of a double letter

doublet

50

double a letter

swap

50

reverse the order of two consecutive letters

truncate

50

delete a letter from the end

append

35

add a letter to the end

delete

50

delete a letter from the middle

insert

100

insert a letter in the middle

replace

100

replace a letter in the middle

firstdel

100

delete the first letter

firstins

200

insert a letter at the beginning

firstrep

200

replace the first letter

The distance is the sum of the costs divided by the length of the query. If this ratio is greater than one, the result is rounded down to the nearest whole number.

Comparisons

The SPEDIS function is similar to the COMPLEV and COMPGED functions, but COMPLEV and COMPGED are much faster, especially for long strings.

Examples

 options nodate pageno=1 linesize=64;  data words;     input Operation $ Query $ Keyword $;     Distance = spedis(query,keyword);     Cost = distance * length(query);     datalines;  match        fuzzy       fuzzy  singlet      fuzy        fuzzy  doublet      fuuzzy      fuzzy  swap         fzuzy       fuzzy  truncate     fuzz        fuzzy  append       fuzzys      fuzzy  delete       fzzy        fuzzy  insert       fluzzy      fuzzy  replace      fizzy       fuzzy  firstdel     uzzy        fuzzy  firstins     pfuzzy      fuzzy  firstrep     wuzzy       fuzzy  several      floozy      fuzzy  ;  proc print data = words;  run; 

The output from the DATA step is as follows .

Output 4.50: Costs for SPEDIS Operations.
start example
 The SAS System                            1   Obs    Operation      Query      Keyword     Distance     Cost     1    match          fuzzy       fuzzy          0           0     2    singlet        fuzy        fuzzy          6          24     3    doublet        fuuzzy      fuzzy          8          48     4    swap           fzuzy       fuzzy         10          50     5    truncate       fuzz        fuzzy         12          48     6    append         fuzzys      fuzzy          5          30     7    delete         fzzy        fuzzy         12          48     8    insert         fluzzy      fuzzy         16          96     9    replace        fizzy       fuzzy         20         100    10    firstdel       uzzy        fuzzy         25         100    11    firstins       pfuzzy      fuzzy         33         198    12    firstrep       wuzzy       fuzzy         40         200    13    several        floozy      fuzzy         50         300 
end example
 

See Also

Functions:

  • 'COMPLEV Function' on page 454

  • 'COMPGED Function' on page 449




SAS 9.1 Language Reference Dictionary, Volumes 1, 2 and 3
SAS 9.1 Language Reference Dictionary, Volumes 1, 2 and 3
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 704

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net