Compares two strings by computing the Levenshtein edit distance
Category: Character
COMPLEV ( string-1 , string-2 <, cutoff ><, modifiers >)
string-1
specifies a character constant, variable, or expression.
string-2
specifies a character constant, variable, or expression.
cuttoff
specifies a numeric constant, variable, or expression. If the acutal Levenshtein edit distance is greater than the value of cutoff , the value that is returned is equal to the value of cutoff .
Tip: Using a small value of cutoff improves the efficiency of COMPGED if the values of string-1 and string-2 are long.
modifiers
specifies a character string that can modify the action of the COMPLEV function. You can use one or more of the following characters as a valid modifier:
i or I | ignores the case in string-1 and string-2 . |
l or L | removes leading blanks in string-1 and string-2 before comparing the values. |
n or N | removes quotation marks from any argument that is an n-literal and ignores the case of string-1 and string-2 . |
: ( colon ) | truncates the longer of string-1 or string-2 to the length of the shorter string, or to one, whichever is greater. |
TIP: COMPLEV ignores blanks that are used as modifiers.
The order in which the modifiers appear in the COMPLEV function is relevant.
'LN' first removes leading blanks from each string and then removes quotation marks from n-literals.
'NL' first removes quotation marks from n-literals and then removes leading blanks from each string.
The COMPLEV function ignores trailing blanks.
COMPLEV returns the Levenshtein edit distance between string-1 and string-2 . Levenshtein edit distance is the number of insertions, deletions, or replacements of single characters that are required to convert one string to the other. Levenshtein edit distance is symmetric. That is, COMPLEV(string-1,string-2) is the same as COMPLEV(string-2,string-1) .
The Levenshtein edit distance that is computed by COMPLEV is a special case of the generalized edit distance that is computed by COMPGED.
COMPLEV executes much more quickly than COMPGED.
The following example compares two strings by computing the Levenshtein edit distance.
options pageno=1 nodate ls=80 ps=60; data test; infile datalines missover; input string1 $char8. string2 $char8. modifiers $char8.; result=complev(string1, string2, modifiers); datalines; 1234567812345678 abc abxc ac abc aXc abc aXbZc abc aXYZc abc WaXbYcZ abc XYZ abcdef aBc abc aBc AbC i abc abc abc abc l AxC 'abc'n AxC 'abc'n n ; proc print data=test; run;
The following output shows the results.
The SAS System 1 Obs string1 string2 modifiers result 1 12345678 12345678 0 2 abc abxc 1 3 ac abc 1 4 aXc abc 1 5 aXbZc abc 2 6 aXYZc abc 3 7 WaXbYcZ abc 4 8 XYZ abcdef 6 9 aBc abc 1 10 aBc AbC i 0 11 abc abc 2 12 abc abc l 0 13 AxC 'abc'n 6 14 AxC 'abc'n n 1
Functions and CALL Routines:
'COMPARE Function' on page 445
'COMPGED Function' on page 449
'CALL COMPCOST Routine' on page 342