Section 2.8. Tokenizing a String


2.7. Performing Specialized String Comparisons

Ruby has built-in ideas about comparing strings; comparisons are done lexicographically as we have come to expect (that is, based on character set order). But if we want, we can introduce rules of our own for string comparisons, and these can be of arbitrary complexity.

For example, suppose that we want to ignore the English articles a, an, and the at the front of a string, and we also want to ignore most common punctuation marks. We can do this by overriding the built-in method <=> (which is called for <, <=, >, and >=). Listing 2.1 shows how we do this.

Listing 2.1. Specialized String Comparisons

class String   alias old_compare <=>   def <=>(other)     a = self.dup     b = other.dup     # Remove punctuation     a.gsub!(/[\,\.\?\!\:\;]/, "")     b.gsub!(/[\,\.\?\!\:\;]/, "")     # Remove initial articles     a.gsub!(/^(a |an |the )/i, "")     b.gsub!(/^(a |an |the )/i, "")     # Remove leading/trailing whitespace     a.strip!     b.strip!     # Use the old <=>     a.old_compare(b)   end end title1 = "Calling All Cars" title2 = "The Call of the Wild" # Ordinarily this would print "yes" if title1 < title2   puts "yes" else   puts "no"         # But now it prints "no" end

Note that we "save" the old <=> with an alias and then call it at the end. This is because if we tried to use the < method, it would call the new <=> rather than the old one, resulting in infinite recursion and a program crash.

Note also that the == operator does not call the <=> method (mixed in from Comparable). This means that if we need to check equality in some specialized way, we will have to override the == method separately. But in this case, == works as we want it to anyhow.

Suppose that we wanted to do case-insensitive string comparisons. The built-in method casecmp will do this; we just have to make sure that it is used instead of the usual comparison.

Here is one way:

class String   def <=>(other)     casecmp(other)   end end


But there is a slightly easier way:

class String   alias <=> casecmp end


However, we haven't finished. We need to redefine == so that it will behave in the same way:

class String   def ==(other)     casecmp(other) == 0   end end


Now all string comparisons will be strictly case-insensitive. Any sorting operation that depends on <=> will likewise be case-insensitive.




The Ruby Way(c) Solutions and Techniques in Ruby Programming
The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)
ISBN: 0672328844
EAN: 2147483647
Year: 2004
Pages: 269
Authors: Hal Fulton

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net