2.8. Tokenizing a StringThe split method parses a string and returns an array of tokens. It accepts two parameters, a delimiter and a field limit (which is an integer). The delimiter defaults to whitespace. Actually, it uses $; or the English equivalent $FIELD_SEPARATOR. If the delimiter is a string, the explicit value of that string is used as a token separator. s1 = "It was a dark and stormy night." words = s1.split # ["It", "was", "a", "dark", "and", # "stormy", "night"] s2 = "apples, pears, and peaches" list = s2.split(", ") # ["apples", "pears", "and peaches"] s3 = "lions and tigers and bears" zoo = s3.split(/ and /) # ["lions", "tigers", "bears"] The limit parameter places an upper limit on the number of fields returned, according to these rules:
These three rules are illustrated here: str = "alpha,beta,gamma,," list1 = str.split(",") # ["alpha","beta","gamma"] list2 = str.split(",",2) # ["alpha", "beta,gamma,,"] list3 = str.split(",",4) # ["alpha", "beta", "gamma", ","] list4 = str.split(",",8) # ["alpha", "beta", "gamma", "", ""] list5 = str.split(",",-1) # ["alpha", "beta", "gamma", "", ""] The scan method can be used to match regular expressions or strings against a target string: str = "I am a leaf on the wind..." # A string is interpreted literally, not as a regex arr = str.scan("a") # ["a","a","a"] # A regex will return all matches arr = str.scan(/\w+/) # ["I", "am", "a", "leaf", "on", "the", "wind"] # A block can be specified str.scan(/\w+/) {|x| puts x } The StringScanner class, from the standard library, is different in that it maintains state for the scan rather than doing it all at once: require 'strscan' str = "Watch how I soar!" ss = StringScanner.new(str) loop do word = ss.scan(/\w+/) # Grab a word at a time break if word.nil? puts word sep = ss.scan(/\W+/) # Grab next non-word piece break if sep.nil? end |