This one isn't really Ruby's fault

May 15, 2008 19:04

Consider the following regular expression, designed to validate a tab-separated string and extract data from it:

>> t = "a\tx\tb"
=> "a\tx\tb"
>> re = Regexp.new("a\t\S\tb")
=> a S b
>> re.match(t)
=> nil
>> re = Regexp.new("a\t[^\t]\tb")
=> a [ ] b
>> re.match(t)
=> #
This, at first, had me glaring at the docs for Ruby's regular expressions to make sure \S was implemented as I expected. The second version was more restrictive, yet it matched where the first version failed. After a few minutes' fuming, I got the inklings of an idea and tried the following:

>> re = Regexp.new('a\t\S\tb')
=> atStb
>> re.match(t)
=> #
>> re = /a\t\S\tb/
=> atStb
>> re.match(t)
=> #
Then it hit me. See, Ruby double-quoted strings interpret escape chars. The reason \t worked above was pure coincidence -- it evaluates to a tab literal, which is treated the same as \t in a regex. Thus, in my original case (turning a bunch of strings into regexen) I could use single-quoted strings if I didn't have to interpolate values into them. Unfortunately I do. For reasons unrelated to this post, I can't use regex literal syntax which allows interpolation but does not evaluate escape characters.

Once again, Ruby foils me. This time, however, it managed to foil a righteously indignant rant about regex escape sequence handling. Go figure.

programming, ruby

Previous post Next post
Up