onsdag, oktober 03, 2007

Know your Regular Expression anchors

As everyone knows, regular expressions are incredibly important in many programming tasks. So it pays to know some of the particulars of the regexp syntax. One example that bit me a while back was a simple oversight - something I did know but hadn't kept in mind while writing the bad code. Namely, the way the caret (^) works when used in a String with newlines in it. To be fair I've been using Java regexps for a while and that problem doesn't exist there.

To illustrate the difference, here is a program you can run in either MRI or JRuby. If running in JRuby you'll see that the Java version needs the flag MULTILINE to behave as Ruby does by default.
str = "one\nover\nyou"
puts "Match with ^"
str.gsub(/^o/) do |e|
p $~.offset(0)
e
end

puts "Match with \\A"
str.gsub(/\Ao/) do |e|
p $~.offset(0)
e
end


if defined?(JRUBY_VERSION)
require 'java'
regexp = java.util.regex.Pattern.compile("^o", java.util.regex.Pattern::MULTILINE)
matcher = regexp.matcher(str)
puts "Java match with ^"
while matcher.find()
p matcher
end

regexp = java.util.regex.Pattern.compile("\\Ao", java.util.regex.Pattern::MULTILINE)
matcher = regexp.matcher(str)
puts "Java match with \\A"
while matcher.find()
p matcher
end
end
So, what's the lesson here? Don't use caret (^) and dollar ($) if you actually want to match the beginning or the end of the string. Instead, use \A and \Z. That's what they're there for.

3 kommentarer:

Mike Owens sa...

Whoa, thanks for the heads up. I wrote in Python for years before learning Ruby, and just assumed they both used the same MULTILINE default.

It's gonna be pretty hard to break the "\A == ^" mindset.

It's very rare that a blog post makes me go back and scan every file I've written in the past two years. Time to get pickier with the unit tests.

Dr Nic sa...

+1 mike - I also never knew there was any difference. Patooey.

Wolf sa...

Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html

Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.