söndag, februari 11, 2007

Ragel performance

I did some performance testing on the old and new Resolver implementation. The testing have some stupid tests that exercise bad parts of both implementations (like longest match, where it can't be decided what type something is until we have to backtrack about 20 characters). I placed these 24 strings in an array, and pounded on it with an instance of the ResolverImpl that is used in exactly the same way on all scalar values in an YAML document. The objective is to find out if the value is an implicit type or not. So basically, we give it a String, and get back a tag URI. So it's not like I'm parsing a language or anything. I'm just doing some recognizing here.

The old implementation was based on a Map> where the first letter of the string to resolve was used as an index to find a list of patterns to try sequentially. This worked fine, and made it extensible. But not very fast. This is the baseline. For 24 different strings, iterated 100 000 times for 2 400 000 resolves it takes 7879ms. That's OK, but not great.

Now, the new Ragel implementation is dead simple. It's just a translation of the regexps in the aforementioned Pattern's into a state machine. At EOF out actions (%/ for people in the Ragel knowhow), I execute an action that sets a local variable to a tag, and at the end of the resolve method returns that tag. Dead simple, and not exercising the full strength of Ragel, of course.
So, for the same number of resolves, this ResolverImpl takes 1288ms. That's 611% improvement in speed. Ain't it nice to have a friend such as Ragel? And the best part is, for harder tasks, these improvements would be even larger.
Finite State Machines are your friends. All your base are belongs to us.

Inga kommentarer: