lördag, februari 10, 2007

Faster YAML with byte processing

As noted in my last post, I have started work on converting JvYAML into JvYAMLb. Right now I have finished the work on the Scanner and the Parser, and it's looking quite good. The numbers I reported in the last post for regular JvYAML performance was wrong though. We're looking at about 7.8s to 10.0s for scanning that 3.5MB gemspec file. (And that's only the scanning, not file IO). But with the Scanner converted to use bytes and ByteList, the same processing takes 2.8s. That's a substantial difference. But it doesn't end with that.

As I said I also converted the Parser. It doesn't do any String processing at all, so I didn't expect either a speedup or slowdown except for that from the Scanner. But... Before, parsing the gemspec took 18.515s, but after, it runs in 4s. That's a dramatic speedup, and I don't really know where it comes from. Unless the earlier implementation generated so much more garbage, and used more memory, that it was noticeable in speed. Anyway, this looks good for JRuby YAML processing, since I expect big reductions in complexity in the callpath and generation of objects after the YAML processor is byted all the way through.

But tomorrow it's time to work on the Resolver, and that's going to be hard. Optimally, it would be nice to have a byte-based Regexp engine. And maybe that would be something for JRuby too, know? Our Regular Expressions must be dead slow now that they have to convert to strings all the time.

Inga kommentarer: