YAML.load_file "gems.yml"First, the file is opened, and wrapped inside a RandomAccessFile. Then data is read from it by YAML. Reading will proceed like this:
1. Bytes are read through the RAF, hopefully in chunks.
2. Those bytes are wrapped in a RubyString so they can be returned from the IO#read method.
3. An IOReader wraps that RubyIO object, gets the RubyString and converts it from bytes into a String, and this String gets converted into a char array.
4. That char array is returned to the YAML Scanner.
5. The chars from the char array is collected in a StringBuffer, and saved in various Strings as token values.
6. The parser, resolver and constructor work on these Strings in various ways.
7. The JRubyConstructor takes these Strings and creates RubyString objects from them and in the process converting the String back to a byte array.
Is there any doubt that this process is slow? Well, it hasn't been that big of a problem until now, since we are doing so well on performance in other parts of the system.
So, the radical decision is to rewrite JvYAML, making it more SYCK-compliant, working with InputStreams and byte-arrays, and in the process get away from several of the steps above. So that's what I'm going to do. I hereby create JvYAMLb. It will only be a part of the JRuby codebase, but it will be reasonably separate, so it can be extracted for other purposes. I will not stop work on regular JvYAML, but will maintain both projects.
Since the objective of this new project is blazing speed, I will post some numbers on this now and again. But first I will show you the speed of the regular system. JvYAML's Scanner can scan an old gem source index (about 3.5MB) of 435654 tokens in about 1654ms. This is the baseline I'm going to use to test performance, and I'll post more on this as soon as the byte-based Scanner is ready to try out.