onsdag, oktober 31, 2007

An interesting memory leak in JRuby

The last two days I had lots of fun with the interesting task of finding a major memory leak in JRuby. The only way I could reliably reproduce it was by running Mingle's test suite and see memory being eaten. I tried several approaches, the first being using jhat to analyze the heap dumps. That didn't really help me much, since all the interesting queries I tried to run with OQL had a tendency to just cause out of memory errors. Not nice.

Next step was to install SAP Memory Analyzer which actually worked really well, even though it's built on top of Eclipse. After several false starts, including one where I thought we had found the memory leak I finally got somewhere. Actually, I did find a memory leak in our method cache implementation. But alas, after fixing that it was obvious there was another leak in there.

I finally got SAP to tell me that RubyClasses was being retained. But when I tried to find the root chain to see how that happened I couldn't see anything strange. In fact, what I saw what the normal chaining of frames, blocks, classes and other interesting parts. And this is really the problem when debugging this kind of problem in JRuby. Since a leak almost always be leaking several different objects, it can be hard to pinpoint the exact problem. In this case I guess that the problem was in a large branch that Bill merged a few weeks back, so I tried going back to it and checking. Alas, the branch was good. In fact, since I went back 200 revisions I finally knew within which range the problem had to be. Since I couldn't find anything more from the heap dumps I resorted to the venerable tradition of binary search. Namely going through the revisions and finding the faulty one. According to log2, I would find the bad revision in less than 8 tries, so I started out.

After I while I actually found the problem. Let me show it to you here:
def __jtrap(*args, &block)
sig = args.first
sig = SIGNALS[sig] if sig.kind_of?(Fixnum)
sig = sig.to_s.sub(/^SIG(.+)/,'\1')
signal_class = Java::sun.misc.Signal
signal_class.send :attr_accessor, :prev_handler
signal_object = signal_class.new(sig) rescue nil
return unless signal_object
signal_handler = Java::sun.misc.SignalHandler.impl do
begin
block.call
rescue Exception => e
Thread.main.raise(e) rescue nil
ensure
# re-register the handler
signal_class.handle(signal_object, signal_handler)
end
end
signal_object.prev_handler = signal_class.handle(signal_object, signal_handler)
end
This is part of our signal handling code. Interestingly enough, I was nonplussed. How could trap leak? I mean, noone actually calls trap enough times to make it leak, right?

Well, wrong. Actually, it seems that ActiveRecord traps abort in transactions and then restore the original handler. So each transaction created new trap handlers. That would have been fine, except for the last line. In effect, in the current signal handler we save a reference to the previous signal handler. After a few iterations we will have a long chain of signal handlers, all pointing back, all holding a hard reference from one of the single static root sets in the JVM (namely, the list of all signal handlers). That isn't so bad though. Except a saved block has references to dynamic scopes (which reference variables). It has a reference to the Frame, and the Frame has references to RubyClass. RubyClass has references to method objects, and method objects have in some cases references to RubyProcs, which in turn have more references to Blocks. At the end, we have a massive leak.

The solution? To simple remove saving of the previous handler and simplify the signal handler.

QCon and OpenWorld

As mentioned before I will be in San Francisco next week for QCon, and the week after that for Oracle OpenWorld. I will be part of a panel debate at QCon and man a booth on Oracle OpenWorld. In fact, if you're attending OpenWorld you should visit ThoughtWorks booth at 343 Moscone South. Looking forward to seeing you there.

söndag, oktober 28, 2007

Michael Gira and Tim Bray

Doesn't Tim Bray and Michael Gira look very much alike? See for yourself:


I'm not sure if Tim has the voice of Michael, though. Anyway, I gotta say, warming up with preparty with Bill Hicks, and then a 90 minutes Michael Gira show was awesome.

Is this the wrong blog for this? Yeah, probably, but the Tim Bray inclusion warrants it. =)

fredag, oktober 26, 2007

Current state of Regular Expressions

As I've made clear earlier, the current regular expression situation has once again become impractical. To reiterate the history: We began with regular Java regex support. This started to cave in when we found out that the algorithm used is actually recursive, and fails for some common regexps used inside Rails among others. To fix that, we integrated JRegex instead. That's the engine 1.0 was released with and is still the engine in use. It works fairly well, and is fast for a Java engine. But not fast enough. In particular, there is no support for searching for exact strings and failing fast, and the engine requires us to transform our byte[]-strings to char[] or String. Not exactly optimal. Another problem is that compatibility with MRI suffers, especially in the multi byte support.

There are two solutions currently on the way. Core developer Marcin are working on a port of the 1.9 regexp engine Oniguruma. This port still has some way to go, and is not integrated with JRuby. The other effort is called REJ, and is a port of the MRI engine I did a few months back. I've freshened up the work and integrated it with JRuby in a branch. At the moment this work actually seems to go quite well, but there are some snags.

First of all, let me point out that this approach gives us more or less total multibyte compatibility for 1.8, which is quite nice.

When doing benchmarking, I'm generally using Rails as the bar. I have a series of regular expressions that Petstore uses for each requst, and I'm using these to check performance. As a first datapoint, JRuby+REJ is faster at parsing regexps than JRuby trunk for basically all regexps. This ranges from slightly faster to twice as fast.

Most of the Rails regexen are actually faster in REJ than in JRuby+trunk, but the problem is that some of them are actually quite a bit slower. 4 of the 22 Rails regexps are slower, by between 20 and 250% percent. There are also this one: /.*_f/ =~ "_fxxxxxxxxxxxxxxxxxxxxxxx" which basically runs about 10x slower than JRuby trunk. Not nice at all.

In the end, the problem is backtracking. Since REJ is a straight port of the MRI code, the backtracking is also ported. But it seems that Java is unusually bad at handling that specific algorithm, and it performs quite badly. At the moment I'm continuing to look at it and trying to improve performance in all ways possible, so we'll see what happens. Charles Nutter have also started to look at it.

But what's really interesting is that I reran my Petstore benchmarks with the current REJ code. To rehash, my last results with JRuby trunk looked like this:
controller :   1.804000   0.000000   1.804000 (  1.804000)
view : 5.510000 0.000000 5.510000 ( 5.510000)
full action: 13.876000 0.000000 13.876000 ( 13.876000)
But the results from rerunning with REJ was interesting, to say the least. I expected bad results because of the bad backtracking performance, but it seems the other speed improvements weigh up:
controller :   1.782000   0.000000   1.782000 (  1.782000)
view : 4.735000 0.000000 4.735000 ( 4.735000)
full action: 12.727000 0.000000 12.727000 ( 12.727000)
As you can see, the improvement is quite large in the view numbers. It is also almost there compared to MRI which had 4.57. Finally, the full action is better by a full second too. Again, MRI is 9.57s and JRuby 12.72. It's getting closer. I am quite optimistic right now, provided that we manage to fix the remaining problems with backtracking, our regexp engine might well be a great boon to performance.

Interesting times in JRuby land

The last week or two have been quite interesting. We are finally starting to see performance numbers that seem good enough. Take a look at Nick's posts here and here for more information on this. The second post contains some valuable tips. In particular, make sure to turn off ObjectSpace and run with -server. Both of these will improve your performance and scalability quite much.

Secondly, JRuby on Rails on Oracle Application Server. Nice, huh? I would imagine more interesting things coming out of all this.

But the end message seems to be that JRuby is really ready now. The 1.1 release looks like it's going to be something really amazing. I can't wait!

tisdag, oktober 16, 2007

Updated JRuby on Rails performance numbers

So, after my last post, several of us have spent time looking at different parts of Rails and JRuby performance. We have managed to improve things quite nicely since my last performance note. Some of the things changed have been JRuby's each_line implementation, JRuby's split implementation, a few other improvements, and some small fixes to AR-JDBC. After that, here are some new numbers for Petstore. Remember, the MRI numbers are for MRI 1.8.6 with native C MySQL. The JRuby numbers is with ActiveRecord-JDBC trunk and MySQL. (I'm only showing the best numbers of each). The number in bold is the most important one for comparison.

MRI   controller :   1.000000   0.070000   1.070000 (  1.430260)
JRuby controller : 1.804000 0.000000 1.804000 ( 1.804000)

MRI view : 4.410000 0.150000 4.560000 ( 4.575399)
JRuby view : 5.510000 0.000000 5.510000 ( 5.510000)

MRI full action: 8.260000 0.410000 8.670000 ( 9.574460)
JRuby full action: 13.876000 0.000000 13.876000 ( 13.876000)
As you can see, we are talking about 9.5s MRI to 13.8s for JRuby, which I find is a quite nice achievement if you look at the numbers from Friday. We are inching closer and closer. Both the view and the controller numbers are looking very nice. This is actually indicative of a nice trend - since general JRuby primitive performance is really good, the slowness in our Regular Expression engine is weighed up by much faster execution speed.

Once the port of Oniguruma lands, this story will almost certainly look very different. But even so, this is looking good.

söndag, oktober 14, 2007

JRuby discovery number one

After my last entry I've spent lots of time checking different parts of JRuby, trying to find the one true bottleneck for Rails. Of course, I still haven't found it (otherwise I would have said YAY in the subject for this blog). But I have found a few things - for example, symbols are slow right now, but Bill's work will make them better. And it doesn't affect Rails performance at all.

But the discovery I made was when I looked at the performance of the regular expressions used in Rails. There are exactly 50 of them for each request, so I did a script that checked the performance of each of them against MRI. And I found that there was one in particular that had really interesting performance when comparing MRI to JRuby. In fact, it was between 200 and a 1000 times slower. What's worse, the performance wasn't linear.

So which regular expression was the culprit? Well, /.*?\n/m. That doesn't look to bad. And in fact, this expression displayed not one, but two problems with JRuby. The first one is that any regular expression engine should be able to fail fast on something like this, simply because there is a string that always needs to be part of a string for this expression to match. In MRI, that part of the engine is called bm_search, and is a very fast way to fail. JRuby doesn't have that. Marcin is working on a port of Oniguruma though, so that will fix that part of the problem.

But wait, if you grep for this regexp in the Rails sources you won't find it. So where was it actually used? Here is the kicker: it was used in JRuby's implementation of String#each_line. So, let's take some time to look at a quick benchmark for each_line:
require 'benchmark'

str = "Content-Type: text/html; charset=utf-8\r\nSet-Cookie: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa "

TIMES=100_000

puts "each_line on small string with several lines"
10.times do
puts(Benchmark.measure{TIMES.times { str.each_line{} }})
end

str = "abc" * 15

puts "each_line on short string with no line divisions"
10.times do
puts(Benchmark.measure{TIMES.times { str.each_line{} }})
end

str = "abc" * 4000

puts "each_line on large string with no line divisions"
10.times do
puts(Benchmark.measure{TIMES.times { str.each_line{} }})
end
As you can see, we simple measure the performance of doing a 100 000 each_line calls on three different strings. The first one is a short string with several newlines, the second is a short string with no newlines, and the last is a long string with no newlines. How does MRI run this benchmark?
each_line on small string with several lines
0.160000 0.000000 0.160000 ( 0.157664)
0.150000 0.000000 0.150000 ( 0.160450)
0.160000 0.000000 0.160000 ( 0.171563)
0.150000 0.000000 0.150000 ( 0.157854)
0.150000 0.000000 0.150000 ( 0.154578)
0.150000 0.000000 0.150000 ( 0.154547)
0.160000 0.000000 0.160000 ( 0.158894)
0.150000 0.000000 0.150000 ( 0.158064)
0.150000 0.010000 0.160000 ( 0.156975)
0.160000 0.000000 0.160000 ( 0.156857)
each_line on short string with no line divisions
0.080000 0.000000 0.080000 ( 0.086789)
0.090000 0.000000 0.090000 ( 0.084559)
0.080000 0.000000 0.080000 ( 0.093477)
0.090000 0.000000 0.090000 ( 0.084700)
0.080000 0.000000 0.080000 ( 0.089917)
0.090000 0.000000 0.090000 ( 0.084176)
0.080000 0.000000 0.080000 ( 0.086735)
0.090000 0.000000 0.090000 ( 0.085536)
0.080000 0.000000 0.080000 ( 0.084668)
0.090000 0.000000 0.090000 ( 0.090176)
each_line on large string with no line divisions
3.350000 0.020000 3.370000 ( 3.404514)
3.330000 0.020000 3.350000 ( 3.690576)
3.320000 0.040000 3.360000 ( 3.851804)
3.320000 0.020000 3.340000 ( 3.651748)
3.340000 0.020000 3.360000 ( 3.478186)
3.340000 0.020000 3.360000 ( 3.447704)
3.330000 0.020000 3.350000 ( 3.448651)
3.350000 0.010000 3.360000 ( 3.489842)
3.350000 0.020000 3.370000 ( 3.429135)
3.350000 0.010000 3.360000 ( 3.372925)
OK, this looks reasonable. The large string is obviously taking more time to search, but not incredibly much time. What about trunk JRuby?
each_line on small string with several lines
32.668000 0.000000 32.668000 ( 32.668000)
30.785000 0.000000 30.785000 ( 30.785000)
30.824000 0.000000 30.824000 ( 30.824000)
30.878000 0.000000 30.878000 ( 30.877000)
30.904000 0.000000 30.904000 ( 30.904000)
30.826000 0.000000 30.826000 ( 30.826000)
30.550000 0.000000 30.550000 ( 30.550000)
32.331000 0.000000 32.331000 ( 32.331000)
30.971000 0.000000 30.971000 ( 30.971000)
30.537000 0.000000 30.537000 ( 30.537000)
each_line on short string with no line divisions
7.472000 0.000000 7.472000 ( 7.472000)
7.350000 0.000000 7.350000 ( 7.350000)
7.516000 0.000000 7.516000 ( 7.516000)
7.252000 0.000000 7.252000 ( 7.252000)
7.313000 0.000000 7.313000 ( 7.313000)
7.262000 0.000000 7.262000 ( 7.262000)
7.383000 0.000000 7.383000 ( 7.383000)
7.786000 0.000000 7.786000 ( 7.786000)
7.583000 0.000000 7.583000 ( 7.583000)
7.529000 0.000000 7.529000 ( 7.529000)
each_line on large string with no line divisions
Ooops. That doesn't look so good... And also, where is the last ten lines? Eh... It's still running. It's been running for two hours to produce the first line. That means that it's taking at least 7200 seconds which is more than 2400 times slower than MRI. But in fact, since the matching of the regular expression above is not linear, but exponential in performance, I don't expect this to ever finish.

There are a few interesting lessons to take away from this exercise:
  1. There may still be implementation problems like this in many parts of JRuby - performance will improve by quite much every time we find something like this. I haven't measured Rails performance after this is fixed, and I don't expect it to actually fix the whole problem, but I think I'll see better numbers.
  2. Understand regular expressions. Why is /.*?\n/ so incredibly bad for strings over a certain length? In this case it's the combination of .* and ?. What would be a better implementation in almost all cases? /[^\n]\n/. Notice that there is no backtracking in this implementation, and because of that, this regexp will have performance O(n) while the earlier one was O(n^2). Learn and know these things. They are the difference between usage and expertise.

fredag, oktober 12, 2007

Mystery: An exposé on JRuby performance

So, after Charlies awesome work yesterday (documented here), I felt it was time I put the story straight on general JRuby performance. It's something of a mystery. I'll begin by showing you the results of lots of different benchmarks and comparison to MRI. All these benchmarks have been run on my MacBook Pro, dual 2.33Ghz cores with 2Gb memory. JRuby revision is 4578, Java is Apple 1.6 beta, and Ruby version is 1.8.6 (2007-03-13 patchlevel 0). JRuby was ran with -J-server and -O.

Now, lets begin with the YARV benchmarks. The first values is always MRI, the second i JRuby. I've made the benchmark red when MRI is faster, and green when JRuby is faster.
bm_app_answer.rb:
0.480000 0.010000 0.490000 ( 0.511641)
0.479000 0.000000 0.479000 ( 0.478000)

bm_app_factorial.rb:
ERROR stack level too deep
2.687000 0.000000 2.687000 ( 2.687000)

bm_app_fib.rb:
5.880000 0.030000 5.910000 ( 6.151324)
2.687000 0.000000 2.687000 ( 2.687000)

bm_app_mandelbrot.rb:
1.930000 0.010000 1.940000 ( 1.992984)
2.752000 0.000000 2.752000 ( 2.876000)

bm_app_pentomino.rb:
84.160000 0.450000 84.610000 ( 88.205519)
77.117000 0.000000 77.117000 ( 77.117000)

bm_app_raise.rb:
2.600000 0.430000 3.030000 ( 3.169156)
2.162000 0.000000 2.162000 ( 2.162000)

bm_app_strconcat.rb:
1.390000 0.010000 1.400000 ( 1.427766)
1.003000 0.000000 1.003000 ( 1.003000)

bm_app_tak.rb:
7.570000 0.060000 7.630000 ( 7.888381)
2.676000 0.000000 2.676000 ( 2.676000)

bm_app_tarai.rb:
6.020000 0.030000 6.050000 ( 6.186971)
2.236000 0.000000 2.236000 ( 2.236000)

bm_loop_times.rb:
4.240000 0.020000 4.260000 ( 4.404826)
3.354000 0.000000 3.354000 ( 3.354000)

bm_loop_whileloop.rb:
9.450000 0.050000 9.500000 ( 9.678552)
5.037000 0.000000 5.037000 ( 5.037000)

bm_loop_whileloop2.rb:
1.890000 0.010000 1.900000 ( 1.936502)
1.039000 0.000000 1.039000 ( 1.039000)

bm_so_ackermann.rb:
ERROR stack level too deep
4.928000 0.000000 4.928000 ( 4.927000)

bm_so_array.rb:
5.580000 0.020000 5.600000 ( 5.709101)
5.552000 0.000000 5.552000 ( 5.552000)

bm_so_concatenate.rb:
1.580000 0.040000 1.620000 ( 1.647592)
1.602000 0.000000 1.602000 ( 1.602000)

bm_so_exception.rb:
4.170000 0.390000 4.560000 ( 4.597234)
4.683000 0.000000 4.683000 ( 4.683000)

bm_so_lists.rb:
0.970000 0.030000 1.000000 ( 1.036678)
0.814000 0.000000 0.814000 ( 0.814000)

bm_so_matrix.rb:
1.700000 0.010000 1.710000 ( 1.765739)
1.878000 0.000000 1.878000 ( 1.879000)

bm_so_nested_loop.rb:
5.130000 0.020000 5.150000 ( 5.258066)
4.661000 0.000000 4.661000 ( 4.661000)

bm_so_object.rb:
5.480000 0.030000 5.510000 ( 5.615154)
3.095000 0.000000 3.095000 ( 3.095000)

bm_so_random.rb:
1.760000 0.010000 1.770000 ( 1.806116)
1.495000 0.000000 1.495000 ( 1.495000)

bm_so_sieve.rb:
0.680000 0.010000 0.690000 ( 0.705296)
0.853000 0.000000 0.853000 ( 0.853000)

bm_vm1_block.rb:
19.920000 0.110000 20.030000 ( 20.547236)
12.698000 0.000000 12.698000 ( 12.698000)

bm_vm1_const.rb:
15.720000 0.080000 15.800000 ( 16.426734)
7.654000 0.000000 7.654000 ( 7.654000)

bm_vm1_ensure.rb:
14.530000 0.070000 14.600000 ( 15.137106)
7.588000 0.000000 7.588000 ( 7.589000)

bm_vm1_length.rb:
17.230000 0.090000 17.320000 ( 20.406438)
6.415000 0.000000 6.415000 ( 6.416000)

bm_vm1_rescue.rb:
11.520000 0.040000 11.560000 ( 11.736435)
5.604000 0.000000 5.604000 ( 5.603000)

bm_vm1_simplereturn.rb:
17.560000 0.080000 17.640000 ( 18.178065)
5.413000 0.000000 5.413000 ( 5.413000)

bm_vm1_swap.rb:
22.160000 0.110000 22.270000 ( 22.698746)
6.836000 0.000000 6.836000 ( 6.835000)

bm_vm2_array.rb:
5.600000 0.020000 5.620000 ( 5.675354)
1.844000 0.000000 1.844000 ( 1.844000)

bm_vm2_method.rb:
9.800000 0.030000 9.830000 ( 9.918884)
5.152000 0.000000 5.152000 ( 5.151000)

bm_vm2_poly_method.rb:
13.570000 0.050000 13.620000 ( 13.803066)
10.289000 0.000000 10.289000 ( 10.289000)

bm_vm2_poly_method_ov.rb:
3.990000 0.010000 4.000000 ( 4.071277)
1.750000 0.000000 1.750000 ( 1.749000)

bm_vm2_proc.rb:
5.670000 0.020000 5.690000 ( 5.723124)
3.267000 0.000000 3.267000 ( 3.267000)

bm_vm2_regexp.rb:
0.380000 0.000000 0.380000 ( 0.387671)
0.961000 0.000000 0.961000 ( 0.961000)

bm_vm2_send.rb:
3.720000 0.010000 3.730000 ( 3.748266)
2.135000 0.000000 2.135000 ( 2.136000)

bm_vm2_super.rb:
4.100000 0.010000 4.110000 ( 4.138355)
1.781000 0.000000 1.781000 ( 1.781000)

bm_vm2_unif1.rb:
3.320000 0.010000 3.330000 ( 3.348069)
1.385000 0.000000 1.385000 ( 1.385000)

bm_vm2_zsuper.rb:
4.810000 0.020000 4.830000 ( 4.856368)
2.920000 0.000000 2.920000 ( 2.921000)

bm_vm3_thread_create_join.rb:
0.000000 0.000000 0.000000 ( 0.006621)
0.368000 0.000000 0.368000 ( 0.368000)
What is interesting about these numbers is that almost all of them are way faster, and the ones that are slower are so by a quite narrow margin (except for the regexp and thread tests).

So, this is the baseline against MRI. Let's take a look at a few other benchmarks. These can all be found in JRuby, in the test/bench directory. When you run them, all of them generate 5 or 10 runs of all measures, but I've simply taken the best one for each. The repetition is to allow Java to warm up. Here are a few different benchmarks, same convention and running parameters as above:
bench_fib_recursive.rb
----------------------
1.390000 0.000000 1.390000 ( 1.412710)
0.532000 0.000000 0.532000 ( 0.532000)


bench_method_dispatch.rb
------------------------
Control: 1m loops accessing a local variable 100 times
2.830000 0.010000 2.840000 ( 2.864822)
0.105000 0.000000 0.105000 ( 0.105000)

Test STI: 1m loops accessing a fixnum var and calling to_i 100 times
10.000000 0.030000 10.030000 ( 10.100846)
2.111000 0.000000 2.111000 ( 2.111000)

Test ruby method: 1m loops calling self's foo 100 times
16.130000 0.060000 16.190000 ( 16.359876)
7.971000 0.000000 7.971000 ( 7.971000)


bench_method_dispatch_only.rb
-----------------------------
Test ruby method: 100k loops calling self's foo 100 times
1.570000 0.000000 1.570000 ( 1.588715)
0.587000 0.000000 0.587000 ( 0.587000)


bench_block_invocation.rb
-------------------------
1m loops yielding a fixnum 10 times to a block that just retrieves dvar
2.800000 0.010000 2.810000 ( 2.822425)
1.194000 0.000000 1.194000 ( 1.194000)

1m loops yielding two fixnums 10 times to block accessing one
6.550000 0.030000 6.580000 ( 6.623452)
2.064000 0.000000 2.064000 ( 2.064000)

1m loops yielding three fixnums 10 times to block accessing one
7.390000 0.020000 7.410000 ( 7.467841)
2.120000 0.000000 2.120000 ( 2.120000)

1m loops yielding three fixnums 10 times to block splatting and accessing them
9.250000 0.040000 9.290000 ( 9.339131)
2.451000 0.000000 2.451000 ( 2.451000)

1m loops yielding a fixnums 10 times to block with just a fixnum (no vars)
1.890000 0.000000 1.890000 ( 1.908501)
1.278000 0.000000 1.278000 ( 1.277000)

1m loops calling a method with a fixnum that just returns it
2.740000 0.010000 2.750000 ( 2.766255)
1.426000 0.000000 1.426000 ( 1.426000)


bench_string_ops.rb
----
Measure string array sort time
5.950000 0.060000 6.010000 ( 6.055483)
8.061000 0.000000 8.061000 ( 8.061000)

Measure hash put time
0.390000 0.010000 0.400000 ( 0.398593)
0.208000 0.000000 0.208000 ( 0.209000)

Measure hash get time (note: not same scale as put test)
1.620000 0.000000 1.620000 ( 1.646155)
0.740000 0.000000 0.740000 ( 0.740000)

Measure string == comparison time
2.340000 0.010000 2.350000 ( 2.368579)
0.812000 0.000000 0.812000 ( 0.812000)

Measure string == comparison time, different last pos
2.690000 0.000000 2.690000 ( 2.724772)
0.860000 0.000000 0.860000 ( 0.860000)

Measure string <=> comparison time
2.340000 0.010000 2.350000 ( 2.369915)
0.824000 0.000000 0.824000 ( 0.824000)

Measure 'string'.index(fixnum) time
0.790000 0.010000 0.800000 ( 0.808189)
1.113000 0.000000 1.113000 ( 1.113000)

Measure 'string'.index(str) time
2.860000 0.010000 2.870000 ( 2.892730)
0.956000 0.000000 0.956000 ( 0.956000)

Measure 'string'.rindex(fixnum) time
0.800000 0.000000 0.800000 ( 0.817300)
0.631000 0.000000 0.631000 ( 0.631000)

Measure 'string'.rindex(str) time
12.190000 0.040000 12.230000 ( 12.310492)
1.247000 0.000000 1.247000 ( 1.247000)


bench_ivar_access.rb
----
100k * 100 ivar gets, 1 ivar
0.500000 0.000000 0.500000 ( 0.582682)
0.340000 0.000000 0.340000 ( 0.340000)

100k * 100 ivar sets, 1 ivar
0.700000 0.010000 0.710000 ( 0.816724)
0.402000 0.000000 0.402000 ( 0.401000)

100k * 100 attr gets, 1 ivar
0.970000 0.000000 0.970000 ( 0.988212)
0.875000 0.000000 0.875000 ( 0.874000)

100k * 100 attr sets, 1 ivar
1.390000 0.010000 1.400000 ( 1.406535)
1.114000 0.000000 1.114000 ( 1.114000)

100k * 100 ivar gets, 2 ivars
0.490000 0.000000 0.490000 ( 0.506206)
0.344000 0.000000 0.344000 ( 0.344000)

100k * 100 ivar sets, 2 ivars
0.680000 0.000000 0.680000 ( 0.693064)
0.388000 0.000000 0.388000 ( 0.388000)

100k * 100 attr gets, 2 ivars
0.970000 0.000000 0.970000 ( 0.989313)
0.878000 0.000000 0.878000 ( 0.878000)

100k * 100 attr sets, 2 ivars
1.400000 0.000000 1.400000 ( 1.434206)
1.129000 0.000000 1.129000 ( 1.128000)

100k * 100 ivar gets, 4 ivars
0.490000 0.000000 0.490000 ( 0.502097)
0.340000 0.000000 0.340000 ( 0.340000)

100k * 100 ivar sets, 4 ivars
0.690000 0.000000 0.690000 ( 0.696852)
0.389000 0.000000 0.389000 ( 0.389000)

100k * 100 attr gets, 4 ivars
0.970000 0.010000 0.980000 ( 0.986163)
0.872000 0.000000 0.872000 ( 0.872000)

100k * 100 attr sets, 4 ivars
1.370000 0.010000 1.380000 ( 1.394921)
1.128000 0.000000 1.128000 ( 1.128000)

100k * 100 ivar gets, 8 ivars
0.500000 0.000000 0.500000 ( 0.519511)
0.344000 0.000000 0.344000 ( 0.344000)

100k * 100 ivar sets, 8 ivars
0.690000 0.000000 0.690000 ( 0.710896)
0.389000 0.000000 0.389000 ( 0.389000)

100k * 100 attr gets, 8 ivars
0.970000 0.000000 0.970000 ( 0.987582)
0.870000 0.000000 0.870000 ( 0.870000)

100k * 100 attr sets, 8 ivars
1.380000 0.000000 1.380000 ( 1.400542)
1.132000 0.000000 1.132000 ( 1.132000)

100k * 100 ivar gets, 16 ivars
0.500000 0.000000 0.500000 ( 0.523690)
0.342000 0.000000 0.342000 ( 0.343000)

100k * 100 ivar sets, 16 ivars
0.680000 0.000000 0.680000 ( 0.707385)
0.391000 0.000000 0.391000 ( 0.391000)

100k * 100 attr gets, 16 ivars
0.970000 0.010000 0.980000 ( 1.017880)
0.879000 0.000000 0.879000 ( 0.879000)

100k * 100 attr sets, 16 ivars
1.370000 0.010000 1.380000 ( 1.387713)
1.128000 0.000000 1.128000 ( 1.128000)


bench_for_loop.rb
----
100k calls to a method containing 5x a for loop over a 10-element range
0.890000 0.000000 0.890000 ( 0.917563)
0.654000 0.000000 0.654000 ( 0.654000)

All of this is really good, of course. Lots of green. And the red parts are not that bad either.

But here is the mystery. General Rails performance sucks. This test case can be ran by checking out petstore from tw-commons at RubyForge. There is a file called script/console_bench that will run the benchmarks. The two commands ran was: ruby script/console_bench production, and jruby -J-server -O script/console_bench ar_jdbc. With further ado, here are the numbers:
controller :   1.000000   0.070000   1.070000 (  1.472701)
controller : 2.144000 0.000000 2.144000 ( 2.144000)
view : 4.470000 0.160000 4.630000 ( 4.794336)
view : 6.335000 0.000000 6.335000 ( 6.335000)
full action: 8.300000 0.400000 8.700000 ( 9.597351)
full action: 16.055000 0.000000 16.055000 ( 16.055000)
These numbers plainly stink. And we can't find the reason for it. As you can see, most benchmarks are much better, and there is nothing we can think of that would add up to a general degradation like this. It can't be IO since these tests use app.get directly. So what is it? There is very likely to be one or two bottlenecks causing this problem, and when we have found it, Rails will most likely blaze along. But profiling haven't helped yet. We found a few things, but there is still stuff missing. It's an interesting problem.

My current thesis is that either symbols or regexps are responsible. I'll spend the day checking that.

Recent interviews

I have been doing several interviews recently, mostly in connection with my book. Here are the ones I know of right now:

torsdag, oktober 11, 2007

Grails eXchange, QCon SF and OpenWorld

It seems conference season is just coming up for me. I'll actually not be presenting at any of these events - just attending, being marketing and so on. First of, I'll run around at Grails eXchange in London next week as official ThoughtWorks representative. That is bound to be interesting.

I'll also attend QCon in San Francisco from 5th to 9th November. Directly following that I'll also be at Oracle's OpenWorld event from 11th November to 15th November. I'll have a fair amount of spare time in the evenings during the two weeks in San Francisco, so if anyone want to have a beer and discuss geeky things, feel free to mail me. Also, if anyone is interested in JRuby, I'm more or less always available to do talks on things like that.

Later on, I'll also be at JavaPolis in Antwerp, Belgium, in December.

See you out there.

onsdag, oktober 10, 2007

Announcing JRuby/LDAP

I have just released JRuby/LDAP, a new project within JRuby-extras, that aim to be interface compatible with Ruby/LDAP. That is, if you have a Ruby project written for Ruby/LDAP, you should be able to run it on JRuby without any changes.

This is a first release and some of the more obscure functionalities haven't been added yet.

Installation is easy:
jruby -S gem install jruby-ldap
If you're interested in the source, it's available on RubyForge, within the JRuby-extras project.

JRuby/LDAP uses JNDI to achieve nice LDAP access. You can also use standard Java ways (i.e. jndi.properties) to change the LDAP access implementation.

The code is BSD licensed.

tisdag, oktober 09, 2007

In favor of Ruby

Chad Wathington recently posted on the official ThoughtWorks Studios blog a post called Many Facets of Ruby. I would like to expand on some of the points on it, and how I see it. To be sure, what is posted on the TW Studios blog is the "official" ThoughtWorks views - whereas what I write in this blog is purely my own opinions, with no relationship to ThoughtWorks at all.

The point Chad writes about is that ThoughtWorks lately have been talking a lot about JRuby, in such a way that it's easy to get the impression that we as a company have chosen one implementation over the others. As Chad writes, that's not correct.

I've probably done the same thing in my blog. Obviously, I really like JRuby and hope it will work out well. I really like Rubinius effort and I predicted a while back that Rubinius may take over after MRI as the standard C Ruby implementation. But that doesn't mean I'm not interested in the the other approaches around. MRI and YARV definitely has strong points going for them (MRI and JRuby are still the only fully working implementations of Ruby). But when IronRuby, XRuby, Rubinius, YARV, Garden Points and Cardinal is more complete, the Ruby environment will be that much richer for it.

I'm not in this game for a specific implementation. I would use Ruby no matter if there was a JRuby or not. It's just that JRuby solves some of my problems, and allows me to hack on something that I know a segment of the Ruby user group will find useful. I'm in this for the language. I have chosen Ruby as my language, but the language is the same over the implementations. And it's going to be really exciting in the Ruby space the next few months.

lördag, oktober 06, 2007

Operator Overloading in Ruby

I really like the fact that Ruby has operator overloading. In many cases this feature is not very fitting in a language, but in Ruby it definitely has its place. So what's the difference? Why do people generally say that operator overloading in C++ is a bad thing, and that Java shouldn't have it? I believe the main reason is static typing. When you have static typing and combine that with a verbose language, operator overloading doesn't really fit in. You can't create DSL looking API's out of the language since the line noise usually make such efforts futile. And then you're left with only the bad parts of operator overloading.

In Ruby on the other hand, it fits very well. It actually makes sense to be able to redefine and add operators since there are many places where your code can be made much more readable with it. Of course, you always have to be careful and not go over board with it, but when used riht it improves readability to a large degree. A typical example of nice use of overloading is in Hpricot:
doc/:html/:body/:p/:img
Here the division operator have been overloaded to make it possible for you to write XPath like search strings beginning on a parsed Hpricot document.

Another place I really like is in the Ruby core library where the call method is usually overloaded as []. This makes sense if you think about it - in most cases the objects you can call on will take parameters and return a result based on the parameter. This can be mapped quite neatly to an Array or Hash containing values that are returned, and in fact you can imagine swapping out this restricted usage of an Array or Hash to something that you can call things on.
If the [] method is all you used for access, duck typing makes it easy to send in a Proc instead of an Array or Hash.

Another reason using [] can be nice, is since Ruby doesn't let you override (). If you want to emulate that kind of behavior, [] can act exactly the same if you want it to.

In the Ruby core, the more useful things that implement [] is Array and Hash, of course, UnboundMethod, BoundMethod, Proc and Continuation. This means that you can do something like proc{ puts "hello" }[] if you feel like it.

There are lots of other nice operators to overload in Ruby, though. Some of the better ones are unary + and unary -, all the common arithmetic operators, ||, &&, |, &, <, >, <=, >=, ~, =~, **, ==, === and a few others. The two things missing here is the ! and friends, and (). But of course, adding () wouldn't really work with the Ruby language. ! on the other hand should be overridable.

So, you should use operator overloading in Ruby, but be careful that it makes sense and actually gives you a good abstraction, not just something "cool".

onsdag, oktober 03, 2007

Know your Regular Expression anchors

As everyone knows, regular expressions are incredibly important in many programming tasks. So it pays to know some of the particulars of the regexp syntax. One example that bit me a while back was a simple oversight - something I did know but hadn't kept in mind while writing the bad code. Namely, the way the caret (^) works when used in a String with newlines in it. To be fair I've been using Java regexps for a while and that problem doesn't exist there.

To illustrate the difference, here is a program you can run in either MRI or JRuby. If running in JRuby you'll see that the Java version needs the flag MULTILINE to behave as Ruby does by default.
str = "one\nover\nyou"
puts "Match with ^"
str.gsub(/^o/) do |e|
p $~.offset(0)
e
end

puts "Match with \\A"
str.gsub(/\Ao/) do |e|
p $~.offset(0)
e
end


if defined?(JRUBY_VERSION)
require 'java'
regexp = java.util.regex.Pattern.compile("^o", java.util.regex.Pattern::MULTILINE)
matcher = regexp.matcher(str)
puts "Java match with ^"
while matcher.find()
p matcher
end

regexp = java.util.regex.Pattern.compile("\\Ao", java.util.regex.Pattern::MULTILINE)
matcher = regexp.matcher(str)
puts "Java match with \\A"
while matcher.find()
p matcher
end
end
So, what's the lesson here? Don't use caret (^) and dollar ($) if you actually want to match the beginning or the end of the string. Instead, use \A and \Z. That's what they're there for.