So, after Charlies awesome work yesterday (documented
here), I felt it was time I put the story straight on general JRuby performance. It's something of a mystery. I'll begin by showing you the results of lots of different benchmarks and comparison to MRI. All these benchmarks have been run on my MacBook Pro, dual 2.33Ghz cores with 2Gb memory. JRuby revision is 4578, Java is Apple 1.6 beta, and Ruby version is 1.8.6 (2007-03-13 patchlevel 0). JRuby was ran with -J-server and -O.
Now, lets begin with the YARV benchmarks. The first values is always MRI, the second i JRuby. I've made the benchmark
red when MRI is faster, and
green when JRuby is faster.
bm_app_answer.rb:
0.480000 0.010000 0.490000 ( 0.511641)
0.479000 0.000000 0.479000 ( 0.478000)
bm_app_factorial.rb:
ERROR stack level too deep
2.687000 0.000000 2.687000 ( 2.687000)
bm_app_fib.rb:
5.880000 0.030000 5.910000 ( 6.151324)
2.687000 0.000000 2.687000 ( 2.687000)
bm_app_mandelbrot.rb:
1.930000 0.010000 1.940000 ( 1.992984)
2.752000 0.000000 2.752000 ( 2.876000)
bm_app_pentomino.rb:
84.160000 0.450000 84.610000 ( 88.205519)
77.117000 0.000000 77.117000 ( 77.117000)
bm_app_raise.rb:
2.600000 0.430000 3.030000 ( 3.169156)
2.162000 0.000000 2.162000 ( 2.162000)
bm_app_strconcat.rb:
1.390000 0.010000 1.400000 ( 1.427766)
1.003000 0.000000 1.003000 ( 1.003000)
bm_app_tak.rb:
7.570000 0.060000 7.630000 ( 7.888381)
2.676000 0.000000 2.676000 ( 2.676000)
bm_app_tarai.rb:
6.020000 0.030000 6.050000 ( 6.186971)
2.236000 0.000000 2.236000 ( 2.236000)
bm_loop_times.rb:
4.240000 0.020000 4.260000 ( 4.404826)
3.354000 0.000000 3.354000 ( 3.354000)
bm_loop_whileloop.rb:
9.450000 0.050000 9.500000 ( 9.678552)
5.037000 0.000000 5.037000 ( 5.037000)
bm_loop_whileloop2.rb:
1.890000 0.010000 1.900000 ( 1.936502)
1.039000 0.000000 1.039000 ( 1.039000)
bm_so_ackermann.rb:
ERROR stack level too deep
4.928000 0.000000 4.928000 ( 4.927000)
bm_so_array.rb:
5.580000 0.020000 5.600000 ( 5.709101)
5.552000 0.000000 5.552000 ( 5.552000)
bm_so_concatenate.rb:
1.580000 0.040000 1.620000 ( 1.647592)
1.602000 0.000000 1.602000 ( 1.602000)
bm_so_exception.rb:
4.170000 0.390000 4.560000 ( 4.597234)
4.683000 0.000000 4.683000 ( 4.683000)
bm_so_lists.rb:
0.970000 0.030000 1.000000 ( 1.036678)
0.814000 0.000000 0.814000 ( 0.814000)
bm_so_matrix.rb:
1.700000 0.010000 1.710000 ( 1.765739)
1.878000 0.000000 1.878000 ( 1.879000)
bm_so_nested_loop.rb:
5.130000 0.020000 5.150000 ( 5.258066)
4.661000 0.000000 4.661000 ( 4.661000)
bm_so_object.rb:
5.480000 0.030000 5.510000 ( 5.615154)
3.095000 0.000000 3.095000 ( 3.095000)
bm_so_random.rb:
1.760000 0.010000 1.770000 ( 1.806116)
1.495000 0.000000 1.495000 ( 1.495000)
bm_so_sieve.rb:
0.680000 0.010000 0.690000 ( 0.705296)
0.853000 0.000000 0.853000 ( 0.853000)
bm_vm1_block.rb:
19.920000 0.110000 20.030000 ( 20.547236)
12.698000 0.000000 12.698000 ( 12.698000)
bm_vm1_const.rb:
15.720000 0.080000 15.800000 ( 16.426734)
7.654000 0.000000 7.654000 ( 7.654000)
bm_vm1_ensure.rb:
14.530000 0.070000 14.600000 ( 15.137106)
7.588000 0.000000 7.588000 ( 7.589000)
bm_vm1_length.rb:
17.230000 0.090000 17.320000 ( 20.406438)
6.415000 0.000000 6.415000 ( 6.416000)
bm_vm1_rescue.rb:
11.520000 0.040000 11.560000 ( 11.736435)
5.604000 0.000000 5.604000 ( 5.603000)
bm_vm1_simplereturn.rb:
17.560000 0.080000 17.640000 ( 18.178065)
5.413000 0.000000 5.413000 ( 5.413000)
bm_vm1_swap.rb:
22.160000 0.110000 22.270000 ( 22.698746)
6.836000 0.000000 6.836000 ( 6.835000)
bm_vm2_array.rb:
5.600000 0.020000 5.620000 ( 5.675354)
1.844000 0.000000 1.844000 ( 1.844000)
bm_vm2_method.rb:
9.800000 0.030000 9.830000 ( 9.918884)
5.152000 0.000000 5.152000 ( 5.151000)
bm_vm2_poly_method.rb:
13.570000 0.050000 13.620000 ( 13.803066)
10.289000 0.000000 10.289000 ( 10.289000)
bm_vm2_poly_method_ov.rb:
3.990000 0.010000 4.000000 ( 4.071277)
1.750000 0.000000 1.750000 ( 1.749000)
bm_vm2_proc.rb:
5.670000 0.020000 5.690000 ( 5.723124)
3.267000 0.000000 3.267000 ( 3.267000)
bm_vm2_regexp.rb:
0.380000 0.000000 0.380000 ( 0.387671)
0.961000 0.000000 0.961000 ( 0.961000)
bm_vm2_send.rb:
3.720000 0.010000 3.730000 ( 3.748266)
2.135000 0.000000 2.135000 ( 2.136000)
bm_vm2_super.rb:
4.100000 0.010000 4.110000 ( 4.138355)
1.781000 0.000000 1.781000 ( 1.781000)
bm_vm2_unif1.rb:
3.320000 0.010000 3.330000 ( 3.348069)
1.385000 0.000000 1.385000 ( 1.385000)
bm_vm2_zsuper.rb:
4.810000 0.020000 4.830000 ( 4.856368)
2.920000 0.000000 2.920000 ( 2.921000)
bm_vm3_thread_create_join.rb:
0.000000 0.000000 0.000000 ( 0.006621)
0.368000 0.000000 0.368000 ( 0.368000)
What is interesting about these numbers is that almost all of them are way faster, and the ones that are slower are so by a quite narrow margin (except for the regexp and thread tests).
So, this is the baseline against MRI. Let's take a look at a few other benchmarks. These can all be found in JRuby, in the test/bench directory. When you run them, all of them generate 5 or 10 runs of all measures, but I've simply taken the best one for each. The repetition is to allow Java to warm up. Here are a few different benchmarks, same convention and running parameters as above:
bench_fib_recursive.rb
----------------------
1.390000 0.000000 1.390000 ( 1.412710)
0.532000 0.000000 0.532000 ( 0.532000)
bench_method_dispatch.rb
------------------------
Control: 1m loops accessing a local variable 100 times
2.830000 0.010000 2.840000 ( 2.864822)
0.105000 0.000000 0.105000 ( 0.105000)
Test STI: 1m loops accessing a fixnum var and calling to_i 100 times
10.000000 0.030000 10.030000 ( 10.100846)
2.111000 0.000000 2.111000 ( 2.111000)
Test ruby method: 1m loops calling self's foo 100 times
16.130000 0.060000 16.190000 ( 16.359876)
7.971000 0.000000 7.971000 ( 7.971000)
bench_method_dispatch_only.rb
-----------------------------
Test ruby method: 100k loops calling self's foo 100 times
1.570000 0.000000 1.570000 ( 1.588715)
0.587000 0.000000 0.587000 ( 0.587000)
bench_block_invocation.rb
-------------------------
1m loops yielding a fixnum 10 times to a block that just retrieves dvar
2.800000 0.010000 2.810000 ( 2.822425)
1.194000 0.000000 1.194000 ( 1.194000)
1m loops yielding two fixnums 10 times to block accessing one
6.550000 0.030000 6.580000 ( 6.623452)
2.064000 0.000000 2.064000 ( 2.064000)
1m loops yielding three fixnums 10 times to block accessing one
7.390000 0.020000 7.410000 ( 7.467841)
2.120000 0.000000 2.120000 ( 2.120000)
1m loops yielding three fixnums 10 times to block splatting and accessing them
9.250000 0.040000 9.290000 ( 9.339131)
2.451000 0.000000 2.451000 ( 2.451000)
1m loops yielding a fixnums 10 times to block with just a fixnum (no vars)
1.890000 0.000000 1.890000 ( 1.908501)
1.278000 0.000000 1.278000 ( 1.277000)
1m loops calling a method with a fixnum that just returns it
2.740000 0.010000 2.750000 ( 2.766255)
1.426000 0.000000 1.426000 ( 1.426000)
bench_string_ops.rb
----
Measure string array sort time
5.950000 0.060000 6.010000 ( 6.055483)
8.061000 0.000000 8.061000 ( 8.061000)
Measure hash put time
0.390000 0.010000 0.400000 ( 0.398593)
0.208000 0.000000 0.208000 ( 0.209000)
Measure hash get time (note: not same scale as put test)
1.620000 0.000000 1.620000 ( 1.646155)
0.740000 0.000000 0.740000 ( 0.740000)
Measure string == comparison time
2.340000 0.010000 2.350000 ( 2.368579)
0.812000 0.000000 0.812000 ( 0.812000)
Measure string == comparison time, different last pos
2.690000 0.000000 2.690000 ( 2.724772)
0.860000 0.000000 0.860000 ( 0.860000)
Measure string <=> comparison time
2.340000 0.010000 2.350000 ( 2.369915)
0.824000 0.000000 0.824000 ( 0.824000)
Measure 'string'.index(fixnum) time
0.790000 0.010000 0.800000 ( 0.808189)
1.113000 0.000000 1.113000 ( 1.113000)
Measure 'string'.index(str) time
2.860000 0.010000 2.870000 ( 2.892730)
0.956000 0.000000 0.956000 ( 0.956000)
Measure 'string'.rindex(fixnum) time
0.800000 0.000000 0.800000 ( 0.817300)
0.631000 0.000000 0.631000 ( 0.631000)
Measure 'string'.rindex(str) time
12.190000 0.040000 12.230000 ( 12.310492)
1.247000 0.000000 1.247000 ( 1.247000)
bench_ivar_access.rb
----
100k * 100 ivar gets, 1 ivar
0.500000 0.000000 0.500000 ( 0.582682)
0.340000 0.000000 0.340000 ( 0.340000)
100k * 100 ivar sets, 1 ivar
0.700000 0.010000 0.710000 ( 0.816724)
0.402000 0.000000 0.402000 ( 0.401000)
100k * 100 attr gets, 1 ivar
0.970000 0.000000 0.970000 ( 0.988212)
0.875000 0.000000 0.875000 ( 0.874000)
100k * 100 attr sets, 1 ivar
1.390000 0.010000 1.400000 ( 1.406535)
1.114000 0.000000 1.114000 ( 1.114000)
100k * 100 ivar gets, 2 ivars
0.490000 0.000000 0.490000 ( 0.506206)
0.344000 0.000000 0.344000 ( 0.344000)
100k * 100 ivar sets, 2 ivars
0.680000 0.000000 0.680000 ( 0.693064)
0.388000 0.000000 0.388000 ( 0.388000)
100k * 100 attr gets, 2 ivars
0.970000 0.000000 0.970000 ( 0.989313)
0.878000 0.000000 0.878000 ( 0.878000)
100k * 100 attr sets, 2 ivars
1.400000 0.000000 1.400000 ( 1.434206)
1.129000 0.000000 1.129000 ( 1.128000)
100k * 100 ivar gets, 4 ivars
0.490000 0.000000 0.490000 ( 0.502097)
0.340000 0.000000 0.340000 ( 0.340000)
100k * 100 ivar sets, 4 ivars
0.690000 0.000000 0.690000 ( 0.696852)
0.389000 0.000000 0.389000 ( 0.389000)
100k * 100 attr gets, 4 ivars
0.970000 0.010000 0.980000 ( 0.986163)
0.872000 0.000000 0.872000 ( 0.872000)
100k * 100 attr sets, 4 ivars
1.370000 0.010000 1.380000 ( 1.394921)
1.128000 0.000000 1.128000 ( 1.128000)
100k * 100 ivar gets, 8 ivars
0.500000 0.000000 0.500000 ( 0.519511)
0.344000 0.000000 0.344000 ( 0.344000)
100k * 100 ivar sets, 8 ivars
0.690000 0.000000 0.690000 ( 0.710896)
0.389000 0.000000 0.389000 ( 0.389000)
100k * 100 attr gets, 8 ivars
0.970000 0.000000 0.970000 ( 0.987582)
0.870000 0.000000 0.870000 ( 0.870000)
100k * 100 attr sets, 8 ivars
1.380000 0.000000 1.380000 ( 1.400542)
1.132000 0.000000 1.132000 ( 1.132000)
100k * 100 ivar gets, 16 ivars
0.500000 0.000000 0.500000 ( 0.523690)
0.342000 0.000000 0.342000 ( 0.343000)
100k * 100 ivar sets, 16 ivars
0.680000 0.000000 0.680000 ( 0.707385)
0.391000 0.000000 0.391000 ( 0.391000)
100k * 100 attr gets, 16 ivars
0.970000 0.010000 0.980000 ( 1.017880)
0.879000 0.000000 0.879000 ( 0.879000)
100k * 100 attr sets, 16 ivars
1.370000 0.010000 1.380000 ( 1.387713)
1.128000 0.000000 1.128000 ( 1.128000)
bench_for_loop.rb
----
100k calls to a method containing 5x a for loop over a 10-element range
0.890000 0.000000 0.890000 ( 0.917563)
0.654000 0.000000 0.654000 ( 0.654000)
All of this is really good, of course. Lots of green. And the red parts are not that bad either.
But here is the mystery. General Rails performance sucks. This test case can be ran by checking out petstore from tw-commons at RubyForge. There is a file called script/console_bench that will run the benchmarks. The two commands ran was: ruby script/console_bench production, and jruby -J-server -O script/console_bench ar_jdbc. With further ado, here are the numbers:
controller : 1.000000 0.070000 1.070000 ( 1.472701)
controller : 2.144000 0.000000 2.144000 ( 2.144000)
view : 4.470000 0.160000 4.630000 ( 4.794336)
view : 6.335000 0.000000 6.335000 ( 6.335000)
full action: 8.300000 0.400000 8.700000 ( 9.597351)
full action: 16.055000 0.000000 16.055000 ( 16.055000)
These numbers plainly stink. And we can't find the reason for it. As you can see, most benchmarks are much better, and there is nothing we can think of that would add up to a general degradation like this. It can't be IO since these tests use app.get directly. So what is it? There is very likely to be one or two bottlenecks causing this problem, and when we have found it, Rails will most likely blaze along. But profiling haven't helped yet. We found a few things, but there is still stuff missing. It's an interesting problem.
My current thesis is that either symbols or regexps are responsible. I'll spend the day checking that.