Ola Bini: Programming Language Synchronicity: Mystery: An exposé on JRuby performance

So, after Charlies awesome work yesterday (documented here), I felt it was time I put the story straight on general JRuby performance. It's something of a mystery. I'll begin by showing you the results of lots of different benchmarks and comparison to MRI. All these benchmarks have been run on my MacBook Pro, dual 2.33Ghz cores with 2Gb memory. JRuby revision is 4578, Java is Apple 1.6 beta, and Ruby version is 1.8.6 (2007-03-13 patchlevel 0). JRuby was ran with -J-server and -O.

Now, lets begin with the YARV benchmarks. The first values is always MRI, the second i JRuby. I've made the benchmark red when MRI is faster, and green when JRuby is faster.

bm_app_answer.rb:
0.480000   0.010000   0.490000 (  0.511641)
0.479000   0.000000   0.479000 (  0.478000)
bm_app_factorial.rb:
ERROR stack level too deep
2.687000   0.000000   2.687000 (  2.687000)
bm_app_fib.rb:
5.880000   0.030000   5.910000 (  6.151324)
2.687000   0.000000   2.687000 (  2.687000)
bm_app_mandelbrot.rb:
1.930000   0.010000   1.940000 (  1.992984)
2.752000   0.000000   2.752000 (  2.876000)
bm_app_pentomino.rb:
84.160000   0.450000  84.610000 ( 88.205519)
77.117000   0.000000  77.117000 ( 77.117000)
bm_app_raise.rb:
2.600000   0.430000   3.030000 (  3.169156)
2.162000   0.000000   2.162000 (  2.162000)
bm_app_strconcat.rb:
1.390000   0.010000   1.400000 (  1.427766)
1.003000   0.000000   1.003000 (  1.003000)
bm_app_tak.rb:
7.570000   0.060000   7.630000 (  7.888381)
2.676000   0.000000   2.676000 (  2.676000)
bm_app_tarai.rb:
6.020000   0.030000   6.050000 (  6.186971)
2.236000   0.000000   2.236000 (  2.236000)
bm_loop_times.rb:
4.240000   0.020000   4.260000 (  4.404826)
3.354000   0.000000   3.354000 (  3.354000)
bm_loop_whileloop.rb:
9.450000   0.050000   9.500000 (  9.678552)
5.037000   0.000000   5.037000 (  5.037000)
bm_loop_whileloop2.rb:
1.890000   0.010000   1.900000 (  1.936502)
1.039000   0.000000   1.039000 (  1.039000)
bm_so_ackermann.rb:
ERROR stack level too deep
4.928000   0.000000   4.928000 (  4.927000)
bm_so_array.rb:
5.580000   0.020000   5.600000 (  5.709101)
5.552000   0.000000   5.552000 (  5.552000)
bm_so_concatenate.rb:
1.580000   0.040000   1.620000 (  1.647592)
1.602000   0.000000   1.602000 (  1.602000)
bm_so_exception.rb:
4.170000   0.390000   4.560000 (  4.597234)
4.683000   0.000000   4.683000 (  4.683000)
bm_so_lists.rb:
0.970000   0.030000   1.000000 (  1.036678)
0.814000   0.000000   0.814000 (  0.814000)
bm_so_matrix.rb:
1.700000   0.010000   1.710000 (  1.765739)
1.878000   0.000000   1.878000 (  1.879000)
bm_so_nested_loop.rb:
5.130000   0.020000   5.150000 (  5.258066)
4.661000   0.000000   4.661000 (  4.661000)
bm_so_object.rb:
5.480000   0.030000   5.510000 (  5.615154)
3.095000   0.000000   3.095000 (  3.095000)
bm_so_random.rb:
1.760000   0.010000   1.770000 (  1.806116)
1.495000   0.000000   1.495000 (  1.495000)
bm_so_sieve.rb:
0.680000   0.010000   0.690000 (  0.705296)
0.853000   0.000000   0.853000 (  0.853000)
bm_vm1_block.rb:
19.920000   0.110000  20.030000 ( 20.547236)
12.698000   0.000000  12.698000 ( 12.698000)
bm_vm1_const.rb:
15.720000   0.080000  15.800000 ( 16.426734)
7.654000   0.000000   7.654000 (  7.654000)
bm_vm1_ensure.rb:
14.530000   0.070000  14.600000 ( 15.137106)
7.588000   0.000000   7.588000 (  7.589000)
bm_vm1_length.rb:
17.230000   0.090000  17.320000 ( 20.406438)
6.415000   0.000000   6.415000 (  6.416000)
bm_vm1_rescue.rb:
11.520000   0.040000  11.560000 ( 11.736435)
5.604000   0.000000   5.604000 (  5.603000)
bm_vm1_simplereturn.rb:
17.560000   0.080000  17.640000 ( 18.178065)
5.413000   0.000000   5.413000 (  5.413000)
bm_vm1_swap.rb:
22.160000   0.110000  22.270000 ( 22.698746)
6.836000   0.000000   6.836000 (  6.835000)
bm_vm2_array.rb:
5.600000   0.020000   5.620000 (  5.675354)
1.844000   0.000000   1.844000 (  1.844000)
bm_vm2_method.rb:
9.800000   0.030000   9.830000 (  9.918884)
5.152000   0.000000   5.152000 (  5.151000)
bm_vm2_poly_method.rb:
13.570000   0.050000  13.620000 ( 13.803066)
10.289000   0.000000  10.289000 ( 10.289000)
bm_vm2_poly_method_ov.rb:
3.990000   0.010000   4.000000 (  4.071277)
1.750000   0.000000   1.750000 (  1.749000)
bm_vm2_proc.rb:
5.670000   0.020000   5.690000 (  5.723124)
3.267000   0.000000   3.267000 (  3.267000)
bm_vm2_regexp.rb:
0.380000   0.000000   0.380000 (  0.387671)
0.961000   0.000000   0.961000 (  0.961000)
bm_vm2_send.rb:
3.720000   0.010000   3.730000 (  3.748266)
2.135000   0.000000   2.135000 (  2.136000)
bm_vm2_super.rb:
4.100000   0.010000   4.110000 (  4.138355)
1.781000   0.000000   1.781000 (  1.781000)
bm_vm2_unif1.rb:
3.320000   0.010000   3.330000 (  3.348069)
1.385000   0.000000   1.385000 (  1.385000)
bm_vm2_zsuper.rb:
4.810000   0.020000   4.830000 (  4.856368)
2.920000   0.000000   2.920000 (  2.921000)
bm_vm3_thread_create_join.rb:
0.000000   0.000000   0.000000 (  0.006621)
0.368000   0.000000   0.368000 (  0.368000)

What is interesting about these numbers is that almost all of them are way faster, and the ones that are slower are so by a quite narrow margin (except for the regexp and thread tests).

So, this is the baseline against MRI. Let's take a look at a few other benchmarks. These can all be found in JRuby, in the test/bench directory. When you run them, all of them generate 5 or 10 runs of all measures, but I've simply taken the best one for each. The repetition is to allow Java to warm up. Here are a few different benchmarks, same convention and running parameters as above:

bench_fib_recursive.rb
----------------------
  1.390000   0.000000   1.390000 (  1.412710)
0.532000   0.000000   0.532000 (  0.532000)

bench_method_dispatch.rb
------------------------
Control: 1m loops accessing a local variable 100 times
2.830000   0.010000   2.840000 (  2.864822)
0.105000   0.000000   0.105000 (  0.105000)
Test STI: 1m loops accessing a fixnum var and calling to_i 100 times
10.000000   0.030000  10.030000 ( 10.100846)
2.111000   0.000000   2.111000 (  2.111000)
Test ruby method: 1m loops calling self's foo 100 times
16.130000   0.060000  16.190000 ( 16.359876)
7.971000   0.000000   7.971000 (  7.971000)

bench_method_dispatch_only.rb
-----------------------------
Test ruby method: 100k loops calling self's foo 100 times
1.570000   0.000000   1.570000 (  1.588715)
0.587000   0.000000   0.587000 (  0.587000)

bench_block_invocation.rb
-------------------------
1m loops yielding a fixnum 10 times to a block that just retrieves dvar
2.800000   0.010000   2.810000 (  2.822425)
1.194000   0.000000   1.194000 (  1.194000)
1m loops yielding two fixnums 10 times to block accessing one
6.550000   0.030000   6.580000 (  6.623452)
2.064000   0.000000   2.064000 (  2.064000)
1m loops yielding three fixnums 10 times to block accessing one
7.390000   0.020000   7.410000 (  7.467841)
2.120000   0.000000   2.120000 (  2.120000)
1m loops yielding three fixnums 10 times to block splatting and accessing them
9.250000   0.040000   9.290000 (  9.339131)
2.451000   0.000000   2.451000 (  2.451000)
1m loops yielding a fixnums 10 times to block with just a fixnum (no vars)
1.890000   0.000000   1.890000 (  1.908501)
1.278000   0.000000   1.278000 (  1.277000)
1m loops calling a method with a fixnum that just returns it
2.740000   0.010000   2.750000 (  2.766255)
1.426000   0.000000   1.426000 (  1.426000)

bench_string_ops.rb
----
Measure string array sort time
5.950000   0.060000   6.010000 (  6.055483)
8.061000   0.000000   8.061000 (  8.061000)
Measure hash put time
0.390000   0.010000   0.400000 (  0.398593)
0.208000   0.000000   0.208000 (  0.209000)
Measure hash get time (note: not same scale as put test)
1.620000   0.000000   1.620000 (  1.646155)
0.740000   0.000000   0.740000 (  0.740000)
Measure string == comparison time
2.340000   0.010000   2.350000 (  2.368579)
0.812000   0.000000   0.812000 (  0.812000)
Measure string == comparison time, different last pos
2.690000   0.000000   2.690000 (  2.724772)
0.860000   0.000000   0.860000 (  0.860000)
Measure string <=> comparison time
2.340000   0.010000   2.350000 (  2.369915)
0.824000   0.000000   0.824000 (  0.824000)
Measure 'string'.index(fixnum) time
0.790000   0.010000   0.800000 (  0.808189)
1.113000   0.000000   1.113000 (  1.113000)
Measure 'string'.index(str) time
2.860000   0.010000   2.870000 (  2.892730)
0.956000   0.000000   0.956000 (  0.956000)
Measure 'string'.rindex(fixnum) time
0.800000   0.000000   0.800000 (  0.817300)
0.631000   0.000000   0.631000 (  0.631000)
Measure 'string'.rindex(str) time
12.190000   0.040000  12.230000 ( 12.310492)
1.247000   0.000000   1.247000 (  1.247000)

bench_ivar_access.rb
----
100k * 100 ivar gets, 1 ivar
0.500000   0.000000   0.500000 (  0.582682)
0.340000   0.000000   0.340000 (  0.340000)
100k * 100 ivar sets, 1 ivar
0.700000   0.010000   0.710000 (  0.816724)
0.402000   0.000000   0.402000 (  0.401000)
100k * 100 attr gets, 1 ivar
0.970000   0.000000   0.970000 (  0.988212)
0.875000   0.000000   0.875000 (  0.874000)
100k * 100 attr sets, 1 ivar
1.390000   0.010000   1.400000 (  1.406535)
1.114000   0.000000   1.114000 (  1.114000)
100k * 100 ivar gets, 2 ivars
0.490000   0.000000   0.490000 (  0.506206)
0.344000   0.000000   0.344000 (  0.344000)
100k * 100 ivar sets, 2 ivars
0.680000   0.000000   0.680000 (  0.693064)
0.388000   0.000000   0.388000 (  0.388000)
100k * 100 attr gets, 2 ivars
0.970000   0.000000   0.970000 (  0.989313)
0.878000   0.000000   0.878000 (  0.878000)
100k * 100 attr sets, 2 ivars
1.400000   0.000000   1.400000 (  1.434206)
1.129000   0.000000   1.129000 (  1.128000)
100k * 100 ivar gets, 4 ivars
0.490000   0.000000   0.490000 (  0.502097)
0.340000   0.000000   0.340000 (  0.340000)
100k * 100 ivar sets, 4 ivars
0.690000   0.000000   0.690000 (  0.696852)
0.389000   0.000000   0.389000 (  0.389000)
100k * 100 attr gets, 4 ivars
0.970000   0.010000   0.980000 (  0.986163)
0.872000   0.000000   0.872000 (  0.872000)
100k * 100 attr sets, 4 ivars
1.370000   0.010000   1.380000 (  1.394921)
1.128000   0.000000   1.128000 (  1.128000)
100k * 100 ivar gets, 8 ivars
0.500000   0.000000   0.500000 (  0.519511)
0.344000   0.000000   0.344000 (  0.344000)
100k * 100 ivar sets, 8 ivars
0.690000   0.000000   0.690000 (  0.710896)
0.389000   0.000000   0.389000 (  0.389000)
100k * 100 attr gets, 8 ivars
0.970000   0.000000   0.970000 (  0.987582)
0.870000   0.000000   0.870000 (  0.870000)
100k * 100 attr sets, 8 ivars
1.380000   0.000000   1.380000 (  1.400542)
1.132000   0.000000   1.132000 (  1.132000)
100k * 100 ivar gets, 16 ivars
0.500000   0.000000   0.500000 (  0.523690)
0.342000   0.000000   0.342000 (  0.343000)
100k * 100 ivar sets, 16 ivars
0.680000   0.000000   0.680000 (  0.707385)
0.391000   0.000000   0.391000 (  0.391000)
100k * 100 attr gets, 16 ivars
0.970000   0.010000   0.980000 (  1.017880)
0.879000   0.000000   0.879000 (  0.879000)
100k * 100 attr sets, 16 ivars
1.370000   0.010000   1.380000 (  1.387713)
1.128000   0.000000   1.128000 (  1.128000)

bench_for_loop.rb
----
100k calls to a method containing 5x a for loop over a 10-element range
0.890000   0.000000   0.890000 (  0.917563)
0.654000   0.000000   0.654000 (  0.654000)

All of this is really good, of course. Lots of green. And the red parts are not that bad either.

But here is the mystery. General Rails performance sucks. This test case can be ran by checking out petstore from tw-commons at RubyForge. There is a file called script/console_bench that will run the benchmarks. The two commands ran was: ruby script/console_bench production, and jruby -J-server -O script/console_bench ar_jdbc. With further ado, here are the numbers:

controller :   1.000000   0.070000   1.070000 (  1.472701)
controller :   2.144000   0.000000   2.144000 (  2.144000)
view       :   4.470000   0.160000   4.630000 (  4.794336)
view       :   6.335000   0.000000   6.335000 (  6.335000)
full action:   8.300000   0.400000   8.700000 (  9.597351)
full action:  16.055000   0.000000  16.055000 ( 16.055000)

These numbers plainly stink. And we can't find the reason for it. As you can see, most benchmarks are much better, and there is nothing we can think of that would add up to a general degradation like this. It can't be IO since these tests use app.get directly. So what is it? There is very likely to be one or two bottlenecks causing this problem, and when we have found it, Rails will most likely blaze along. But profiling haven't helped yet. We found a few things, but there is still stuff missing. It's an interesting problem.

My current thesis is that either symbols or regexps are responsible. I'll spend the day checking that.

8 kommentarer:

Anonym sa...: If I remember correctly from last time I looked at it, Rails's routing code leans very heavily on regular expressions for route recognition. If I had to put the blame somewhere for poor rails performance in JRuby, that would be the first place I'd look.; 1:14:00 em
Anonym sa...: Ugh... so much for us red/green color blind folks.

That thin colored text on the white background is hard to pick apart. :(; 3:45:00 em
Anonym sa...: Hey, thanks for the benchmark numbers, wouldn't it be easier to just tag each line like so:

MRI: 0000.000 0000.0000 0000.0000
JRB: 1111.111 3333.3333 9999.9999; 7:00:00 em
Adam Sanderson sa...: Any chance it's the rails class reloading? I would assume it is disabled, but I could see that having fairly different performance characteristics between MRI and JRuby.; 7:17:00 em
Okke sa...: It sounds a bit like "Look, all my unit tests succeed so my application must work but what the ****, it crashes??"

Two possibilities imho:

1. Granularity of the benchmarks is not representative for a real life rails case.
2. Unforeseen behavior in rails that should have been benchmarked but did not made into the benchmarks

Its probably a combination. Rails simply stretches ruby to every edge it can find.

Finally, it is likely rails has been tested for performance on the MRI, not on JRuby.

Nevertheless, great figures. And given the JRuby progress that have been made lately, I bet it's nothing but a matter of time and you can show green benchmark results for Rails also.; 11:20:00 em
Anonym sa...: Route regognition does depend heavily on regular expressions, but as it only happens once a request it's unlikely to be a huge bottleneck.

You're running in production mode so the class-loading should be one-off.

Your benchmark rig seems quite a bit harder than the once I've been using(http://p.caboo.se/106832) not sure if it's worth trying that.

Grab me on IRC if you find anything interesting, always keen to apply benchmark driven patches :); 9:49:00 fm
joevandyk sa...: Have you looked into using dtrace to profile what's happening?; 9:59:00 em
Anonym sa...: http://joyeur.com/2007/05/07/dtrace-for-ruby-is-available; 2:54:00 em

Skicka en kommentar

Ola Bini: Programming Language Synchronicity

fredag, oktober 12, 2007

Mystery: An exposé on JRuby performance

8 kommentarer:

About me

People

Links

Bloggarkiv

Etiketter