måndag, september 24, 2007

Can bytecodes perform well?

I really need to write an answer to a comment that appeared on my post about Rubinius by Kragen Sitaker. It has two points in it, and I really want to address one of them at some length. So here goes. The comment read like this:

"It is byte code based. This means it's easier to handle performance."

If by "handle" you mean "not have".

"garbage collection/object memory can be switched out to use another algorithm"

In your dreams. I suspect you'll find out that the architecture isn't as clean as you think it is when you try to plug in some other existing garbage collector or object memory.

It sounds like a good project, but I don't think its advantages are going to be so enormous as you think they are.

Woha. That's hard to argue with. Isn't it? Or maybe not. Let's being with the second point. Is Rubinius architecture clean and decoupled enough to allow swapping of GC and object memory? My answer is a resounding yes. But of course you can go and look for yourself. Of course, it's possible to make a system where you can decouple the GC. Sun's standard JVM does this, for example. But the interesting point isn't whether Rubinius' hooks for this is clean enough, but if they are cleaner than MRI's. If you have ever tried to switch out the GC in MRI, you know that Rubinius beats MRI hands down in this regard. If not, you can ask Laurent Sansonetti at Apple, who have actually done most of the work to switch out the GC in MRI if that was a fun experience.

Let's see, what's next? Oh yeah. Bytecodes are always slow. No argument here. C++ will always beat a bytecode based engine. And C will almost always beat C++. And assembler (in the right hands) will usually beat C. But wait... Isn't Java bytecode based? And doesn't Java 6 perform on the same level as C in many cases, and in some cases performing better than C? Well, yes, it does... And wasn't Smalltalk always based on bytecodes? Most Smalltalk engines performed very well. Why is MRI switching to bytecodes for 1.9? And why has Python always been bytecode based? And why is the CLR bytecode based? Why was even Pascal using bytecodes back in the day? (OK, that is cheating... Pascal used bytecodes for portability, not performance, but it still worked well in that aspect too). Erlang is bytecode based.

Basically, there are some cases where a static language will benefit from straight compilation down to hardware machine codes. OCaML is a typical example of such a language. Due to the extremely stringent type requirements of the language, the emitted code is usually faster than C. But that is the exception, and only works for bondage-tightly typed languages. When talking dynamic languages, bytecodes is the real path to good performance. Granted, a naive implementation of a bytecode engine will not perform well. But that is true for a compiler too. The difference is that the part interpreting bytecodes is usually a quite small part of the whole runtime system, and it can be switched out for better performance, piecemal or all together.

There are other reasons. For example, statically typed languages like Java and the CLR family of languages use bytecodes because it gives the runtime system the opportunity to dynamically change the machine code running based on statistics and criteria found out during runtime. This means that your application will actually have better performance in the parts where it counts, and the parts that are not heavily used will not be optimized. (That's what HotSpot does, for example). This is not possible in a clean compilation to machine code. Java would never have had the performance it has now if it weren't for the bytecodes.

So please, stop spreading this myth. It is NOT true and it has NEVER been true.

8 kommentarer:

Anonym sa...

minor nit: Ocaml isn't usually faster than C. It is occasionally faster than C, and this is as much a result of the compilation strategy and runtime representation of things as it is the type system.

Anonym sa...

Of course, it's possible to make a system where you can decouple the GC. Sun's standard JVM does this, for example.

I'm not saying it's impossible to do, of course. I'm saying:

1. The architecture isn't as clean as you think it is --- that is, the architecture will probably require some changes when you try to change the garbage collector in a significant way. This is just a variant of the observation that the only really portable code is code that's actually been ported.

2. When you try to plug in some other existing garbage collector or object memory. I've never seen a pluggable GC system that was sufficiently abstract that you could take some random GC off the shelf and plug it in with a small amount of work; typically adapting an existing GC to the system, when it's even possible, is more work than writing a simple GC.

It's just that the properties of the GC and object memory are deeply coupled with the rest of the system, including programs written in it. For example, it's pretty much impossible to put a compacting GC into a system written for a nonmoving GC; if your code uses "become:" to any significant extent, the kinds of object memory you can use are quite restricted (Squeak gets away by just not using "become:" much); write barriers need to be integrated into code generation in order to be reasonably efficient (and they are necessary if you want to have a generational or incremental gc); and finalization is always a mess, in part because the guarantees you can provide about finalization always depend deeply on the approach taken by your garbage collector.

I hope this is a sufficient explanation of my GC/OM remark.

Anonym sa...

Oh, I should say that you may well be right that Rubinius gives you more flexibility in garbage-collection approaches than MRI; I'm not familiar with the internals of either one of those. I'm just saying that I don't expect that to be the enormous advantage that you seemed to think it was.

Ola Bini sa...

@kragen: fair enough. I know for a fact that Evan has gone through two totally different GC implementations, switching them out in a few hours at most. so my comments about the easy of this was based on actual results.

Unknown sa...

For example, [...] languages use bytecodes because it gives the runtime system the opportunity to dynamically change the machine code running based on statistics and criteria found out during runtime. [...] This is not possible in a clean compilation to machine code.

Nothing's impossible. See Dynamo (tech report).

Jengu sa...

I have to disagree with the byte code bit. "Isn't Java as fast as C in some cases?" Well yes, when the JIT kicks in and compiles the code so it's not byte code anymore :P

Using Java is also a bit of an apples and oranges comparison because Java could be faster than C in some areas because it has less unsafe language features, i.e., it is more restrictive (no direct pointer arithmetic means less aliasing issues). Ruby/Python on the other hand are pretty damn open to anything getting changed at anytime by any part of the code, which means writing optimizations will be harder. Heterogenous lists of variable size are also a slow data structure, but they're the bread and butter of Ruby/Python coding.

BTW, Ocaml supports byte code compiling as well as native compilation. Its main problem seems to be that it's floating point performance sucks compared to C, but I think that's a compiler issue.

Anonym sa...

The JVM is only fast when it has JIT compiled the bytecode... and then it's not running bytecode anymore, it's running native machine code. Performance isn't the main reason for any language to use bytecode, at least not for any lisps or MLs that I'm aware of.

Anonym sa...

have you had a chance to look at the tamarin vm that is being integrated by firefox team?