torsdag, augusti 02, 2007

The pain of compiling try-catch

I've been spending some time trying to implement a compiler for the defined?-feature of Ruby. If you haven't seen it, be happy. It's quite annoying, and incredibly complicated to implement, since you basically need to create a small interpreter especially just for nodes existing within defined?. So why is defined? so important? Well, for one it's actually needed to implement the construct ||= correctly. And that is used everywhere, which means that not compiling it will severely impact our ability to compile code. Also, it just so happens that OpAsgnOrNode (as it's called), and EnsureNode, are the two nodes left to implement to be able to compile Test::Unit assert-methods, since the internal _wrap_assertion uses both ensure and ||=.

So, now you know why. Next, a quick intro to the compilation strategy of JRuby. Basically we try to compile each script and each method into one Java method. We try to use the stack as much as possible, since we in that way can link statements together correctly. And that's about it.

The problem enters when you need to handle exceptions in the emitted Java bytecode. This isn't a problem in the interpreter, since we explicitly return a value for each node, and the interpreter doesn't use the Java stack as much as the compiler does. We also want to be able to use finally blocks at places, especially to ensure that ensure can be compiled down, but also to make the implementation of defined? safe.

So what's the problem? Can't we just emit the catch-table and so on correctly? Well, yes, we can do that. But it doesn't work. Because of a very annoying feature of the JVM. Namely, when a catch-block is entered, the stack gets blown away. Completely. So if the Ruby code is in the middle of a long chained statement, everything will disappear. And what's worse, this will actually fail to load with a Verifier exception, saying "inconsistent stack height", since there will now be one code path with things on the stack, and one code path with no values on the stack, and the way JRuby works, these will end up at the same point later on. And the JVM doesn't allow that either.

This makes it incredibly hard to handle these constructs in bytecode, and frankly, right now I have no idea how to do it. My first approach was to actually create a new method for each try-catch or try-finally, and just have the code in there instead. The fine thing about that is that the surrounding stack will not be blown away since it's part of the invoking method, and not in the current activation frame. And that approach actually works fairly well. Until you want to refer to values from outside from the try or catch block. Then it breaks down.

So, right now I don't know what to do. We have no way of knowing at any specific place how low the stack is, so it's not possible to copy it somewhere, and then restore it in the catch block. That would be totally inefficient too. In fact, I have no idea how other implementations handle this. There's gotta be a trick to it.

5 kommentarer:

Unknown sa...

Well, maybe you can parameter the interpreter with a flag of execution.
If sets to false, it's just resolved without any possible execution.

If you can avoid a try ... catch block, it would be better for perfs, anyway.


My 2 cents,

xue.yong.zhi sa...

Here is what I did in xruby:

1. Try to make stack depth always equal to 1. If the value on stack is no longer needed, just pop it.

2. If stack depth has to be more than 1 and exception may throw, keep the type information of objects on stack somewhere. And before begin..end, save them to local variables, and after begin..end, restore them from local variable.

Anonym sa...

A (possibly naive) thought; if you're compiling blocks as methods, you may be able to use some of the same machinery.

The idea would be that

try foo; bar
rescue; zot

could be compiled as if the Ruby looked like:

_caught_exception = nil
_trampoline_func {
try foo; bar
rescue _caught_exception = $!
}
if !_caught_exception.nil?
zot
end

where _trampoline_func is just

def _trampoline_func; yield; end

The code in the block ("foo; bar") would have access to the local variables of the method that it's declared in, and the rescue clause in the block is going to immediately return (from the block) anyway, so (the hope is that) it doesn't matter that the stack got trashed.

But I am making some dangerous assumptions here --- that you are implementing blocks more or less like this, and that there is no subtle semantic distinction between code in the body of a 'try', and code in a block...

Eugene Kuleshov sa...

Ola, can you please give a simplified example of the Java bytecode you want to generate? I think I had similar issue at some point, but can't recall all the details now...

Daniel Garcia sa...

Why not use local variables instead of the stack? you have as many local variable slots available to you as you have stack locations. The stack in the jvm isn't really intended to be a long term storage for things, that's what the local variable table is for.

As for the person making a comment about avoiding try/catch blocks for performance - try/catch has no overhead in java until an exception gets thrown - and even then, the overhead is minimal - since it's pretty much just a local goto.