lördag, juli 07, 2007

ObjectSpace: to have or not to have

Among all the features of Ruby that JRuby supports, I would say that two things take the number one place as being really inconvenient. Threads are one; making the native threading of Java match the green threading semantics of Ruby is not fun, and it's not even possible for all edge cases. But that argument have been made several times by both me and Charles.

ObjectSpace now, that is another story. The problems with OS are many. But first, let's take a quick look at the most common usage of OS; iterating over classes:
ObjectSpace::each_object(Class) do |c|
p c if c < Test::Unit::TestCase
This code is totally obvious; we iterate over all instances of Class in the system, and print an inspected version of them if the class is a subclass of Test::Unit::TestCase.

Before we take a closer look at this example, let's talk quickly about how MRI and JRuby implements this functionality. In fact, having this functionality in MRI is dead easy. It's actually very simple, and there are no performance problems of having it when it's not used. The trick is that MRI just walks the heap when iterating over ObjectSpace. Since MRI can inspect the heap and stack without problems, this means that nothing special needs to be done to support this behavior. (Note that this can never be safe when using a real threading system).

So, the other side of the story: how does JRuby implement it? Well, JRuby can't inspect the heap of course. So we need to keep a WeakReference to each instance of RubyObject ever created in the system. This is gross. We pay a huge penalty for managing all this stuff. Many of the larger performance benefits we have found the last year have revolved around having internal objects be smarter and not put themselves into ObjectSpace until necessary. One of my latest optimizations of regexp matching was simple to make MatchData lazy, so it only goes into OS when someone actually uses it. RDoc runs about 40% faster when ObjectSpace is turned off for JRuby.

So, is it worth it? In real life, when do you need the functionality of ObjectSpace? I've seen two places that use it in code I use every day. First, Rails uses it to find generators, and secondly, Test::Unit uses it to find instances of TestCase. But the fun thing is this; the above code is almost exactly what they do; they iterate over all classes in the system and checking if they inherit from a specific base class. Isn't that a quite gross implementation? Shouldn't it be possible to do something better? Euhm, yes:
module SubclassTracking
def self.extended(klazz)
(class <<klazz; self; end).send :attr_accessor,
(class <<klazz; self; end).send :define_method,
:inherited do |clzz|
klazz.subclasses << clzz
klazz.subclasses = []

# Where Test::Unit::TestCase is defined
Test::Unit::TestCase.extend SubclassTracking

# Load all other classes

# To find all subclasses and test them:
I would say that this code solves the problem more elegantly and useful than ObjectSpace. There are no performance degradation due to it, and it will only effect subclasses of the class you are interested in. What's the best benefit of this? You can use the -O flag when running JRuby, and your tests and rest of the code will run much faster and use less memory.

As a sidenote: I'm putting together a patch based on this to both Test::Unit and Rails. ObjectSpace is unnecessary for real code and the vision of JRuby is that you will explicitly have to turn it on to use it, instead of the other way around.

Anyone have any real world examples of things you need to do with ObjectSpace?

13 kommentarer:

Anonym sa...

what if other features of ruby that allow for easy runtime inspection of metadata do not work well in say the .net runtime? are we going to rally to get those thrown out too?

lots of folks use objectspace to find leaks, and understand software that is running that they had no hand in authoring.

Anonym sa...

Well, in that case you could just enable ObjectSpace/feature X, right? As Ola says, the idea is to leave performance-heavy and not-necessarily-needed features off by default, and let them be there for you to enable when/if needed.

murphee sa...

I'm not entirely sure that the shown method is more elegant... you're basically polluting the subclasses with accessors and methods and change the behavior of the inherited method. But maybe that's a matter of taste.
Actually, one problem: wouldn't this approach create a memory leak if you used it for some design that requires a lot of classes? You're tracking the subclasses in a list, which means they'll be kept alive as long as the superclass. I'm not sure if this could be a (big) problem, but still... smacks of a leak.

However: the example with Rails using it to find Generators is interesting. I put me in mind of Smalltalk: in Smalltalk, you have all your system available in the Smalltalk image, and you can just iterate over all classes and objects. So if you want to find, say, all Generators you'd just query the classes and get their objects. Removing this functionality from Ruby just doesn't seem like a step in the right direction, just because some insufficient VMs (JVM, .NET) cause trouble.

I'd be fine with the switch to turn ObjectSpace and object tracking off for situations where it's not needed and when the performance is crucial.

Anonym sa...

"what if other features of ruby that allow for easy runtime inspection of metadata do not work well in say the .net runtime"

-> Well, actually nothing is running fine on .Net, so that would just be what you could expect from M$. My 2 cents.

Ola Bini sa...

So, in what am I polluting the namespace of subclasses? Except for the inherited method, all the accessors are on the singleton class of the base class. And really, that could be called an important functionality.

And further, talking about leaks is kinda interesting, since this implementation would keep alive all subclasses of test cases. It seems to me that _not_ keeping them alive would possibly allow some of the test classes to be collected before Test::Unit have had time to iterate over them, which is definitely a bug.

The comparison to Smalltalk isn't really fair; Smalltalk implementations doesn't use anything at all like ObjectSpace to allow walking existing objects. Since Smalltalk is a live object system you really can't compare them.

@anonymous one: as anonymous two said, I'm not talking about removing ObjectSpace, just make the default to be running without it. In fact, the two examples you describe are definitely things that don't need to have runtime support for it all the time.

Logan Capaldo sa...

As far as leaks goes, in most normal usage of Test::Unit, etc. the classes are not anonymous, which means they'll be around forever anyway. And if you are still worried about leaking anonymous classes, the solution is simple, use WeakRefs in your implementation of the inherited class tracking.

Tiago sa...

You talk about threads...

I am a Ruby newbie (*) and one thing I cannot stop thinking about is that the JRuby thread model (ie, the JVM one) is better than the Ruby one.

Is there really a need to downgrade to Ruby's model. Or putting it it another way: Couldn't compatibility be sacrificed here? And BTW, the model of Ruby 2.0, will it still be green threads? On multi-core architectures (read all recent machines) green threads suck in any case...

(*) I have some of my initial DSL experiences discussed here.

Robert Thau sa...

FWIW, there's already been some discussion of this in the post and comment thread over here, including a few other random uses of ObjectSpace.each_object (enumeration of I/O handles, the test suite for one Rails DB adapter which enumerates StatementHandles, etc.)

Matt sa...

"So, the other side of the story: how does JRuby implement it? Well, JRuby can't inspect the heap of course."

Ola, could you explain why JRuby can't inspect the heap?


Chris Carter sa...

ObjectSpace._id2ref powers DRuby, IIRC.

Josh Graham sa...

I do like your approach as it puts the information where you'd expect to look for it. I suppose, though, that the compiler and runtime of a language are going to have more intimate knowledge of objects than they otherwise normally provide.

For example, ObjectSpace may be needed to help support refactoring and completion in IDEs.

Is the penalty you mentioned more about space or more about latency?

If the objects are on the heap in the MRI is it so different that you manufacture your own heap in ObjectSpace? (By "you" I mean the JRuby authors, and in this case Anders, I suppose).

As the MRI ObjectSpace iteration is single-threaded, is it so different that your heap is a synchronized collection?

This may be a useless thought, or at least the non-determinism makes it of low/worse value, but could you look at having a part of ObjectSpace run in it's own thread, keeping a ThreadLocal collection as it's heap. Then the ObjectSpace.iterator runs in the thread of the caller, using a Future to obtain the next IRubyObject off the ObjectSpace heap?

I suppose this depends if the thread switching is more likely to be efficient than the critical section needed to return a temporary, stack-based collection of strong references.

With the separate thread / Future approach you can reduce the granularity of temporary strong-references to each next() of the iteration, so that objects lower on the heap may be finalized if so deemed by the GC. When you get to that part of the heap, your weak reference is invalid so you clear it and move to the next element on the heap.

I'm probably missing something, but it's interesting to think about. Maybe a Sydney GeekNight challenge...

James Moore sa...

Don't Ruby weak references depend on ObjectSpace - _id2ref/object_id? Pretty rare that this is actually used though.

Giles Bowkett sa...

Totally trivial use case: I use ObjectSpace in my IRB, to enable a simple "what was that class called again?" reminder method called grep_classes.

Bleak House used to use ObjectSpace but got rid of it because its profiling was totally inaccurate (iirc).