Ola Bini: Programming Language Synchronicity: december 2007

måndag, december 31, 2007

Scala unit testing

I wish this could be a happy story. But it's really not. If I have made any factual errors in this little rant, don't hesitate to correct me - I would love to be proved wrong about this.

Actually, I wrote this introduction before I went out to celebrate New Years. Now I'm back to finish the story, and the picture have changed a bit. Not enough yet, but we'll see.

Let's tell it from the start. I have this project I've just started working on. It seemed like a fun and quite large thing that I can tinker on in my own time. It also seemed like a perfect match to implement in Scala. I haven't done anything real in Scala yet, and wanted to have a chance to do it. I like everything I've seen about the language itself. I've said so before and I'll say it again. So I decided to use it.

As you all probably know, the first step in a new project is to set up your basic structure and getting all the simple stuff working together. Right, for me that means a simple Ant script that can compile Java and Scala, package it into a jar file, and run unit tests on the code. This was simple. ... Well, except for the testing bit, that is.

It seems there are a few options for testing in Scala. The ones I found was SUnit (included in the Scala distribution), ScUnit, Rehersal and specs (which is based on ScalaCheck, another framework). So these are our contestants.

First, take SUnit -- this is a very small project, no real support for anything spectacular. The syntax kinda stinks. One class for each test? No way. Also, no integration with Ant. I haven't even tried to run this. Testing should be painless, and in this case I feel that using Java would have been an improvement.

ScUnit looked really, really promising. Quite nice syntax, lots of smarts in the framework. I liked what the documentation showed me. It had a custom Ant task and so on. Very nice. It even worked for a simple Hello, World test case. I thought that this was it. So I started writing the starting points for the first test. For some reasons I needed 20 random numbers for this test. Scala has so many ways of achieving this... I think I've tried almost all of them. But all failed with this nice class loading exception just saying InstantiationException on an anonymous function. Lovely. Through some trial and error, I found out that basically any syntax that causes Scala to generate extra classes, ScUnit will fail to run. I have no idea why.

So I gave up and started on the next framework. Rehersal. I have no idea what the misspelling is about. Anyway, this was a no show quite quickly since the Ant test task didn't even load (it referenced scala.CaseClass, which doesn't seem to be in the distribution anymore). Well then.

Finally I found specs and ScalaCheck. Now, these frameworks look mighty good, but they need better Google numbers. Specs also has the problem of being the plural of a quite common word. Not a good recipe for success. So I tried to get it working. Specs is built on top of ScalaCheck, and I much preferred the specs way of doing things (being an RSpec fan boy and all). Now specs doesn't have Ant integration at all, but it does have a JUnit compatibility layer. So I followed the documentation exactly and tried to run it with the Ant JUnit task. KABOOM. "No runnable methods". This is an error message from JUnit4. But as far as I know, I have been able to run JUnit3 classes as good as JUnit4 classes with the same classpath. Hell, JRuby uses JUnit3 syntax. So obviously I have JUnit4 somewhere on my classpath. For the world of me I cannot find it though.

It doesn't really matter. At that point I had spent several hours getting simple unit testing working. I gave up and integrated JtestR. Lovely. Half of my project will now not be Scala. I imagine I would have learned more Scala by writing tests in it, than with the implementation. Apparently not. JtestR took less than a minute to get up and working.

I am not saying anything about Scala the language here. What I am saying is that things like this need to work. The integration points need to be there, especially with Ant. Testing is the most important thing a software developer does. I mean, seriously, no matter what code you write, how do you know it works correctly unless you test it in a repeatable way? It's the only responsible way of coding.

I'm not saying I'm the worlds best coder in any way. I know the Java and Ruby worlds quite well, and I've seen lots of other stuff. But the fact that I can't get any sane testing framework in Scala up and running in several hours, with Ant, tells me that the Scala ecosystem might not be ready for some time.

Now, we'll see what happens with specs. If I get it working I'll use it to test my code. I would love that to happen. I would love to help make it happen - except I haven't learned enough Scala to actually do it yet. Any way or another, I'm not giving up on using Scala for this project. I will see where this leads me. And you can probably expect a series of these first impression posts from me about Scala, since I have a tendency to rant or rave about my experiences.

Happy New Years, people!

fredag, december 28, 2007

JtestR 0.1 released

If people have wondered, this is what I have been working on in my spare time the last few weeks. But now it's finally released! The first version of JtestR.

So what is it? A library that allows you to easily test your Java code with Ruby libraries.

Homepage: http://jtestr.codehaus.org
Download: http://dist.codehaus.org/jtestr

JtestR 0.1 is the first public release of the JtestR testing tool. JtestR integrates JRuby with several Ruby frameworks to allow painless testing of Java code, using RSpec, Test/Unit, dust and Mocha.

Features:

Integrates with Ant and Maven
Includes JRuby 1.1, Test/Unit, RSpec, dust, Mocha and ActiveSupport
Customizes Mocha so that mocking of any Java class is possible
Background testing server for quick startup of tests
Automatically runs your JUnit codebase as part of the build

Getting started: http://jtestr.codehaus.org/Getting+Started

Team:
Ola Bini - ola.bini@gmail.com
Anda Abramovici - anda.abramovici@gmail.com

måndag, december 24, 2007

Code size and dynamic languages

I've had a fun time the last week noting the reactions to Steve Yegge's latest post (Code's Worst Enemy). Now, Yegge always manages to write stuff that generate interesting - and in some cases insane - comments. This time, the results are actually quite a bit more aligned. I'm seeing several trends, the largest being that having generated a 500K LOC code base in the first case is a sin against mankind. The second one being that you should never have one code base that's so large, it should be modularized into several hundreds of smaller projects/modules. The third reaction is that Yegge should be using Scala for the rewrite.

Now, from my perspective I don't really care that he managed to generate that large of a code base. I think any programmer could fall down the same tar pit, especially if it's over a large amount of time. Secondly, you don't need to be one programmer to get this problem. I would wager that there are millions of heinous code bases like this, all over the place. So my reaction is rather the pragmatic one: how do you actually handle the situation if you find yourself in it? Provided you understand the whole project and have the time to rewrite it, how should it be done? The first step in my opinion, would probably be to not do it alone. The second step would be to do it in small steps, replacing small parts of the system while writing unit tests while going.

But at the end of the day, maybe a totally new approach is needed. So that's where Yegge chooses to go with Rhino for implementation language. Now, if I would have tackled the same problem, I would never reimplement the whole application in Rhino - rather, it would be more interesting to try to find the obvious place where the system needs to be dynamic and split it there, keep those parts in Java and then implement the new functionality on top of the stable Java layer. Emacs comes to mind as a typical example, where the base parts are implemented in C, but most of the actual functionality is implemented in Emacs Lisp.

The choice of language is something that Stevey gets a lot of comments about. People just can't seem to understand why it has to be a dynamic language. (This is another rant, but people who comment on Stevey's blog seems to have a real hard time distinguishing between static typing and strong typing. Interesting that.) So, one reason is obviously that Stevey prefers dynamic typing. Another is that hotswapping code is one of those intrinsic features of dynamic languages that are really useful, especially in a game. The compilation stage just gets in the way at that level, especially if we're talking something that's going to live for a long time, and hopefully not have any down time. I understand why Scala doesn't cut it in this case. As good as Scala is, it's good exactly because it has a fair amount of static features. These are things that are extremely nice for certain applications, but it doesn't fit the top level of a system that needs to be malleable. In fact, I'm getting more and more certain that Scala needs to replace Java, as the semi stable layer beneath a dynamic language, but that's yet another rant. At the end of it, something like Java needs to be there - so why not make that thing be a better Java?

I didn't see too many comments about Stevey's ideas about refactoring and design patterns. Now, refactoring is a highly useful technique in dynamic languages too. And I believe Stevey is wrong saying that refactorings almost always increase the code size. The standard refactorings tend to cause that in a language like Java, but that's more because of the language. Refactoring in itself is really just a systematic way of making small, safe changes to a code base. The end result of refactoring is usually a cleaner code base, better understanding of that code base, and easier code to read. As such, they are as applicable to dynamic languages as to static ones.

Design patterns are another matter. I believe they serve two purposes - the first and more important being communication. Patterns make it easier to to understand and communicate high level features of a code base. But the second purpose is to make up for deficiencies in the language, and that's mostly what people see when talking about design patterns. When you're moving in a language like Lisp, where most design patterns are already in the language, you tend to not need them for communication as much either. Since the language itself provides ways of creating new abstractions, you can use those directly, instead of using design patterns to create "artificial abstractions".

As a typical example of a case where a design pattern is totally invisible due to language design, take a look at the Factory. Now, Ruby has factories. In fact, they are all over the place. Lets take a very typical example. The Class.new method that you use to create new instances of a class. New is just a factory method. In fact, you can reimplement new yourself:

class Class
def new(*args)
  object = self.allocate
  object.send :initialize, *args
  object
end
end

You could drop this code into any Ruby project, and everything would continue to work like before. That's because the new-method is just a regular method. The behavior of it can be changed. You can create a custom new method that returns different objects based on something:

class Werewolf;end
class Wolf;end
class Man;end

class << Werewolf
 def new(*args)
   object = if $phase_of_the_moon == :full
              Wolf.allocate
            else
              Man.allocate
            end
   object.send :initialize, *args
   object
 end
end

$phase_of_the_moon = :half
p Werewolf.new

$phase_of_the_moon = :full
p Werewolf.new

Here, creating a new Werewolf will give you either an instance of Man or Wolf depending on the phase of the moon. So in this case we are actually creating and returning something from new that is not even sub classes of Werewolf. So new is just a factory method. Of course, the one lesson we should all take from Factory, is that if you can, you should name your things better than "new". And since there is no difference between new and other methods in Ruby, you should definitely make sure that creating objects uses the right name.

fredag, december 21, 2007

Ruby closures and memory usage

You might have seen the trend - I've been spending time looking at memory usage in situations with larger applications. Specifically the things I've been looking at is mostly about deployments where a large number of JRuby runtimes is needed - but don't let that scare you. This information is exactly as applicable for regular Ruby as for JRuby.

One of the things that can really cause unintended high memory usage in Ruby programs is long lived blocks that close over things you might not intend. Remember, a closure actually has to close over all local variables, the surrounding blocks and also the living self at that moment.

Say that you have an object of some kind that has a method that returns a Proc. This proc will get saved somewhere and live for a long time - maybe even becoming a method with define_method:

class Factory
def create_something
 proc { puts "Hello World" }
end
end

block = Factory.new.create_something

Notice that this block doesn't even care about the actual environment it's created in. But as long as the variable block is still live, or something else points to the same Proc instance, the Factory instance will also stay alive. Think about a situation where you have an ActiveRecord instance of some kind that returns a Proc. Not an uncommon situation in medium to large applications. But the side effect will be that all the instance variables (and ActiveRecord objects usually have a few) and local variables will never disappear. No matter what you do in the block. Now, as I see it, there are really three different kinds of blocks in Ruby code:

Blocks that process something without needing access to variables outside. (Stuff like [1,2,3,4,5].select {|n| n%2 == 0} doesn't need closure at all)
Blocks that process or does something based on living variables.
Blocks that need to change variables on the outside.

What's interesting is that 1 and 2 are much more common than 3. I would imagine that this is because number 3 is really bad design in many cases. There are situations where it's really useful, but you can get really far with the first two alternatives.

So, if you're seeing yourself using long lived blocks that might leak memory, consider isolating the creation of them in as small of a scope as possible. The best way to do that is something like this:

o = Object.new
class << o
def create_something
  proc { puts "Hello World" }
end
end
block = o.create_something

Obviously, this is overkill if you don't know that the block needs to be long lived and it will capture things it shouldn't. The way it works is simple - just define a new clean Object instance, define a singleton method in that instance, and use that singleton method to create the block. The only things that will be captured will be the "o" instance. Since "o" doesn't have any instance variables that's fine, and the only local variables captured will be the one in the scope of the create_something method - which in this case doesn't have any.

Of course, if you actually need values from the outside, you can be selective and onle scope the values you actually need - unless you have to change them, of course:

o = Object.new
class << o
 def create_something(v, v2)
   proc { puts "#{v} #{v2}" }
 end
end
v = "hello"
v2 = "world"
v3 = "foobar" #will not be captured by the block
block = o.create_something(v, v2)

In this case, only "v" and "v2" will be available to the block, through the usage of regular method arguments.

This way of defining blocks is a bit heavy weight, but absolutely necessary in some cases. It's also the best way to get a blank slate binding, if you need that. Actually, to get a blank slate, you also need to remove all the Object methods from the "o" instance, and ActiveSupport have a library for blank slates. But this is the idea behind it.

It might seem stupid to care about memory at all in these days, but higher memory usage is one of the prices we pay for higher language abstractions. It's wasteful to take it too far though.

onsdag, december 19, 2007

ThoughtWorks is looking at Sweden

I am not sure how well it comes across in my blog posts, but joining ThoughtWorks have been the best move of my life. I can't really describe what a wonderful place this is to be (for me at least). I sometimes try - in person, after a few beers - but I always end up not being able to capture the real feeling of working for a company that is more than a company.

I'm happy at being with ThoughtWorks. It's as simple as that - I feel like I've found my home.

So imagine how happy I am to tell you that ThoughtWorks is exploring opportunities for an office in Sweden!

Now, I am one of the persons involved in this effort, and we have been talking about it for a while (actually, we started talking about it for real not long after I joined). But now it's reality. The first trips to Sweden will be in January. ThoughtWorks will be sponsoring JFokus (which is shaping up to be a really good conference, by the way. I'm happy to have been presenting there the first year). We will have a few representatives at JFokus, of course. I will be there, for example. =)

Of course, exploring Sweden is not the same thing as saying that an office will actually happen. But we think there are good reasons to at least consider it. I personally think it would be a perfect fit, but I am a bit biased about it.

So what are we doing for exploration? Well, of course we have started to look into business opportunities and possible clients. We are looking at partnerships and collaboration. We are looking at potential recruits. But really, the most important thing at this stage is to talk to people, get a feeling for the lay of the land, get to know interesting folks that can give us advice and so on. And that is what our travels in January will be about.

So. Do you feel you might fit any of the categories of people above? We'd love to meet you and talk - very informally. So get in touch.

This is exciting times for us!

Your Ruby tests are memory leaks

The title says it all. The only reason you haven't noticed, is that you probably aren't working on a large enough application, or have enough tests. But the fact is, Test::Unit leaks memory. Of course, that's to be expected if it is going to be able to report results. But that leak should be more or less linear.

That is not the case. So. To make this concrete, lets take a look at a test that exhibits the problem:

class LargeTest < Test::Unit::TestCase
def setup
  @large1 = ["foobar"] * 1000
  @large2 = ["fruxy"] * 1000
end

1000_000.times do |n|
  define_method :"test_abc#{n}" do
    assert true
  end
end
end

This is obviously fabricated. The important details are these: the setup method will create two semi large objects and assign them to instance variables. This is a common pattern in many test suites - you want to have the same objects created, so you assign them to instance variables. In most cases there is some tear down associated, but I rarely see teardown that includes assigning nil to the instance variables. Now, this will run one million tests, with one million setup calls. Not only that - the way Test::Unit works, it will actually create one million LargeTest instances. Each of those instances will have those two instance variables defined. Now, if you take a look at your test suites, you probably have less than one million tests all over. You also probably don't have that large objects all over the place. But remember, it's the object graph that counts. If you have a small object that refers to something else, the whole referral chain will be stopped from garbage collection.

... Or God forbid - if you have a closure somewhere inside of that stuff. Closures are a good way to leak lots of memory, if they aren't collected. The way the structures work, they refer to many things all over the place. Leaking closures will kill your application.

What's the solution? Well, the good one would be for test unit to change it's implementation of TestCase.run to remove all instance variables after teardown. Lacking that, something like this will do it:

class Test::Unit::TestCase
 NEEDED_INSTANCE_VARIABLES = %w(@loaded_fixtures @_assertion_wrapped @fixture_cache @test_passed @method_name)

 def teardown_instance_variables
   teardown_real
   instance_variables.each do |name|
     unless NEEDED_INSTANCE_VARIABLES.include?(name)
       instance_variable_set name, nil
     end
   end
 end

 def teardown_real; end
 alias teardown teardown_instance_variables

 def self.method_added(name)
   if name == :teardown && !@__inside
     alias_method :teardown_real, :teardown
     @__inside = true
     alias_method :teardown, :teardown_instance_variables
     @__inside = false
   end
 end
end

This code will make sure that all instance variables except for those that Test::Unit needs will be removed at teardown time. That means the instances will still be there, but no memory will be leaked for the things you're using. Much better, but at the end of the day, I feel that the approach Test::Unit uses is dangerous. At some point, this probably needs to be fixed for real.

tisdag, december 18, 2007

Joda Time

I spent a few hours this weekend converting RubyTime in JRuby to use Joda Time instead of Calendar or Date. That was a very nice experience actually. I'm incredibly impressed by Joda, and overall I think it was worth adding a new dependency to JRuby for this. The API is very nice, and immutability in these classes make things so much easier.

There were a few things I got a bit annoyed at though. First, that Joda is ISO 8601 compliant is a really good thing, but I missed the functionality to tune a few things. Stuff like saying which weekday a week should start on, for the calculation of current week would be very nice. As it is right now, that functionality has to use Calendar. It might be in Joda, but I couldn't find it.

The other thing I had a problem with - and this actually made me a bit annoyed - was how Joda handles GMT and UTC. Now, it says clearly in the documentation that Joda works with the UTC concept, and that GMT is not exactly the same thing. So why is it this code passes (if assertNotEquals is assumed):

    public void testJodaStrangeNess() {
       assertEquals(DateTimeZone.UTC, DateTimeZone.forID("UTC"));
       assertEquals(DateTimeZone.UTC, DateTimeZone.forID("GMT"));
       assertEquals(DateTimeZone.UTC, DateTimeZone.forOffsetHours(0));
       assertNotEquals(DateTimeZone.UTC, DateTimeZone.forID("Etc/GMT"));
       assertNotEquals(DateTimeZone.forID("GMT"), DateTimeZone.forID("Etc/GMT"));
   }

Yeah, you're reading it right - UTC and GMT is the same time zone. +00:00 is the same as UTC too. But Etc/GMT is not the same as UTC or GMT or +00:00. Isn't that a bit strange?

fredag, december 14, 2007

JavaPolis report

Earlier today I attended the last sessions for this years JavaPolis. This was the first time I attended, and I've been incredibly impressed by it. The whole conference have been very good.

I arrived on Monday, sneaking in on Brian Leonard and Charlies JRuby tutorial. I didn't see much of it though, and after that me and Charles had to prepare our session a bit, so no BOFs.

Tuesday I slept late (being sick and all), and then saw Jim Weavers JavaFX tutorial, which was very adept. I feel I have a fairly good grasp of the capabilities of Java FX Script now, at least. There were a few BOFs I wanted to go to that evening, but since the speaker dinner/open bar was that night, I obviously choose that. Cue getting to bed at 3am, after getting home to the hotel from... uhm. somewhere in or around Antwerpen.

On Wednesday, the real conference started. My first session was the Groovy Update. I always enjoy seeing presentations of other language implementations, partly because I'm a language geek, but also because everyone has a very different presentation style that I like to contrast with each other. One thing I noticed about the Groovy presentation was that much of it was spent comparing Groovy to "other" languages.

Right, after that I saw two quickies - the first one about IntelliJ's new support for JRuby. And yes, this is support for JRuby, not just Ruby. You can use IntelliJ to navigate from Ruby code to Java code, where you have used that Java code in your Ruby. It looks really promising actually, and I spent some time showing the presenter a few things more that could be included. I don't know of any IDE that supports JRuby specific things like that, actually.

After that I saw Dick Wall's presentation on GWT. Since I have actually managed to avoid any knowledge about GWT, it was kinda interesting.

The next sessions didn't seem too interesting, so I worked a bit more on my presentation, and walked around talking to people.

Charles and my presentation went quite well, even though I managed to tank all the demonstrations quite heavily. For some reason I actually locked JIRB in comment mode, and couldn't get out of it, and then I fell upon the block coercion bug that happens when you call a Java method that is overloaded so that one takes no arguments and another overload takes the interface you want to coerce into. Charles didn't stop me until afterwards... =)

But yes, it went well. Lots of people in the audience, and lots of interest.

The final session of the day was the future of computing panel, with Gosling, Bloch, Gafter and Odersky. To be honest, I found it boring - Quinn was moderator, but didn't really manage to get the panel as enthused about anything.

After that, it was BOF time, I sat in on the Adobe one to pass the time, but didn't learn anything spectacular. The Groovy BOF was nice - it's always fun to see lots of code.

I started Thursday with the Scala presentation. Now, I didn't learn anything I didn't know here, but it was still a very good presentation. And oh, I found out that there is a Scala book on the way. (It's actually available as a Rough Cut from Artima. Very nice.)

The next session was supposed to be Blochs Effective Java, but he used to spot to rant about the BGGA closures proposal instead. Of course, Joshua Bloch always rants in a very entertaining way, and he had chosen insidiously good examples for his point of view - but I'm still not convinced.

The Java Posse live show was good fun. After that I managed to see Bob Lee's Web Beans presentation, and then the one on JAX-RS. Doesn't really have much to say about those two. Except... am I the only one who starts getting bored by annotations all over the place?

The day was nearly over, and then it was time for BOF's. The main difference being that it was time for the JRuby BOF. All went well, except that Charles didn't show up, I didn't have a projector the first half of the BOF, Tom introduced a bug on Wednesday that made all my examples fail, and so on. A huge thanks to Damian Steer who saved me by keeping the audience entertained while I fixed the bug in front of everyone.

I sat through Chet Haase's talk about Update N, but didn't pay that much attention since I was hacking on JRuby.

Finally, it was time for the BOF on other new language features in Java, with Gafter and Bloch. This was actually very interesting stuff. It ended up being almost 2 hours. But I think most people got their fill of new language syntax in it. The question is, which parts are good? I particularly didn't like method extensions. All the proposals seems to lose the runtime component of it, and in that case it just stops being interesting. I would much rather see the language add real categories or something like that.

Friday was a lazy day. I sat in on the OGIi presentation and the TDD one, but nothing really exciting there either.

So that's my JavaPolis week. It's been a good time. And now I think it's time to have some more beers with JRuby people before moving out from here.

JDBC and DDL

It would really be time for JDBC to add support for database agnostic DDL. This is still one of the more gross areas of many database libraries (just look at dialects in Hibernate). Most of it is actually caused by DDL. But at the end of the day, most of the operations supported are actually exactly the same. Am I the only one who thinks it would be nice to have programmatic access to DDL operations?

söndag, december 09, 2007

JavaPolis

Tomorrow I'm going to JavaPolis - me and Charles are presenting on JRuby on Rails on Wednesday. We'll be there the whole week so if you wanna get in touch, don't hesitate. We're aiming to taste many nice Belgian beers.

Except for that, it's actually quite quiet right now. There are some things in the works which will soon be announced, though.

torsdag, december 06, 2007

AspectJ and JRuby?

This is one of those idea-posts. There is no implementation and no code. But if someone wants to take the idea and do something with it, go ahead.

The gist of it is this: what if you could implement the actions for AspectJ in Ruby? You could define an on-load pointcut that matches everything and dispatches to Ruby. From there on you could do basically anything from the Ruby side - including dynamically changing the stuff happening. Of course, there would be a performance cost, but it could be incredibly useful for debugging, when you don't really want to restart your application and recompile every time you want to change the implementation of the aspect code.

Ola Bini: Programming Language Synchronicity