måndag, december 24, 2007

Code size and dynamic languages

I've had a fun time the last week noting the reactions to Steve Yegge's latest post (Code's Worst Enemy). Now, Yegge always manages to write stuff that generate interesting - and in some cases insane - comments. This time, the results are actually quite a bit more aligned. I'm seeing several trends, the largest being that having generated a 500K LOC code base in the first case is a sin against mankind. The second one being that you should never have one code base that's so large, it should be modularized into several hundreds of smaller projects/modules. The third reaction is that Yegge should be using Scala for the rewrite.

Now, from my perspective I don't really care that he managed to generate that large of a code base. I think any programmer could fall down the same tar pit, especially if it's over a large amount of time. Secondly, you don't need to be one programmer to get this problem. I would wager that there are millions of heinous code bases like this, all over the place. So my reaction is rather the pragmatic one: how do you actually handle the situation if you find yourself in it? Provided you understand the whole project and have the time to rewrite it, how should it be done? The first step in my opinion, would probably be to not do it alone. The second step would be to do it in small steps, replacing small parts of the system while writing unit tests while going.

But at the end of the day, maybe a totally new approach is needed. So that's where Yegge chooses to go with Rhino for implementation language. Now, if I would have tackled the same problem, I would never reimplement the whole application in Rhino - rather, it would be more interesting to try to find the obvious place where the system needs to be dynamic and split it there, keep those parts in Java and then implement the new functionality on top of the stable Java layer. Emacs comes to mind as a typical example, where the base parts are implemented in C, but most of the actual functionality is implemented in Emacs Lisp.

The choice of language is something that Stevey gets a lot of comments about. People just can't seem to understand why it has to be a dynamic language. (This is another rant, but people who comment on Stevey's blog seems to have a real hard time distinguishing between static typing and strong typing. Interesting that.) So, one reason is obviously that Stevey prefers dynamic typing. Another is that hotswapping code is one of those intrinsic features of dynamic languages that are really useful, especially in a game. The compilation stage just gets in the way at that level, especially if we're talking something that's going to live for a long time, and hopefully not have any down time. I understand why Scala doesn't cut it in this case. As good as Scala is, it's good exactly because it has a fair amount of static features. These are things that are extremely nice for certain applications, but it doesn't fit the top level of a system that needs to be malleable. In fact, I'm getting more and more certain that Scala needs to replace Java, as the semi stable layer beneath a dynamic language, but that's yet another rant. At the end of it, something like Java needs to be there - so why not make that thing be a better Java?

I didn't see too many comments about Stevey's ideas about refactoring and design patterns. Now, refactoring is a highly useful technique in dynamic languages too. And I believe Stevey is wrong saying that refactorings almost always increase the code size. The standard refactorings tend to cause that in a language like Java, but that's more because of the language. Refactoring in itself is really just a systematic way of making small, safe changes to a code base. The end result of refactoring is usually a cleaner code base, better understanding of that code base, and easier code to read. As such, they are as applicable to dynamic languages as to static ones.

Design patterns are another matter. I believe they serve two purposes - the first and more important being communication. Patterns make it easier to to understand and communicate high level features of a code base. But the second purpose is to make up for deficiencies in the language, and that's mostly what people see when talking about design patterns. When you're moving in a language like Lisp, where most design patterns are already in the language, you tend to not need them for communication as much either. Since the language itself provides ways of creating new abstractions, you can use those directly, instead of using design patterns to create "artificial abstractions".

As a typical example of a case where a design pattern is totally invisible due to language design, take a look at the Factory. Now, Ruby has factories. In fact, they are all over the place. Lets take a very typical example. The Class.new method that you use to create new instances of a class. New is just a factory method. In fact, you can reimplement new yourself:
class Class
def new(*args)
object = self.allocate
object.send :initialize, *args
You could drop this code into any Ruby project, and everything would continue to work like before. That's because the new-method is just a regular method. The behavior of it can be changed. You can create a custom new method that returns different objects based on something:
class Werewolf;end
class Wolf;end
class Man;end

class << Werewolf
def new(*args)
object = if $phase_of_the_moon == :full
object.send :initialize, *args

$phase_of_the_moon = :half
p Werewolf.new

$phase_of_the_moon = :full
p Werewolf.new
Here, creating a new Werewolf will give you either an instance of Man or Wolf depending on the phase of the moon. So in this case we are actually creating and returning something from new that is not even sub classes of Werewolf. So new is just a factory method. Of course, the one lesson we should all take from Factory, is that if you can, you should name your things better than "new". And since there is no difference between new and other methods in Ruby, you should definitely make sure that creating objects uses the right name.

5 kommentarer:

Thomas Lockney sa...

You make an interesting point about Scala replacing Java as the underlying language for apps. I've been thinking about this a lot lately and wonder how well Scala would serve for implementing a dynamic language. For instance, would JRuby be "better" (how one defines "better" is a matter for debate) if it had been implemented in Scala? This is purely a rhetorical question, of course.

LudoA sa...
Den här kommentaren har tagits bort av skribenten.
LudoA sa...

Why is that a purely rhetorical question? Granted, I don't know a lot about the implementation of dynamic languages or even about Scala, but I find the question of whether implementing e.g. Ruby would be easier in Scala than in Java an interesting one.

Would this be easier? Or would it maybe not even be possible?

David Pollak sa...

Personally, I would choose Java to implement other languages. While I am a huge Scala fan, where performance matters (squeezing the extra 5% out of the code), Scala constructs are sometimes more verbose than Java constructs. Java constructs map more directly and predictably to JVM instructions and that matters for implementing languages.

On the other hand, I strongly disagree with Ola about the whole dynamic vs. static dispatch thing.

I can and do write dynamic, yet type checked and type safe, code in Scala. I've done substantial work in both Scala and Ruby and find that Scala's type system works in favor of code brevity and expressiveness.

Ola is a coding god in both Ruby and Java. Well... okay, I've only seen his Java code, but if he's 30% as good in Ruby as he is in Java, he's a god in Ruby. But I think that if he spent 6-9 months coding Scala hardcore, he'd come to appreciate what can be done with a powerful type system and the flexibility that Scala offers.

Put another way, I've been coding lift for 9+ months. In the last 5 months, the LoC has stayed stable, but there have been hundreds on new features in lift. This is because as I build my vocabulary of Scala idioms, I am able to express more in fewer LoC.



Stephan.Schmidt sa...

I also would probably use JS for the dynamic parts and Java for the more static ones.

But one could think of using Janino to create Java "scripts" at runtime.



Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
Blog at http://stephan.reposita.org - No signal. No noise.