onsdag, maj 14, 2008

A New Hope: Polyglotism

OK, so this isn't necessarily anything new, but I had to go with the running joke of the two blog posts this post is more or less a follow up to. If you haven't already read them, go read Yegge's Dynamic Languages Strikes Back, and Beust's Return Of The Statically Typed Languages.

So let's see. Distilled, Steve thinks that static languages have reached the ceiling for what's possible to do, and that dynamic languages offer more flexibility and power without actually sacrificing performance and maintainability. He backs this up with several research papers that point to very interesting runtime performance improvement techniques that really can help dynamic languages perform exceptionally well.

On the other hand Cedric believes that Scala is bad because of implicits and pattern matching, that it's common sense to not allow people to use the languages they like, that tools for dynamic languages will never be as good as the ones for static ones, that Java generics isn't really a problem, that dynamic language performance will improve but that this doesn't matter, that static languages really hasn't failed at all and that Java is still the best language of choice, and will continue to be for a long time.

Now, these two bloggers obviously have different opinions, and it's really hard to actually see which parts are facts and which are opinions. So let me try to sort out some facts first:

Dynamic language have been around for a long time. As long as statically typed languages in fact. Lisp was the first one.

There have been extremely efficient dynamic language implementations. Some of the Common Lisp implementations are on par with C performance, and Strongtalk also achieved incredible numbers. As several commenters have noted, Strongtalks performance did not come from the optional type tags.

All dynamic languages in large use today are not even on the same map with regards to performance. There are several approaches to fixing these, but we can't know how well they will work out in practice.

Java's type system is not very strong, and not very static, as these definitions go. From a type theoretic stand point Java does not offer neither static type safety nor any complete guarantees.

There is a good reason for these holes in Java. In particular, Java was created to give lots of hints to the compiler so the compiler can catch errors where the programmer is insoncistent. This is one of the reasons that you very often find yourself writing the same type name twice, including the type name arguments (generics). If the programmer makes a mistake at one side, the compiler will be able to catch this error very easily. It is a redundancy in the syntax that makes Java programs very verbose, but helps against certain kinds of mistakes.

Really strong type systems like those Haskell and OCaML use provide extremely strong compile time guarantees. This means that if the compiler accepts your program, you will never see any runtime errors from the type system. This allows these compilers to generate very efficient code, because they know more about the state of the application at most points in time, compared to the compiler for Java, which knows some things, but not nearly as much as Haskell or OCaML.

The downside of really strong type systems is that they disallow some extremely common expressions - these are things you intuitively can imagine, but it can't be expressed within the constraints of such a type system. One solution to these problems is to add higher kinds, but these have a tendency to create more complexity and also suffer from some of the same problems.

So, we have three categories of languages here. The strongly statically checked ones, like Haskell. The weakly statically checked ones, like Java. And the dynamically checked ones, like Ruby. The way I look at these, they are good at very different things. They don't even compete in the same leagues. And comparing them is not really a valid point of reasoning. The one thing that I am totally sure if is that we need better tools. And the most important tool in my book is the language. It's interesting, many Java programmers talk so much about tools, but they never seem to think about their language as a tool. For me, the language is what shapes my thinking, and thus it's definitely much more important than which editor I'm using.

I think Cedric have a point in that dynamic language tool support will never be as good as those for statically typed languages - at least not when you're defining "good" to be the things that current Java tools are good at. Steve thinks that the tools will be just as good, but different. I'm not sure. To a degree I know that no tool can ever be completely safe and complete, as long as the language include things like external configuration, reflection and so on. There is no way to include all dynamic aspects of Java, but using the common mainstream parts of the language will give you most of these. As always this is a tradeoff. You might get better IDE support for Java right now, but you will be able to express things in Ruby that you just can't express in Java because the abstractions will become too large.

This is the point where I'm going to do a copout. These discussions are good, to the degree that we are working on improving our languages (our tools). But there is a fuzzy line in these discussions, where you end up comparing apples and oranges. These languages are all useful, for different things. A good programmer uses his common sense to provide the best value possible. That includes choosing the best language for the job. If Ruby allows you to provide functionality 5 times faster than the equivalent functionality with Java, you need to think about whether this is acceptable or not. On the one hand, Java has IDEs that make maintainability easier, but with the Ruby codebase you will end up maintaining a fifth of the size of the Java code base. Is that trade off acceptable? In some cases yes, in some cases no.

In many cases the best solution is a hybrid one. There is a reason that Google allows more than one language (C++, Java, Python and JavaScript). This is because the languages are good at different things. They have different characteristics, and you can get a synergistic effect by combining them. A polyglot system can be greater than the sum of it's parts.

I guess that's the message of this post. Compare languages, understand your most important tools. Have several different tools for different tasks, and understand the failings of your current tools. Reason about these failings in comparison to the tasks they should do well, instead of just comparing languages to languages.

Be good polyglot programmers. The world will not have a new big language again, and you need to rewire your head to work in this environment.

11 kommentarer:

Bob Aman sa...

A minor point. OCaml can't gaurantee type safety if you unmarshal something. It just assumes that you know what you're doing and that you're treating some deserialized value as the correct type. If you make a mistake, you get a segfault. Generally in OCaml, it's a good idea to manually specify the type of values that are being deserialized so that the compiler can more reliably infer the types.

Anonym sa...

I miss Objective-C.

(Not that it's gone, but that I'm not using it.)

Philip Schwarz sa...

Hi Ola,

you said:

"Java's type system is not very strong, and not very static...
...
So, we have three categories of languages here. The strongly statically checked ones, like Haskell. The weakly statically checked ones, like Java. And the dynamically checked ones, like Ruby."


Can you clarify (for me) your labelling of Java's typing system in the context of the following definitions from On Understanding Types, Data Abstraction, and Polymorphism (1985) Luca Cardelli, Peter Wegner:

Programming languages in which the type of every expression can be determined by static program
analysis are said to be statically typed.

Static typing is a useful property, but the requirement that all variables and expressions are bound to a type at compile time is sometimes too restrictive. It may be replaced by the weaker requirement that all expressions are guaranteed to be type-consistent although the type itself may be statically unknown; this can be generally done by introducing some run-time type checking.

Languages in which all expressions are type-consistent are called strongly typed languages. If a
language is strongly typed its compiler can guarantee that the programs it accepts will execute
without type errors.

In general, we should strive for strong typing, and adopt static typing whenever possible.

Note that every statically typed language is strongly typed but the converse is not necessarily true.

Static typing allows type inconsistencies to be discovered at compile time and guarantees that
executed programs are type-consistent. It facilitates early detection of type errors and allows greater execution-time efficiency. It enforces a programming discipline on the programmer that makes programs more structured and easier to read.

Thanks.

Philip Schwarz.

Christophe sa...

Philip,

I am not Ola, but I can tell you readily that your book is wrong on many points.

"Note that every statically typed language is strongly typed but the converse is not necessarily true."

That's wrong. Statically typed languages can easily be weak, due in particular to one property: typecasting. Because of casting, even when your static analysis tells you that your surface types match correctly, you are never sure that your underlying types are correct as well, since they can easily be cast to other types. C and Java both allow typecasting, and are both statically typed languages, so they are weak statically typed languages. Haskell, OCaml and Ruby don't allow such casting, so they are strong languages (the first two being static, while the last one being dynamic).

"In general, we should strive for strong typing, and adopt static typing whenever possible."

Modern practice has proven this piece of advice to be misleading. Strong typing is good to have, and I do think indeed that you should strive to have it. Static typing is something completely different, whose usefulness (at least without the presence of type inference to shorten the written expressions) isn't unquestioned.

GM sa...

"The downside of really strong type systems is that they disallow some extremely common expressions"

Eh? An example? My experience has been that this is very rare with the sophisticated type systems we have these days, and it's almost always because I've made a mistake with my design. (sometimes even though that behaviour was possible, it wasn't desirable, hence the type-checker not being extended to facilitate it) To the point where it seems likely that any exceptions are mistakes I don't yet realise I made.

Anonym sa...

What I find ironic is that both SteveY and Cedric almost seem to be in agreement on the idea that picking one language and sticking to it is ultimately a good idea, no matter how it may chafe. Although I wonder if Yegge will stick to that conclusion if and when he leaves Google.

Frankly, Cedric lost me after he started complaining about pattern matching in Scala. Advocating sophisticated type systems in languages like Haskell and Scala is one thing; but advocating Java as your paragon of static typing is like trying to introduce people to on authentic Mexican food by taking them to Taco Bell.

Anonym sa...

I liked Avdi's last sentence in his comment.

After four years of Java and four years of Ruby after that (all full-time programming), switching to Haskell made me notice that Java has one interesting property: it's not possible to create a type that does not include null amongst its set of values. It turns out that this was the source of a lot of pain.

In one of the language continuums, I'd place Java at one end, Ruby in the middle, and Haskell at the other.

Anonym sa...

you said: "Java's type system is not very strong, and not very static...

Here is how Venkat Subramaniam classifies Java in (in Programming Groovy) :

STRONG
|
|
Ruby | Java
/Groovy | /C++
|
|
DYNAMIC--------|-------STATIC
|
|
JavaScript| C/C++
/Perl |
|
WEAK

Anonym sa...

I'll try again...let's see if this displays OK
..............................
.............STRONG...........
................|.............
................|.............
.......Ruby.....|..Java.......
......./Groovy..|../C++.......
................|.............
................|.............
.DYNAMIC--------|-------STATIC
................|.............
................|.............
......JavaScript|..C/C++......
......../Perl...|.............
................|.............
..............WEAK............

Anonym sa...

Doh! typing mistake: The C++ in the top right quarter should of course be C#.

Anonym sa...

You may find this funny but I believe from the language (not libraries) point of view Javascript has the best usability, flexibility, read and maintainability amongst others. I am not talking about mess of the (web browsers) clientside programming here, so dont be harsh on me.

I used all 4 four parts of the diagram above and this is just a personal opinion :)

With the right tools, and support; OO and functional programming compatibility; in a world where marketing and buzzwords and programming heroes mean nothing; I believe that (kind of) option would have higher chance to survive XD