Ola Bini: Programming Language Synchronicity: september 2006

lördag, september 30, 2006

In Joy and Sorrow with Continuations

Continuations is one of those topics that tend to crop up now and again. This is not strange, of course, since they happen to be one of the more powerful features of certain languages, but also is one of the most confusing one. I would like to stick my head out and say that continuations are probably up there besides real macros in power. The reason for this is that you can implement so many language features in terms of them.

Since there still seem to be some confusion about them, I'll write my piece on the. Not just for you readers of course, but more importantly for myself. I intend to get a good grip on continuations in Ruby by writing this (and this is incidentally one of the best ways to learn about something confusing; try to write about).

First of all, exactly what is a continuation? Basically, at every point in the evaluation of an expression, there will be one or more continuations lurking. For example, if we take the very simple expression foo = 13 * (10 - 7). In this place there is 4 interesting continuations waiting. (There are actually 8 of them all in all, but only 4 interesting.) We start by looking at the expression 10 - 7. If we look at the rest of the expression like this: foo = 13 * [] where the square brackets is the place where the result of the expression 10 - 7 will go. What's actually happening is that those square brackets is the continuation of the complete expression. The result of evaluating 10 - 7 will be injected into the rest of the expression, and that is what the continuation is.

Until now, I have spoken about continuations as a concept. Those of you who know the Ruby interpreter knows that it isn't coded in continuation-passing style. But it could be, and it doesn't really matter, since we still have a way to get at the current continuation. So, how should a continuation be represented, though? The way most languages choose to do it, a continuation is nothing but an anonymous closure, which takes one parameter, which is the result to return to the evaluation. In the example above, if we inject the callcc-primitive into the mix, we will have code that looks like this:

foo = 13 * callcc {|c| c.call(10-7) }

His doesn't really look that spectacular, of course. The above code will have exactly the same effect as the first example, namely binding the variable 'foo' with the value 39.

If you want to, you can look at every computation like this. It sometimes helps to imagine that you just wring the evaluation inside out.

So. What can you do with them? Mostly anything, actually. Many parts of Scheme is implemented in CPS (continuation passing style). But for a few concrete things that can be implemented easily: exceptions, throw/catches, breaks, returns, coroutines, generators and much more. As an example, we can implement a return like this:

 def val
  callcc do |ret|
    1000.times do |v|
      if v == 13
        ret.call(v+1)
      end
    end 
  end
end
bar = val
puts bar

What happens here is exactly the same result as if we had used the keyword return. Most of the other flow control primitives can be implemented this way too.

What has made continuations trendy lately is something called continuation web servers. The idea is to make the statelessness of the web totally transparent by hiding the client round trips inside methods, and these methods save the current continuation, and then breaks of evaluation. When the result from the server arrives, the continuation will be looked up from some session storage, and then restarted again, where it was. Basically, this allows web applications to work more or less exactly the same as a console application. This is very powerful, but as I hope this small post have shown, continuations have much more to give.

YAML and JRuby - the last bit

An hour ago I sent the patches to make JRuby's YAML support completely Java-based. What I have done more specifically, is to remove RbYAML completely, and instead used the newly developed 0.2-support of JvYAML. There were a few different parts that had to be done to make this possible, especially since most of the interface to YAML was Ruby-based, and used the slow Java proxy-support to interact with JvYAML.

So, what's involved in an operation like this? Well, first I created custom versions of the Representer and the Serializer. (I had a custom JRubyConstructor since May). These weren't that big, mostly just delegating to the objects themselves to decide how they wanted to be serialized. And that leads me to the RubyYAML-class, which is what will get loaded when you write "require 'yaml'" in JRuby from now on. It contains two important parts. First, the module YAML, and the singleton methods on this module, that is the main interface to YAML functionality in Ruby. This was implemented in RbYAML until now.

The next part is several implementations of the methods "taguri" and "to_yaml_node" on various classes. These methods are used to handle the dumping, and it's really there that most of the dumping action happens. For example, the taguri method for Object says that the tag for a typical Ruby object should be "!ruby/object:#{self.class.name}". The "to_yaml_node" for a Set says that it should be represented as a map where the values of the set are keys, and the values for these keys are null.

So, when this support gets into JRuby trunk it will mean a few things, but nothing that is really apparent for the regular JRuby user. The most important benefits of this is part performance, and part correctness. Performance will be increased since we now have Java all the way, and correctness since I have had the chance to add lots of unit tests and also to fix many bugs in the process. Also, this release makes YAML 1.0-support a reality, which means that communication with MRI will work much better from now on.

So, enjoy. If we're lucky, it will get into the next minor release of JRuby, which probably will be here quite soon.

Announcing JvYAML 0.2.1

The last few days have been spent integrating the JvYAML dumper with JRuby, and also to make YAML support in JRuby totally implemented in Java. As a side effect I have been able to root out a few bugs in JvYAML. Enough of them to warrant a minor release, actually. So, what's new? Working binary support, support for better handling of null types, better 1.o-support and a few hooks to make it possible to remove anchors in places where it doesn't make sense. (Like empty sequences.)

The url is http://jvyaml.dev.java.net and I recommend everyone to upgrade.

måndag, september 25, 2006

Two things in Rails

This will be a short in-between post. Don't expect to be annoyed, enlightened or even trivially entertained. I'm just going to describe two small things I do in all my Rails-projects, and I haven't found a way to do them as plugins. This is very annoying, of course, so I hope someone from the Rails team will eventually see this and tell me how to do it DRY.

1. Add a production_test environment
I feel constrained by the three environments that get delivered by Rails out of the box. And I find that for every project where the customer isn't myself and the codebase is bigger than about 50 lines of (hand-written) code, I tend to add a new environment to Rails; 'production_test'. The problem this environment solves is the situation where I want my customers to test out an application, but I don't want them to do it against a real production environment. For example, I did an application called LPW a few months back, that works against a 3rd party web service. This web service has one production environment and one test environment. I want the production_test to be as fast, responsive and generally as much as the production environment as possible, but not go against the production web service. I solve this by adding a production_test env which is exactly like the production environment, except I can just change the address to the web service endpoint to the test one.

I usually do this, so I can give my customers a nice application that they can play with, but without worrying about them damaging production data.

2. Add plugin environment configuration
This is actually a major pain. I have developed a few plugins, and generally I want them to have configurations based on which environment we are in. For example, the CAS authentication plugin shouldn't really redirect to the CAS server when in development environment. But, I can't set this in any good way, since the plugins will be loaded after the environment-specific files have been loaded. So, what I do is simply to add a new directory, called config/plugin and in environment.rb I have this:

 plugin_environment = File.join(RAILS_ROOT,'config', 'plugin', "#{ENV['RAILS_ENV']}.rb")
load plugin_environment if File.exist?(plugin_environment)

This solution sucks, but it works.

söndag, september 24, 2006

The Ruby singleton class

After my post on Meta-programming techniques I got a few comments and questions about the singleton-class. This feature seem to be quite hard to understand so I have decided that I will try to clarify the issue by first describing what it is, and then detail why it is so useful. This entry will be concept-heavy and code-light.

What it is
A child with many names, the singleton class has been called metaclass, shadow class, and other similar names. I will stay with singleton class, since that's the term the Pickaxe uses for it.

Now, in Ruby, all objects have a class that it is an instance of. You can find this class by calling the method class on any object. The methods an object respond to will originally be the ones in that objects class. But as probably know, Ruby allows you to add new methods to any object. There are two syntaxes to do this:

 class << foo
def bar
puts "hello world"
end
end

and

 def foo.bar
puts "hello, world"
end

To the Ruby interpreter, there is no difference in this case. Now, if foo is a String, the method bar will be available to call on the object referenced by foo, but not on any other Strings. The way this works is that the first time a method on a specific object is defined, a new, anonymous class will be inserted between the object and the real class. So, when I try to call a method on foo, the interpreter will first search inside the anonymous class for a definition, and then go on searching the real class hierarchy for an implementation. As you probably understand, that anonymous class is our singleton class.

The other part of the mystery about singleton classes (and which is the real nifty part) is this. Remember, all objects can have a singleton class. And classes are objects in themselves. Actually, a class such as String is actually an instance of the class Class. There is nothing special about these instances, actually. They have capitalized names, but that's because the names are constants. And, since every class in Ruby is an instance of the class Class, that means that what's called class methods, or static methods if you come from Java, is actually just singleton methods defined on the instance of the class in question. So, say you would add a new class method to String:

 def String.hello
puts "hello"
end

String.hello

And now you see that the syntax is actually the same as when we add a new singleton method to any other object. This only difference here is that that object happens to be an instance of Class. There are two other common ways to define class methods, but they work the same way:

 class String
def self.hello
puts "hello"
end
end

class String
class << self
def hello
 puts "hello"
end
end
end

Especially the second version needs explaining, for two reasons. First, this is the preferred idiom in Ruby, and it also makes explicit the singleton class. What happens is that, since the code inside the "class String"-declaration is executed in the scope of the String instance of Class, we can get at the singleton class with the same syntax we used to define foo.bar earlier. So, the definition of hello will happen inside the singleton class for String. This also explain the common idiom for getting the singleton class:

 class << self; self; end

There is no other good way to get it, so we extract the self from inside a singleton class definition.

Why is it so useful for metaprogramming?
Obviously, you can define class methods with it, but that's not the main benefit. You can do many metaprogramming tricks with it, that are impossible without. The first one is to create a super class that can define new class methods on sub classes of itself. That is the use I show cased in my earlier blog entry. The problem is that you can't just use self by itself, since that only gives the class instance. This code with results show the difference:

 class String
p self
end # => String

class String
p (class << self; self; end)
end # => #<Class:String>

And, if you want to use define_method, module_eval and all the other tricks, you need to invoke them on the singleton-class, not the regular self. Basically, if you need to dynamically define class methods, you need the singleton-class. This example will show the difference between defining a dynamic method with self or the singleton class:

 class String
self.module_eval do
  define_method :foo do
    puts "inside foo"
  end
end

(class << self; self; end).module_eval do
  define_method :bar do
    puts "inside bar"
  end
end
end

"string".foo # => "inside foo"
String.bar # => "inside bar"

As you can see, the singleton class will define the method on the class instead. Of course, if you know the class name it will always be easier to avoid having an explicit singleton class, but when the method needs to defined dynamically you need it. It's as simple as that.

Announcing JvYAML 0.2

I'm very pleased to announce that JvYAML 0.2 was released a few minutes ago. The new release contains all the things I've talked about earlier and a few extra things I felt would fit good. The important parts of this release are:

The Dumper - JvYAML is now a complete YAML processor, not just a loader.
Loading and dumping JavaBeans - This feature is necessary for most serious usage of YAML. It allows people to read configuration files right into their bean objects.
Loading and dumping specific implementations of mappings and sequences. Very nice if you happen to need your mapping to be a TreeMap instead of a HashMap.
Configuration options to allow 1.0-compatibility with regard to the ! versus !! tag prefixes.
The simplified interface have been substantially improved, adding several utility methods.
Lots and lots of bug fixes.

So, as you can see, this release is really something. I am planning on spending a few nights this week integrating it with JRuby too. And soon after that we will be able to have YAML completely in Java-land. That is great news for performance. It also makes it easier to just have one YAML implementation to fix bugs in, instead of two.

A howto? Oh, you want a guide to the new features? Hmm. Well, OK, but it really isn't much to show. How to dump and object and get the YAML string back:

 YAML.dump(obj);

or dump directly to a file:

 YAML.dump(obj,new FileWriter("/path/to/file.yaml"));

or dump with version 1.0 instead of 1.1:

 YAML.dump(obj, YAML.options().version("1.0"));

dumping a JavaBean:

 String beanString = YAML.dump(bean);

and loading it back again:

 YAML.load(beanString);

That's more or less it. Nothing fancy. Of course, all the different parts underneath is still there, and you can provide your own implementation of YAMLFactory to add your own specific hacks. If you want to dump your object in a special way, you can implement the YAMLNodeCreator interface, and your own object will be in charge of creating the information that should be used to represent your object.

lördag, september 23, 2006

Three ways to add Ruby Macros

As most of my readers probably have realized at this point, I have a few obsessions. Lisp and Ruby happens to be two of the more prominent ones. And regarding Lisp, macros is what especially interest me. I have been doing much thinking lately on how you could go about adding some kind of macro facility to Ruby and these three options are the result.

I should begin by saying that none of these options are entirely practical right now. All of them have some serious problems which I frankly haven't been able to come up with an answer for yet. But that doesn't stop me from blogging about my ideas, of course. Another thing to notice is that this is not about hygienic macros. This is the full-blown, power, blow-the-moon away version of macros.

MacRuby - Direct defmacro in Ruby
The first approach rests on modifying the language itself. You can add a defmacro keyword which takes a name and a code block to execute. Each time the compiler/interpreter finds a macro-definition, it will remember the name. When that name is found in the code later on each place will be marked. Then, before execution begins, all places where the call to the macro are will be replaced by the output from sending in the subnodes at that place by the output of calling the macro. An example of a simple macro:

 defmacro log logger, level, *messages
if $DEBUG
  :call, logger, level, *messages
else
  :nop
end
end

log @l, :debug, "value is: #{very_expensive_operation()}"

What's interesting in this case is that the messages will not be evaluated if the $DEBUG flag is not set. This is because the value returned from the macro will be spliced into the AST only if that flag is set. Otherwise a no-op will be inserted instead. Obviously, for this kind of code to work, the interpreter would need to change substantially. There is also a big problem with it, since it's very hard to fit this model into the object-oriented system of Ruby. As I think about it now, it seems macros would be the only non-OOP feature in Ruby, if added in this way. Another big problem with this model is that it is really not that intuitive what the resulting code from the macro will be. As soon as something more advanced needs to be returned, it will be very hard getting it straight in your head. One solution to this would be to do it the standard CL way. First write the output from the macro in several different instances. Then transform this to the AST code through a tool that parses the code. Then transform this into the macro. This process would be helped by tools, of course.

Back-and-Lisp-Ruby - Write macros in Lisp, translate Ruby back and forth
Another way to achieve this power in Ruby would be to separate the macro language from the main language. In effect, the macros would be a classic pre-processor. To offer the same power level as Lisp and others, the best way would be to write the macros themselves in a Lisp dialect, then transform Ruby in a well-defined way to Lisp and back again. (See the next version for more about this idea.) In this situation the same macro as before could look like this:

 (defmacro log (logger level &rest messages)
 (if $DEBUG
     `(,level ,logger ,@messages)
     '()))

The main difference in this code is that the macro and the output from the macro is Lisp. We have gotten rid of the ugly :call and :nop return values, and to me this seems quite readable. Of course, I'm not sure everyone else feels the same way. And we still have the same problem with Object Orientedness. It's missing.

RoCL - Ruby over Common Lisp
The final idea is to build a Ruby runtime within Common Lisp and transform Ruby into Common Lisp before running it. The macros could either be added as Ruby code or Lisp code. Everything will be transformed into the equivalent code in Lisp, maybe using CLOS as the Object-system, or building something based on Ruby's. Of course, the semantics of many things would change, and many libraries would need to rewritten. But in the end, there would be incredible power available. Especially if we can make it go both ways, so that Common Lisp can use Ruby libraries.

An example transformation could look like this. From this Ruby:

 class String
  def revert(a, *args)
    if block_given?
      yield a
    else
      args + [a]
    end
  end
end

"abc".revert "one" do |x|
  puts x
end

This is nonsense code, if you hadn't noticed. =)

 (with-class "String" nil
            (def revert (a block &rest args)
              (if block
                  (apply block a)
                  (+ args [a]))))
(revert "abc" "one" #'(lambda (x)
                        (puts self x)))

Conclusions
It is very hard to actually retrofit macros into Ruby after the fact. I'm still not sure it can be done and keep enough of Ruby's semantics to make it meaningful. It seems that we need a new language. But if I had to choose among these approach, the RoCL one seems the most interesting and also the most fun to implement. If I have a motto it would have to be something in the line of "best of all worlds". I want the best from Ruby, Java, Lisp, Erlang and everything I can find.

fredag, september 22, 2006

The Dark Ages of programming languages

We seem to be living in the dark ages of programming languages. I'm not saying this to bash everything; I'm actually being totally objective right now. Obviously, our situation right now is much better than it was 10 years ago. Or even 5 years ago. I would actually say that it's really much better now, than 1 year ago. But programming is still way too painful in almost all cases. We are doing so much stuff by hand that obviously should be done be computer.

I spend quite much time learning new languages now and then, to try to find something that's really good for me. So far, the best contestants are Ruby, Erlang, OCaml and Lisp, but all of those have their share of problems too. They just suck less than the alternatives.

Ruby... I really like Ruby. Ruby is such an improvement that I really want to do almost everything in it nowadays. I think in Ruby half the time and in Lisp the other half. But it's not enough. It is still clunky. I want tail calls. I want real macros. I want blazing speed and complete integration with good libraries for everything and more. I'm just a sucker for power, and I want more of it in Ruby.
Erlang and OCaml. These languages are really great. For specific applications. Specifically, Erlang is totally superior for concurrent programming. And OCaml is incredibly fast, very typesafe and has great GUI libraries. So, if I was asked to do something massively concurrent I would probably choose Erlang, and OCaml if it was GUI programming. But otherwise... Well, Erlang does have some neat functional properties, but not any nice macro support. It doesn't have a central code repository and many other things you expect from a general purpose language. OCaml suffers from the same things.
Lisp is the love of my life. But as so many people before me has noted, all the implementations are bad in some way or another. Scheme is lovely; for research. Common Lisp is so powerful, but it needs users. Lots of them, creating libraries for every little data format there can be, creating competing implementations of particularly important API's; like databases.

Conclusion. Nothing is good enough, right now. I see two two paths ahead. Two ways that could actually end in the "100-year language".

The first path is one new language. This language will be based on all the best features of all current languages, plus a good amount of research output. I have a small list what this language would need to be successful as the next big one:

It needs to be multiparadigm. I'm not saying it can't choose one paradigm as the base, but it should be possible to program in it functionally, OOP, AOP, imperative. It should be possible to build a declarative library so you can do logic programming without leaving the language.
It should have static type inference where possible. It should also allow optional type hints. This is so important for creating great implementations. It can also increase readability in some cases.
It needs all the trappings of functional languages; closures, first-order functions and lambdas. This is essential, to avoid locking the language into an evolutionary corner.
It needs garbage collection. Possibly several competing implementations of GC's, running evolutionary algorithms to find out which one is best suited for long running processes of the program in question.
A JIT VM. It seems almost a given right now that Virtual Machines are a big win. They can also be made incredibly fast.
Another JIT VM.
A non-VM implementation. Several competing implementations for different purposes is important to allow competition and experimentation with new features of implementation.
Great integration with legacy languages (Java, Ruby (note, I'm counting on all Rubyists moving to this new language when it gets out, making Ruby legacy), Cobol). This is obvious. There are to many things lying around, bitrotting, that we will never get rid of.
The language and at least one production quality implementation needs to be totally open-source. No lock-in of the language should be possible.
Likewise, good company support is essential. A language needs money to be developed.
A centralized code/library repository. This is one of Java's biggest failings. Installing a new library in Java is painful. We need something like CPAN, ASDF, RubyGems.
The language needs great, small and very orthogonal libraries. The libraries included with the language needs to be great, since they have to be small but still pack all the most needed punch.
Concurrency must be a breeze. There should be facilities in the language itself for making this obvious. (Like Erlang or Gambit Scheme).
It should be natural to do meta-programming in it (in the manner of Ruby).
It should be natural to solve problems bottom-up, by implementing DSL's inside or outside the language.
The languages needs a powerful macro facility that isn't to hard to use.
Importantly, for the macro facility, the language needs to have a well-defined syntax tree of the simplest possible kind, but it also needs to have optional syntax.

So, that's what I deem necessary (but maybe not sufficient) for a really useful, good, long term programming language. When I read this list, it doesn't seem that probables that this language will show up any time soon, though. Actually, it seems kinda unrealistic.

So maybe the other way ahead is the right one? The other way I envision is that languages become easier and easier to create, and languages have their strength in different places. Along this path I envision the descendants of Ruby and Erlang exploiting what they're good at and eschewing everything else. But for this strategy to work, the first thing implemented in each language needs to be a seamless way to integrate to other languages. Maybe there will come an extremely good glue-language (not like Perl or Ruby, but a language that only will serve as glue between programming languages), and all languages will implement good support for that language. For example you could code a base Erlang concurrent framework, which uses G (the glue language) to implement some enterprise functionality in Java sandboxes, and some places where Ruby through G will implement a DSL, which have subparts where Ruby uses G to run Prolog knowledge engines.

If you had to choose among the two futures, I am frankly more inclined towards the one-language one. But the multi-language way seems much more probable. And since I'm trying to choose way now, I'm placing my bets on the second option. We are not ready to implement G yet, but I do think that as many p-language techs as possible should do their best to learn how languages can cooperate in different ways, to prepare this project.

The difference between Kernel#` and Kernel#system

Today I had a fun learning experience. It cost me several hours of work, so I will post a small notice about it here so Google can make other developers lives easier. Or maybe I'm the only one who did this mistake.

Anyway. What I was trying to do was to start an external Ruby script (from another Ruby script). This other Ruby script went daemon, but since I didn't want to install the daemonize package (another bad decision, probably), I just wrote the script in question to fork and detach. Now, I have condensed the question a little, to this Ruby script:

 `ruby -e'if pid=fork; Process.detach(pid); else; sleep(5); end'`

Everyone please raise their hands if it is obvious that this script will sleep for 5 seconds before giving back my prompt. It wasn't obvious for me, since it was a long time I did UNIX System programming.

For those who still want to know, the problem is that backtick binds to the started process' STDIN, STDOUT and STDERR. As long as STDOUT is live, backtick will wait. And since the forking and detaching doesn't redirect all the STD* streams, this will wait until both processes has finished.

There are two ways to fix this. One right way, and one fast way. The right way is to detach the rebind the streams after forking. This can easily be done with this code:

 STDIN.reopen('/dev/null')
STDOUT.reopen('/dev/null')
STDERR.reopen('/dev/null')

The faster way is to replace backtick with a system call. Since system isn't interested in the output from the process, it will not bind those streams. So just running this instead, will work:

 system "ruby -e'if pid=fork; Process.detach(pid); else; sleep(5); end'"

I have learned the lesson. I have bought a copy of the Stevens book. (UNIX Network Programming, which detail the interaction between fork and ports, which was what my original problem was about.)

onsdag, september 20, 2006

Dynamic Ruby power and static balance

Update: This post has been updated to explain, clarify and remove certain things that sounded like an attack on people that didn't agree with me, especially Austin. This was certainly not my intent when writing it. Added explanations will be highlighted with italic text.

Sir Bedevere: And what do you burn, apart from witches?
Peasant 1: More witches.
Peasant 2: Wood.
Sir Bedevere: Good. Now, why do witches burn?
Peasant 3: ...because they're made of... wood?
Sir Bedevere: Good. So how do you tell whether she is made of wood?
Peasant 1: Build a bridge out of her.
Sir Bedevere: But can you not also build bridges out of stone?
Peasant 1: Oh yeah.
Sir Bedevere: Does wood sink in water?
Peasant 1: No, no, it floats!... It floats! Throw her into the pond!
Sir Bedevere: No, no. What else floats in water?
Peasant 1: Bread.
Peasant 2: Apples.
Peasant 3: Very small rocks.
Peasant 1: Cider.
Peasant 2: Gravy.
Peasant 3: Cherries.
Peasant 1: Mud.
Peasant 2: Churches.
Peasant 3: Lead! Lead!
King Arthur: A Duck.
Sir Bedevere: ...Exactly. So, logically...
Peasant 1: If she weighed the same as a duck... she's made of wood.
Sir Bedevere: And therefore...
Peasant 2: ...A witch!
(quotes from Monty Python and the Holy Grail, courtesy of IMDB)

My post announcing Ducktator seems to have stirred up a few emotions on Ruby-talk. Of course, most of this is my fault, by naming the library in such a frivolous way and not explaining the domains for its usage correctly. But on the other hand, there seems to be a general confusion about the concept of Duck typing, dynamic versus static typing, validation and other issues. Actually, I get a whiff of religion when my mention of Duck typing engendered such a diverse set of responses.

Of course, my reaction about duck typing was as religious. I see this is a general trap when discussing programming languages. The Ruby community is altogether very good at avoiding religion, which caused me to be quite startled when I found hints of it. Duck typing as a concept seem to be very loaded right now. I'm merely pointing this out as something that we should take care to be on the watchout for. Just as I will do from now on, I suggest people in the Ruby community should try to be as objective as possible, when discussing this.

And everyone and their aunt seem to have different opinions on what duck typing really is. It's all quite fun, actually, except for the fact that it misses the point. I should have avoid mentioning ducks. I should have avoiding saying anyting at all about typing, since that isn't the point. And I bloody well shouldn't have used the class-validator in my example. Well, done is done. And this post won't be about that. Just the next paragraph.

The Ducktator disclaimer

I won't mention the words duck typing from here on. I would change the name of the project if it wasn't so damn hard in RubyForge. But what I want to explain is this. Ducktator is about validating things. But not everywhere. You shouldn't use Ducktator at those places where you have one or two checks for something in an object. You should really only use it at the borders of your code. The borders where you you will receive complex objects. Really complex objects where a method_missing won't tell anyone anything useful at all. The use case I had in mind when writing the library was for RubyGems, when the YAML spec for a Gem has been loaded, to check that the important parts actually have what it takes to get into the source index. Since I managed to break RubyGems this way, I feel that this kind of validation can be really important. Once again, this is validation of live Ruby objects. Nothing else. You can check practically anything you want, but the easiest examples have been about each, class and respond_to. Hope this clarifies things a bit.

I removed the entire paragraph about typing. But my recommendation still stands; if you find formal types in programming languages interesting and/or confusing, read Programming Language Pragmatics, and you will be enlightened.

The main point

The reaction to my possibly improper use of the term Duck typing engendered a very strange response, which I hadn't expected. Of course, I realize that this is a very obvious community effect. Since Duck typing is one of the trademarks of the Ruby community, it also means everyone has opinions on it, and more importantly feel the need to defend it as soon as some threat is perceived. Steve Yegge has written lots and lots about what language religion is really about, and I feel that this is an extension of that issue, so I won't write more about it here either. You can find more in many of this excellent Drunken Blog Rants.

Finally. Balance is what I'm after. One person (Austin) said that the d**k t****g philosophy (I had written the word 'issue' here. That seems to have been misinterpreted. I blame that on my poor grasp of English, since my mother tounge is Swedish. =) is about TRUST. That you should trust the caller of your library to read your documentation (which - obviously - is perfect), and supply the correct objects. This isn't too much to task if your docs are up to notch. And if the caller is the same one that will suffer if he mishandles your library. But trust isn't enough when you're at the borders. When talking to other languages through shaky serialization systems. When talking with clients that possibly could be hostile. (Yes, in this case setting $SAFE helps, but it doesn't go all the way). (Sandbox is - or will be - a good alternative here, but I still see places where object validation is a better solution.)

Further, Austin responded in his blog post that he thinks I have 'set up a false dichotomy here: people who are for duck typing as trusting your caller are against validation'. This wasn't my intention. Actually more the other way around. I am for duck typing, in most places. What I'm saying is that no solution is perfect at all points in your code and duck typing is good fit in many, but not all. Further, the next paragraph clarifies my wish for balance.

What I'm saying is, most of the time you won't need it, but in some cases, some kind of interface validation really helps a lot. I know the so called dynamic community doesn't like to hear this. But what is so dynamic about failing without control? (The arguments I heard about letting code fail when the method isn't there sounded very much to me like failing without control. That was my interpretation of the argument that you don't need to use respond_to? for duck typing.) I know that I, as a developer isn't infallible. I make mistakes. Most of the times I am in control of all my objects, but there are times when I'm not. For example, there are situations where I develop smaller applications for other (non-programmers) people. I like to create configurations and rules in YAML for these projects and leave the client in charge of configuring the application. But, what if he/she/it makes a mistake? Using the 'other' way, I would fail when trying to call protocol on something that should have been an URI, but wasn't because the person made a typo and put an illegal character inside the URL. Will that message help the person doing the configuration? Should you wrap your calls in rescue's all over the place and give the same explanation? Should you trust that the (non-programming) client should be able to read your RDoc and figure out that a method (which I bet you didn't name get_uri_from_yaml_configuration) failed because of something they did in the configuration? I believe not.

What I'm really ranting about is balance. There needs to be a balance between checking and laissez-faire. In most places, just calling the method is fine. In other places it's appropriate to check with respond_to?, in some cases you need to check the class. We're programmers. We are supposed be good at judging which technique to use where. Yes, Ruby is dynamic language. Yes, Ruby is very easy to learn. Yes, Ruby makes most stuff very easy on you. That doesn't mean you should stop thinking. It doesn't mean you should be lazy. We are programmers, and we should be able to adapt.

One more time. Balance. Balance. Everywhere. And I do love the Ruby community. It is the best. Even though people get mad at each other, we can solve our differences. I'm proud of being a part of it.

MySQL, some concrete suggestions!

After my post Rails, Databases, ActiveRecord and the path towards damnation, I got an e-mail from Mårten Mickos, the CEO of MySQL. He asked me to provide concrete suggestions on how to improve MySQL (since the other post just contained some unspecified not-like vibes), so that's the rationale for this post. I'm going to point at a few things I see as a problem for using MySQL as a production database right now. Standard disclaimer stands: these are my opinions, my own only, and my employer doesn't necessarily agree or disagree with them on any level.

Let us jump into the fray:

Sequences. I would like real, nice and sweet sequences. I really don't like to have no control of my primary key generation, and I especially don't like that I can't have sequences for anything else. The recommended solution according to the manual is to create a table with one auto-increment column in it, and use this as a sequence. That's not acceptable, especially since I cannot tie this so-called sequence to the generation of id's on other tables with subselects and other fun things.
OK, I really don't like the auto-increment feature. Why not provide an IDENTITY keyword like the non-core feature ID T174+T175 specifies?
Real, honest-to-god, boolean types. Real ones. Not tinyint(1)s. Not enums. Not tinyint's hidden behind the word boolean (like JDBC). Real boolean types.
I would like table1 and Table1 to be different (as per the spec). Oh yes, we seem to live in an insensitive world (case and otherwise) with Windows all over the place. But in my database I want that kind of control.
Limiting the return values of result sets. Now, I have no problem with LIMIT and friends, but since there is a spec, and that spec has a feature for this functionality too (T611), why can't that be in MySQL?
Time-types should be able to store fractional seconds and time zones.
And what's the matter with the TIMESTAMP type? That doesn't really do what the standard says it should do. Please give it a name not in the standard.
And for Pete's sake, double bars is for concatenation in SQL. || is for 'or' in programming, but SQL is a DSL. This screams leaky abstractions and is very annoying.
Stability of 5.0 features. I know triggers, foreign keys and stored procedures are all there now. But frankly, I don't trust my referential integrity with them yet. Not from a database vendor that a few years ago wrote in their manual that the only reason for foreign keys was to be able to let GUI's diagram relationships between database objects. Not from a vendor that said that you don't need transactions to ensure data integrity. All in all, I want these features to be around a few hours, get the bugs hashed out, let them be pounded on for a while. But that's not going to happen if people move to Rails, since Rails doesn't believe in data integrity or foreign keys.

Well, that's that. Only my opinions, remember? Anyway, for small and fast development, MySQL is really useful. I'm just arguing that a big production system should choose something else.

Ruby Metaprogramming techniques

Updated: Scott Labounty wondered how the trace example could work and since a typical metaprogramming technique is writing before- and after-methods, I have added a small version of this.
Updated: Fixed two typos, found by Stephen Viles

I have been thinking much about Metaprogramming lately. I have come to the conclusion that I would like to see more examples and explanations of these techniques. For good or bad, metaprogramming has entered the Ruby community as the standard way of accomplishing various tasks, and to compress code. Since I couldn't find any good resources of this kind, I will start the ball running by writing about some common Ruby techniques. These tips are probably most useful for programmers that come to Ruby from another language or haven't experienced the joy of Ruby Metaprogramming yet.

1. Use the singleton-class

Many ways of manipulating single objects are based on manipulations on the singleton class and having this available will make metaprogramming easier. The classic way to get at the singleton class is to execute something like this:

 sclass = (class << self; self; end)

RCR231 proposes the method Kernel#singleton_class with this definition:

 module Kernel
def singleton_class
class << self; self; end
end
end

I will use this method in some of the next tips.

2. Write DSL's using class-methods that rewrite subclasses

When you want to create a DSL for defining information about classes, the most common trouble is how to represent the information so that other parts of the framework can use them. Take this example where I define an ActiveRecord model object:

 class Product < ActiveRecord::Base
set_table_name 'produce'
 end

In this case, the interesting call is set_table_name. How does that work? Well, there is a small amount of magic involved. One way to do it would be like this:

 module ActiveRecord
class Base
def self.set_table_name name
define_attr_method :table_name, name
end

def self.define_attr_method(name, value)
singleton_class.send :alias_method, "original_#{name}", name
singleton_class.class_eval do
 define_method(name) do
   value
 end
end
end
end
 end

What's interesting here is the define_attr_method. In this case we need to get at the singleton-class for the Product class, but we do not want to modify ActiveRecord::Base. By using singleton_class we can achieve this. We have to use send to alias the original method since alias_method is private. Then we just define a new accessor which returns the value. If ActiveRecord wants the table name for a specific class, it can just call the accessor on the class. This way of dynamically creating methods and accessors on the singleton-class is very common, and especially so in Rails.

3. Create classes and modules dynamically

Ruby allows you to create and modify classes and modules dynamically. You can do almost anything you would like on any class or module that isn't frozen. This is very useful in certain places. The Struct class is probably the best example, where

 PersonVO = Struct.new(:name, :phone, :email)
p1 = PersonVO.new(:name => "Ola Bini")

will create a new class, assign this to the name PersonVO and then go ahead and create an instance of this class. Creating a new class from scratch and defining a new method on it is as simple as this:

 c = Class.new
c.class_eval do
define_method :foo do
puts "Hello World"
end
end

c.new.foo    # => "Hello World"

Apart from Struct, examples of creating classes on the fly can be found in SOAP4R and Camping. Camping is especially interesting, since it has methods that creates these classes, and you are supposed to inherit your controllers and views from these classes. Much of the interesting functionality in Camping is actually achieved in this way. From the unabridged version:

 def R(*urls); Class.new(R) { meta_def(:urls) { urls } }; end

This makes it possible for you to create controllers like this:

 class View < R '/view/(\d+)'
def get post_id
end
end

You can also create modules in this way, and include them in classes dynamically.

4. Use method_missing to do interesting things

Apart from blocks, method_missing is probably the most powerful feature of Ruby. It's also one that is easy to abuse. Much code can be extremely simplified by good use of method_missing. Some things can be done that aren't even possible without. A good example (also from Camping), is an extension to Hash:

 class Hash
def method_missing(m,*a)
if m.to_s =~ /=$/
  self[$`] = a[0]
elsif a.empty?
  self[m]
else
  raise NoMethodError, "#{m}"
end
end
end

This code makes it possible to use a hash like this:

 x = {'abc' => 123}
x.abc # => 123
x.foo = :baz
x # => {'abc' => 123, 'foo' => :baz}

As you see, if someone calls a method that doesn't exist on hash, it will be searched for in the internal collection. If the method name ends with an =, a value will be set with the key of the method name excluding the equal sign.

Another nice method_missing technique can be found in Markaby. The code I'm referring to makes it possible to emit any XHTML tags possible, with CSS classes added into it. This code:

 body do
h1.header 'Blog'
div.content do
'Hellu'
end
end

will emit this XML:

  <body>
<h1 class="header">Blog</h1>
<div class="content">
Hellu
</div>
</body>

Most of this functionality, especially the CSS class names is created by having a method_missing that sets attributes on self, then returning self again.

5. Dispatch on method-patterns

This is an easy way to achieve extensibility in ways you can't anticipate. For example, I recently created a small framework for validation. The central Validator class will find all methods in self that begin with check_ and call this method, making it very easy to add new checks: just add a new method to the class, or to one instance.

 methods.grep /^check_/ do |m|
self.send m
end

This is really easy, and incredibly powerful. Just look at Test::Unit which uses this method all over the place.

6. Replacing methods

Sometimes a method implementation just doesn't do what you want. Or maybe it only does half of it. The standard Object Oriented Way (tm) is to subclass and override, and then call super. This only works if you have control over the object instantiation for the class in question. This is often not the case, and then subclassing is worthless. To achieve the same functionality, alias the old method and add a new method-definition that calls the old method. Make sure that the previous methods pre- and postconditions are preserved.

 class String
alias_method :original_reverse, :reverse

def reverse
 puts "reversing, please wait..."
 original_reverse
end
end

Also, a twist on this technique is to temporarily alias a method, then returning it to before. For example, you could do something like this:

 def trace(*mths)
add_tracing(*mths) # aliases the methods named, adding tracing
  yield
remove_tracing(*mths) # removes the tracing aliases
end

This example shows a typical way one could code the add_tracing and remove_tracing methods. It depends on singleton_class being available, as per tip #1:

 class Object
  def add_tracing(*mths)
    mths.each do |m|
      singleton_class.send :alias_method, "traced_#{m}", m
      singleton_class.send :define_method, m do |*args|
        $stderr.puts "before #{m}(#{args.inspect})"
        ret = self.send("traced_#{m}", *args)
        $stderr.puts "after #{m} - #{ret.inspect}"
        ret
      end
    end
  end
 
  def remove_tracing(*mths)
    mths.each do |m|
      singleton_class.send :alias_method, m, "traced_#{m}"
    end
  end
end

"abc".add_tracing :reverse

If these methods were added to Module (with a slightly different implementation; see if you can get it working!), you could also add and remove tracing on classes instead of instances.

7. Use NilClass to implement the Introduce Null Object refactoring

In Fowlers Refactorings, the refactoring called Introduce Null Object is for situations where an object could either contain an object, or null, and if it's null it will have a predefined value. A typical exampel would be this:

 name = x.nil? ? "default name" : x.name

Now, the refactoring is based on Java, which is why it recommends to create a subclass of the object in question, that gets set when it should have been null. For example, a NullPerson object will inherit Person, and override name to always return the "default name" string. But, in Ruby we have open classes, which means you can do this:

 def nil.name; "default name"; end
x # => nil
name = x.name # => "default name"

8. Learn the different versions of eval

There are several versions of evaluation primitives in Ruby, and it's important to know the difference between them, and when to use which. The available contestants are eval, instance_eval, module_eval and class_eval. First, class_eval is an alias for module_eval. Second, there's some differences between eval and the others. Most important, eval only takes a string to evaluate, while the other can evaluate a block instead. That means that eval should be your absolutely last way to do anything. It has it's uses but mostly you can get away with just evaluating blocks with instance_eval and module_eval.

Eval will evaluate the string in the current environment, or, if a binding is provided in that environment. (See tip #11).

Instance_eval will evaluate the string or the block in the context of the reveiver. Specifically, this means that self will be set to the receiver while evaluating.

Module_eval will evaluate the string or the block in the context of the module it is called on. This sees much use for defining new methods on modules or singleton classes. The main difference between instance_eval and module_eval lies in where the methods defined will be put. If you use String.instance_eval and do a def foo inside, this will be available as String.foo, but if you do the same thing with module_eval you'll get String.new.foo instead.

Module_eval is almost always what you want. Avoid eval like the plague. Follow these simple rules and you'll be OK.

9. Introspect on instance variables

A trick that Rails uses to make instance variables from the controller available in the view is to introspect on an objects instance variables. This is a grave violation of encapsulation, of course, but can be really handy sometimes. It's easy to do with instance_variables, instance_variable_get and instance_variable_set. To copy all instance_variables from one object to another, you could do it like this:

 from.instance_variables.each do |v|
to.instance_variable_set v, from.instance_variable_get(v)
end

10. Create Procs from blocks and send them around

Materializing a Proc and saving this in variables and sending it around makes many API's very easy to use. This is one of the ways Markaby uses to manage those CSS class definitions. As the pick-axe details, it's easy to turn a block into a Proc:

 def create_proc(&p); p; end
create_proc do
puts "hello"
end       # => #<Proc ...>

Calling it is as easy:

 p.call(*args)

If you want to use the proc for defining methods, you should use lambda to create it, so return and break will behave the way you expect:

 p = lambda { puts "hoho"; return 1 }
define_method(:a, &p)

Remember that method_missing will provide a block if one is given:

 def method_missing(name, *args, &block)
block.call(*args) if block_given?
end

thismethoddoesntexist("abc","cde") do |*args|
p args
end  # => ["abc","cde"]

11. Use binding to control your evaluations

If you do feel the need to really use eval, you should know that you can control what variables are available when doing this. Use the Kernel-method binding to get the Binding-object at the current point. An example:

 def get_b; binding; end
foo = 13
eval("puts foo",get_b) # => NameError: undefined local variable or method `foo' for main:Object

This technique is used in ERb and Rails, among others, to set which instance variables are available. As an example:

 class Holder
def get_b; binding; end
end

h = Holder.new
h.instance_variable_set "@foo", 25
eval("@foo",h.get_b)

Hopefully, some of these tips and techniques have clarified metaprogramming for you. I don't claim to be an expert on either Ruby or Metaprogramming. These are just my humble thoughts on the matter.

tisdag, september 19, 2006

Announcing Ducktator - A Duck Type Validator

As I hinted in my last post, I feel with all my heart that there should be some way to actively validate my static expectations on certain kinds of objects. Now, respond_to? and friends are fine, but they do not scale. Not at all. So, I have built Ducktator - a duck type validator. It uses a very recursive, extensible rule syntax. Rules can be specified in either YAML or simple Ruby, with hashes and arrays and all that stuff.

First though, where would that be useful? Not everywhere of course, but these are the places that just drops into my head when writing this: Validating objects that have been serialized or marshallad. Validating what you get when loading YAML files, so that the object graph matches what your code does. Write test cases that expect a complicated object back. The possibilities are many.

Ducktator is very easy to extend. Basically, you just create a method on the validator whose name begins with "check_" and this will be automatically called for all objects. The base library is divided into modules that are mixed-in to the central Validator. I won't detail exact usage here, but just show an example. First, the rule file, which resides in rules.yml:


---
root:
class: Hash
 each_key: {class: String}
 each_value:
  class: Array
   value:
   - - 0
     - class: Symbol
  - - 1
     - class: Integer
     - max: 256

Then, our code to create a Validator from this:


require 'ducktator'
v = Ducktator::from_file('rules.yml')

And lastly, to use it to validate the objects foo and bar:


foo = {'baz' => 13}
bar = {'b1' => [:try1, 130],
      'q16' => [:foobaz, 255]}
v.valid?(foo) # => false
v.valid?(foo,bar) # => false
v.valid?(bar) # => true

Whereto
Now, you'll certainly be wondering where to get this interesting code. As always, it will be found on RubyForge here, and the first release is available through gems, so just gem install ducktator and you should be set to go.

It is licensed with a nice MIT license and I am the project creator, maintainer et al.

YAML needs schema

It has been said before and it needs to be said again. YAML really needs schema. Now, before all your enterprisey warning bells start ringing I want to add that I'm only proposing this for specific applications. Most uses of YAML can continue gladly without any need for schema. But for some cases the security and validation capabilities of a good YAML schema would be invaluable. One example could be for RubyGems. It shouldn't be possible to crash RubyGems with bad YAML. Also, in all cases where Ruby emits objects as YAML it should be possible to automatically generate a schema specification from the object structure. This means that in many cases you may not need to create your schema by hand. You could just serialize your domain objects to YAML, take the schema generated and modify it as needed.

What would the advantages of YAML schema be? Numerous:

Validation: Validate that a YAML file conforms to your expectations before loading it
Default values: The possibility to provide default values for missing parts of the YAML, making convention over configuration even more powerful. With reasonable defaults most YAML documents could shrink dramatically in size.
Tool help: GUI builders and other tools would be able to help you construct your YAML-file from scratch. I like being able to auto-complete XML with nXML in Emacs. Very neat. I just wish I had that capability with yaml-mode too.
Loading hints and instructions: A schema could specify that the key named 'foo' always has a value with the tag !ruby/object:Gem::Specification or that all integer values should be decimal, regardless of leading zeroes. Many instructions that you at this point need to customize your YAML system to achieve would be automatic.
Remove clutter from YAML-file: If the schema defines the tags for values, it means that this information doesn't need to appear in the YAML file itself, reducing clutter and noise. This would make it even easier to edit YAML files by hand.

A YAML schema format should be specified in YAML, and it should be self hosting (meaning it's format language should be definable in itself). For most parts it seems we can use ideas from XML Schema. The only part I'm not really sure about for YAML schema is how to bind a document to a schema. Maybe the best way would just be to add a new directive that specifies the schema for that document. I don't believe that YAML needs different schema for different parts of documents right now, though. I don't think we need the proliferation of schema metadata inside the YAML document that XML experiences. (Anyone tried to manually work with a WSDL-file which includes all requires namespaces and such? Nightmare!)

There are a few different parts needed for this to work. I believe it could be done with the current YAML spec (and retrofitted on YAML 1.0 too), since the only real change to the document would be a new directive in the stream header. The next step is that someone starts defining a format for schema. Then, a tool would be needed that could validate against schema. This wouldn't reap us all benefits of schema, but it's a start. The final step would be to integrate schema support in existing YAML libraries, to allow validation and using schema for metadata information.

Actually, this solves exactly half the problem, the part of the problem I call the external validation. The other part is not YAML specific, and it's something I've been thinking about for Ruby. This regards validation of object hierarchies in the current language. Expect some more info on this in one or few days. I want to have something usable to release. But I believe the Ducktator will be really useful for certain use cases.

Ola Bini: Programming Language Synchronicity