måndag, september 10, 2007

Should Ruby have optional typing and compiler directives?

I'm starting to miss a few Common Lisp special forms more and more. To be more specific, I miss the family of declare/declaim/proclaim. I think a version of them could actually be extremely useful for Ruby. The current approach to String encodings in 1.9.1 uses a "pragma" inside of a comment on the first line of a Ruby file, like this:
# coding=utf-8
This works fine for coding, I guess, but I would like to have something more general - something that can be used inside of code too. As a typical example of the kind of thing I would love to be able to do, here is an example of a Common Lisp implementation of fib:
(defun fib (n)
(cond
((or (eql n 0) (eql n 1)) n)
(t (+
(fib (- n 1))
(fib (- n 2))))))
As you can see, it's the standard implementation. But, after this implementation is finished, I can continue by adding declarations:
(defun fib (n)
(declare (integer n))
(cond
((or (eql n 0) (eql n 1)) n)
(t (+
(fib (- n 1))
(fib (- n 2))))))
This will help the compiler by telling that the n-argument is always an integer. You can also add more information to the declaration:
(defun fib (n)
(declare (integer n)
(optimize (safety 0)
(speed 3)))
(cond
((or (eql n 0) (eql n 1)) n)
(t (+
(fib (- n 1))
(fib (- n 2))))))
Now we're looking at some directives about how the compiler prioritize between safety and speed. You can also add directives for how much debug information should be retained or if the compiler should improve speed or size. All of this is interesting, but if I compile it with SBCL I'll get a few warnings. I've specified I want a high level of speed and low safety, but there are many optimizations that SBCL still can't do, so it warns about them. In most cases it's because there is no way to see if the result of the fib-operations return numbers, floats or rationals. If fib is seen as a bottleneck, the way to fix this is to add declarations inline the code in the style of this: (the numeric (- n 1)) and so on. This will allow the compiler to generate the best code possible.

Now, these kinds of changes are quite intrusive, and they aren't really something you want all over your code base. That's why you can do these declarations selectively, at only the places where it makes sense from a performance perspective to do it. By doing it correctly, Common Lisp code can be compiled to be incredibly efficient and fast, but very clean since in most cases you don't add the type declarations at all.

This is also why Common Lisp is very good for rapid prototyping. You can get something working really quickly, but you can also change the code to be efficient later on with quite small changes.

I guess you all wonder what this has got to do with Ruby. Or maybe you don't. Anyway - I want this in Ruby, or something equivalent to it. Not necessarily the exactly same thing, but something which in a standard way can add type declarations and also other things that can be interesting to know from a compiler perspective. It needs to be a keyword with specific syntax so the information is available before runtime. It should probably be used for the encoding declarations too, so it should be possible to use either at top level, or within class declarations , or method definitions. What do I want to be able to do? Well, type declarations is the first one. The second is compiler directives that can help improve code more than is possible right now. A typical example could look like this, maybe:
declare encoding: utf-8

def test_one(first, second, *rest)
declare type: [Fixnum first, Enumerable second, Array rest]
declare return: Fixnum
declare compiler: [no-bindings, no-eval, no-closures]

puts "hello"
return 1+1
end
This code includes several things that would be very useful to know and would allow compilers like JRuby, XRuby, YARV, Rubinius and IronRuby to produce much better code. If we know that the arguments should always be specific types we can declare this and hopefully get better results. The return value is also usually the same type and declaring this can make code that uses this method more efficient. Finally, the compiler declarations will help with some of the pain points for currently compiling things efficiently. If no bindings, or evals or closures are used, the code generated will be much better.

This kind of information would be very useful not only for the compiler but also for tool support. If you know something should always be a specific type, you can add that as a declaration and things will automatically function better.

I know that this is kind of un-Ruby like, but I really believe there is great value to be had by adding this kind of declarations.

11 kommentarer:

Anonym sa...

If I understand you correctly, this seems similar to the function annotations being introduced in Python 3 (but your Lisp example seems more powerful and geared towards the compiler).

I guess the "optionality" of features like these will add a lot of value. You can continue with your rapid prototyping, and then add typing and optimization in later iterations when you are getting ready for production. In fact, this would fit nicely with agile methods.

Anonym sa...

Just to share how other langs do this: In Dylan (which is also a Lisp relative), you can also optionally specify type information, e.g.

define function built-a-house (area, height :: <integer>, windows :: <sequence>) => (house :: <house>)
...
end define

This doesn't only improve performenace, but also enables multiple dispatch based on the given type.

Daniel Berger sa...

At least give us some nice notation:

def test_one(Fixnum first, Enumerable second, *rest) => Fixnum
puts "hello"
return 1+1
end

compiler :test_one [no-bindings, no-eval, no-closures]

Keep the compiler directives outside the method if possible. Something like an annotation along the lines of public/private/protected.

Also, you shouldn't have to explicitly specify the type for "*rest". The asterisk tells you that it's an Array already.

Daniel Spiewak sa...

It's an interesting idea. Should even be possible without screwing the language too much. The following fragment would provide the benefits of your example, without requiring new syntax:

declare return=Fixnum
declare type= :first => Fixnum, :second => Enumerable, :rest => Array
declare compiler= :no_bindings, :no_eval, :no_closures

Since declare is obviously just a method with named parameters, you could streamline the above into a single call if you really wanted to.

Ola Bini sa...

@daniel berger: there was a very specific reason for not putting the type declarations inside the declaration of the arguments. the reason was that I wanted to make it possible to use exactly the same syntax for locals and instances variables. this is also why I put it inside of the method instead of outside. in my opinion, the state changing of private/protected/public is kinda gross, and not at all thread safe.

@daniel spiewak: I considered using "standard" Ruby syntax but decided against it - the reason being that declare would need to be a parse time keyword and should definitely not look anything like regular Ruby. that would invite using variables as arguments or other fallacies like that. I think it's best to keep it as a keyword, totally separate in functioning.

Charles Oliver Nutter sa...

We could of course implement this in JRuby in a way that would be non-breaking if the same code ran in regular Ruby. For example, via additional syntax normally parsed as comments, or in a way similar to Rubinius's primitive VM instructions. It would potentially make it easier to implement parts of JRuby in Ruby without getting a perf hit from using heavily dynamic types.

Anonym sa...

I second daniel spewak. Implementing this using plain ruby sintax doesn't complicate ruby grammar, and we could use no-op function to be used in "non-compiling" ruby implementations.

Anonym sa...

I don't understand why you want ruby to be like Lisp, or Java or C or C++ or whatever. Surely, if you like Lisp you should use Lisp.

It's fine to chat about stuff and throw ideas out there, but let Ruby be Ruby. See where it goes on it's own. Lisp remains niche, so we know where that went - ie not as succesful (these days) as Ruby.

If anything, you should be recommending that List gets rid of the features that have seemingly held it back from the big time.

Jonathan

Unknown sa...

I'm probably revealing my ignorance here but anyway:

If you declare arguments as Fixnum/Array/etc. don't you jeopardize future use of the code that could rely on the dynamic nature of the language?

Anonym sa...

Hi Ola,

We already have this in Ruby In Steel. We call them type assertions. They exist in comment blocks so don't affect the meaning of Ruby code. They are currently most useful for providing any IntelliSense which cannot be inferred (e.g. input args) but we plan to extend this to add additional features in future.

You can get an overview of Ruby In Steel type assertions by watching the short "ParameterInfo & Type Assertions" screencast on this page: http://www.sapphiresteel.com/Ruby-In-Steel-Movies

best wishes

Huw

Unknown sa...

I don't know very much about Ruby, but, as I understood, this should be used only in very few cases, for performance improvement of intensive use code. So, wouldn't be interesting to take out this optimizations out of the code? Some file like optimizations.jrb with that format:

Person.testMethod.variable is Fixnum
Person.testMethod.variable2 is Date
Person.testMethod.options =[no-bindings, no-eval, no-closures]

This remains the code compatible with other xRuby versions and gives you the same functionality. Obviously this solution is useless and error prone if you use it masively...