onsdag, november 08, 2006

Nooks and Crannies of Ruby

There are many small parts of Ruby, tips, tricks and strange things. I thought that I would write about some of the more interesting of these, since some of them are common idioms in the Ruby community. The basis for the information is as always from the Pick-axe, but how these things are used in real life comes from various places.

The splat operator

The asterisk is sometimes called the splat operator when not used for multiplication. It is used in two different places for opposite cases. When on the right hand side of an expression, it is used to convert an array into more than one right hand value. This makes splicing of lists very easy and nice to do.
a,b,c = *[1,3,2]
Second, it's used at the left hand side to collect more than one right hand value into an arra
*a = 1,3,2
This makes no difference if you're calling a method or assigning variables. What matters is as usual with programming languages; that there is a left hand side and a right hand side (lhs and rhs from now on):
def foo(a,*b)
p b
end

foo 1,2,3,*[4,5,6]
This is all old news, and not very exciting. It's useful and the basis for some niceties, but nothing overwhelming. The thing that is really nice about the rhs version of the splat operator is what it does if the value it's applied to isn't an array. Basically, the interpreter first checks if there is a to_ary-method available. If not, it goes for the to_a method. Now, Kernel has a default to_a-method so all objects will respond to to_a. This method is deprecated to call directly, though, but if called through splat or Kernel#Array it doesn't generate a warning. So:
a = *1
will result in the same thing as
a = 1
except for jumping through some unnecessary hoops underneath the covers. But say that you have an object that implements Enumerable and you want to do something with. Maybe transform a Hash into an array of 2-element arrays, you can do it like this:
*a = *{:a=>1,:b=>2}
Now, this still isn't that useful. Oh, it's slightly useful but there is a method in Hash that does this too. But say that we have a file object:
*a = *open('/etc/passwd')
Since File includes Enumerable, it also has a to_a method which creates the array by using each to iterate and collect all elements. In this case all the lines in the file.
def foo(*args)
bar(*args)
end
Camping uses the splat operator at many places, mostly with the common idiom to take any arguments offered and passing them all on as separate arguments again:

Symbols and to_proc

I hesitate to use the word neat, but I can't really find anything that better describes the sweet, sweet combination of symbols and to_proc. I'm going to show you a small example of how it's used before I explain this very common practice:
[1e3,/(foo)/,"abc",:hoho].collect &:to_s
Now, this code will not run without a small addition to your code base. But first of all, let's just walk through the code. First we define a literal array that contains four elements of different type. One Float, one Regexp, a String and a Symbol. Then we call collect to make a new array out of this. But where we usually provide collect with a block, we instead see the ampersand that symbolizes that we want to turn a Proc-object into a block argument for a method. But what comes next is not a variable, but a symbol. So, what happens? Well, the ampersand checks if the value provided to it is a Proc, and if not it calls to_proc on the value in question, if such a method is defined. And how should this method look? Like this:
class Symbol
def to_proc
lambda { |o| o.send(self) }
end
end
Now, this method is nothing much. But it employs some fun trickery. It first creates a Proc by calling Kernel#lambda with a literal block. This block takes one argument, and the block calls the method send on the argument with itself as argument. As self in this case would be a symbol, and specifically the symbol :to_s in the above example, the end result is that the Proc returned will call to_proc on each object yielded to the block. So, with this explanation it's easier to understand what the first example does. In effect it is exactly the same as
[1e3,/(foo)/,"abc",:hoho].collect {|v| v.to_s}
but without that nasty duplication of the v-argument. It's not a big saving, but many small savings...

I recommend installing facets, which include numerous small, nice solutions like this. They can also be required separately, so if you have facets installed, just require 'facet/symbol/to_proc' to get this specific functionality included.

Using operators as method names

Ruby allows much more operators to be redefined than most languages. This makes some interesting tricks possible, but most importantly it can make your code radically more readable. An excellent example of this can be found in the net/ldap-library (available as ruby-net-ldap from RubyGems). Now, LDAP uses something called filters for searching, and the syntax for filters are basically prefix notation with ampersand, pipe and exclamation mark for and, or and not, respectively. Now, with the net/ldap-library you can define a combined filter like this:
include Net
f = (LDAP::Filter.eq(:cn,'*Ola*') & LDAP::Filter.eq(:mail,'*ologix*')) |
LDAP::Filter.eq(:uid,'olagus')
This defines a filter that basically says: find all entries where cn is '*Ola*' and mail is '*ologix*' or uid is 'olagus'. This is very readable thanks to the infix operators, that for everyone who knows LDAP will be easy to understand.

The next example comes from Hpricot, where _why puts the slash to good use:
doc = Hpricot(open("http://redhanded.hobix.com/index.html"))
(doc/"span.entryPermalink").set("class", "newLinks")
Note how neatly doc/"span..." fits in, and it looks like XQuery, or any other path query syntax. But it's just regular Ruby code and the slash is just method call. I'm really sad that /. isn't allowed as a method in this way... =)

Now, ackording to the Pickaxe, all of these infix operators will be translated from arg1 op arg2 into arg1.op(arg2). But Ruby still needs to be able to parse everything. This means that most operators need to have one required argument. Trying this with a home defined *-operator will not work:
x = a *
But, an experimental syntax for importing packages in JRuby actually used this effect:
import java.util.*
This is just a simple exploatation of the fact that * is a regular method name and used like this will be parsed by Ruby like that too, which means it doesn't need an argument. So, which operators are available for your leisure? Ackording to the Pickaxe, these are [], []=, **, !, ~, + (unary), - (unary), *, /, %, +, -, >>, <<, &, ^, |, <=, <, >, >=, <=>, ==, ===, !=, =~, !~.
Note that the method names when implementing the unary + and - is +@ and -@:
class String
def -@
swapcase
end
end
The most important thing to remember when reusing operators like this is to not overdo it. Use it where it makes sense and is natural but not elsewhere. Remember that Ruby code should follow the principle of least surprise. The above example of using unary minus to return a swapcased version of the string is probably not obvious enough to warrant its use, for example.

Using lifecycle methods to simplify daily life

Inversion of control is all the rage in the Java world right now, but using callbacks of call kinds have always been a great way to make readable and compact. The Observer pattern is used in many places, and I suspect it's implemented without any knowledge of the pattern in most places.

Ruby contains a few callback methods and lifecycle hooks that make life that much easier for the Ruby library writer. Probably the most useful of these are Module#included. Basically, this is a method you define like this:
module Enumerable
def self.included(mod)
puts "and now Enumerable has been used by #{mod.inspect}..."
end
end
It will be called every time a module is included somewhere else.

There are other callbacks that can be useful. Module#method_added, Module#method_removed, Module#method_undefined and counterparts for Kernel with singleton prefixed. Class#inherited is interesting. Through this you can actually keep track of all direct subclasses of your class and with some metaprogramming trickery (basically writing a new inherited for each subclass that does the same thing) you can get hold of the complete tree of subclasses. If you want that for some reason. I would for example use this approach for Test::Unit, rather than iterating over ObjectSpace. But I guess that's a matter of taste.

Class variables versus Class instance variables

This is one thing that always trips people up. Including me. Class variables are special variables that are associated with a class. They are referenced with two at-signs and a name, like @@name. So far, it's simple. But classes are also instances of Class, which means that these instances can have regular one-at-sign instance variables. These are not the same thing. Not at all. Something like this:
class Foo
@@borg = []
@me = nil

def initialize
@me = self
Foo::add_borg
end

def self.add_borg
@@borg << @me
end
end
will result in a @@borg-list filled with nils. This is because the first @me refers to an instance variable in the Foo instance of Class; not the @me instance variable associated with an instance of the Foo-class.

Condensed lesson: Class have instance variables of themselves, these are rarely useful; they usually contribute to hard-to-find-errors. And don't confuse them with class variables which is a totally different kind of beast.

Shortcuts: __FILE__ and ARGF

Ruby contains a myriad of shortcuts, many influenced from Perl and other invented to make it easier to write condensed programs. The regexp result globals are always good to have, but there are other that can be very useful too. Two that I like most are __FILE__ and ARGF. __FILE__ is also part of a very, very common idiom that the Pickaxe details. Combined with the global $0 it makes it easy to differ execution when a file is required, and when it's executed. Basically, $0 contains the name of the file that has been executed. In C this would be argv[0]. __FILE__ is the full filename of the file the code can be found in. If these are the same, the current file is the one asked to execute. This is useful in many places. I use it often in gemspecs:
if $0 == __FILE__
Gem::manage_gems
Gem::Builder.new(spec).build
end
If I run the file above with gem build, this part will not execute, but if I execute the file directly, it will run.

Matz sometimes likes to show how to implement the UNIX utility cat in Ruby:
puts *ARGF
This combines tip number uno in this blog entry with the constant ARGF. ARGF is a nice special object that when you reference it will open all the files named in ARGV. If you have any options in your ARGV you'd better remove them before referencing ARGF, though. Basically what you get when referencing ARGF is a file handle to the files named on the command line. And since a File has Enumerable and thus to_a, splat will read all the lines in all the files and combine them into an array and then splay the array into the call to puts which will print each line. Here you are, cat!

There are other globals and constants available, but most aren't as useful as the previously named. For example you can use __END__ on an empty line, and the code interpolation will stop there and the rest of the file will be available as the constant DATA. I haven't seen anyone use this. It's a remnant from when Ruby was a tool to replace Perl, and the other scripting tools in UNIX.

Everything is runtime

Basically, the whole difference in Ruby compared to compiled languages is that everything happens at runtime. Actually, this difference can be seen when looking at Lisp too. In Common Lisp there are three different times when code can be evaluated: at compile-time, load-time and eval-time. In Java class-structure is fixed. You can't change class structure based on compile parameters (oh boy, sometimes I miss C-style macros). But in Ruby, everything is runtime. Everything happens at that time (except for constants... this is a different story). This means that class definitions can be customized based on environment. A typical example is this:
class Foo
include Tracing if $DEBUG
end
This class will include some methods when the -d flag is provided, and others when it's not. Basically there isn't much syntax in Ruby that couldn't be implemented in the language itself. A class declaration can be be duplicated with
Class.new(:name) do
#class declarations go here
end
And almost all parts of a method-definition with def can be provided with define_method. The glaring mismatch (blocks) will be corrected with 1.9. Except for that, it's just sugar. If statements could be implemented with duck typing/polymorphism:
class TrueClass
def if(t,f)
t.call
end
end

class FalseClass
def if(t,f)
f.call if f
end
end

x = true

x.if lambda{ puts "true" }, lambda{ puts "false"}
And that's the real Lisp inheritage of Ruby. There really isn't any essential syntax. Everything can be implemented with the basics of receiver, message, arguments, and blocks. Just remember that. It's the basis for all useful metaprogramming. There is no compile-time. Everything can change. "There is no spoon".

16 kommentarer:

Derek sa...
Den här kommentaren har tagits bort av skribenten.
Derek sa...

In the last example, wouldn't it need to be:

def if(t,f = nil) ...

or even

def if(t,&f) ...

Then you could just pass the true closure.

Unknown sa...

Hi Ola,

!= and !~ (in addition to ! of course) actually are not customizable; they're just artificial syntax converted into a not node surrounding the contained expression during parsing. See my findings during a discussion on the RSpec list for details.

Peter Cooper sa...

Condensed lesson: Class have instance variables of themselves, these are rarely useful; they usually contribute to hard-to-find-errors. And don't confuse them with class variables which is a totally different kind of beast.

Actually, they're extremely useful if you want TRUE class variables rather than class hierarchy variables (until it all changes in Ruby 1.9/2, of course).. example:

class X
def self.x=(x); @@x = x; end
def self.x; @@x; end
end

class Y < X
end

X.x = 10
puts X.x # => 10
puts Y.x # => 10

If @@x was really a 'class variable', Y.x should be nil. If you use @x in the methods, it works on a per class rather than per hierarchy basis :)

Ola Bini sa...

@Derek: True, f = nil would be better, to mimic a real if-statement.
I choose not to use a block param, since it's more consistent to use lambda for both.

@nick: of course that is true. But on the other hand you can customize !, == and =~ respesctively, which comes down to being able to customize != and !~.

@coop: yes, that is true. They are very useful in those cases, but on the other hand you don't often need it.

Unknown sa...

If you can really customize !, please show me how to do it.

def !@; nil; end doesn't work for me, I get a syntax error.

Ola Bini sa...

@nick:

You can do it, but it doesn't seem to work:
class A
module_eval do
define_method "!" do
puts "hello"
end
end
end

!(A.new)

The method is there, and you can call it by send, but it doesn't seem to execute in this case. =)

I call it a draw.

Anonym sa...

When I do:

=> a, b, c = [3, 2, 1]

it works without the splat.

Dave

Christoffer S. sa...

Neat write-up!

One additional nice feature of ARGF is that it points at STDIN if ARGV is empty.

Pedro Côrte-Real sa...

The "everything is a method call with a block" part is probably a SmallTalk inheritance more than a Lisp one.

Anonym sa...

NO NO NO NO NO NO NO

Please please do not let our lovely innocent little cute Ruby grow up to be a clone of that ugly cackling &*(&^*^%$^%$*^ Perl.

I am scared of the splat operator.

Evan Farrar sa...

Perl? looks more lisp like to me:

head,*tail = list

Unknown sa...

Thanks a lot for this post!

Lots of nice tips'n'tricks to help me increase my ruby-fu

Unknown sa...

So I have a question... maybe you can help me out. So as you pointed out in your article, you can't currently use blocks in methods defined by define_method (or so it seems).

An example of this:
class Dog
self.instance_eval do
define_method(:bark) do
if block_given?
yield
end
end
end
end

rover = Dog.new
rover.bark { puts "hello" } => nil

The yield in this case does not grab the block, which I believe is as you said.

However, it can be done with straight eval (not the most elegant solution, admittedly...) or one other way, although this way is essentially just cloning another method.

class Cat
def meow
if block_given?
yield
end
end
self.instance_eval do
define_method(:purr, instance_method(:meow))
end
end

lily = Cat.new
lily.purr { puts "meow!" }
In this case the block does get called.

Does this work just because purr is essentially a clone of meow? Also, you say they are planning to fix this in 1.9?

I'm not exactly sure what my my question here was... but I'd just like to hear some more about this issue if you know anything.

Thank You

Anonym sa...

I am affraid that yout examples on the splat operator do not make much sense to me as in most of them, the presence of the * has no visible effect compared to the same statements. I would advise that you devote a single post about the splat op and use proper irb snippets to support your claims about its behaviour.

And about your so-called customization of !, I call it a failure. Again, you would gain much credibility by substantiating your claims with proper code snippets+output.

My reading of your post brings me much FUD.

Anonym sa...

Hi, Ola,
I learnt quite a bit on Ruby Metaprogramming from your posts. I have a suggestion on this point:

Class#inherited is interesting. Through this you can actually keep track of all direct subclasses of your class and with some meta trickery (basically writing a new inherited for each subclass) you can get hold of the complete tree of subclasses..

Actually, I think that no code is necessary on any of the classes, with exception of the top class, that woud just include a module. The module defines included, which in turn defines inherited on the metaclass of the class including the module. Any subclass automatically triggers the inherited method, which keeps track of the Hierarchy Tree. I just tried the following (of course omitting the construction of the Tree):

module TreeBuilder

def self.included(root)
root.class_eval do
class << self
def inherited(subclass)
puts "#{subclass} subclassed #{self}"
end; end; end; end

end

class Top; include TreeBuilder; end
class B < Top; end
class C < B; end
# etc