lördag, juni 09, 2007

What's wrong with this code?

Today I will introduce to you a method from ActiveRecord. The method takes a parameter called type and that value can bu for example :primary_key, :string or :integer. Now, in the first line there is a call to native_database_types. Generally, that call returns a structure that looks somewhat like this:
def native_database_types #:nodoc:
{
:primary_key => "int(11) DEFAULT NULL auto_increment PRIMARY KEY",
:string => { :name => "varchar", :limit => 255 },
:text => { :name => "text" },
:integer => { :name => "int", :limit => 11 },
:float => { :name => "float" },
:decimal => { :name => "decimal" },
:datetime => { :name => "datetime" },
:timestamp => { :name => "datetime" },
:time => { :name => "time" },
:date => { :name => "date" },
:binary => { :name => "blob" },
:boolean => { :name => "tinyint", :limit => 1 }
}
end

The method itself looks like this.
def type_to_sql(type, limit = nil, precision = nil, scale = nil) #:nodoc:
native = native_database_types[type]
column_type_sql = native.is_a?(Hash) ? native[:name] : native
if type == :decimal # ignore limit, use precison and scale
precision ||= native[:precision]
scale ||= native[:scale]
if precision
if scale
column_type_sql << "(#{precision},#{scale})"
else
column_type_sql << "(#{precision})"
end
else
raise ArgumentError, "Error adding decimal column: precision cannot be empty if scale if specified" if scale
end
column_type_sql
else
limit ||= native[:limit]
column_type_sql << "(#{limit})" if limit
column_type_sql
end
end

There is something very wrong with this implementation. Of course, there could exist many errors here, but what I'm thinking about right now is a violation of the usual way methods should work. And in effect, that problem with this method have caused ActiveRecord-JDBC to implement some very inefficient code to handle this method. And it gets called a lot in ActiveRecord. I'll get back later today with a pointer to what's wrong here, and I will also discuss some of what I've done in AR-JDBC to handle this situation. I hope for many suggestions here! =)

7 kommentarer:

Lars Westergren sa...

Wild guess from a Ruby noob:
The code modifies the hash entries by concating, which is not expected behaviour. It should "dup" first.

Anonym sa...

Hm... something to do with Strings?

The << modifies the Strings, so this might be inefficient, especially if the same type (+ same precision, limit and scale) is requested over and over. Some kind of memoization could help.

Other idea (from Lars' comment): the modification of the returned {:name => ... } Hash. I guess, the method navtive_database_types must recreate the Hash (+ nested Hashes) at every call. If the Hashes and their data (Strings) were immutable, then the same Hash could be returned over and over.

Hmm... I have this suspicion that I'm missing something very easy to spot...

Anonym sa...

It's one more example of Rails using a data structure where it should be using objects.

It's endemic throughout the Rails code base.

Anonym sa...

And now, a little less gnomically. The implementation of this could, and arguably should look like:

def type_to_sql(type, limit=nil, precision=nil, scale=nil)
 ColumnType.build(type, limit, precision, scale).to_sql
end


Where ColumnType.build would build a ColumnType of the appropriate subclass each of whose implementations of to_sql would be radically simpler than the current implementation of type_to_sql.

The fact that native_database_types also fails to cache its (invariant) result, thus leading to a whole mess of objects being needlessly created and then recycled is mere icing on this particular cake.

Assuming you don't go down the class extraction route, then, at the very least, you should replace the native_database_types method with (say)

NATIVE_DATABASE_TYPES = {...}.freeze

Mark M. Fredrickson sa...

I would agree with Piers as to how it _should_ work. I would like to see ActiveField become the norm for generating SQL - if for now other reason that it makes it easier to use types that are not in the list (e.g. geometry types in Postresql's PostGIS package).

I would take it step further - why not just pass in the class name and skip the unnessecary builder module.

Here's an IRB session to illustrate:

>> def builder(klass)
>> klass.new()
>> end
=> nil
>> a = builder(String)
=> ""
>> a.inspect
=> """"
>> a = String
=> String
>> b = builder(a)
=> ""
>> b.inspect
=> """"

Anonym sa...

Funny that you posted this, as I was just digging around in this code the other day at work.

We were looking for a way to specify a type for the default MySQL primary key that rails sets because we wanted unsigned integers for our primary keys. I was left scratching my head why that was done the way it was.

I ended up writing a Rails plugin that monkey-patches the native_database_types for the MySQL adapter, and the table method for ActiveRecord::SchemaDumper so that it correctly dumps the column types.

Unknown sa...

If you guys come up with a nice fix for this, please submit a patch.

The column metadata is a part of rails that could do with a cleanup, it seems this could trigger some motivation :).

All the best.