lördag, maj 27, 2006

RbYAML version 0.1.0 released

Version 0.1.0 of RbYAML has now been released. Most of the interesting work on this was done on the flight from San Francisco and JavaOne to Stockholm. I guess I got tired of all the Java code. Anyhow, this is a major release, which improves almost all areas, with better testing, more functionality, Ruby-fied code, a new parser, and huge performance improvements.

I will take some time later this week to write more about the things I have done, implementation-wise.

lördag, maj 20, 2006

RbYAML version 0.0.2 released.

I have released version 0.0.2 of RbYAML. This is mostly fixes and convergence to the current PyYAML codebase, so nothing revolutionary. There are some things working now, that didn't before. I've also added some more automated tests.

The code can be downloaded here.

JavaOne, last day.

So. The last day of JavaOne is always a strange experience. Most people are often to tired to stand straight after 3 really intense days of information gathering and people interactions. Personally, I was to tired to go to all sessions, but I managed the general session with Gosling and McNealy, the Mustang scripting session and the one about writing good API's.

All three were worthwhile. Gosling showcased some really amazing toys, as usual. The Mustang scripting session was interesting, mostly so because it seems they've ripped some parts of the Rhino JavaScript engine out, for some reason.

The best session today was the one on writing good API's, though. It had som really interesting advice and tips about API design. Basically you should apply the same rules as when you're doing UI design.

After this, I went to the JRuby meetup, where we sat around talking for a few hours, until I felt the need to go home and pack. JRuby is really on the go now, we have momentum and some really cool stuff almost finished. Stay tuned.

fredag, maj 19, 2006

JavaOne, day 3.

So, the third day of JavaOne has also featured some interesting presentations. My blog today will not be a blow-for-blow description of these, but more a few interesting tidbits I noticed during the day.

I managed to talk to Gilad Bracha about how I thought his proposal for super packages looked very inspired by Common Lisp packages, and his response was that it was an interesting observation. He hadn't thought that way consciously until I pointed it out, so it was not designed that way, but he said that it was a good sign for the proposal that it looked like Common Lisp packages.

Actually, the first session was probably the most interesting from my perspective. This was Gilad s talk about supporting dynamically typed languages on the JVM. The first part talked about invokedynamic, which is fairly straightforward. The only new information I got about this area was that they're thinking about adding handlers for cases where the JVM can't discern a correct overloaded method to call for a dynamic invocation. In reality, this would more or less be a method_missing, available directly on the JVM, with all the performance characteristics you can get from the JIT. Nice stuff. Probably the handler architecture could also be used to implement some variations of multiple inheritance and mixins, which also is a problem to do efficiently on the JVM.

The second part of his talk was about hotswapping, which I didn't even know they're trying to get into the JVM. Basically hotswapping is what enables eval and replacing, adding and removing methods and types at runtime. This seems to be a very hard problem, but Gilad had some ideas, so it looks promising. It seems that JRuby may actually be able to run completely in JVM bytecode sometime in the future. Very cool.

After this I want to a session about simplifying enterprise development with scripting. This turned out to not match the title; it was basically another presentation on Groovy, and nothing much more.

The session on Compiler Optimizations where really interesting, and full of the kind of vocabulary that makes your head spin (but for different reasons if you're a compiler head or just a regular geek).

The Harmony session where really cool, they actually have a working (but slow) Swing implementation. The demonstration showed JEdit running inside Harmony, which is nice.

The security traps session was mostly basic material. Nothing new at all if you've been reading the books.

The last session for me today was about good ways to both an enterprise application. This presentation was really great, one of the top 3 this JavaOne, and I'm definitely planning on going home to study the slides. (It was TS-5397 if anyone wasn't there). Great stuff, really.

So, the rest of the evening will be After Dark Bash, and then out to make San Francisco unsafe.

torsdag, maj 18, 2006

JavaOne, day 2, second part.

So, the second part of day two was composed of a few different BOF's. I won't bother to talk about them all separately, since there really wasn't that much information in them.

First of all I went to the Collections Connection, which is always fun. Josh had most responsibility still, even though he's officially at Google now. They talked about the new collections in Mustang, of which the Deque interface is the most important addition. Also, navigable collections have been added. This is more or less SortedSet and SortedMap done right, with navigability from all ways.

My second BOF talked about identity management and federation. I really didn't get much out of this presentation. The presenter showcased a few standards that should be used, and some fairly complicated graphics showing how to interconnect these data transport protocols. Most of the stuff focused on SAML 2.0, XACML and ID-FF.

After that there was the BOF on Java Language and Compiler Issues, where they talked a little about the new compiler API in Mustang. The new packages javax.tools, javax.lang.model and com.sun.source seems really interesting and usable to do neat stuff. Another cool thing they showed was something called the JavacViewer, which more or less gives access to most information that the different compiler types uses internally. Parse trees, annotation processing, internal labeling; it's all there. Very cool.

Last, but not least at all, the late night BOF called "A script for more powerful Java technology-based applications" which talked about how you can leverage different scripting technologies to add a different interface to your application in a few different ways, by providing plugin possibilities, as a way of adding new features quickly, and also to make macros for getting your power users happy. The presenter used different kinds of scripting to demonstrate these techniques. Some parts integrated BeanShell, and a big part of the demonstration talked about how to write your own domain specific language, and a parser and definition for this. As the session was late at night, and there were fairly few people attending, it tended to drift to different subjects depending on questions from the audience, but this didn't detract at all. It was mostly very interesting and one of the better sessions this JavaOne.

One of the best reasons and rationales for adopting scripting languages as an approach is for your own developer needs. It makes sense to add scripting support so you can explore a huge code base, test out corner cases easily. (I know I constantly do this, start up JRuby or BeanShell inside Emacs, and test something there before using it in a real Java application).

After this session, me, Pop, Bob Evans, Charles Nutter, Thomas Enebo (the JRuby guys) and a few other went to a pub, drank some beer and continued talking scripting, JRuby, Lisp and other cool stuff for some parts of the night. I've learn some very neat stuff, and we've talked some more about the future for implementing RubyGems in JRuby. It will be very soon.

JavaOne, day 2, first part.

So, this day I've been trying to keep my notes more close to the final result seen in this blog, with the result that I'll actually be able to post information even before the day is over. So, what I'm posting now is information from the beginning of the day, to the JRuby session that ended at 5pm.

Effective Java Reloaded
Effective Java has not been reloaded. Or not yet at least. But there is much material that can be used, and the session went through some great stuff. The presentation were divided into three parts, Object Creation, Generics and Other.

So, the object creation part had some great patterns. The first regarded static factories and how you can use factory methods to improve creation of
generic instances. For example, take this horrible example of creating a HashMap:
Map<String, List<String>> m = new HashMap<String,List<String>>();
Instead, HashMap should have a factory method, and then you can do this:
Map<String, List<String>> m = HashMap.newInstance();
The recommendation is to always write your generic code like this.

There are a few disadvantages that both static factories and constructors share. A big one is optional parameters. There are many ways of solving this, but none good. The pattern to fix this is to use a variation of the builder pattern.
You create a static Builder nested class, this builder constructor takes all required parameters and then provides setters for all optional parameters. It also exposes a build method that returns a created object. An example:
final NutritionFacts twoLdietCoke = new NutritionFacts.Builder("Diet Coke",240,8).sodium(1).build();
or even
final NutritionFacts twoLdietCoke = NutritionFacts.builder("DietCoke",240,8).sodium(1).build();

This approach is really powerful. If we're lucky this interface may be added to the JDK in the future:

public interface Builder<T> {
T build();

Then we could stop passing Class objects around, and use the typesafe Builder instead.

The generic part of the session had some interesting information that was new to me, at least.
The first recommendation was to never use raw types anymore. Those are only for legacy code. Raw types are really evil.
You should never ignore compiler warnings. They should be understood and eliminated if possible. If not they should be commented, and suppressed with the SuppressWarnings annotation if it can be proved safe.

Wildcards should be preferred to explicit type parameters. In many cases this makes method signatures clearer, and you don't have
to manage a type variable. The exception to this is conjunctive types (which is really neat too).

Bounded wildcards are almost always better to use in your API, it will make it work for many more cases where people expect it to work.
The usual case when this is a problem is when you're using generics of generic types in your code. The reason this is a problems is that for example Collection<Integer> is NOT a subtype of Collection<Number>.

Bounded wildcards should never be a return type. This forces clients to deal with wildcards explicitly. Only library designers should use wildcards.
Sometimes you actually need to do it, but it's very unlikely.

Generics and arrays don't mix very well, mostly always use generics if you can.
Some people say avoid arrays altogether, but there are cases where arrays are both prettier and faster.

Finally, the presentation ended with a few various recommendations.

Use the @Override annotation. This avoids common problems when you think you're overriding something, but really isn't, for example equals or hashCode.

Final should be used everywhere, except where there really is a reason to not do that. This minimizes mutability and is clearly thread-safe, which means you have one less thing to worry about. The only problem is readObject and clone, so take care with these.

You can use a HashMap makes a fine sparse array, with generics and autoboxing.

The Serialization Proxy pattern is really neat.
Since serialization depends on implementation details you should take care with serialization.
The pattern solves these problem by having you create a new class representing the logical state of your object, and you just use writeReplace and readResolve to use this proxy to serialize your object in an implementation independent way.

Java Puzzlers
There were some really intriguing things showcased here, and everything was Tiger-oriented. I didn't take any notes, since I had way to much fun. But I definitely recommend everyone to have a look at the presentation slides.

Super packages
Gilad Bracha had a small session about the new super packages proposed for Dolphin. As he constantly told us, nothing of this is really ready or finished. The JCP process will hash everything out later.

There is really two processes going on for modularity. One is for super packages, and regards the language changes necessary for this functionality. The other part is a module approach for packaging and distribution. The packaging has nothing to do with the language. The packaging only concerns tools and environment, more or less.

The problem with current packages concern information hiding and encapsulation. There are really hard to do this in a good way in current Java. A few solutions have been proposed for this, that are easier than a real language change.
* Don't document unexposed API
* Using static classes to provide access control to different classes
* Make a small language change that makes packages nested
The conclusion is that these doesn't suffice. They are not good enough, and very hackish solutions.

A real solution will solve the packaging problem, provide encapsulation and also allow separate compilation. All this will use separate module files for changing the semantics of a program, but still having the default way for modularity to look like current Java, for providing backwards compatibility. This is also the reason annotations won't be used for this, since it would change runtime semantics of a program, which annotations should not do.

To my eyes, the syntax and semantics Gilad showed us reminds me very much of Common Lisp packages.

Spring WebFlow
Classical web packages use free navigation, stateless systems. This is not always perfect. Some business scenarios are better represented with a controlled flow of actions. Traditionally this hasn't been the focus of Web tools. Instead, most of the current frameworks focus on providing easy to use solutions for the base case of free simple navigation. There are a few reasons for this, but the simplest reason is that controlled flow is really hard to get right.

In my opinion, WebFlow is a perfect example on how you should not solve this problem. It has the right ideas, but doesn't go far enough.

The idea in Spring WebFlow is basically to describe states and state progressions either declaratively with XML, or programmatically in code. When you've done this, WebFlow takes care of most boring stuff, like state and back buttons. It's really about inverting control to the controller, instead of having the client provide parameters that the web server uses to find out where in the flow they are.

This approach is a really good solution to the problem, but it doesn't go far enough, if you ask me. When I see executable XML I always get scared, and this case is no exception. Spring WebFlow seems to be more or less a (very) poor mans continuation server. Since you can actually have real continuation servers in Java, using an embedded script language like JavaScript or Ruby, this approach isn't good enough for me.

Groovy is like Java with some Python, Ruby and Smalltalk. It's object oriented and completely Java compatible. It has iterators, code blocks (closures), and many, many DWIM hacks.

Since the JVM is standardized and more general than Java, it can be used to innovate at the source code level. There are many scripting languages for Java.
Scripting seems to be a good way to glue business code together, since you really don't have that much business code in reality. There also is a drive to test code
with dynamic languages. So scripting is just great glue; it works like programmer duct tape.

The reason for Groovy is to have something Java developers will instantly recognize. Complete binary compatibility with Java, and possibilities to use Java without wrappers and cumbersome API's.

Groovy is basically dynamic, but also supports static typing. There is native support for lists, maps, arrays and beans..
Regexps are also part of the language. There exists some operator overloading, but nothing really lethal. Groovy also adds lots of convenience methods to the JDK, for example lots of new String-methods.

It has BSF support.

There seemed to me to exist some really hairy magic, which means that it's very hard to know exactly what's going on under the covers. A typical example was the actionPerformed parameter to some of the swing builders, which found an ActionListener interface and found the method inside that, and implemented this interface with a closure added, via 3 or 4 levels of indirection.

In conclusion, Groovy looks good on the surface, but beneath, it feels very much like Perl (in the negative sense).

JRuby showcased many fun things, the best one was JRuby on Rails actually running. Is that cool or what? A part from that, the talk was mostly aimed at people new to Ruby and JRuby. It was interesting to see that most of the people at the session hadn't heard about either Ruby or Rails one year ago! Major impact or what?

That was my day, to now. I'm off to the Java Certified Professional party! Part two comes later. Maybe much later depending on how much free drinks there are at the party.

onsdag, maj 17, 2006

JavaOne, day 1: some diverse impressions.

So, JavaOne is finally here, and it starts big! This blog will talk some about the different sessions I've been to during the day, but first a few impressions. There are very many people here. John Gage said at the first general session that this is the largest JavaOne ever, and I believe him. I'm sad to say that the WiFi network is very spotty at best.

JavaOne has a new way of getting into sessions; you have to use their schedule builder to reserve places for a session, and then use your RFID chip to register when entering a specific session. I thought this was a mad idea, but it actually works really well, and I'm glad to say that the JavaOne team seems to have solved most of the overflow troubles from last year.

Another small reflection is that the focus on compatibility in the core platform seem to be a major focus this year. I've seen and heard the word spoken more times than I can speak.

General session
The first general session is always one of the more interesting times during JavaOne. Both Sun and other company leaders are just brimming with announcements. One of the more interesting quotes of the day came fairly early, from Rich Green: "It's not a question of whether, it's a question of how". As you may guess, this was an answer to the question of open sourcing Java. In plain language the situation is that Java will be open source, as soon as Sun finds a good way to do it in.

Java Enterprise Edition 5 is now finished, released and of production quality, and these are some highlights from the release:
It's very focused on ease of development.
Contains much Web 2.0 support.
Interoperability with .NET have been greatly enhanced. (Except for compatilibity, interoperatibility is the major illy today).
SOA has been simplified.
They have simplified the programming model, mostly by using annotations.
New EJB version, using plain old Java objects.
A new (annotation-based) persistence API.

During the general session, many, many libraries and applications were open sourced, among these JMS. It's nice to see that Sun really wants open source to work.

Session on EJB 3.0
The idea for EJB 3 was to make it easier for the developer, by making the container harder to implement. This tradeoff seems reasonable in retrospect, but until now most of the EJB specification made it easy for the container.

The new model is based on letting the container provde requested services to the bean, and also using reasonable defaults for most operations.

Once again, the presentation pressed very hard on compatibility. Existing applications had to continue working. Clients using the new API should have no trouble connecting to servers using the old libraries and the other way around.

Most of EJB 3.0 is based on POJOs and declarations with either annotations or XML information.

Environment access is made easier with dependency injection or simple lookup.
Client code also uses dependency injection, which means the new model removes the need for home interfaces, PortableRemoteObject.narrow, RemoteException and other checked exceptions.

The EJB lifecycle doesn't use explicit callbacks anymore. Instead you can annotate the method that should get notifications about a lifecycle change and this will be taken care of by the container. You can also allow a separate interceptor as the notification and callback manager. Very neat.

All in all, EJB seems to be heading the right way, at last. Ease of development really matters, and configuration by reasonable defaults have already been shown to be a viable solution.

Technical general session
The technical session had some interesting information too. Most of the talk was about Mustang (Java SE 6) and Dolphin (Java SE 7), and what we could expect from these releases.

The projections for Mustang looks good, and it is scheduled for release in October. They have been using a new, very open process for Mustang development, which have worked extremely well. The same system will most likely be used even more in Dolphin development.

So, some nuggets of good stuff in Mustang:
Many performance improvements. They showed some pretty convincing performance graphs, and I was duly impressed.
They've fixed the so called gray rect problem in Swing, which results in heightened perceived performance dramatically.
There is many improvements in the monitoring and management areas.
Scripting support will come, with JSR 223. Also, check out http://scripting.dev.java.net.
There are many desktop fixes, and Vista is the desktop focus for Mustang.

The talk went on to the future for Standard Edition:
Probably direct support for XML in the language.
Super packages, and new module system for packaging with versioning.
They're thinking about adding BeanShell to the scripting languages provided by the core language.
And more, more,more desktop stuff.

Hamilton went on to talk about scripting- and dynamic languages, and more or less recommended using them in the Web tier and in situations where a fast cycle of development is required. This layer can then use Java for the business logic. (And this is really what KI is heading for right now; using Ruby on Rails for web, and using SOAP to get at Java business logic exposed with Web Services.)

Anothing thing that's coming is JSR 292: the new bytecode for dynamic invocation.

An interesting demo of a Visual Basic to Java compiler. The system is not a total clone, but will enable people used to Visual Basic to program for the JVM instead. It will not enable translation of existing applications, though.

Mustang and Dolphin session
This session started out with Mark commanding us to upgrade to Tiger now, there is really no reason not to do it.

So, what's new in Mustang?
Some class file changes.
A new Compiler API.
New annotation processors.
Scripting support.
Streaming API for XML.
Common Annotations.
and JAX-WS 2.0.

Mark top ten list of new things in Mustang:
10. Attach-on-demand monitoring.
9. Plugin API for JConsole.
8. jhat OQL (a query language to explore heap dumps).
7. Solaris d-trace.
6. javac will now do annotation processing interleaved with compilation.
5. Classpath wildcards!
4. API for finding free Disk-space.
3. API for password prompting.
2. New Grouplayout for Swing.
1. That JAX-WS can do RESTful web services.

Mustang will also bundle Apache Derby, a small inmemory database.

There are many ideas for what Dolphin will contain. There are some interesting things that can be really cool.
These are mostly core language changes:
Properties, improving getters and setters.
Real method references.
Block closures.
Native XML support.
The new bytecode for dynamic invocation.
Bundling BeanShell.
And beans binding for making Swing more easy to use.

The session also had some very fun information about how the testing of JDK is done. It's really quite amazing. The big trouble with creating new versions is disconcerting fact that running a full test cycle takes 10 weeks.

Session on the new Concurrency features in Java 5
The session began with some talk about the rationale behind the new concurrency features in Java. The easy answer is that real concurrency is hard to do right, and the builtin primitives for threads and locks are, well, primitive.

The new concurrency packages have something for everyone. There is both easy-to-use utilities for mostly anyone, and also some primitives for hard core programmers, that enable some things that just can't be done in Java right now.

Java has always had thread-safe collections, but these are not conccurent. They also had a bad performance structure. Therefore a few new conccurent collections has been added. Most of them allow unlimited reads and up to 16 simultanous writes. Mostly, the semantics are the same, but they differ in a few areas. The most glaring difference is iterators. With the old collections you got a ConcurrentModificationException if someone updated the collection while you were iterating over it. This will not happen anymore. Tiger has a new Map called ConccurentHashMap and Mustang adds a SkipListMap.

A new collection interface has been added to java.util, which is called Queue. This is a subset of the List functionality, that can be used for implementing high performance versions with just this restricted functionality.
The most interesting Queue is the BlockingQueue, which makes explicit a producer-consumer relationship. Mostly all code already uses something like it, but this implementation is industry strength and very easy to use.

They have hadded ThreadPools with a system of Executors and ExecutorServices. They are very easy to use and configure with factory methods.

Another interesting addition is Future and Callable, which is used to put an execution inside another thread and then get the value when it's finished. But it is not
call-by-need, which was what I first thought. The value will always be calculated, even if it's not needed.

Scheduling primitives have been added, to replace Timer and TimerTask.

There are also some new advanced features for locking. There are some things you just can't do with the synchronized keyword in Java. Hand-over-hand locking is one example. The concurrency library adds Locks, Conditions and Semaphores for more advanced use. They are very complicated so they shouldn't be used if you don't really need them. One good reason for this is that you have to release locks manually.

JUnit next generation
JUnit has many warts and problems. How do you run one test in a test case, for example?
It's not really good for anything but unit testing. It has been very few updates and the
protocol is intrusive. It also uses a very static programming model. And it doesn't use the latest Java features.

TestNG is the new JUnit. It uses annotations for most configuration. There are test groups which can be dynamic depending on your needs.
You also have the possibility to have dependent tests, parallell testing, load testing and partial failures. It has also got a very nice plugin API.

All in all, it looks really good. My personal opinion is that there is no reason not to use TestNG instead of JUnit on all
Java 5 projects.

Restructuring a web application with Hibernate and Spring
This was the first BOF for me this JavaOne and I was very disappointed. The presenters use case was a web application that had been badly written from the beginning. They then decided to rewrite it with Hibernate and Spring, but there really didn't seem to be much better code written for this application. I guess it's a testament to how good Spring and Hibernate are, that they got a really good performance improvement anyway.

Testing a persistence layer
Testing a persistence layer is really hard, for a few different reasons. You want it to be fast, and easy to write. There are many different strategies to testing persistence, and the presentation talked about most of them.

There are a few different kinds of persistence layers. They can be SQL-based, Object/Relational-based or using the ActiveRecord pattern (which the presenter viewed as a special case of SQL-based persistence).

As noted above, there are many strategies available.
You can mock the DAO's for testing the business logic. This is easy if you use a dependency injection framework like Spring to initialize the DAO objects. Otherwise this is the main problem. It's very fast but the scaffolding can get hairy to write.

You can also mock the ORM-framework, to test the DAO's. The presenter have written a utility called ORMUnit to faciliate this testing.
Regarding mock frameworks, the presenter prefers JMock, but is planning on migrating to EasyMock instead.

Another strategy is to test the metadata mapping and schema. This checks for stupid errors like forgetting to map a field. It can also checks that all referenced tables and columns actually exist.

Of course, the standard way of testing persistence, by doing CRUD operations is also available, but it's complex, and very slow. You have to write much code to drop and add data for each test.

Another simple way is to check the generated queries directly, to see that they return correct data.

Using an inprocess database can be very fast, but may also lead to trouble with incomplete SQL implementations.

A final strategy for testing slightly things faster is doing operations but never committing transactions. This has the advantage of being fast and also avoids actually changing the DB.

All in all, it was a busy first day. Much of it was very interesting, and I've learned many new things. I just regret missing the Scripting Languages BOF at the end of the night, but I was just to beat to manage going to it.

tisdag, maj 16, 2006

JavaOne, day 0: The Fireside Chat

The Fireside Chat is usually an interesting conversation between JavaOne alumni and some of the more Java founders. This year it was James Gosling, Graham Hamilton, Jeff Jackson and a few others. These are more disjointed notes of what I am interested in, than a running commentary on everything that's said.

The format for this chat is that someone in the audience asks one ore more questions, and the panel tries to answer as much as possible. Nothing advanced.

Questions about Java Applets downloading time, is there any plans on having a Java Web Edition? Short answer: no.

Performance, specifically the startup time of the Java engine; Sun have tried, but right now it seems hard to get it any faster without removing significant functionality.

I gathered from some comments that one of Sun's primary priorities right now is good interoperability with Microsoft and .NET.

Regarding deprecated API's, it seems likely they will never disappear, at least not until there is enough good tool support to actually remove and refactor all dependencies on such code.

AJAX: Sun is handling this in a few ways; with toolkits for generating JavaScript with servlets and stuff like that. JSF should create AJAX-aware components without us needing to
have to care about AJAX.

Then the panel got the question what one thing they'd like to remove from Java if they could. Goslings immediate answer was java.awt. He got off on a tangent and talked a little about how the basic feature set of Java was decided, and that he only put in stuff that people really, really needed. The result of this is that he doesn't regret anything with the language. He seemed particularly glad that he didn't try to put in generics or enumerations from the beginning, because he would probably have failed doing it "right".

If he could add something to the language, that would be really lightweight objects. Some kind of structs, for implementing the canonical example of an object oriented number class, for example. Right now the most lightweight classes are still way to heavy. Also, Hamilton would like to improve getters and setters, but he doesn't know how yet.

There seems to be some interesting improvements to JavaBeans in Dolphin.

Hamilton also said that he would like to undeprecate java.util.Date, since it is very much more usable than java.util.Calendar most of the time.

Sun is also working on really good refactoring tools. Gosling has a vision of "lint on steroids". They have a few good prototypes, but nothing close to being released yet.

Another point in planning is better performance for JNI. Make it possible to remove checks and stuff like that. It seemed that arrays were the big problem. You can get quite good speed already, by not using arrays, and having as much NIO as possible.

These were my opinions on the more interesting stuff from the Fireside Chat.

Translating Python to Ruby (the YAML case study)

. sI am not an experienced Python programmer. When I decided to port PyYAML3000 to Ruby I wasn't sure exactly what kind of problems I would run into. The stuff I found out when trying to get more or less the same semantics in Ruby as in Python was actually quite interesting and I have the notion that some of it may be of interest. At least I will get my thoughts on the translation straight.

The first phase of RbYAML was supposed to be a straight forward port of PyYAML from Python to Ruby. This ideal was of course impossible since some things really didn't work the same way in the two languages, but mostly the semantics translated cleanly. The biggest change in architecture between the implementations was the Unicode support - which I had to remove for now - and the inheritance structure. PyYAML uses the Python version of multiple inheritance to provide basic combination of functionality. Since Ruby doesn't have MI I implemented this using mixins instead. This is probably the first thing that I will change for phase two of RbYAML since mixins (and modules) just doesn't work the same way as inheritance. Right now I'm thinking that I'll probably use the Composite pattern where Loader and Dumper export the interface that's available, and all other operations will have to be done explicitly with the separate components.

The next thing that caused some trouble was Python's notion of truth and false. In Ruby, only the values nil and false is considered false, everything else is true. In Python a whole range of values are considered false: False, None, 0, the empty string, the empty list and the empty dict.

Another big stumbling block for implementing the same architecture in Ruby as in Python was the heavy use of coroutines
in a few parts of PyYAML. Right now I implemented this by just doing a really inefficient version that collects all nodes
before sending them away. I could of course have implemented a coroutine engine with call/cc, but since JRuby doesn't support
continuations yet, this would have defeated the purpose of the project. My plan for the next phase is to reorganize the code so that I
can use Ruby's standard idioms for iterators, which should improve both performance and memory usage drastically.

Some of the regular expressions wasn't really compatible with each other, but this was pretty easy to fix.

The final trouble was Python's use of requiring () for invocation of a method, and if a method is referenced without parenthesis, this means
that the symbol for the method is referenced instead. This was mostly a problem because of my unfamiliarity with Python.

So, in conclusion, porting Python to Ruby is quite easy, except for a few small areas. I would try it again if there existed an application that's needed for

This is the first part of a series of blogs about RbYAML. The next entry will talk more about Rubyfying the code quite much.

lördag, maj 13, 2006

SpocP and Lisp.

For a few years I've been involved with an open source project called SpocP, through work. SpocP is a framework and protocol for handling mostly any authorization tasks any application can have. The base distribution is a server written in C, which when started with rules definitions answers TCP (or socket) requests with YES or NO, and attaching a blob if the rule says so. If you want to, you can write all your authorization information directly in SpocP rules. This is usually not that interesting since the common case is that you have a data source somewhere that you'd like to use as basis for authorization. Maybe a LDAP server with role information, maybe an RDBMS, or something entirely different.

The interesting thing about SpocP is the format used for Rules and Queries. (Rules are what SpocP uses to provide an answer, a Query is a specific authorization request translated into the SpocP Query language. In human terms a rule might be "All persons who have OU=itc in our LDAP server have access to all pages of the web application named 'itdoc'", and a typical query that might answer YES for this rule is "Can the person with uid 'olagus' access the page 'showLinks.do' for application 'itdoc'?"). The format for queries and rules in SpocP is based on S-expressions (which is a fancy word for more-or-less-Lisp syntax). The reason for using S-expressions is the interesting fact that S-expressions have the mathematical property that one S-expression can be compared to another, and judge less-than-or-equal or not. This is all that's needed to implement generic authorization facilities, and also some other fascinating possibilities. As an example, take a rule file looking like this:

(spocp (action view) (resource (app "itdoc") (page "admin.do")) (subject (uid)))
(spocp (action view) (resource (app "itdoc") (page)) (subject (uid)))
(spocp (action admin) (resource (app "kimkat")) (subject (uid) (role)))

and a query like this:

(spocp (action view) (resource (app "itdoc" "http://itdoc.it.ki.se") (page "showLinks.do")) (subject (uid "olagus") (role "user") (loggedIn 200505260115)))

When receiving this request, SpocP will walk through all rules until it finds anyone where the RULE is <= QUERY. If it does find one matching it will return "YES" otherwise "NO". In this case rule number 2 would make the answer "YES". You've now seen the first part of SpocP matching. But how do we involve external data sources? With something called boundary conditions. The idea is that for each rule matching, check the attached boundary conditions, and if these return positive answers too, the whole expression is deemed true. A typical boundary condition may look something like this:
urn:spocp:ldapset:ldap.ki.se;ou=people,o=ki.se;{uid & ${uid}}/ou & "itc"

which more or less is the same as the LDAP query (&(uid=$UID)(ou=itc)).

This boundary conditions checks if the second element of the list with tag "uid" exists in our LDAP server, and if that entry also has an ou with value "itc". Boundary conditions can be chained and linked with and, or and not, and you can also predefine boundary conditions so you don't have to define the same one more than once.

I recommend using SpocP if you have a need to decouple your authorization from the application. As soon as you have defined what kind of queries a typical application may do, you can implement this on a central server, and when your business rules change, you can change the SpocP definitions without having to change the application. There is not much extra complexity involved and you buy yourself some very nice flexibility. There are SpocP client libraries available for most of the standard languages.

The fun thing with SpocP is that the fundamental operation of checking less-than-or-equal between two S-expressions have always been defined mathematically, and in some highly optimized C-code. When fooling around with this, I found a simple executable definition of the SpocP core in Common Lisp. The whole thing look like this:

(defun starform-p (list)
(eql (first list) '*))

(defun starform-match (query rule)
(let ((form (second rule))
(data (cddr rule)))
(case form
(any (member query data))
(prefix (string-equal (car data) (string query) :end2 (length (car data))))
(suffix (string-equal (car data) (string query) :start2 (- (length (string query)) (length (car data)))))
(range (member query data)) ;;not implemented
(set (member query data))
(t t))))

(defun matches-p (query rules))
(some #'(lambda (rule) (match-p query rule)) rules))

(defun match-p (query rule)
(if (and (atom query) (atom rule))
(eql query rule)
(if (starform-p rule)
(starform-match query rule)
(every #'match-p query rule))))

The only thing missing here is one of the so called star-forms, range. Otherwise this is a complete definition of the SpocP core evaluation of rules. To use it, do something like this:

(matches-p '(spocp (action view) (resource "foo")) '((spocp (action admin) (resource) (subject (uid) (role "admin")))(spocp (action view) (resource "foo"))))

Craig Larman at KI.

A few weeks ago, Craig Larman visited KI and had a one day presentation about agile development to some 30 people from different professions at KI. I was there and some parts were very interesting.

Of course, most of it is not new for the developers, but the we got some good firepower for convincing management about some more realistic practices. KI haven't been very good at embracing the new and hot in development methodology the latest years, but hopefully this is about to change.

So, what about the presentation? The first part, 'til lunch, were mostly statistics and "management-convincing" arguments why agile is good practice. The only really interesting part were Larman's dissecting of Royce's original paper on the waterfall method. It's really totally obvious to anyone that actually reads the paper instead of just looking at the graphs, that Royce really is against the waterfall method except for very small projects when all requirements are known upfront and cannot be changed.

The afternoon session contained much more of interest, at least for me. He talked some about good tools to practice good Agile, naming continuous integration, test driven development, good version control, self organizing teams and good dissemination of information through wikis as key points. IT Center - that is, me and me 4 colleagues - at KI are pretty good at this. The only weak point is our use of Wiki. I have some plans for this, since I totally agree with Craig about this. Wikis are unique in the way they combine simplicity with expressability and information processing.

After this I finally got something really worthwhile out of the presentation; Craig presented the first good argument for pair programming that I've ever heard. Regarding Pair Programming I'm probably like most programmers: very skeptical. I have tried it a few times and I don't like it. But anyway, Craig's premise was a situation where you had programmers working on several projects simultaneously, and also having maintenance work on applications they had the responsibility for.

In this case it's very easy that people get unproductive, due to insistent interruptions from other projects or immediate maintenance work. Usually this most of the maintenance work is also person dependent. (This situation describes exactly my groups current work environment.) When dealing with issues like this it makes sense to create a rotating schedule where programmers work on one project at a time, or completely on maintenance. When something that is dependent on someone in a project shows up, one of the maintenance programmers pair program with this person to fix the problem, which ensures that the next time this problem appears, there will be at least two persons capable of handling it.

All in all, it was an interesting session and I felt I got out of there knowing more than I knew before. The most important thing is still having someone to have as reference when talking with management about adopting more agile methods.

Maybe I'm imagining this, but it feels like since Craig was here, we've been allowed much more slack in using technology that's not "industry standard", but more agile to solve tasks suited for problems suited for it. For example using Ruby on Rails to create internal test and development tools for bigger projects seems to be totally accepted now.

söndag, maj 07, 2006

Announcing RbYAML

This is an announcement of the availability of RbYAML, a pure Ruby YAML parser, based on the Python project PyYAML3000. The last two weeks I've converted this to Ruby, with more or less the same functionality. The glaring hole is Unicode, which isn't there right now.

The project will be updated regularly, and will also soon move to RubyForge.

Right now the homepage is here: http://rbyaml.ologix.com

In a few days I'll post a longer blog with some information about the Python-to-Ruby translation, which had some interesting surprises, at least for me.