A very interesting article from IBM Developerworks: Ruby off the Rails.
On a similar note, I propose that Jakarta Struts should be renamed to Java in Jail.
onsdag, december 21, 2005
söndag, december 18, 2005
The power of YAML and XML.
Me and my colleagues have recently spent some time debating the readability and usability of YAML versus XML. As expected, the opinions vary wildly. Because of this, I've spent quite some time thinking about these issues. I really believe that the question is one about power, that readability, usability and succinctness are what defines the power of a data format. This discussion is restricted to formats which are text based, human and computer readable, and are mostly used for configuration and data interchange. XML is of course used also for RPC and build scripts, among others, but these uses have been agreed to be a nonoptimal, for many reasons.
So, I will use a simple example in three different encodings. The example is the common configuration of a data source for the use of a library. This is intentionally a simple problem, but it tends to show the main differences between encodings.
For this setup - without using binary or compressed formats - the simplest encoding is probably a line based, character separated value-file like this (where hash is a comment line):
(This example wraps in stupid way. Just imagine it's only two lines.)
#name, type, database, server, uid, pwd, param1, param2, param3, param4, param5
development, postgres, my_dev, db.dontexist.com, mydev, secretpwd, encoding=LATIN1,,,,
The problem with this approach is apparent; there is no easy way of knowing which field means what. You need a comment just for the sake of the human reader to provide this information. If you for some reason write the password and username in the wrong order, the dataformat will not catch this. Another very real trouble is that as soon as you have to specify information specific to one instance of a database you have to use a new language inside the field. (In this case the small language is only defined as name=value, but it's still another concept to get used to, which is not part of the regular data format). In regard to this, it's not possible to define lists or maps without defining extra syntax for it.
The positive point with this encoding is that it's probably as short as it can be, without being binary or compressed. It's very easy to parse, but it's not generic at all.
Example number 2 is a data source definition from JBoss:
It's longer, of course, but it's still a very simple XML-document. No namespaces, no entities, no DTD's, no attributes. It's just very basic XML. And sure, it's more readable than the custom format. And it is generic, in the sense that you can define different formats and have a validating parser read it for you. But is it easy to read? I don't think so, there's to much noise. Sure, the element end tags make it easy seeing what ends when, but that's not a big deal anyway, since most XML documents actually use indentation to make it easy enough for a human being to read. Compare the previous document with this:
As I've said, it's good for reading, if it's your first time with the data format, but after a while it gets quite annoying trying to sort out the relevant information from all the end tags. Another problem is how to add simple non standard parameters. For example, if I wanted to add an Oracle-specific parameter, I would in this example have to change the JDBC connection string. This is not the fault of XML, of course. The other route would have been to add some element like this
<param name="encoding">LATIN1</param>
to the XML schema. But this denies the use of end tags to see where attribute mappings end. If we have long blobs of information stored in these params, and if there is more than one we can't see from the ending which one we're at. So that point is lost here.
Another thing which sometimes is a problem is the lack of simple mappings and lists as datatypes in XML. Sure, you can define them in your own schema, or remap CDATA-sections to your own format, but this is not part of the spec, and will never be readable by a standard XML-parser without help from you.
You will never be able to do something like this in DOM: ((List)node.getNodeValue()).iterator() since nodeValue is defined to be a String and there is no intrinsic sequence in XML values.
Lastly, YAML. This example is taken from Rails:
Notice that indentation and newlines matter in YAML. Actually, it's this context sensitivity that
makes it extremely readable for human beings, but still being parseable by machines. One thing that is immediately noticeable is that the "encoding"-parameter - which is PostgreSQL specific - looks exactly the same as the other parameters. No no-YAML construct is used to represent this. The next interesting point is that even though you've never seen YAML before, you should be able to reason out that something that's called "development" have a few properties attached to it, with the values seen.
In reality, what happens in a standard YAML-parser reading this, is that a map will be created with one entry with key "development" and as value have one map instance with the keys and values specified. YAML have support for three datatypes; mapping, sequence and scalar. These types can be specialized to a specific implementation, which means that any object can be serialized as YAML without very much work. In the Ruby implementation every object gets a new method added, called to_yaml, which returns the YAML representation of that instance. There is a static method called YAML::load which returns the correct object for the serialized YAML stream sent in.
All three formats have much more advanced features, of course. You can do whatever you want with the specific format. XML have schema, validation, XPath and much other. YAML have tags, aliases and some more things. But when looking at all this from the perspective of what you can easily represent inside the language, without making new meta language constructs, and still retain readability and easy understandability, well. My vote is on YAML. I have seen wrong end tags in XML more times than I can ever count, and I still find it impractical to extract relevant information from convoluted documents. Have you ever tried reading a slightly complex WSDL-definition? Change it by hand? It isn't fun.
Of course, S-expressions is a good alternative, which have many of the nice points of YAML, while not having the dependency on whitespace for structure. But the whitespace is fun. I look at the lists and text files I've scattered all over my computer, filled with notes for myself, and I realize that most of it could be readable by YAML without a change. That is more or less how I write lists and mappings for myself, without ever intending it to be read by computers. I sure don't write notes for myself in XML.
I would never program in YAML, or make a turing complete language out of it, but for pure data representation that should be easily readable and writable by humans, while still being as succinct as possible, it's probably the best thing right now.
But would there ever be a need for using the data to drive programs, S-expressions is the way to go.
A last note, if you believe YAML to complex, try out JSON. It's actually a proper subset of YAML and should be read and writable by all YAML-compliant processors. The same thing goes for standard Java properties-files.
So, I will use a simple example in three different encodings. The example is the common configuration of a data source for the use of a library. This is intentionally a simple problem, but it tends to show the main differences between encodings.
For this setup - without using binary or compressed formats - the simplest encoding is probably a line based, character separated value-file like this (where hash is a comment line):
(This example wraps in stupid way. Just imagine it's only two lines.)
#name, type, database, server, uid, pwd, param1, param2, param3, param4, param5
development, postgres, my_dev, db.dontexist.com, mydev, secretpwd, encoding=LATIN1,,,,
The problem with this approach is apparent; there is no easy way of knowing which field means what. You need a comment just for the sake of the human reader to provide this information. If you for some reason write the password and username in the wrong order, the dataformat will not catch this. Another very real trouble is that as soon as you have to specify information specific to one instance of a database you have to use a new language inside the field. (In this case the small language is only defined as name=value, but it's still another concept to get used to, which is not part of the regular data format). In regard to this, it's not possible to define lists or maps without defining extra syntax for it.
The positive point with this encoding is that it's probably as short as it can be, without being binary or compressed. It's very easy to parse, but it's not generic at all.
Example number 2 is a data source definition from JBoss:
<datasources>
<local-tx-datasource>
<jndi-name>Development</jndi-name>
<connection-url>
jdbc:oracle:thin:@db.dontexist.com:1521:orcl
</connection-url>
<driver-class>
oracle.jdbc.OracleDriver
</driver-class>
<user-name>jbossdev</user-name>
<password>secretpwd</password>
</local-tx-datasource>
</datasources>
It's longer, of course, but it's still a very simple XML-document. No namespaces, no entities, no DTD's, no attributes. It's just very basic XML. And sure, it's more readable than the custom format. And it is generic, in the sense that you can define different formats and have a validating parser read it for you. But is it easy to read? I don't think so, there's to much noise. Sure, the element end tags make it easy seeing what ends when, but that's not a big deal anyway, since most XML documents actually use indentation to make it easy enough for a human being to read. Compare the previous document with this:
which is the exact same document, but well. I wouldn't edit it without a good editor to check for validity. Especially if it's on a production machine.
<datasources><local-tx-datasource>
<jndi-name>Development</jndi-name>
<connection-url>jdbc:oracle:thin:@db.dontexist.com:1521:orcl
</connection-url><driver-class>oracle.jdbc.OracleDriver
</driver-class><user-name>jbossdev</user-name>
<password>secretpwd</password></local-tx-datasource>
</datasources>
As I've said, it's good for reading, if it's your first time with the data format, but after a while it gets quite annoying trying to sort out the relevant information from all the end tags. Another problem is how to add simple non standard parameters. For example, if I wanted to add an Oracle-specific parameter, I would in this example have to change the JDBC connection string. This is not the fault of XML, of course. The other route would have been to add some element like this
<param name="encoding">LATIN1</param>
to the XML schema. But this denies the use of end tags to see where attribute mappings end. If we have long blobs of information stored in these params, and if there is more than one we can't see from the ending which one we're at. So that point is lost here.
Another thing which sometimes is a problem is the lack of simple mappings and lists as datatypes in XML. Sure, you can define them in your own schema, or remap CDATA-sections to your own format, but this is not part of the spec, and will never be readable by a standard XML-parser without help from you.
You will never be able to do something like this in DOM: ((List)node.getNodeValue()).iterator() since nodeValue is defined to be a String and there is no intrinsic sequence in XML values.
Lastly, YAML. This example is taken from Rails:
development:
adapter: postgresql
database: xyz_dev
host: db.dontexist.com
username: railsdev
password: secretpwd
encoding: LATIN1
Notice that indentation and newlines matter in YAML. Actually, it's this context sensitivity that
makes it extremely readable for human beings, but still being parseable by machines. One thing that is immediately noticeable is that the "encoding"-parameter - which is PostgreSQL specific - looks exactly the same as the other parameters. No no-YAML construct is used to represent this. The next interesting point is that even though you've never seen YAML before, you should be able to reason out that something that's called "development" have a few properties attached to it, with the values seen.
In reality, what happens in a standard YAML-parser reading this, is that a map will be created with one entry with key "development" and as value have one map instance with the keys and values specified. YAML have support for three datatypes; mapping, sequence and scalar. These types can be specialized to a specific implementation, which means that any object can be serialized as YAML without very much work. In the Ruby implementation every object gets a new method added, called to_yaml, which returns the YAML representation of that instance. There is a static method called YAML::load which returns the correct object for the serialized YAML stream sent in.
All three formats have much more advanced features, of course. You can do whatever you want with the specific format. XML have schema, validation, XPath and much other. YAML have tags, aliases and some more things. But when looking at all this from the perspective of what you can easily represent inside the language, without making new meta language constructs, and still retain readability and easy understandability, well. My vote is on YAML. I have seen wrong end tags in XML more times than I can ever count, and I still find it impractical to extract relevant information from convoluted documents. Have you ever tried reading a slightly complex WSDL-definition? Change it by hand? It isn't fun.
Of course, S-expressions is a good alternative, which have many of the nice points of YAML, while not having the dependency on whitespace for structure. But the whitespace is fun. I look at the lists and text files I've scattered all over my computer, filled with notes for myself, and I realize that most of it could be readable by YAML without a change. That is more or less how I write lists and mappings for myself, without ever intending it to be read by computers. I sure don't write notes for myself in XML.
I would never program in YAML, or make a turing complete language out of it, but for pure data representation that should be easily readable and writable by humans, while still being as succinct as possible, it's probably the best thing right now.
But would there ever be a need for using the data to drive programs, S-expressions is the way to go.
A last note, if you believe YAML to complex, try out JSON. It's actually a proper subset of YAML and should be read and writable by all YAML-compliant processors. The same thing goes for standard Java properties-files.
onsdag, december 14, 2005
YAML, Java and JRuby.
There is currently no good support for YAML in Java. YAML is a very nice data format for all people tired of XML (and who isn't?). It's easy on the eyes, easy to understand and easy to write. It's also rapidly becoming the exchange format of choice for many dynamic languages like Ruby, Python and Perl. If you're more interested in YAML, some information can be found at http://www.yaml.org.
Of course there should be a Java implementation for YAML. There exists some old half hearted tries, but nothing that actually works all right. And since JRuby do not have YAML support yet, a Java implementation would help that situation too.
So I've started a new open source project - soon to be hosted at Sourceforge - called JAML. It's very straight forward right now, based on PyYAML right now, since Syck is write-only code, and most other systems were based on automatic lexers and parsers. As soon as the project is public at Sourceforge, I will post a note here. In the meantime, if you're interested in helping out, or looking at the code, send me a mail at ola AT ologix.com.
Of course there should be a Java implementation for YAML. There exists some old half hearted tries, but nothing that actually works all right. And since JRuby do not have YAML support yet, a Java implementation would help that situation too.
So I've started a new open source project - soon to be hosted at Sourceforge - called JAML. It's very straight forward right now, based on PyYAML right now, since Syck is write-only code, and most other systems were based on automatic lexers and parsers. As soon as the project is public at Sourceforge, I will post a note here. In the meantime, if you're interested in helping out, or looking at the code, send me a mail at ola AT ologix.com.
Unorthodox Hibernate.
Today I did something slightly unorthodox with Hibernate. Or at least I believe it is. And I couldn't find a better way of doing it. You be the judge.
First, some background. Most of the system I'm working on is fairly normal with JavaBeans and whatnot. But we do have twenty-odd of something called types. These types are supposed to be our own implementation of the type-safe enum pattern, with the added complexity that the values of these enums is saved in the database, each value have one or many localized strings attached to it. So for, so good. The old solution was based on some sql and some reflective calling of static getInstance-methods. This was clearly not possible with Hibernate. Now, let me be clear about this. What was really the point was getting Hibernate to return the same instance for the same key every time.
After some fiddling about with this, it turned out that the best solution was to go grope around the innards of Hibernate and implement my own EntityPersister for this task. (By the way, if you should go down this route - and are using Hibernate 3 - be aware that the API-docs are not totally up to date, regarding demands on constructor signature and such.)
According to all documentation I could find, implementing your own EntityPersister was not something that should ever be needed by a user of Hibernate. Clearly this is not so, or I'm missing something obvious. So, my question is simple, how should I have managed this with existing Hibernate tools? And no, modifying the model objects are not an option. To much dependencies.
First, some background. Most of the system I'm working on is fairly normal with JavaBeans and whatnot. But we do have twenty-odd of something called types. These types are supposed to be our own implementation of the type-safe enum pattern, with the added complexity that the values of these enums is saved in the database, each value have one or many localized strings attached to it. So for, so good. The old solution was based on some sql and some reflective calling of static getInstance-methods. This was clearly not possible with Hibernate. Now, let me be clear about this. What was really the point was getting Hibernate to return the same instance for the same key every time.
After some fiddling about with this, it turned out that the best solution was to go grope around the innards of Hibernate and implement my own EntityPersister for this task. (By the way, if you should go down this route - and are using Hibernate 3 - be aware that the API-docs are not totally up to date, regarding demands on constructor signature and such.)
According to all documentation I could find, implementing your own EntityPersister was not something that should ever be needed by a user of Hibernate. Clearly this is not so, or I'm missing something obvious. So, my question is simple, how should I have managed this with existing Hibernate tools? And no, modifying the model objects are not an option. To much dependencies.
tisdag, december 13, 2005
Homegrown DAO to Hibernate to Rails.
We have an internal project called ITDoc, which should be used by the IT department to document internal resources. This system was created a few years ago by me, with a quite complex database, strange homegrown DAO and a very nondeterministic homebuilt cache. As always the prototype became the system and now we sit here with code that really don't scale and crash way to often.
But due to lack of resources (and time) we can't fix the code right now. So I had the idea that I would try something not-so-radical. I remodeled the database into a better shape, cut away all the DAO and cache code, and I had myself some model objects ripe for hibernation. But after I begun writing some xml definitions I found my mind drifting and thinking about how I would convert the old database to the new format. And the xml-writing went slow, even though it was light years faster than writing DAO code from scratch.
Enter Rails. I decided - after making sure that the hibernate approach could work - to write a spike in Ruby instead. Said and done. After an hour I had a rudimentary migration script written with ActiveRecord. Since the data model was quite complex this was the hardest part to write. But after that was done the Rails writing went extremely easy. In one hour more I had the basic functionality that was so hard to actually manage in the original system, and the code was probably less than 50 lines long.
But now comes the decision time, when other people decide which approach to choose. Since I'm not going to be doing this project it's not sure the choice will be Rails. Even so, it was an interesting endevour and I once again reinforced my opinion that Ruby is a fun language to write in.
But due to lack of resources (and time) we can't fix the code right now. So I had the idea that I would try something not-so-radical. I remodeled the database into a better shape, cut away all the DAO and cache code, and I had myself some model objects ripe for hibernation. But after I begun writing some xml definitions I found my mind drifting and thinking about how I would convert the old database to the new format. And the xml-writing went slow, even though it was light years faster than writing DAO code from scratch.
Enter Rails. I decided - after making sure that the hibernate approach could work - to write a spike in Ruby instead. Said and done. After an hour I had a rudimentary migration script written with ActiveRecord. Since the data model was quite complex this was the hardest part to write. But after that was done the Rails writing went extremely easy. In one hour more I had the basic functionality that was so hard to actually manage in the original system, and the code was probably less than 50 lines long.
But now comes the decision time, when other people decide which approach to choose. Since I'm not going to be doing this project it's not sure the choice will be Rails. Even so, it was an interesting endevour and I once again reinforced my opinion that Ruby is a fun language to write in.
måndag, december 12, 2005
The limitations of keytool.
I've always known that keytool is a very limited program. But right now I'm once again amazed about how few things that are actually doable in it.
I've spent some time this weekend debugging our (that is, KI's) setup of a system called CWAA (common web authentication architecture). It can be found here. It's based on the idea of using certificates so that one university can guarantee the identity of an entity to another university. In our case it's used to make it possible for KI students to login to the wireless network services at Stockholm University, with their KI identity.
As it is certificate expiry time, there was some work to be done. No problem, I thought. I proceeded to update KI's CWAA certificate and add it to the correct places. It should be noted that we use PKCS12 keystores. As I looked inside the old p12-file, I couldn't find any other cert that should be updated, so I went ahead and tried the installation. No luck. After much time I noticed that when using openssl to view the p12 file, it contained more certificates than keytool showed. It turns out that keytool can only view ONE entry in a PKCS12 keystore, and can't edit it at all. This limitation was not known to me until now. Once I found this out it was easy to update the rest of the certificates and everything is working.
But right now I'm strongly considering either finding a better keytool (maybe bouncycastle have a client program?) or writing myself a new one. Apparently the JDK version is unusable for all professional use.
I've spent some time this weekend debugging our (that is, KI's) setup of a system called CWAA (common web authentication architecture). It can be found here. It's based on the idea of using certificates so that one university can guarantee the identity of an entity to another university. In our case it's used to make it possible for KI students to login to the wireless network services at Stockholm University, with their KI identity.
As it is certificate expiry time, there was some work to be done. No problem, I thought. I proceeded to update KI's CWAA certificate and add it to the correct places. It should be noted that we use PKCS12 keystores. As I looked inside the old p12-file, I couldn't find any other cert that should be updated, so I went ahead and tried the installation. No luck. After much time I noticed that when using openssl to view the p12 file, it contained more certificates than keytool showed. It turns out that keytool can only view ONE entry in a PKCS12 keystore, and can't edit it at all. This limitation was not known to me until now. Once I found this out it was easy to update the rest of the certificates and everything is working.
But right now I'm strongly considering either finding a better keytool (maybe bouncycastle have a client program?) or writing myself a new one. Apparently the JDK version is unusable for all professional use.
Prenumerera på:
Inlägg (Atom)