XML Schema vs RelaxNG

When I first read this quote by Tim Bray yesterday (yes, I know it was posted on Slashdot last week, I just happen to have a huge feed backlog and not much time to read), I was a little surprised. I have been using XML Schema for a while now and never had any problems with it. Of course, there is quite a lot of vocabulary to learn to be able to write it, but it’s not that hard to read. After you wrote one of them, you can simply start from that one for the others and half of the burden imposed by the syntax is gone.

W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs.

I never really bothered looking into Relax NG. I saw the name a couple of times, saw an XML sample once and figured it did the same thing as XML Schema, so I had no reason to bother. When I read this quote, I knew there was something I missed about it. Really, the XML syntax of Relax NG is not more readable. For some reason, I would even say I prefer the XML Schema xs:element tag with the minOccurs and maxOccurs. I find it more readable than those “oneOrMore” tags.

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <ref name="library"/>
  </start>
  <define name="library">
    <element name="library">
      <zeroOrMore>
        <ref name="book"/>
      </zeroOrMore>
    </element>
  </define>
  <define name="book">
    <element name="book">
      <element name="title">
        <data type="string">
          <param name="minLength">1</param>
        </data>
      </element>
      <oneOrMore>
        <element name="author">
          <text/>
        </element>
      </oneOrMore>
    </element>
  </define>
</grammar>

Then I figured that this XML blurb could be generated from a much more simple syntax called Relax NG Compact. Seriously, I don’t even see a reason why they actually made a non-compact form. The syntax is a lot like the DTD syntax without all the brackets and more capabilities. It can even use the datatypes from XML Schemas (which are in a separate specification) to perform some additional validation. I’ll let you guess what the following piece does. (Note: the XML above was generated using this sample).

grammar {
	start = library

	library = element library { book* }
	book = element book {
		element title { xsd:string {minLength = "1"} },
		element author { text }+
	}
}

Not only the syntax is a lot simpler than anything out there for schema definition, it also has better documentation. The RELAX NG Compact Syntax Tutorial is a very good place to start. I don’t know if it can do more than XML Schema, but it sure can do about as much in a more efficient fashion. There are two elements I couldn’t find an equivalent for: minOccurs and maxOccurs. Relax NG only supports “one or zero”, “zero or more” and “one or more”, so unless you define the minimum as required elements and fill to the maximum as optional ones, there is no way to obtain the same behaviour.

All validation tools I came across use the XML syntax to validate, so you need a way to convert. I used Trang, which was available from my distribution’s repository. Usage is very simple:

trang -Irnc -Orng in.rnc out.rng

Have fun.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>