I recently had to do a lot of work related to XML documents, and still have a lot more in the list. I basically had to touch about every specification available out there, including: XML, XML Schema, XSL-T, XSL-FO and XPath. All these technologies have been released around Y2K and support for them is getting there. For years, XML has been the buzz word to place on a resume, but there was nothing to it. Without all the other companions, XML really isn’t worth using.
The first one I had to play with was XML Schema. At first, I tought it was only a replacement to DTD. It partially is, but it does so much better. Not only it validates the available tags and arguments, it validates the content. Strings, numbers, complex types, it’s all in there. Validation can be applied on tag content or attributes. Generic types can be defined, extended and applies on multiple nodes. The syntax is a little heavy, but since it only has to be written once, it’s not so bad. Once XForms actually makes it to browsers, validation will be a charm.
XLS-T is probably the most versatile specification out there. It was originally part of the XSL specification, which included XPath and XSL-FO too. An early decision was to split them up to allow usage in other context. XSL-T is itself a language, with conditions, loops, functions and variables. It’s used to apply a template on XML. Basically, it allows to convert the raw data into other text formats. Those formats can be HTML or any other kind of XML file. In theories, it can do even more. It’s probably the best way around to work with XML as everything is made for it. Using the document() function, it’s even possible to access other XML documents and integrate the content on the fly! Of course, it implies even more validation as the file itself has to be validated against it’s DTD or Schema.
XPath is one of my favorite specification. It’s the shortest of them all, it’s not an XML format (which makes the writing a lot faster) and it’s extremely powerful. It’s half way between regular expressions and SQL, but for XML documents. It’s very easy to pick up and it feels natural after a few minutes. Every expression fits on a line. It’s mainly used in XSL-T to select nodes and values.
While XSL-T and XPath have been implemented in about every XML library and browser, XSL-FO has been left behind. The specification focus on printable data. Basically, the specification is a set of rule for XML->PDF conversion. It has to be generated by XSL-T to be dynamic. Countless formatting options are in the specification. The voccabulary is simply terrific. There are currently no parser supporting the entire set. It’s still very usable from the moment you can actually understand it. The difficulties come from the fact that the specification is very complex, support is only partial, documentation could be a lot better and error reporting is not very effective. It does have a very bright future ahead. From the moment there will be multiple complete implementations and a set of tools for development, it will be one of the best way to generate print output from data. It’s already possible prepare an output template in a matter of hours or days depending on the complexity, which is much better than writing a Perl script with PDFLib.
The major problem is that the rendering part is very slow. Each block of content has it’s own set of attributes. During development, it’s very quick since everything is generated with XSL-T. When it comes to the final parsing, the resulting file is huge and I don’t see many ways of optimizing the process. Generating a 200 page PDF can easily take over an hour. Forget about online PDF generation, at least for now.
Overall, when the technologies are implemented correctly, they are just great. W3C and it’s members have done a great job. As a side note, forget about Dave Pawson’s book (O’Reilly) on XSL-FO, it’s a piece of trash.