Background As XML has become more and more pervasive, I have found
myself writing more and more custom parsers. Each time I write one, it seems to be a re-write of the one I wrote
previously with the
exception that the tags have changed and the xml structure itself has changed. However, the common thread each time I
do this is that
I'm the one defining the XML structure, and I'm the one defining the Java Object structure (usually bean objects). And
most of the
time, the XML is used purely internally to the application I'm developing. Sometimes I'm using the XML as a more
file, and sometimes I'm using the XML as a datafile because either the data I wish to store is relatively small, or
else I'm trying to
build an application that is as lightweight as possible therefore eliminating the need for a third party database.
|Add a little tequila...
...to your Java
There are many 3rd party API's out there that attempt to make this process simpler. J2EE includes JAXB, Apache-Jakarta has
Commons Digester and XMLBeans,
J2SE has java.beans.XMLEncoder and
java.beans.XMLDecoder. They all have their individual
advantages and disadvantages, but a common theme among them is that they are all somewhat heavyweight, and they try to be all things to
JAXB has the advantage that it is part of the J2EE standard, and it is more universal. It typically enforces schemas, as the primary
way to use it is by providing an XML Schema for it to use to generate its beans. This is typically good when XML is being shared with
multiple applications, possibly across the internet. However, for lightweight applications that want simple XML parsing for internal
use as opposed to portable data, this is way too much work, and overkill.
Apache Jakarta's Commons Digester seems to be a step in the right direction. My guess is that this API started as an attempt to solve
the problem I'm describing here, only it seems to have ballooned into a more heavyweight API, that again, attempts to be all things to
everyone. The biggest issue I have with Commons Digester is that it requires too much help. I'll explain in more detail below, as I'll
be comparing my proposed solution to Commons Digester.
XMLBeans is not very well documented. Oh there's plenty of documentation, but it isn't very concise, and the examples are hard to
follow. From what I can tell, XMLBeans uses its own schema-compiler that generates interfaces and classes to handle the parsing. If I
wanted to go through all of this work, I'd just use JAXB.
java.beans.XMLEncoder and java.beans.XMLDecoder
This is interesting, and seems relatively simple. Your beans must be serializable, and then Java will automatically convert the bean to
xml and vice-versa. However, the XML that is generated is very verbose, and includes attributes that then help the API determine how to
later parse the XML back into the beans. It makes for very clunky XML. But I applaud the attempt.
So why can't we have a simple, lightweight API that can parse the XML into Beans with very little configuration, and/or programming?
Commons Digester tries to do this, but it comes up short. By default, Commons Digester has to be told every little detail about how the
XML is to be parsed. It doesn't attempt to discover the structure on its own. That's where SLASH comes
in. SLASH is the Simple Lightweight Automatic Sax Handler. SLASH differs from Commons Digester in two
ways. First it is an implementation of the DefaultHandler class that is provided by the SAX 2.0 API, which allows it to work with the
SAX 2.0 API and is therefore extensible. Second, SLASH's default behavior is to attempt to discover the XML structure and build up an
object map automatically with little or no help from things like the "rules" that are used by Commons Digester.
SLASH optionally uses "Hints", but they aren't required. SLASH Hints are provided to the Handler in the form of a java.util.Map where
each key is a local name of an xml tag, and the corresponding value is a java.lang.Class object that represents the bean that the
handler should use when it encounters the xml tag tied specified by the key. If the Class object provided represents any
java.util.Collection concrete class or interface, then SLASH will treat all xml tags directly descended from the xml tag
associated as an object to be added to that collection. For instance if I have the XML:
And I provide a hint of the form ("Computer", java.util.ArrayList.class), then SLASH will construct an ArrayList every time it
encounters a "Computer" tag, and add all subtag objects to that ArrayList. So in this example, it will produce an ArrayList of Strings
containing the values of the Processor, Memory, Drive, and Drive tags specified in the xml. Or had I provided hints for the Processor,
Memory, and Drive tags, then it would have constructed the appropriate objects and added them to the ArrayList.
Now if the hint provided is a java bean object such that it has a default constructor (no parameters) and setter methods corresponding
to the subtag names, SLASH becomes much more powerful. SLASH will construct a new bean object each time it encounters that tag, and
then will assume that a setter method exists for all subtags. For instance if I have the XML:
And I provide a hint of the form ("Computer", your.package.Computer.class) such that Computer.class is some bean object with a default
constructor and methods setProcessor(String), setMemory(String), setDrives(Collection), SLASH will automatically parse this XML into a
Computer object. When SLASH encounters the Processor and Memory tags, it calls the corresponding setter methods. Also, when SLASH
encounters the Drives tag, it inspects the Computer bean via reflection and determines that the Computer bean expects a Collection, so
it constructs a LinkedList (the default Collection implementation that SLASH prefers), and continues parsing the XML. As each Drive
subtag is encountered, it parses the value and adds it to the LinkedList as a string, and then when it is done, it calls the setDrives
method on the Computer.
In the examples above, all of the parsing is accomplished by providing only one hint to SLASH. Contrastly, Commons Digester would have
required at least one "Rule" per tag.
There has to be an easier way to parse XML into bean objects. And now there is... it's called SLASH.