Parsing XML into Java Beans

Background

As XML has become more and more pervasive, I have found myself writing more and more custom parsers. Each time I write one, it seems to be a re-write of the one I wrote previously with the exception that the tags have changed and the xml structure itself has changed. However, the common thread each time I do this is that I'm the one defining the XML structure, and I'm the one defining the Java Object structure (usually bean objects). And most of the time, the XML is used purely internally to the application I'm developing. Sometimes I'm using the XML as a more structured config file, and sometimes I'm using the XML as a datafile because either the data I wish to store is relatively small, or else I'm trying to build an application that is as lightweight as possible therefore eliminating the need for a third party database.

Add a little tequila...     
...to your Java

Problem

There are many 3rd party API's out there that attempt to make this process simpler. J2EE includes JAXB, Apache-Jakarta has Commons Digester and XMLBeans, J2SE has java.beans.XMLEncoder and java.beans.XMLDecoder. They all have their individual advantages and disadvantages, but a common theme among them is that they are all somewhat heavyweight, and they try to be all things to everyone.

JAXB

JAXB has the advantage that it is part of the J2EE standard, and it is more universal. It typically enforces schemas, as the primary way to use it is by providing an XML Schema for it to use to generate its beans. This is typically good when XML is being shared with multiple applications, possibly across the internet. However, for lightweight applications that want simple XML parsing for internal use as opposed to portable data, this is way too much work, and overkill.

Commons Digester

Apache Jakarta's Commons Digester seems to be a step in the right direction. My guess is that this API started as an attempt to solve the problem I'm describing here, only it seems to have ballooned into a more heavyweight API, that again, attempts to be all things to everyone. The biggest issue I have with Commons Digester is that it requires too much help. I'll explain in more detail below, as I'll be comparing my proposed solution to Commons Digester.

Apache XMLBeans

XMLBeans is not very well documented. Oh there's plenty of documentation, but it isn't very concise, and the examples are hard to follow. From what I can tell, XMLBeans uses its own schema-compiler that generates interfaces and classes to handle the parsing. If I wanted to go through all of this work, I'd just use JAXB.

java.beans.XMLEncoder and java.beans.XMLDecoder

This is interesting, and seems relatively simple. Your beans must be serializable, and then Java will automatically convert the bean to xml and vice-versa. However, the XML that is generated is very verbose, and includes attributes that then help the API determine how to later parse the XML back into the beans. It makes for very clunky XML. But I applaud the attempt.

Solution

So why can't we have a simple, lightweight API that can parse the XML into Beans with very little configuration, and/or programming? Commons Digester tries to do this, but it comes up short. By default, Commons Digester has to be told every little detail about how the XML is to be parsed. It doesn't attempt to discover the structure on its own. That's where SLASH comes in. SLASH is the Simple Lightweight Automatic Sax Handler. SLASH differs from Commons Digester in two ways. First it is an implementation of the DefaultHandler class that is provided by the SAX 2.0 API, which allows it to work with the SAX 2.0 API and is therefore extensible. Second, SLASH's default behavior is to attempt to discover the XML structure and build up an object map automatically with little or no help from things like the "rules" that are used by Commons Digester.

SLASH optionally uses "Hints", but they aren't required. SLASH Hints are provided to the Handler in the form of a java.util.Map where each key is a local name of an xml tag, and the corresponding value is a java.lang.Class object that represents the bean that the handler should use when it encounters the xml tag tied specified by the key. If the Class object provided represents any java.util.Collection concrete class or interface, then SLASH will treat all xml tags directly descended from the xml tag associated as an object to be added to that collection. For instance if I have the XML:


<Computer>
   <Processor>1500mhz</Processor>
   <Memory>512MB</Memory>
   <Drive>100GB HD</Drive>
   <Drive>DVD RW</Drive>
</Computer>

And I provide a hint of the form ("Computer", java.util.ArrayList.class), then SLASH will construct an ArrayList every time it encounters a "Computer" tag, and add all subtag objects to that ArrayList. So in this example, it will produce an ArrayList of Strings containing the values of the Processor, Memory, Drive, and Drive tags specified in the xml. Or had I provided hints for the Processor, Memory, and Drive tags, then it would have constructed the appropriate objects and added them to the ArrayList.

Now if the hint provided is a java bean object such that it has a default constructor (no parameters) and setter methods corresponding to the subtag names, SLASH becomes much more powerful. SLASH will construct a new bean object each time it encounters that tag, and then will assume that a setter method exists for all subtags. For instance if I have the XML:


<Computer>
   <Processor>1500mhz</Processor>
   <Memory>512MB</Memory>
   <Drives>
      <Drive>100GB HD</Drive>
      <Drive>DVD RW</Drive>
   </Drives>
</Computer>

And I provide a hint of the form ("Computer", your.package.Computer.class) such that Computer.class is some bean object with a default constructor and methods setProcessor(String), setMemory(String), setDrives(Collection), SLASH will automatically parse this XML into a Computer object. When SLASH encounters the Processor and Memory tags, it calls the corresponding setter methods. Also, when SLASH encounters the Drives tag, it inspects the Computer bean via reflection and determines that the Computer bean expects a Collection, so it constructs a LinkedList (the default Collection implementation that SLASH prefers), and continues parsing the XML. As each Drive subtag is encountered, it parses the value and adds it to the LinkedList as a string, and then when it is done, it calls the setDrives method on the Computer.

In the examples above, all of the parsing is accomplished by providing only one hint to SLASH. Contrastly, Commons Digester would have required at least one "Rule" per tag.

Summary

There has to be an easier way to parse XML into bean objects. And now there is... it's called SLASH.

Reader Comments:

12/20/2018 10:51 AM - chocopie wrote:
BSesev Many thanks! It a wonderful internet site!|

08/03/2018 10:03 AM - promise rings for men wrote:
I blog frequently and I seriously thank you for your information. The article has truly peaked my interest. I'm going to book mark your blog and keep checking for new information about once per week. I subscribed to your Feed too.

10/09/2012 1:17 PM - Leonard wrote:
Osama, Can you post a working vesoirn of Controller and VisualForce page?I am not able to make it work?I am having following controller and in visualForce page i am using as following an dit's not working. var accounts = []; accounts.push( {!address}');public class AddressController { public List accountAddresses { get { List addresss = new List(); List accounts = [select BillingStreet, BillingCity, BillingState, BillingPostalCode, BillingCountry from Account]; for(Account a : accounts) { addresss.add(a.BillingStreet + , + a.BillingCity + , + a.BillingState+a.BillingPostalCode + , + a.BillingCountry); } return addresss; } set; }}

02/19/2010 12:39 AM - CrazyHorse wrote:
Hello, i was surfing for java5 pros and cons. I find few things like annotations annoying to code. you usually do not code '@interface', which is not OO programming. to come back to the point :), there is one apache product Betwixt which did serve my purpose of converting xml to POJOs. Looking forward to more engaging articles. Regards

07/30/2009 10:41 AM - Rado wrote:
@Bharath: TO READ XML ATTRIBUTES ADD THIS AT Line 275 TO SplashHandler.java ------------------------------------ 276: if ((attributes != null) && (attributes.getLength() > 0)) { for (int i = 0; i < attributes.getLength(); i++) { // Lets Call itself recursively to initialize attributes as propertyes String aUri = attributes.getURI(i); String aLocalName = attributes.getLocalName(i); String aValue = attributes.getValue(i); String aQName = attributes.getQName(i); startElement(aUri, aLocalName, aQName, null); value = aValue; endElement(aUri, aLocalName, aQName); } }

07/30/2009 7:16 AM - R wrote:
Some example Code, how to put Nodes as Classess: The following will Create Object Computer containing LinkedList of Drives Objects and other ... ========================================================= the XML File: <?xml version="1.0" encoding="UTF-8" ?> <Computer> <Processor>1500mhz</Processor> <Memory>512MB</Memory> <Perifery> <Drives> <HDD>100GB HD</HDD> <CDROM>DVD RW</CDROM> </Drives> <Drives> <HDD>555GB HD</HDD> <CDROM>CD-R</CDROM> </Drives> </Perifery> <SW> <Pack>Star Office</Pack> <Pack>Firefox</Pack> <Pack>Vim Editor</Pack> </SW> </Computer> ============================================================ From the Main: try { // java.util.HashMap hints = new java.util.HashMap(); hints.put("Computer", Computer.class); hints.put("Drives", Drives.class); SlashHandler handler = new SlashHandler(hints, true); SAXParser parser = SAXParserFactory.newInstance().newSAXParser(); parser.parse(f, handler); Computer result = (Computer) handler.getResult(); System.out.println(result.toString()); } catch ( .... ======================================================= Drives.java public class Drives { private String HDD; private String CDROM; @Override public String toString() { return "HDD: " + this.getHDD() + " CDROM: " + this.getCDROM()+ "; "; } public Drives() { } public String getCDROM() { return CDROM; } public void setCDROM(String CDROM) { this.CDROM = CDROM; } public String getHDD() { return HDD; } public void setHDD(String HDD) { this.HDD = HDD; } } =================================================================================== Computer.java // Serializable is not really needed, but who // knows... you might want to Serialize it ... import java.io.Serializable; import java.util.Set; public class Computer implements Serializable { private String Processor; private String Memory; private Set <Drives> Perifery; private Set <String> SW; public Computer() { } public Computer(String Processor, String Memory) { this.Processor = Processor; this.Memory = Memory; } @Override public String toString() { return "\n CPU: "+ this.getProcessor() + " Mem: "+ Memory + "\n\tPerifery" + Perifery.toString()+ "\n\tSW"+ SW.toString() +"\n"; } public String getMemory() { return Memory; } public void setMemory(String Memory) { this.Memory = Memory; } public String getProcessor() { return Processor; } public void setProcessor(String Processor) { this.Processor = Processor; } public Set<Drives> getPerifery() { return Perifery; } public void setPerifery(Set<Drives> Perifery) { this.Perifery = Perifery; } public Set<String> getSW() { return SW; } public void setSW(Set<String> SW) { this.SW = SW; }

07/29/2009 10:51 AM - Rado wrote:
Wow! Sorry! I figured it out, it's in your DOC! one more Thx!

07/29/2009 10:22 AM - Rado wrote:
Hi, really good! thx! How to populate under nodes, as sub classes?

07/11/2009 7:43 PM - JC wrote:
A complete jaxb and xmlbeans. so, going to try slash

02/05/2009 3:41 PM - Bharath wrote:
Hi.. This is really a cool tool.. but can you please tell me how to automatically populate attributes for a node? I tried defining a settermethod for the attribute name but the method did not get call

Post a Comment:

Your Name:
Your Comment:

 Last Updated: Tuesday June 10, 2014