X-Factor of XML parsers

Java XML parsers

I have used XML in spring, hibernate and some other frameworks. I didn’t get pain to parse and read those configuration files ever because framework’s API was doing that for me. Just before 10 days I had attended a training of AXIOM in my organization, I was sitting like a dumb person because I was completely unaware of things like Push parsing and Pull parsing. Any how I had to complete that training and I did it (But I don’t even remember what is AXIOM ;)).

Then, I started learning about these two things Push and Pull parsing, and I found them very interesting. So, I thought of writing a Tutorial on this topic. This is my very first tutorial in my life so please give suggestions on improvements.
Basic Introduction:

Streaming(Event-based) versus DOM(Tree-based)

Generally speaking, there are two programming models for working with XML infosets: streaming and the document object model (DOM).

DOM(Tree-based): The DOM model involves creating in-memory objects representing an entire document tree and the complete infoset state for an XML document. · Tree-based APIs are easier to be used but they are less efficient because these APIs read the whole document and place it into the memory. · Thus tree APIs are normally not practical for documents larger than a few megabytes in size or in memory constrained environments such as J2ME. · Examples: DOM, JDOM

Streaming(Event-based) :

Streaming refers to a programming model in which XML infosets are transmitted and parsed serially at application runtime, often in real time, and often from dynamic sources whose contents are not precisely known beforehand. · It provides a smaller memory footprint, reduced processor requirements, and higher performance in certain situations. · Examples: STAX, SAX.

What does Streaming means?

It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards). Such a system could be very memory efficient and easy to program with. It is what I thought Stax would be, before I saw the Stax examples.
Again in Sreaming-based APIs we have two different Models

(1) Streaming push parsing (Example: Sax) (2) Streaming pull parsing (Example: Stax)

Streaming pull parsing refers to a programming model in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset–that is, the client only gets (pulls) XML data when it explicitly asks for it.

Streaming push parsing refers to a programming model in which an XML parser sends (pushes) XML data to the client as the parser encounters elements in an XML infoset–that is, the parser sends the data whether or not the client is ready to use it at that time.

Push Parsing vs. Pull Parsing:

(1) In pull parsing, events are generated by the parsing application (not the parser), thus providing parsing regulation to the client, rather than the parser. Whether in case of Push parsing the control is in the hands of parser not the Application. (2) Pull parsing code is simpler and it has fewer libraries than push parsing. (3) Pull parsing clients can read multiple XML documents simultaneously. (4) Pull parsing allows you to filter XML documents and skip parsing events. This point makes clear that you have full control of parsing process. Push parsing model generates events until the completion of documents.

In this tutorial I am using SAX as a Push parser and STAX as a Pull parser SAX is fast and efficient, but its event model makes it most useful for such state-independent filtering. For example, a SAX parser calls one method in your application when an element tag is encountered and calls a different method when text is found. If the processing you’re doing is state-independent (meaning that it does not depend on the elements have come before), then SAX works fine.

SAX provides an Event-Driven XML Processing following the Push-Parsing Model. What this model means is that in SAX, Applications will register Listeners in the form of Handlers to the Parser and will get notified through Call-back methods. Here the SAX Parser takes the control over Application thread by Pushing Events to the Application.

Example of Push parsing using SAX:

package com.pushparser;

import java.io.File;  
import java.io.FileReader;

import org.xml.sax.Attributes;  
import org.xml.sax.InputSource;  
import org.xml.sax.XMLReader;  
import org.xml.sax.helpers.DefaultHandler;  
import org.xml.sax.helpers.XMLReaderFactory;

public class SAXXMLReader extends DefaultHandler  
{
//static int i=0;
public static void main (String args[])  
throws Exception  
{
XMLReader xr = XMLReaderFactory.createXMLReader();  
SAXXMLReader handler = new SAXXMLReader();  
xr.setContentHandler(handler);  
xr.setErrorHandler(handler);

// Parse each file provided on the
// command line.
File file = new File("test.xml");  
FileReader r = new FileReader(file);  
xr.parse(new InputSource(r));

System.out.println("count.......... "+i);  
}

public SAXXMLReader ()  
{
super();  
}

////////////////////////////////////////////////////////////////////
// Event handlers.
////////////////////////////////////////////////////////////////////

public void startDocument ()  
{
i++;  
System.out.println("Start document");  
}

public void endDocument ()  
{
i++;  
System.out.println("End document");  
}

public void startElement (String uri, String name,  
String qName, Attributes atts)  
{
i++;  
if ("".equals (uri))  
System.out.println("Start element: " + qName);  
else  
System.out.println("Start element: {" + uri + "}" + name);  
}

public void endElement (String uri, String name, String qName)  
{
i++;  
if ("".equals (uri))  
System.out.println("End element: " + qName);  
else  
System.out.println("End element: {" + uri + "}" + name);  
}

public void characters (char ch[], int start, int length)  
{
i++;  
System.out.print("Characters: \"");  
for (int i = start; i < start + length; i++) {  
switch (ch[i]) {  
case '\\':  
System.out.print("\\\\");  
break;  
case '"':  
System.out.print("\\\"");  
break;  
case '\n':  
System.out.print("\\n");  
break;  
case '\r':  
System.out.print("\\r");  
break;  
case '\t':  
System.out.print("\\t");  
break;  
default:  
System.out.print(ch[i]);  
break;  
}
}
System.out.print("\"\n");  
}

}

test.xml

<?xml version=’1.0′ encoding=’utf-8’?>  
<Customers>  
<Customer id=”1″>  
<Name>ABC Pizza</Name>  
<Address>1 Main Street</Address>  
<City>Simsbury</City>  
<State>CT</State>  
<Zip>06070</Zip>  
</Customer>  
</Customers>  

This is a simple example to read a XML file using SAX parser. You can override methods like startDocument() of DefaultHandler class(Which has implemented ContentHandler interface). But in this case as it is a push parser the control of parsing is not in Application hands, Parser is handling that process so he will process through the whole document, in this example there will be total 29 events.

On the other hand, for state-dependent processing, where the program needs to do one thing with the data under element A but something different with the data under element B, then a pull parser such as the Streaming API for XML (StAX) would be a better choice.

StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.

Example of Pull parsing using STAX:

package com.pullparser;  
import java.io.DataInputStream;  
import java.io.FileNotFoundException;  
import java.io.FileReader;  
import java.io.IOException;  
import java.util.Iterator;

import javax.xml.stream.XMLEventReader;  
import javax.xml.stream.XMLInputFactory;  
import javax.xml.stream.XMLStreamException;  
import javax.xml.stream.events.Attribute;  
import javax.xml.stream.events.Characters;  
import javax.xml.stream.events.EndElement;  
import javax.xml.stream.events.StartElement;  
import javax.xml.stream.events.XMLEvent;

public class SimpleXmlEventReader {  
public static void main(String[] args) throws FileNotFoundException,  
XMLStreamException {  
String filename = "test.xml";

XMLInputFactory factory = XMLInputFactory.newInstance();  
XMLEventReader reader = factory.createXMLEventReader(new FileReader(  
filename));  
while (reader.hasNext()) {  
/*
* Here is the main difference you are pulling something from the
* XML, When you are ready but in Push parsing model , parser will
* push the results to the client whether client is ready to take it
* or not.
*/
XMLEvent event = reader.nextEvent();

XMLEvent nextEvent = reader.peek();  
System.out  
.println("event.getEventType()..." + event.getEventType());

switch (event.getEventType()) {  
case XMLEvent.START_ELEMENT:  
StartElement se = event.asStartElement();

System.out.print("<" + se.getName());

Iterator attributes = se.getAttributes();  
while (attributes.hasNext()) {  
Attribute attr = (Attribute) attributes.next();  
System.out.print(" " + attr.getName() + "=\""  
+ attr.getValue() + "\"");
}

System.out.print(">");

if (nextEvent.isCharacters()) {  
Characters c = reader.nextEvent().asCharacters();  
if (!c.isWhiteSpace())  
System.out.print(c.getData());  
}
break;

case XMLEvent.END_ELEMENT:  
EndElement ee = event.asEndElement();  
System.out.print("</" + ee.getName() + ">");  
System.out.println();  
break;

}

reader.close();  
}
}
}

You can use same XML for this example. See these comments in the program:

/*

* Here is the main difference you are pulling something from the
* XML, When you are ready but in Push parsing model , parser will
* push the results to the client whether client is ready to take it
* or not.
*/

You can transverse through the XML by getting next event using reader.nextEvent(); method. Application has full control.

This is the basic difference between Push parsing and Pull parsing model that in Push parsing Parser controls the whole process whether in Pull parser Application does this by its own.

I have tried to make clear this difference this tutorial, I am not sure whether I have done this, but I am sure that I will get suggestions on improvements in this tutorial. Thanks!

Regards,

Share on : Twitter, Facebook or Google+