This article explains XML Processors and two of the popular XML document parsing APIs namely SAX and DOM. Java example programs are also provided.
Introduction
XML processors are needed for the following reasons:
- The processor must check the basic syntax of the document for well-formedness.
- The processor must replace all occurrences of an entity with its definition.
- The processor must copy the default values for attributes in a XML document.
- The processor must check for the validity of the XML document if either a DTD or XML Schema is included.
Although an XML document exhibits a regular and elegant structure, that structure does not provide applications with convenient access to document’s data. This need led to the development of two standard APIs for XML processors: SAX (Simple API for XML) and DOM (Document Object Model).
SAX Approach
The SAX standard, released in May 1998, was developed by an XML user group, XML-DEV. SAX has been widely accepted as a de facto standard and is widely supported by XML processors.
The SAX approach to processing is known as event processing. The processor scans the document from beginning to end sequentially. Every time a syntactic structure like opening tag, attributes, text or a closing tag is recognized, the processor signals an event to the application by calling an event handler for the particular structure that was found. The interfaces that describe the event handlers form the SAX API.
Below is an example Java program which reads an XML document using SAX API:
//file.xml - XML document
<?xml version="1.0"?>
<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
//ReadXMLFile.java - Java File
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName,String qName,Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName,String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : " + new String(ch, start, length));
bsalary = false;
}
}
};
saxParser.parse("c:\\file.xml", handler);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Output:
Start Element :company
Start Element :staff
Start Element :firstname
First Name : yong
End Element :firstname
Start Element :lastname
Last Name : mook kim
End Element :lastname
Start Element :nickname
Nick Name : mkyong
End Element :nickname
Start Element :salary
Salary : 100000
End Element :salary
End Element :staff
Start Element :staff
Start Element :firstname
First Name : low
End Element :firstname
Start Element :lastname
Last Name : yin fong
End Element :lastname
Start Element :nickname
Nick Name : fong fong
End Element :nickname
Start Element :salary
Salary : 200000
End Element :salary
End Element :staff
End Element :company
DOM Approach
An alternative to SAX approach is DOM. In a XML processor, the parser builds the DOM tree for an XML document. The nodes of the tree are represented as objects that can be accessed and manipulated by the application.
The advantages of DOM over SAX are:
- DOM is better for accessing a part of an XML document more than once.
- DOM is better for rearranging (sorting) the elements in a XML document.
- DOM is best for random access over SAX’s sequential access.
- DOM can detect invalid nodes later in the document without any further processing.
The disadvantages of DOM over SAX are:
- DOM structure (tree) is stored entirely in the memory, so large XML documents require more memory.
- Large documents cannot be parsed using DOM.
- DOM is slower when compared to SAX.
Below is an example Java program which reads an XML document using DOM API:
//file.xml - XML document
<?xml version="1.0"?>
<company>
<staff id="1001">
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff id="2001">
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
//ReadXMLFile.java - Java File
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
File fXmlFile = new File("/Users/mkyong/staff.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("staff");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Staff id : " + eElement.getAttribute("id"));
System.out.println("First Name : " + eElement.getElementsByTagName("firstname").item(0).getTextContent());
System.out.println("Last Name : " + eElement.getElementsByTagName("lastname").item(0).getTextContent());
System.out.println("Nick Name : " + eElement.getElementsByTagName("nickname").item(0).getTextContent());
System.out.println("Salary : " + eElement.getElementsByTagName("salary").item(0).getTextContent());
}
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Output:
Root element :company
—————————-
Current Element :staff
Staff id : 1001
First Name : yong
Last Name : mook kim
Nick Name : mkyong
Salary : 100000
Current Element :staff
Staff id : 2001
First Name : low
Last Name : yin fong
Nick Name : fong fong
Salary : 200000
Suryateja Pericherla, at present is a Research Scholar (full-time Ph.D.) in the Dept. of Computer Science & Systems Engineering at Andhra University, Visakhapatnam. Previously worked as an Associate Professor in the Dept. of CSE at Vishnu Institute of Technology, India.
He has 11+ years of teaching experience and is an individual researcher whose research interests are Cloud Computing, Internet of Things, Computer Security, Network Security and Blockchain.
He is a member of professional societies like IEEE, ACM, CSI and ISCA. He published several research papers which are indexed by SCIE, WoS, Scopus, Springer and others.
Leave a Reply