XML Processors

Suryateja Pericherla 01/16/2025 1:31pm GMT+0530 Categories: XML. No Comments

This article explains XML Processors and two of the popular XML document parsing APIs namely SAX and DOM. Java example programs are also provided.

Contents

1 Introduction
2 SAX Approach
3 DOM Approach

Introduction

XML processors are needed for the following reasons:

The processor must check the basic syntax of the document for well-formedness.
The processor must replace all occurrences of an entity with its definition.
The processor must copy the default values for attributes in a XML document.
The processor must check for the validity of the XML document if either a DTD or XML Schema is included.

Although an XML document exhibits a regular and elegant structure, that structure does not provide applications with convenient access to document’s data. This need led to the development of two standard APIs for XML processors: SAX (Simple API for XML) and DOM (Document Object Model).

SAX Approach

The SAX standard, released in May 1998, was developed by an XML user group, XML-DEV. SAX has been widely accepted as a de facto standard and is widely supported by XML processors.

The SAX approach to processing is known as event processing. The processor scans the document from beginning to end sequentially. Every time a syntactic structure like opening tag, attributes, text or a closing tag is recognized, the processor signals an event to the application by calling an event handler for the particular structure that was found. The interfaces that describe the event handlers form the SAX API.

Below is an example Java program which reads an XML document using SAX API:

//file.xml - XML document
<?xml version="1.0"?>
<company>
   <staff>
      <firstname>yong</firstname>
      <lastname>mook kim</lastname>
      <nickname>mkyong</nickname>
      <salary>100000</salary>
   </staff>
   <staff>
      <firstname>low</firstname>
      <lastname>yin fong</lastname>
      <nickname>fong fong</nickname>
      <salary>200000</salary>
   </staff>
</company>

//ReadXMLFile.java - Java File
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class ReadXMLFile { 
   public static void main(String argv[]) { 
      try { 
         SAXParserFactory factory = SAXParserFactory.newInstance();
         SAXParser saxParser = factory.newSAXParser(); 
         DefaultHandler handler = new DefaultHandler() { 
            boolean bfname = false;
            boolean blname = false;
            boolean bnname = false;
            boolean bsalary = false; 
            public void startElement(String uri, String localName,String qName,Attributes attributes) throws SAXException { 
               System.out.println("Start Element :" + qName); 
               if (qName.equalsIgnoreCase("FIRSTNAME")) {
                  bfname = true;
               } 
               if (qName.equalsIgnoreCase("LASTNAME")) {
                  blname = true;
               } 
               if (qName.equalsIgnoreCase("NICKNAME")) {
                  bnname = true;
               } 
               if (qName.equalsIgnoreCase("SALARY")) {
                  bsalary = true;
               } 
            }

            public void endElement(String uri, String localName,String qName) throws SAXException { 
               System.out.println("End Element :" + qName); 
            }

            public void characters(char ch[], int start, int length) throws SAXException { 
               if (bfname) {
                  System.out.println("First Name : " + new String(ch, start, length));
                  bfname = false;
               } 
               if (blname) {
                  System.out.println("Last Name : " + new String(ch, start, length));
                  blname = false;
               } 
               if (bnname) {
                  System.out.println("Nick Name : " + new String(ch, start, length));
                  bnname = false;
               } 
               if (bsalary) {
                  System.out.println("Salary : " + new String(ch, start, length));
                  bsalary = false;
               } 
            } 
         };

         saxParser.parse("c:\\file.xml", handler);
      }
      catch (Exception e) 
      {
         e.printStackTrace();
      } 
   } 
}

Output:

Start Element :company

Start Element :staff

Start Element :firstname

First Name : yong

End Element :firstname

Start Element :lastname

Last Name : mook kim

End Element :lastname

Start Element :nickname

Nick Name : mkyong

End Element :nickname

Start Element :salary

Salary : 100000

End Element :salary

End Element :staff

Start Element :staff

Start Element :firstname

First Name : low

End Element :firstname

Start Element :lastname

Last Name : yin fong

End Element :lastname

Start Element :nickname

Nick Name : fong fong

End Element :nickname

Start Element :salary

Salary : 200000

End Element :salary

End Element :staff

End Element :company

DOM Approach

An alternative to SAX approach is DOM. In a XML processor, the parser builds the DOM tree for an XML document. The nodes of the tree are represented as objects that can be accessed and manipulated by the application.

The advantages of DOM over SAX are:

DOM is better for accessing a part of an XML document more than once.
DOM is better for rearranging (sorting) the elements in a XML document.
DOM is best for random access over SAX’s sequential access.
DOM can detect invalid nodes later in the document without any further processing.

The disadvantages of DOM over SAX are:

DOM structure (tree) is stored entirely in the memory, so large XML documents require more memory.
Large documents cannot be parsed using DOM.
DOM is slower when compared to SAX.

Below is an example Java program which reads an XML document using DOM API:

//file.xml - XML document
<?xml version="1.0"?>
<company>
   <staff id="1001">
      <firstname>yong</firstname>
      <lastname>mook kim</lastname>
      <nickname>mkyong</nickname>
      <salary>100000</salary>
   </staff>
   <staff id="2001">
      <firstname>low</firstname>
      <lastname>yin fong</lastname>
      <nickname>fong fong</nickname>
      <salary>200000</salary>
   </staff>
</company>

//ReadXMLFile.java - Java File
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;

public class ReadXMLFile { 
   public static void main(String argv[]) { 
      try {	
         File fXmlFile = new File("/Users/mkyong/staff.xml");
         DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
         DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
         Document doc = dBuilder.parse(fXmlFile);
         doc.getDocumentElement().normalize(); 
         System.out.println("Root element :" + doc.getDocumentElement().getNodeName()); 
         NodeList nList = doc.getElementsByTagName("staff"); 
         System.out.println("----------------------------"); 
         for (int temp = 0; temp < nList.getLength(); temp++) { 
            Node nNode = nList.item(temp); 
            System.out.println("\nCurrent Element :" + nNode.getNodeName()); 
            if (nNode.getNodeType() == Node.ELEMENT_NODE) { 
               Element eElement = (Element) nNode; 
               System.out.println("Staff id : " + eElement.getAttribute("id"));
               System.out.println("First Name : " + eElement.getElementsByTagName("firstname").item(0).getTextContent());
               System.out.println("Last Name : " + eElement.getElementsByTagName("lastname").item(0).getTextContent());
               System.out.println("Nick Name : " + eElement.getElementsByTagName("nickname").item(0).getTextContent());
               System.out.println("Salary : " + eElement.getElementsByTagName("salary").item(0).getTextContent()); 
            }
         }
      } 
      catch (Exception e) {
         e.printStackTrace();
      }
   } 
}

Output:

Root element :company

—————————-

Current Element :staff

Staff id : 1001

First Name : yong

Last Name : mook kim

Nick Name : mkyong

Salary : 100000

Current Element :staff

Staff id : 2001

First Name : low

Last Name : yin fong

Nick Name : fong fong

Salary : 200000

Suryateja Pericherla

Suryateja Pericherla, at present is a Research Scholar (full-time Ph.D.) in the Dept. of Computer Science & Systems Engineering at Andhra University, Visakhapatnam. Previously worked as an Associate Professor in the Dept. of CSE at Vishnu Institute of Technology, India.

He has 11+ years of teaching experience and is an individual researcher whose research interests are Cloud Computing, Internet of Things, Computer Security, Network Security and Blockchain.

He is a member of professional societies like IEEE, ACM, CSI and ISCA. He published several research papers which are indexed by SCIE, WoS, Scopus, Springer and others.

Note: Do you have a question on this article or have a suggestion to make this article better? You can ask or suggest us by filling in the below form. After commenting, your comment will be held for moderation and will be published in 24-48 hrs.

Introduction

SAX Approach

DOM Approach

Related Posts

Leave a Reply Cancel reply