This article explains Document Type Definition (DTD) which is one of the ways to specify high level syntax for XML document. We will learn creating and using DTDs along with XML documents.
Contents
Introduction
A Document Type Definition (DTD) is a set of structural rules called declarations, which should be followed by the tags, attributes and entities in a XML document. A document can be tested against the DTD to determine whether it confirms to the rules that the DTD describes.
A DTD can be embedded in the XML document in which case it is called as internal DTD or the DTD can be specified in a separate file which can be linked to several XML documents. In such case, it is known as external DTD.
Syntactically, a DTD is a sequence of declarations, each of which has the form of a markup declaration as shown below:
<!keyword … >
The keyword can be any one of the following four keywords:
- ELEMENT – which defines a tag
- ATTLIST – which defines the attributes of a tag
- ENTITY – which defines an entity
- NOTATION – which defines data type notations
Declaring Elements
Each element declaration in a DTD specifies the structure of one category of elements. The declaration provides the element name along with the specification of the structure of that element.
An XML document can be thought of as a tree. An element is an internal node or a leaf node in the tree. The form of an element declaration for elements that contain other elements is as shown below:
<!ELEMENT element_name (list of names of child elements)>
For example, the declaration of a student element can be created as shown below:
<!ELEMENT student (name, regdno, branch, section)>
Multiple occurrences of the child elements can be specified using the child element specification modifiers which are given below:
For example, consider the modified declaration of the above student element:
<!ELEMENT student (name, regdno, branch, section?, email*)>
In the above example declaration, section element can occur zero or one time and email element can occur zero or many times.
The leaf nodes of a DTD specify the data types of the content of their parent nodes, which are elements. Generally the content of leaf node is PCDATA, for parsable character data. It is a string of any printable characters except < and &.
Two other content types that can be specified are EMPTY and ANY. The EMPTY type specifies that the element has no content and ANY type specifies that the element might contain any content.
For example, the leaf element declaration is as shown below:
<!ELEMENT element_name (#PCDATA)>
Declaring Attributes
The attributes of an element are declared separately from the element declaration. The declaration of an attribute is as shown below:
<!ATTLIST element_name attribute_name attribute_type [default_value]>
If more than one attribute is declared for a given element, such declarations can be combined as shown below:
<!ATTLIST element_name
attribute_name_1 attribute_type default_value_1
attribute_name_2 attribute_type default_value_2
---
attribute_name_n attribute_type default_value_n
>
There are ten different attribute types. Among them, most frequently used type if CDATA, which specifies character data (any string characters except < and &).
The default value of an attribute can be an actual value or a requirement for the value of the attribute in the XML document. The possible default values for an attribute are given below:
Declaring Entities
Entities can be defined so that they can be referenced anywhere in the content of an XML document, in which case they are called general entities. All predefined entities are general entities. Entities can also be defined so that they can be referenced only in DTDs. Such entities are called parameter entities.
An entity declaration is as shown below:
<!ENTITY [%] entity_name “entity_value”>
The optional percentage sign (%) when present in the entity declaration denotes a parameter entity rather than a general entity.
When a document includes a large number of references to the abbreviation HyperText Markup Language, it can be defined as an entity as shown below:
<!ENTITY html “HyperText Markup Language”>
Any XML document that includes the DTD containing the above declaration can specify the complete name with just the reference &html;
When an entity is longer than a few words, its text is defined outside the DTD. In such cases, the entity is called an external text entity. The declaration of an external entity is shown below:
<!ENTITY entity_name SYSTEM “file_location”>
The keyword SYSTEM specifies that the definition of the entity is in a different file, which is specified as the string following SYSTEM.
A Sample DTD
A Document Type Definition is saved with the extension .dtd and a normal XML file is saved with the extension .xml
Below is an example DTD which contains the specification for storing the details of students:
//students.dtd - DTD file
<?xml version="1.0" encoding="utf-8" ?>
<!ELEMENT students (student+)>
<!ELEMENT student (name, branch, section, regdno)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT branch (#PCDATA)>
<!ELEMENT section (#PCDATA)>
<!ELEMENT regdno (#PCDATA)>
<!ATTLIST student id CDATA #REQUIRED>
The XML code that conforms to the above DTD is given below:
//student.xml - XML file
<?xml version="1.0" encoding="utf-8"?>
<students>
<student id="1">
<name>K.Ramesh</name>
<branch>CSE</branch>
<section>A</section>
<regdno>12PA1A0501</regdno>
</student>
</students>
Internal and External DTDs
A DTD can be placed within the XML file or in a separate file. If the DTD is placed within the XML document, then it is called as internal DTD. An internal DTD is specified as shown below as the second line in the XML document:
<!DOCTYPE root-element [ —DTD text— ]>
Below is an example for internal DTD:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE students[
<!ELEMENT students (student+)>
<!ELEMENT student (name, branch, section, regdno)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT branch (#PCDATA)>
<!ELEMENT section (#PCDATA)>
<!ELEMENT regdno (#PCDATA)>
<!ATTLIST student id CDATA #REQUIRED>
]>
<students>
<student id="1">
<name>K.Ramesh</name>
<branch>CSE</branch>
<section>A</section>
<regdno>12PA1A0501</regdno>
</student>
</students>
If the DTD is written separately in another file, then it is called as external DTD. An external DTD is linked with an XML document as shown below:
<!DOCTYPE root-element SYSTEM “filename.dtd”>
Below is an example for external DTD:
/students.dtd - DTD file
<?xml version="1.0" encoding="utf-8" ?>
<!ELEMENT students (student+)>
<!ELEMENT student (name, branch, section, regdno)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT branch (#PCDATA)>
<!ELEMENT section (#PCDATA)>
<!ELEMENT regdno (#PCDATA)>
<!ATTLIST student id CDATA #REQUIRED>
//student.xml - XML file
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE students SYSTEM "students.dtd">
<students>
<student id="1">
<name>K.Ramesh</name>
<branch>CSE</branch>
<section>A</section>
<regdno>12PA1A0501</regdno>
</student>
</students>
An XML document which contains a DTD and is validated by a validating XML parser is known as a valid XML document.
Suryateja Pericherla, at present is a Research Scholar (full-time Ph.D.) in the Dept. of Computer Science & Systems Engineering at Andhra University, Visakhapatnam. Previously worked as an Associate Professor in the Dept. of CSE at Vishnu Institute of Technology, India.
He has 11+ years of teaching experience and is an individual researcher whose research interests are Cloud Computing, Internet of Things, Computer Security, Network Security and Blockchain.
He is a member of professional societies like IEEE, ACM, CSI and ISCA. He published several research papers which are indexed by SCIE, WoS, Scopus, Springer and others.
Leave a Reply