XML - An Overview


XML—An Overview

XML, a subset of Standard Generalized Markup Language (SGML), is a text-based markup language used to describe the data in a document. However, you cannot format data using XML. To do so, you need to use style sheets. You'll learn about style sheets later in this chapter.

You can create an XML document by using authoring tools that are simple and easily available. Similar to HTML documents, you can create an XML document in a Notepad file. To make this file an XML document, you need to save it with the .xml extension. You'll learn how to create an XML document later in this chapter.

Another significant advantage of XML is the ability to create tags with names that can be easily identified by users. XML allows you to create elements, attributes, and containers with meaningful names, as and when required by the user. For example:

 <Home_Page> <Employee_Details> <Employee_Name>John Roger</Employee_Name> <Employee_Name>George Smith</Employee_Name> <Employee_Name>Bob Murray</Employee_Name> <Employee_Name>Daniel Clark</Employee_Name> </Employee_Details> </Home_Page> 

You can view the output of the preceding code by saving the Notepad file with the name Employee.xml and opening it in Internet Explorer (see Figure 23.1).

click to expand
Figure 23.1: A sample XML document.

The ability to create meaningful custom tags makes XML a hardware- and software-independent markup language. Any computer running on any operating system can easily interpret a document created in XML. Therefore, XML is widely used as a markup language that transfers structured data over a network. In addition, XML is used to transfer structured data in high-level B2B transactions.

The following list summarizes the advantages of XML as a markup language.

  • Information sharing. Any data can be stored in the XML format and be used by various tools that read data, write data, and transform data between XML and other formats. This is possible because of the separation of data and presentation.

  • Content delivery. XML supports different users and channels, such as digital TV, Web, and multimedia. This helps in delivering e-business applications to customers through their chosen media.

  • Extensibility. As the name suggests, XML supports extensibility by enabling users to create new tags.

    start sidebar
    CSS AND XSL

    XML is used to describe data, but you cannot use it to format data. To format the data in an XML document, you can use style sheets, such as Extensible Style Sheet Language (XSL) or Cascading Style Sheets (CSS). These style sheets contain information about how to present the data in the document. As a result, you can present the same data in different formats in an XML document.

    • CSS. This is syntax used to present the text in an HTML document. You can use CSS to set the formatting properties for elements in an HTML document, such as color, font, font size, text spacing, and so on.

    • XSL. This is an XML-based language that's used to present the data in an XML document. In addition, XSL is used to transform an XML document into a document that can be understood by different media or browsers. For example, you can transform an XML document into another XML document or an HTML document. To do this, you can use Extensible Style Sheet Language Transformation (XSLT), which is a subset of XSL.

      When an XML document is transformed into some other type of document, the XSLT engine converts the structure of the document into an abstract form called the internal model. This internal model stores the data of the XML document in the form of elements of a tree. You can access the elements of a tree by using XPath, which is another subset of XSL.

    end sidebar

  • Semantic information. An HTML-based search engine cannot distinguish between a document authored by John Bright and a document that describes the star that shines bright. On the other hand, XML builds semantic information into the documents, making it possible to discern the meaning of a document.

Besides these benefits, XML is easy to read, easy to write, and tree-based.

Some of the common specifications related to XML include the following:

  • Document Type Definition (DTD). DTDs specify the rules for XML documents that make it easier for everyone to understand their structure and logic.

  • Document Object Model (DOM). DOM enables navigation and modification in an XML document, including adding, updating, or deleting the content of elements. This also allows you to access XML data programmatically.

  • XML schemas. XML schemas can be considered a superset of DTDs and are also used to define the structure of XML documents.

Overview of DTD

DTD is a set of rules to define the structure and logic of XML documents. The documents that store these rules are called DTD documents (or DTDs) and have the extension .dtd.

To better understand the concept of DTDs, compare them with the creation of tables in a database. When you create a table in a database system, you specify the columns, the datatypes for different columns, the validation rules for data within columns, and so on. Similarly, you can use a DTD to specify rules that can be used in XML documents, such as tags and attributes. DTDs can be considered rule-books for XML documents.

Tip

It's not essential for you to create a DTD for your XML documents, but it can help users understand the structure of your XML documents and create similar ones. These users can refer to your DTD to understand the structure and logic of your XML documents.

When you create a DTD for an XML document, the XML document is checked against the rules specified in the DTD. If the XML document adheres to all the DTD rules, the document is considered valid. Otherwise, the XML document fails to generate the desired output.

Here are the components of a DTD:

  • DOCTYPE declarations. The <!DOCTYPE> declaration contains the information about the location of the DTD.

  • Element declarations. An element is a logical component of a document. Every element that's contained in an XML document must have a corresponding declaration in the DTD. The element declaration is used to validate the elements in the document.

  • Attributes declaration. Attributes represent the characteristics of an element. An element can contain multiple attributes. For each element attribute that's used in an XML document, a corresponding attribute declaration must be specified in the DTD.

  • Content model. The content model is used to describe the content of an element.

  • Entity declaration. Entities are aliases associated with a group of data. They're used in a document to avoid typing long pieces of text repeatedly.

Here's the general structure of a DTD:

 <!DOCTYPE dtd-name [ Element declaration( Attribute declaration ] > 

Element Declaration

An element declaration specifies a single markup element. Every tag used in the XML document must be defined with an element declaration in the corresponding DTD.

The syntax to declare an element is

 <!ELEMENT element-name (element content-type)> 

For example, consider a DTD called restaurant.dtd that's used to define the details about a restaurant:

  • RESTAURANT. Identifies the restaurant.

  • NAME. Identifies the name of the restaurant.

  • LOCATION. Identifies the location of the restaurant.

  • ADDRESS. Identifies the address of the restaurant.

  • PHONE. Provides the phone number of the restaurant.

  • REMARKS. Used for comments about the restaurant.

The declarations for these elements are

 <!ELEMENT restaurant> <!ELEMENT name> <!ELEMENT location> <!ELEMENT address> <!ELEMENT phone> <!ELEMENT remarks> 

Attribute Declaration

An attribute declaration defines the sets of attributes for an element. Every attribute used in an XML document must have a declaration in the corresponding DTD. All elements need not have attributes.

For example, in restaurant.dtd, attributes may be added to the RESTAURANT element, as shown:

 RESTAURANT:TYPE 

The values for TYPE can be Continental, Chinese, Indian, Mexican, and Multi-Cuisine.

The following is the declaration for this attribute:

 <!ATTLIST RESTAURANT TYPE (INDIAN | CONTINENTAL | CHINESE | MEXICAN | MULTI-CUISINE ) "CONTINENTAL" #REQUIRED> 

The default value for an attribute is enclosed in quotation marks. #REQUIRED indicates that the attribute is mandatory and is required each time the element is used in a document.

Content Model

A content model is part of the element declaration and is used to describe the content of the element. There are three different types of content:

  • Data content. This signifies text-based characters and is the most basic type of content. Data content can be specified either as #CDATA or #PCDATA. #CDATA is used to specify that the element contains data that's not to be parsed by the parser, while #PCDATA is used to specify that the element contains data that's to be parsed by the parser.

  • Element content. This specifies the child elements that are contained in the element. In addition, element content also specifies which of the child elements are required and the order in which these elements must appear in the document.

  • Mixed content. Mixed content signifies both the data and element content.

An element with data is declared as shown:

 <!ELEMENT element-name (data-type)> 

An element with a child element is declared as shown:

 <!ELEMENT element-name (child-element-name)> 

Multiple child elements can be separated with commas. In an XML document, the child elements must appear in the same sequence as they've been declared in the DTD. A question mark after a child element indicates that the element is optional.

In the restaurant.dtd, the RESTAURANT element contains all the other elements. Here's the restaurant.dtd after adding the content model information:

 <!ELEMENT RESTAURANT (NAME, LOCATION, ADDRESS, PHONE, REMARKS?)>                 <!ATTLIST RESTAURANT TYPE (INDIAN | CONTINENTAL | CHINESE | MEXICAN | MULTI-CUISINE ) "CONTINENTAL" #REQUIRED>                 <!ELEMENT NAME (#PCDATA)>                 <!ELEMENT LOCATION EMPTY>                 <!ATTLIST LOCATION TYPE (SOUTH|NORTH|EAST|WEST) "SOUTH" )> <!ELEMENT ADDRESS (#PCDATA)> <!ELEMENT PHONE (#PCDATA)> 

The keyword EMPTY can be used as the content type to specify that the element has no child elements. The ELEMENT LOCATION is a singleton tag that doesn't require start and end tags.

Entity Declaration

Entities are used within a document to avoid typing long pieces of repetitive text. Such texts can be assigned an alias, which can be used in the document. When the document is processed, the alias is replaced by the specified text.

Table 23.2 lists some of the predefined entities in XML.

Table 23.2: Predefined Entities in XML

Entity

Character

&lt;

<

&gt;

>

&amp;

&

&quto

"

&apos;

'

Entities are of two types:

  • General entities. A general entity is declared as follows:

         <!ENTITY myaddress " 112 Vasant Enclave New Delhi -57"> 

    This is an example of an internal entity, where the text phrase being mapped is in the entity declaration itself. An external entity maps the unique name to a block of text stored outside of the document. A general entity is referenced with & before the entity name.

  • Parameter entities. Parameter entities are specified by %. These entities are similar to general entities but can be used only within the DTD.

A DTD is used to validate the content in an XML document. When the data in an XML document is exchanged over a network, the receiving application can validate the structure of the XML document based on the rules defined in a DTD. However, to do so, the receiving application requires a parser.

start sidebar
XML PARSERS

An XML parser is used to parse or validate an XML document. This involves checking the XML document for any errors. For example, if you omit any brackets while creating an XML document, the parser can trace the error in the document.

There are two types of parsers used with XML documents, non-validating and validating:

  • Non-validating parsers are used with well-formed XML documents that don't have a DTD associated with them. As a result, a non-validating parser only ensures that an XML document is well formed.

  • Validating parsers validate the structure of an XML document against the rules defined in a DTD. Therefore, they can be used with valid XML documents only. After checking the structure and content of an XML document, the parser raises an error if the document isn't valid. However, if the document conforms to the rules in a DTD, the document is transferred to the receiving application.

end sidebar

Note

In ColdFusion MX, you use the function xmlParse() to implement the parser functionality.

In addition to validating a document after it's created, you can also use DTDs with the authoring tools to ensure that the document you create conforms to the rules defined in a DTD. In other words, when you create an XML document by using an authoring tool that has a DTD associated with it, the authoring tool will ensure that you can use only the elements and attributes defined in a DTD in your document.

To use a DTD with an XML document, you first need to associate the XML document with a DTD. To do this, include the DOCTYPE declaration statement in the beginning of the XML document.

The Structure of an XML Document

An XML document consists of character data and the markup that describes the data. Here's a sample XML document based on restaurant.dtd:

 <?xml version="1.0"?> <RESTAURANT TYPE="CONTINENTAL">        <NAME> Sensoi </NAME>        <LOCATION TYPE="SOUTH" />        <ADDRESS> West End, Wellingdon Street, New Delhi</ADDRESS>        <PHONE>91-011-6854672</PHONE> </RESTAURANT> 

An XML document has the following components:

  • XML declaration

  • Elements

  • Attributes

  • Entities

  • Comments

XML Declaration

An XML declaration is the first statement in an XML document and is used to identify it as an XML document. It's also used to specify processing instructions, such as whether the application should process only the XML document or the DTD as well. The XML declaration may include attributes, such as version and encoding. For example:

 <?XML version= "1.0" encoding="UTF-8"?> 

<? and ?> signify that XML is a processing instruction, used to pass messages to the application processing the XML document. Such processing instructions can be placed anywhere in the document.

The version attribute specifies the version of the XML document. The encoding attribute is used to specify the character encoding used by the author. UTF-8 corresponds to 8-bit ASCII characters.

Elements

Elements are the main components of a markup language and are defined in the DTD. They're specified using tags. A tag is specified within angular brackets (<>). It can be a paired tag with a start tag (<element>) and an end tag (</element>) It can also be a singleton tag that doesn't have start and end tags and therefore cannot contain any elements or data. Singleton tags are signified with the EMPTY keyword in the DTD.

Every XML document must have one root element that describes the function of the document, such as <RESTAURANT> in the restaurant.dtd example. The root element contains the other elements of the XML document.

In ColdFusion MX, you can access the root element through the XmlRoot property of an XML document object.

The XML elements contain a number of keys, including XmlName, XmlText, XmlAttributes, and XmlChildren.

Attributes

Attributes provide additional information about the elements and are embedded in the <start> tag. An attribute consists of an attribute name and an attribute value. In the preceding XML code, the RESTAURANT element contains a TYPE attribute that specifies the cuisine the restaurant specializes in.

Entities

Entities are used to specify an alias for test data that needs to be typed repeatedly. They must be declared before they're referenced in the XML document. Here's an example of an entity:

 <!ENTITY Poor " The restaurant has poor customer service"> 

This entity can be referenced as &Poor. For example:

 <REMARKS> &Poor </REMARKS> 

In an XML document, all entities are declared within a DOCTYPE declaration. The <!DOCTYPE [] > declaration follows the XML declaration. For example:

 <?xml version="1.0"?> <!DOCTYPE RESTAURANT[ <!ENTITY Poor " The restaurant has poor customer service"> ]> 

Comments

Here's the syntax to specify comments in an XML document:

 <! Comments 

For example:

 <?xml version="1.0"?> <! -This is a comment -> <RESTAURANT TYPE="CONTINENTAL">        <NAME> Sensoi </NAME>        <LOCATION TYPE="SOUTH" />        <ADDRESS> West End, Wellingdon Street, New Delhi</ADDRESS>        <PHONE>91-011-6854672</PHONE> </RESTAURANT> 

The XML DOM

The XML DOM is used to access and manipulate the XML document programmatically. The DOM is an in-memory, cached tree representation of an XML document that enables navigation of and modification to a document, including adding, updating, or deleting the content of elements. The DOM represents data as a hierarchy of object nodes.

Consider a sample XML document:

 <book>         <name> Gone with the Wind </name>         <language> English </language> </book> 

According to the XML DOM, this document contains the following objects:

  • Document object. This includes the entire document.

  • The book object. This is a root node that has two child elements, name and language.

  • The name object. This is a text object and has a sibling object, language.

  • Gone with the Wind. This is a text object and a child of the name object.

  • The language object. This is an object that has a sibling object name and a child text object.

  • English. This is a child of the language object.

Table 23.3 lists some of the commonly used XML DOM objects.

Table 23.3: XML DOM Objects

Object

Description

DOMDocument

Represents the top node of the XML DOM tree

XMLDOMNode

Represents a single node in the DOM tree

XMLDOMAttribute

Represents an attribute object

XMLDOMCDATASection

Marks the text so that it isn't interpreted as markup language

XMLDOMDocumentType

Contains information associated with the document declaration

XMLDOMEntity

Represents an entity in the XML document

XMLDOMProcessingInstruction

Represents a processing instruction




Macromedia ColdFusion MX. Professional Projects
ColdFusion MX Professional Projects
ISBN: 1592000126
EAN: 2147483647
Year: 2002
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net