Chapter 16: Extensible Markup Language (XML)

XML stands for Extensible Markup Language and like HTML, it is a subset of the Standard Generalized Markup Language (SGML). All three standards (XML, HTML, and SGML), are managed by the World Wide Web Consortium (W3C), and a page with details on XML can be found at (http://www.w3.org/XML/). The XML standard was created to help tackle the challenges of electronic publishing because the limits of HTML were rapidly being reached. That original goal has gradually expanded so that XML now encompasses everything from electronic data interchange (EDI) applications to streaming news, stock quote feeds (such as Really Simple Syndication, or RSS), and Web Services.

How Does XML Differ from HTML?

While HTML and XML are derivations of the same SGML standard, they are used for entirely different purposes:

  • HTML has a limited element set that is predefined by the HTML standard itself. In contrast, XML has an unlimited, user -definable element set.

  • HTML is focused on the page presentation of data while XML focuses on the specification of data. In other words, the markup elements in HTML are concerned with describing how data to be displayed on a page will look. XML is concerned with describing what the data contained in an XML document is.

  • HTML documents are inherently flat (non-hierarchical) and are largely unstructured. They are meant to be interpreted and rendered by a web browser in a top-to-bottom fashion. XML documents, however, have a tree structure to them. Different elements have a hierarchical relationship to each other. You can search an XML document along these hierarchical paths, something you cannot do with an HTML document.

  • An XML document can be easily transformed into other common document formats, including simple text, PDF, and HTML, to name only a few. While it is possible to convert HTML into other formats using third-party tools, the XML standard explicitly specifies a standardized methodology for converting XML documents.

While HTML is very good at displaying information, it is very difficult to do anything more with HTML than render it in a browser window. Information in standard HTML is not categorized or specified in any meaningful way. If you want to be able to manipulate and process data, then XML is the right tool.

 <?xml version="1.0" encoding="UTF-8"?> <ADDRESSBOOK>      <ENTRY ID="1">           <FIRSTNAME>John</FIRSTNAME>           <LASTNAME>Doe</LASTNAME>           <ADDRLINE1>100 Maple Lane</ADDRLINE1>           <CITY>Dallas</CITY>           <STATE>TX</STATE>           <ZIP>75201</ZIP>           <HOMEPHONE>555-555-1111</HOMEPHONE>           <EMAIL>jdoe@someisp.com</EMAIL>      </ENTRY>      <ENTRY ID="2">           <FIRSTNAME>Jane</FIRSTNAME>           <LASTNAME>Doe</LASTNAME>           <ADDRLINE1>1 Cherry Lane</ADDRLINE1>           <ADDRLINE2>Apt 201</ADDRLINE2>           <CITY>Buffalo</CITY>           <STATE>NY</STATE>           <ZIP>14201</ZIP>           <HOMEPHONE>555-555-2222</HOMEPHONE>           <EMAIL>doej@someisp.com</EMAIL>      </ENTRY> </ADDRESSBOOK> 

This listing could be, for instance, an address book exported from an e-mail program into an XML file. We ll get into the mechanics of the file later in the chapter in the XML Document Validation “Document Type Definition (DTD) section, but for the moment we can make some initial observations about this XML document:

  • XML uses the familiar <elementname> </elementname> markup syntax that HTML uses. Each piece of data is denoted by starting and ending elements. Note that unlike in HTML, element names are case-sensitive, so a <FIRSTNAME> element is not equivalent to a <firstname> element.

  • The XML document has an XML declaration at the beginning of the file that designates that it is an XML document, what version of the XML specification it adheres to, and what character encoding is used to specify the data. You may encounter XML files in the real world that do not include this declaration (i.e., XML files used to hold configuration data for applications), but to follow the specification correctly, the declaration should be included.

  • The XML document has only one root or outermost element; in this case, it is <ADDRESSBOOK>. The root element is allowed to contain data itself, but it often only contains other child elements.

  • The elements are nested in a hierarchical fashion and no overlapping is allowed. In other words, the root element <ADDRESSBOOK> has <ENTRY> elements as its children. Each <ENTRY> element can have <FIRSTNAME>, <LASTNAME>, etc., elements as its children. Each element must remain in its proper scope. In other words, because a <FIRSTNAME> element starts under an <ENTRY> element, its closing tag (</FIRSTNAME>) must occur before the parent s closing tag (</ENTRY>) and before the opening tag of any other element at the same hierarchical level (<LASTNAME>, for instance).

    The actual data is presented between the opening and closing tags of an element (for this example, the last name is Doe ):

     <LASTNAME>Doe</LASTNAME> 
Note  

Unlike some HTML elements, XML elements are always required to be closed off.

  • Elements may carry attributes. In the case of our example, there are multiple <ENTRY> elements. They can be distinguished from one another by the ID attribute included in the opening tag.

If we were to store this data in an Oracle table with a structure that matches the data, such as:

 CREATE TABLE address_book (id          number(5)     CONSTRAINT address_book_pk PRIMARY KEY, first_name     varchar2(50), last_name     varchar2(50), addr_line1     varchar2(60), addr_line2     varchar2(60), city          varchar2(60), state_code     varchar2(2), zip_code     number(9), phone          number(12), email_addr     varchar2(50)); 

we would need to insert a row into the address_book table for each <ENTRY> element in the XML document. We can use the ID attribute on each <ENTRY> element to populate the primary key.

The XML code in Listing 1 illustrates the fundamental and most significant difference between HTML and XML: XML clearly describes the data contained within the document, rather than how it should be displayed.



Oracle Application Server 10g Web Development
Oracle Application Server 10g Web Development (Oracle Press)
ISBN: 0072255110
EAN: 2147483647
Year: 2004
Pages: 192

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net