Chapter 1: XML Syntax


Extensible Markup Language (XML) is now in widespread use. Many applications on the Internet or residing on individual computers use some form of XML to run or manage the processes of an application. Earlier books about XML commented that XML was to be the "next big thing." Now, it is "the big thing." In fact, there really isn't anything bigger.

For this reason, you want to understand XML and its various applications. This book focuses on some of the more common ways to apply XML to the work you are doing today. Whether you need Web services, searching, or application configuration, you can find immediate uses for XML. This book shows you how to apply this markup language to your work.

This first chapter looks at the basics of XML, why it exists, and what makes it so powerful. Finally, this chapter deals with XML namespaces and how to properly apply them to XML instance documents. If you are already pretty familiar with the basics of XML, feel free to skim this chapter before proceeding.

The Purpose of XML

Before you actually get into the basics of XML, you should understand why this markup language is one of the most talked about things in computing today. To do this, look back in time a bit.

During the days of mainframes, information technology might have seemed complicated, but it actually got a heck of a lot more complicated when we moved from the mainframes and started working in a client-server model. Now the users were accessing information remotely instead of sitting at the same machine where the data and logic were actually stored. This caused all sorts of problems-mainly involving how to visually represent data that was stored on larger mainframes to remote clients. Another problem was application-to-application communication. How was one application sitting on one computer going to access data or logic residing on an entirely different computer?

Two problems had to be resolved. One dealt with computer-to-human communications of data and logic; another dealt with application-to-application communications. This is illustrated in Figure 1-1.

image from book
Figure 1-1

The first problem of computer-to-human communication of data and logic was really solved in a large way with the advent of HTML (also known as HyperText Markup Language). This markup language packaged data and logic in a way that allowed users to view it via applications specifically designed to present it (the birth of the browser as we know it). Now with HTML and browser applications in place, end users could work through data and logic remotely without too much of a problem.

With that said, it really isn't all about humans is it? There was also a need for other servers, processes, applications, and whatnot to access and act upon data and logic stored elsewhere on a network or across the planet. This created a pursuit to find the best way of moving this data and logic from point A to point B.

It was a tough task. The varying sources of data were often not compatible with the platform where the data was to be served up. A common way to structure and represent the data was needed. Of course, many solutions were proposed-some of which were pretty exciting.

The idea was to mark up a document in a manner that enabled the document to be understood across working boundaries. Many systems existed to mark up documents so that other applications could easily understand them. Applying markup to a document means adding descriptive text around items contained in the document so that another application or another instance of an application can decipher the contents of the document.

For instance, Microsoft Word provides markup around the contents of document. What markup is really needed? Well, as you type words into Microsoft Word, you are also providing data to be housed in the document. The reason you don't simply use Microsoft Notepad is that Word gives you the extra capability to change the way in which the data is represented. What this really means is that you can apply metadata around the data points contained in the document. For instance, you can specify whether a word, paragraph, or page is bolded, italicized, or underlined. You can specify the size of the text and the color. You can actually alter the data quite a bit. Word takes your instructions and applies a markup language around the data.

Like Word, XML uses markup to provide metadata around data points contained within the document to further define the data element. XML provides such an easy means of creating and presenting markup that it has become the most popular way to apply metadata to data.

In its short lifetime, XML has become the standard for data representation. XML came into its own when the W3C (The World Wide Web Consortium) realized that it needed a markup language to represent data that could be used and consumed regardless of the platform. When XML was created in 1998, it was quickly hailed as the solution for data transfer and data representation across varying systems.

In the past, one way to represent data was to place the data within a comma-, tab-, or pipe-delimited text file. Listing 1-1 shows an example of this:

Listing 1-1: An example of a pipe-delimited data representation

image from book
    Bill|Evjen|Technical Architect|Lipper|10/04/2001|St. Louis, Missouri|3 
image from book

These kinds of data representations are in use today. The individual pieces of data are separated by pipes, commas, tabs, or any other characters. Looking at this collection of items, it is hard to tell what the data represents. You might be able to get a better idea based on the file name, but the meaning of the date and the number 3 is not that evident.

On the other hand, XML relates data in a self-describing manner so that any user, technical or otherwise, can decipher the data. Listing 1-2 shows how the same piece of data is represented using XML.

Listing 1-2: Representing the data in an XML document

image from book
      <?xml version="1.0" encoding="UTF-8" ?>      <Employee>         <FirstName>Bill</FirstName>         <LastName>Evjen</LastName>               <JobTitle>Technical Architect</JobTitle>         <Company>Lipper</Company>         <StartDate>10/04/2001</StartDate>         <WorkLocation>St. Louis, Missouri</WorkLocation>         <NumberOfDependents>3</NumberOfDependents>      </Employee> 
image from book

You can now tell, by just looking at the data in the file, what the data items mean and how they relate to one another. The data is laid out in such a simple format that is quite possible for even a non-technical person to understand the data. You can also have a computer process work with the data in an automatic fashion.

When you look at this XML file, you may notice how similar XML is to HTML. Both markup languages are related, but HTML is used to mark up text for presentation purposes whereas XML is used to mark up text for data representation purposes.

Both XML and HTML have their roots in the Standard Generalized Markup Language (SGML), which was created in 1986. SGML is a complex markup language that was also used for data representation. With the explosion of the Internet, however, the W3C realized that it needed a universal way to represent data that would be easier to use than SGML. That realization brought forth XML.

XML has a distinct advantage over other forms of data representation. The following list represents some of the reasons XML has become as popular as it is today:

  • q XML is easy to understand and read.

  • q A large number of platforms support XML and are able to manage it through an even larger set of tools available for XML data reading, writing, and manipulation.

  • q XML can be used across open standards that are available today

  • q XML allows developers to create their own data definitions and models of representation.

  • q Because a large number of XML tools are available, XML is simpler to use than binary formats when you want to represent complex data structures.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net