Section 23.1. XML 101

23.1. XML 101

XML alone sounds pretty modest. It's often described as a format for storing information. For example, instead of saving data in Word documents (.doc files), Excel spreadsheets (.xls files), or ordinary text files, you can save data in an XML file. This simplicity is deceiving, and two factors make XML really special:

  • XML is flexible . You can tailor XML to store pretty much any type of information: pictures, product catalogs, invoice data, receipts, catalog listings, the maintenance spec's for every Dodge Minivan ever built, and on and on.

  • XML is widespread . Computer applications written in different programming languages (for example Java, Visual Basic, or C++), or running on different operating systems and computer hardware (for example, Windows, Mac, or Linux), can all use XML in exactly the same way. That makes XML a perfect solution for exchanging information between people, companies, and even computers that have been programmed to send data to one another automatically (it's features like this last one that cause supply-chain management types to start drooling when they talk about XML).

23.1.1. What Is XML, Really?

Contrary to what many people believe, XML is not a data format (as is, for example HTML, the format used to create Web pages). If XML were an ordinary data format, it wouldn't be nearly so useful because, no matter how good a format is, it can't suit everyone. For example, even though almost every company needs to create invoices, most companies wouldn't be happy with a generic format for storing invoice information. One company might need to track customer names , while another might track customer ID numbers . The bottom line is that most companies need to store slightly different data in slightly different ways. That means a one- size -fits-all solution is pretty much always doomed to failure.

So if XML isn't a data format, what is it? Technically, XML is a meta-language , which is a fancy way of saying that XML is a language for creating other languages. XML does this by setting out a few simple rules that let you build your own data format that's just right for your data.

For example, Acme Company can build an XML format for invoices and call it AcmeInvoice. Meanwhile, Budget Company can build its own XML invoice format and call it BudgetInvoice. Even though both these formats are designed to store invoice information, they can contain completely different kinds of data. XML's flexibility is its strength.

At the same time, XML's flexibility can create problems. Imagine that a bank named Worldwide Green sets up a system to process XML invoices in a specific format automatically. The system works smoothly until Acme Corporation sends along its own homegrown invoice. Even though Acme's invoice uses XML, it doesn't conform to the XML that the bank expects, and so it gums up the bank's automated invoice-processing application. Suddenly, XML doesn't look so useful.

The bottom line is this: XML holds the promise of universal data sharingbut if you don't create some rules and follow them, you're left with a bunch of incompatible formats.


Note: XML is really quite simple. However, there are a slew of other standards with names like XML Schema and XSLT that work in conjunction with XML and that provide solutions for validating XML, searching XML, transforming XML, and so on. These other standards are quite complex, and aren't discussed in this book. For more information, refer to a book like Learning XML by Erik Ray (O'Reilly, 2003), or the Web site www.w3schools.com/xml.

23.1.2. Three Rules of XML

To get a better understanding of how to configure Excel to handle XML, it helps to look at a simple example. Technically, you don't need to know what XML looks like in order to use the XML features in Excel, but the more you understand, the less confusing life will be. In this section, you'll learn the three most important rules that shape all XML documents. If you already know a little about XML, feel free to skip ahead.

By the way, good news before you even start: XML is written in a text-based, human-readable format. That means you can use a program like Notepad to crack open an existing XML file and get a basic idea of its format and structure. You can even write an XML file from scratch using Notepad. You can't do the same with the average Excel spreadsheet file because they're stored in a binary format that you can read only when you're looking at the file in Excel. (If you open an ordinary Excel file in Notepad, you'll see a jumble of indecipherable symbols.)

23.1.2.1 The prolog

All respectable XML documents start with something called a document prolog . This bit simply announces that what you're looking at is an XML document. It can also indicate the encoding of the document, which sometimes specifies that the document uses a special character set (for example, a non-English alphabet).

Here's a typical document prolog, indicating that this document uses Version 1.0 of the XML standard (the most prevalent version):

 <?xml version="1.0" ?> 

If you're creating an XML document by hand, you should make sure you place the document prolog as the very first line of the file.

23.1.2.2 Elements

The basic building block of any XML document is the element . Elements are information containers. For example, if you wanted to store a person's name, you might create an element called Name . (For more on the infinite variety of elements that anyone can create, see the box on Sidebar 23-SIDEBAR-1">Sidebar 23-1.

A typical element is composed of a start tag and an end tag. The actual information goes between these two tags. You can easily recognize start tags and end tags because they use angle brackets <>. For example, here's one possible start tag:

 <Name> 

This tag marks the start of the Name element. The end tag looks almost identical, except it begins with the characters </ instead of just <. Here's what you need to end the Name element:

 </Name> 

To actually store some information in an XML document, you just insert the content between the start and end tag of an element. For example, here's how you might store someone's name in an XML document:

 <Name>Patrick</Name> 

You could create a list of names by putting one <Name> element after the other, or you could add other elements that store different types of information, like address, title, employer, and so on. You put all these tags together in a file to make an XML document.

UP TO SPEED
A Closer Look at Tags

Tags follow fairly strict naming rules. They can be of any length, are case-sensitive, include any alphanumeric character and hyphens (-), underscores (_), and periods (.). You can't use other special characters, including spaces, and the tag name must start with an underscore or letter. XML documents also support characters from non-English alphabets.

The most important thing you should understand about tags is that it's up to you to create them. If you decide that you need to store a list of names, you might create an XML format that uses a <Name> tag. Meanwhile, someone else might decide to track name information by creating another XML format that uses elements like <firstName> and <lastName>. These two elements might store the same type of information as your <Name> element, but they're different, and a document written with the <firstName> and <lastName> tags isn't compatible with your documents.

Because there are so many possible XML formats, a lot of intelligent people have invested a lot of time and energy in trying to create ways to define and manage different XML formats. Also, companies and organizations have come together to define specific XML standards for different industries. (For example, if you search on the Internet, you'll find predefined XML formats for law, science, real estate, and much more.) When you use an XML document in Excel, you probably won't be responsible for creating it. Instead, you'll be given a document that's already in a specific XML format, and you'll just be interested in retrieving the information that it contains.


23.1.2.3 Nesting

So far, you've seen examples of XML elements that contain text. Another option is to create an element that contains one or more additional elements. This is a basic principle for organizing information in XML.

For example, imagine you want to keep track of the name and age of several people. The following format isn't especially clear because it's hard to tell which person connects to which age:

 <Name>Lisa Chen</Name> <Age>19</Age> <Name>Bill Harrison</Name> <Age>48</Age> 

A better solution is to group the <Name> and <Age> elements together for each person, and put them inside another element. Here's an example:

 <Person>     <Name>Lisa Chen</Name>     <Age>19</Age> </Person>     <Person>     <Name>Bill Harrison</Name>     <Age>48</Age> </Person> 

Here, the two <Person> elements each represent a distinct individual. Information about each person is stored in <Name> and <Age> elements that are nested inside the appropriate <Person> element.

There's no limit to how many layers deep you can nest information, making this method of organizing information extremely flexible. In fact, it's part of the reason that XML can work with so many different types of data.

XML imposes one more rule. Every document must start with a single element that you place right after the document prolog. You place all the other content inside this element, which is called the root or document element . So far, the examples you've seen are only excerpts of XML. The following listing shows a complete, valid XML documenta list with information about two peoplethat starts off with the document element <PeopleList>:

 <?xml version="1.0" ?> <PeopleList>     <Person>         <Name>Lisa Chen</Name>         <Age>19</Age>     </Person>         <Person>         <Name>Bill Harrison</Name>         <Age>48</Age>     </Person> </PeopleList> 

You could enhance this document by adding more <Person> elements or by adding different elements to track additional information about each person.

You've probably noticed that these XML examples indent each level of elements. That indentation makes the overall structure easier to read, but it's not required. In fact, applications that read XML (including Excel) ignore all the white space in between elements, so it doesn't matter if you add spaces, tabs, and blank lines. In fact, as far as computers are concerned , the document above is exactly the same as the following, much less human-friendly version:

 <?xml version="1.0" ?> <PeopleList><Person><Name>Lisa Chen</Name><Age>19</Age></Person><Person><Name>Bill Harrison</Name><Age>48</Age></Person> </PeopleList> 

23.1.3. XML Files and Schemas

As you've already learned, one place you can store XML documents is in a file. But you can just as easily place XML documents in databases or other storage locations. In fact, sometimes XML data isn't stored anywhereinstead, it's just used to send information between applications over the Internet. However, when you use XML with Excel, you're always using XML files (unless your company has created a custom solution using Excel's heavy-duty programming features). Most XML files have the extension .xml. For example, it makes perfect sense to take the person list document shown earlier and place it in a text file named PersonList.xml.


Tip: The Excel XML features make the most sense when you have information that's already in an XML format. In previous versions of Excel, there was no way to import information in XML files, and so you couldn't analyze it with formulas and charts . In Excel 2003, this barrier is gone. However, if you don't have XML information that you need to work with, there's really no reason to play around with Excel's XML features. For example, if your data is in a normal text file or a database, you can use the techniques in Chapter 22 to get at it instead.

There's another type of XML document that's extremely important: XML schemas . Schemas are designed to solve a common problemnamely, defining the rules for a specific XML-based format. For example, a schema indicates the element names you can use, how you can arrange the elements, and the type of information each element can contain. An XML-friendly application like Excel can use the schema to verify that an XML document uses the right structure and contains the appropriate content. In an ideal world, every time a company created an XML format, they'd write an XML schema that defines it. (You probably won't be surprised to learn this doesn't always happen.) Excel doesn't include schemas of its own, so it's up to you to supply the schema based on the XML you want to use.

This book doesn't look at XML schemas in detail because they are a more complicated standard than XML. However, even if you don't know how to build your own schemas, you can still use other people's schemas with Excel. In fact, you'll find that doing so actually simplifies your life! In order to use a schema, you simply need to have a copy of it in a file (schemas themselves are complex and ugly and beyond the scope of what a typical office needsor wantsto learn). Usually, schema files have the extension .xsd.


Note: For a more comprehensive beginner's introduction to XML and XML schemas, check out the excellent online tutorial provided by W3 Schools at www.w3schools.com/xml.


Excel. The Missing Manual
Excel 2010: The Missing Manual
ISBN: 1449382355
EAN: 2147483647
Year: 2003
Pages: 185

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net