19.4. Access and XML
One of the hottest buzzwords in the computer world is XML (the extensible markup language), an all-purpose way of exchanging information between different programs. Access 2007 supports XML with its import and export features, where XML shows up as just one more supported format. However, if you really want to understand how the Access XML features workand whether or not they really add anything newyou need to dig a little deeper.
19.4.1. What Is XML, Really?
XML alone sounds pretty modest. People often describe it as a format for storing information. For example, instead of saving data in Word documents, Excel spreadsheets, or ordinary text files, you can save data in an XML file. This simplicity's deceiving, and two factors make XML really special:
Contrary to what many people believe, XML is not a data format (like HTML, the format used to create Web pages). If XML were an ordinary data format, it wouldn't be nearly as useful because, no matter how good a format is, it can't suit everyone. For example, even though almost every company needs to create invoices, most companies wouldn't be happy with a generic format for storing invoice information. One company may need to track customer names , while another might track customer ID numbers . The bottom line is that most companies need to store slightly different data in slightly different ways. That means a one- size -fits-all solution is pretty much always doomed to failure.
So if XML isn't a data format, what is it? Technically, XML is a meta-language , which is a fancy way of saying that XML is a language for creating other languages. XML does this creating by setting out a few simple rules that let you build your own data format that's just right for your data.
For example, Acme Company can build an XML format for invoices, and call it AcmeInvoice. Meanwhile, Budget Company can build its own XML invoice format and call it BudgetInvoice. Even though both these formats are designed to store invoice information, they can contain completely different kinds of data. XML's flexibility is its strength.
At the same time, XML's flexibility can create problems. Suppose a bank named Worldwide Green sets up a system to automatically process XML invoices in a specific format. The system works smoothly until Acme Corporation sends along its own homegrown invoice. Even though Acme's invoice uses XML, it doesn't conform to the XML that the bank expects, and so it gums up the bank's automated invoice-processing application. Suddenly, XML doesn't look so useful.
The bottom line is: XML holds the promise of universal data sharingbut if you don't create some rules and follow them, then you're left with a bunch of incompatible formats.
Note: XML is really quite simple. However, there are a slew of other standards with names like XML Schema and XSLT that work in conjunction with XML and provide solutions for validating XML, searching XML, transforming XML, and so on. These other standards are quite complex and aren't discussed in this book. For more information, refer to a book like Learning XML by Erik Ray (O'Reilly), or the Web site www.w3schools.com/xml.
19.4.2. Three Rules of XML
To get a better understanding of how to configure Access to handle XML, look at a simple example. Technically, you don't need to know what XML looks like in order to use the XML features in Access, but the more you understand, the less confusing life will be. In this section, you'll learn the three most important rules that shape all XML documents. If you already know a little about XML, feel free to skip ahead.
By the way, good news before you even start: XML is written in a text-based, human-readable format. So you can use a program like Notepad to crack open an existing XML file, and get a basic idea of its format and structure. You can even write an XML file from scratch using Notepad. You can't do the same with the average Access database, because it's stored in a binary format that you can read only when you're looking at the data in Access. (If you try to open a database in Notepad, you'll see a jumble of indecipherable symbols.)
126.96.36.199. The prolog
All respectable XML documents start with something called a document prolog . This bit simply announces that what you're looking at is an XML document. It can also indicate the encoding of the document, which sometimes specifies that the document uses a special character set (like a non-English alphabet).
Here's a typical document prolog, indicating that this document uses Version 1.0 of the XML standard (the most prevalent version):
<?xml version="1.0" ?>
If you're creating an XML document by hand, then you should make sure you place the document prolog as the very first line of the file.
The basic building block of any XML document is the element . Elements are information containers. For example, if you wanted to store a person's name, you could create an element called Name . (For more on the infinite variety of elements that anyone can create, see the box in Section 188.8.131.52.)
A typical element's composed of a start tag and an end tag. The actual information goes between these two tags. You can easily recognize start tags and end tags because they use angle brackets <>. Here's one possible start tag:
This tag marks the start of the Name element. The end tag looks almost identical, except it begins with the characters </ instead of just <. Here's what you need to end the Name element:
To actually store some information in an XML document, you just insert the content between the start and end tag of an element. Here's how you might store someone's name in an XML document:
You could create a list of names by putting one <Name> element after the other, or you could add other elements that store different types of information, like address, title, employer, and so on. You put all these tags together in a file to make an XML document.
So far, you've seen examples of XML elements that contain text. You can also create an element that contains one or more additional elements. This is a basic principle for organizing information in XML.
Suppose you want to keep track of several people's names and ages. The following format isn't especially clear because it's hard to tell which person connects to which age:
<Name>Lisa Chen</Name> <Age>19</Age> <Name>Bill Harrison</Name> <Age>48</Age>
A better solution is to group the <Name> and <Age> elements together for each person, and put them inside another element. Here's an example:
<Person> <Name>Lisa Chen</Name> <Age>19</Age> </Person> <Person> <Name>Bill Harrison</Name> <Age>48</Age> </Person>
Here, the two <Person> elements each represent a distinct individual. Information about each person's stored in <Name> and <Age> elements that are nested inside the appropriate <Person> element.
There's no limit to how many layers deep you can nest information, making this method of organizing information extremely flexible. In fact, it's part of the reason that XML can work with so many different types of data.
XML imposes one more rule. Every document must start with a single element that you place right after the document prolog. You place all the other content inside this element, which is called the root or document element . So far, the examples you've seen are only excerpts of XML. The following listing shows a complete, valid XML documenta list with information about two peoplethat starts off with the document element <PeopleList>:
<?xml version="1.0" ?> <PeopleList> <Person> <Name>Lisa Chen</Name> <Age>19</Age> </Person> <Person> <Name>Bill Harrison</Name> <Age>48</Age> </Person> </PeopleList>
You could enhance this document by adding more <Person> elements or different elements to track additional information about each person.
You've probably noticed that these XML examples indent each level of elements. That indentation makes the overall structure easier to read, but it's not required. In fact, applications that read XML (including Access) ignore all the white space between elements, so it doesn't matter if you add spaces, tabs, and blank lines. In fact, as far as computers are concerned , the document above is exactly the same as the following, much less human-friendly version:
<?xml version="1.0" ?> <PeopleList><Person><Name>Lisa Chen</Name><Age>19</ Age></Person><Person><Name>Bill Harrison</Name><Age>48 </Age></Person></PeopleList>
19.4.3. XML Files and Schemas
As you've already learned, a file's one place you can store XML documents. But you can just as easily place XML documents in databases or other storage locations. In fact, sometimes XML data isn't stored anywhereinstead, people just use it to send information between applications over the Internet. However, when you use XML with Access, you're always using XML files (unless your company has created a custom solution using the heavy-duty programming features in Access).
Most XML files have the extension .xml. For example, it makes perfect sense to take the person list document shown earlier and place it in a text file named PersonList.xml.
Another type of XML document's extremely important: XML schemas . Schemas are designed to solve a common problemnamely, defining the rules for a specific XML-based format. For example, a schema indicates the element names you can use, how you can arrange the elements, and the type of information each element can contain. An XML-friendly application can use the schema to verify that an XML document uses the right structure and contains the appropriate content. In an ideal world, every time a company created an XML format, they'd write an XML schema that defines it. (You probably won't be surprised to learn this doesn't always happen.)
In order to use a schema, you simply need to have a copy of it in a file. (Schemas themselves are complex and ugly and beyond the scope of what a typical office needsor wantsto learn.) Usually, schema files have the extension .xsd.
Note: For a more comprehensive beginner's introduction to XML and XML schemas, check out the excellent online tutorial provided by W3 Schools at www.w3schools.com/xml.
19.4.4. The Access XML Story
XML is a great way to exchange data between different computer programs. But what does that have to do with Access, which already has its own perfectly good way of storing data? Here's the deal: More and more companies today use XML to pass data back and forth. When companies exchange business orders, for instance, or news organizations post stories, or real estate firms list properties for sale, chances are they're using an XML-based format. If you want to send your Access data to these systems, then you need a way to take it out of the specialized .accdb database format, and put it in clear-as-a-bell XML.
Unfortunately, the XML support in Access is still quite limited. The problem's that Access doesn't let you pick the XML format you want. Instead, it creates a custom format that closely matches your table. Consider the table in Figure 19-11. (When exporting XML, you always export a complete table.)
When you export this table, Access creates an XML document that looks like this:
<dataroot> <Product> <ProductID>371</ProductID> <Name>Thin-Jam Hex Nut 7</Name> <ProductNumber>HJ-7161</ProductNumber> <SafetyStockLevel>1000</SafetyStockLevel> <ReorderPoint>750</ReorderPoint> </Product> <Product> <ProductID>372</ProductID> <Name>Thin-Jam Hex Nut 8</Name> <ProductNumber>HJ-7162</ProductNumber> <SafetyStockLevel>1000</SafetyStockLevel> <ReorderPoint>750</ReorderPoint> </Product> </dataroot>
No matter what table you export, Access always follows the same rules:
There's nothing particularly wrong with structuring XML in this way. However, since you can't change the structure, you'll run into trouble if you want to use another program that expects XML in a different format. For example, your program may expect the root element to be named <ProductRecords> instead of <dataroot>, or it may assume a slightly different nesting. Minor quibbles like these can completely derail an XML-processing application.
Sadly, there's no way around this problem. In order to use Access XML, you must specifically design a program that recognizes this structure, or you must use another tool to convert the XML to the standard you really want. Access's XML export feature's enough to get you started, but it doesn't take your data all the way.
Note: If all you need to do is filter out records or fields that don't interest you, or give fields different names, then you can solve the problem with a query. Just create a query that presents the information the way you want it, and then export its results (rather than the whole table).
The same limitations appear when you import XML content. Access expects to find XML content in the rigid table-like format it expects. If you try to feed it a different type of XML, then you get an error.
19.4.5. Exporting to an XML File
Now that you've learned about XML and considered its limitations in Access, you're ready to try it out for yourself. The following steps lead you through the process:
19.4.6. Importing from an XML File
Access makes it just as easy to import XML data, provided it's in the structure Access expects. To try it out, take the table you just exported, and then reimport it into a new database. Here's how to do it: