Gaining an XML Vocabulary

SQL has formal grammar rules that depend on the structure of relational databases. XML has a hierarchical structure based on Standard Generalized Markup Language (SGML) for formatting text documents. XML-related "programming" languages, which manipulate XML documents, have a very complex grammar. Before you can begin to interpret not to mention write and manipulate XML documents, you need to know a few basic XML terms.

Note

The quotes around "programming" in the preceding paragraph are to contrast procedural code that you write in VBA or other familiar programming languages with XML code that manipulates XML documents. XML document-manipulation code, written in Extensible Stylesheet Language Transformations (XSLT), deals with the entire document as a single "chunk." An example near the end of this chapter illustrates the use of VBScript code to process XML data documents with XSLT.


Following are brief definitions of XML-related terms used in this and the following two chapters; some definitions include simple XML examples:

  • XML document Any document that follows all XML syntax rules is a well-formed document. An XML document must have a least one pair of tags that define the document root (<root>...</ root>) and usually has an XML header. The tags can have any name, but the case must match; unlike HTML, XML is case-sensitive. In this and the following chapters, an XML document is assumed to be a document containing Jet 4.0 or SQL Server 2000 data, with or without an embedded XML schema. XML data documents usually carry an .xml extension; by default, Access names exported data files TableName.xml or QueryName.xml.

  • Element An element is a unit of an XML document that's enclosed between a pair of tags, as in <tag> element</tag>. Elements can be and almost always are nested within other elements to form a hierarchical document structure.

  • Attribute An attribute is a name=" value" pair that follows the first tag name of an element, as in <tag attribName="attribValue">...</tag>. Attributes usually represent properties of an element.

  • XML header The <?xml version="1.0" ?> header technically is optional, but all XML documents should include the header, which usually specifies the encoding method, as in <?xml version="1.0" encoding="UTF-8" ?>. UTF-8 is an abbreviation for Universal Character Set Transformation Format 8-bit, a transformation of 16-bit Unicode that's supported by most Web browsers.

  • Well-formed A well-formed document is one that an XML parsing tool, such as the MSXML parser included with IE 5+, can display without reporting syntax errors (for instance, <ROOT>...</root>). Well-formed isn't the same as validated.

    XML

    Following is an example of the simplest well-formed XML data document that conveys some information:

     <?xml version="1.0" encoding="UTF-8" ?> <dataroot>    This is data. </dataroot> 

    If you type the preceding text in Notepad and save the file as Simple.xml, you can open it in IE 5+. IE's XML parser color codes XML elements: The first line is blue, tags are brown, and values (This is data.) are black and bold face.

  • Validated The original definition of a validated XML document is one that conforms to a predefined Document Type Definition (DTD), which is a holdover from SGML, on which XML (and HTML) is based. DTDs use an arcane syntax and are very difficult to write. An XML schema is more appropriate than a DTD for data documents, but is at least equally difficult to compose. Fortunately, Access generates the schemas for the XML documents you export. All XML document examples in this and the following chapters are well-formed and many are validated against XML schemas during import processing.

  • XML schema Schemas are metadata (data about data) that describe the structure of a table, query result set, or database. The content of Access's Relationships window is an example of a partial schema for a database. The schema is partial because field data types of the tables are missing. XML schemas define the structure of XML documents and the types of data they contain. For data documents, schemas include field data type definitions and, if a query returns data from more than one table, a description of the relationship between the tables. Access names exported schema files TableName.xsd or QueryName.xsd.

    Note

    graphics/globe.gif

    Schemas exported by Access 2003 conform to the W3C's final recommendation of May 2, 2001, for XML Schema 1.0 (http://www.w3.org/TR/xmlschema-0/ for Part 0: Primer). The accepted file extension for schemas conforming to this to the recommendation is .xsd.

    Early SQL Server 2000 XML features and ActiveX Data Objects (ADO) 2.1+ use a Microsoft-designed schema called XML Data Reduced (XDR). ADO Recordsets saved to .xml files incorporate XDR schemas and use the attribute-centric format. When saved as schema files, the accepted file extension is .xdr.


  • XML namespace XML namespaces associate element and attribute names with a unique identifier to avoid tag-name ambiguity. The namespace attribute (xmlns) usually but not necessarily has a unique Uniform Resource Identifier (URI) as its value. There's a recent recommendation for XML Namespace attribute names and values at http://www.w3.org/TR/1999/REC-xml-names-19990114/.

    XML

    The following header and namespace declaration appears at the beginning of each XML data document/schema pair you export from an Access 2003 table or query:

     <?xml version="1.0" encoding="UTF-8" ?> <dataroot xmlns:od="urn:schemas-microsoft-com:officedata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=" TableOrQueryName.xsd"> 

    Namespaces for the entire document are declared as attributes of the document root. Access doesn't use the od (officedata) element type when exporting XML data documents; od appears in schemas. The xmlns:xsi=... line specifies a URI to indicate that the document has an associated schema. The xsi:noNamespaceSchemaLocation= line asserts that the document's data elements are defined by the specified XML schema file: TableOrQueryName.xsd. If the location doesn't have a path or an http://... URL, the .xsd file must be in the same folder as the .xml file.

  • Element-centric XML documents that contain a single value, such as a number or a block of text, within an element are called element-centric. XML data documents exported by Access are element-centric. Element-centric XML typically stores table data in <row> elements with <column> value</ column> subelements.

    XML

    The shortest XML document that you can generate by exporting an Access object is the Shippers.xml file:

     <?xml version="1.0" encoding="UTF-8" ?> <dataroot xmlns:od="urn:schemas-microsoft-com:officedata"     xmlns:xsi="http://www.w3.org/2001XMLSchema-instance"     xsi:noNamespaceSchemaLocation="Shippers.xsd">     <Shippers>         <ShipperID>1</ShipperID>         <CompanyName>Speedy Express</CompanyName>         <Phone>(503) 555-9831</Phone>    </Shippers>    <Shippers>        <ShipperID>2</ShipperID>        <CompanyName>United Package</CompanyName>        <Phone>(503) 555-3199</Phone>    </Shippers>    <Shippers>        <ShipperID>3</ShipperID>        <CompanyName>Federal Shipping</CompanyName>        <Phone>(503) 555-9931</Phone>    </Shippers>  </dataroot> 

    Each sub-element (child) of the Shippers (parent) element consists of a single-valued piece of data that represents field values from the Shippers table.

  • Attribute-centric XML documents with multi-valued elements are called attribute-centric. For data documents, the attribute name usually is the field or column name, and the value is a text representation of the field value.

    XML

    Following is an edited version of the attribute-centric XML document for the Shippers table, which is created by saving an ADO Recordset in an XDR-related format:

    [View full width]

    <xml xmlns:rs='urn:schemas-microsoft-com:rowset' xmlns:z='#RowsetSchema'> <rs:data> <z:row ShipperID='1' CompanyName='Speedy Express' graphics/ccc.gifPhone='(503) 555-9831'/> <z:row ShipperID='2' CompanyName='United Package' graphics/ccc.gifPhone='(503) 555-3199'/> <z:row ShipperID='3' CompanyName='Federal Shipping' graphics/ccc.gifPhone='(503) 555-9931'/> </rs:data> </xml>

    Attribute-centric XML can hold the data for an entire row of a table or query result set in a single element. In this case, a set of z:row elements nest within an single rs:data element for the entire Recordset. The attribute-value pairs are FieldName=' value'. Single- or double-quotes must enclose text values. (XML documents created by saving ADO Recordsets in XML format include the schema as a separate set of elements. The schema elements are removed from the preceding XML code.)

  • XML style sheets Extensible Stylesheet Language (XSL) can serve two purposes: defining the presentation of an XML document and transforming one XML document into another XML document with a different structure or into HTML. The most common use of XML style sheets is transforming XML data documents into HTML. XSLT is the language in which you or more likely others at this point write XML style sheets. Style sheets are stored in files that carry an .xsl extension (.xslt also is used).

Note

graphics/globe.gif

The October 2001 W3C recommendation for Extensible Stylesheet Language (http://www.w3.org/TR/xsl/) was current when this book was written. XSLT 1.0's finale recommendation of November 1999 is at http://www.w3.org/TR/xslt.html.


XSLT isn't limited to manipulating XML data documents. The only requirement is that the data processed by XSLT be structured as a tree of tagged nodes, starting with a root node, progressing through sub-nodes, and terminating in leaf nodes. A leaf node contains only text, although the text can comprise hundreds or thousands of lines of, for instance, VBScript code.



Special Edition Using Microsoft Office Access 2003
Special Edition Using Microsoft Office Access 2003
ISBN: 0789729520
EAN: 2147483647
Year: 2005
Pages: 417

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net