An XML document contains only text data. Therefore, an XML document can be easily created or edited using any simple text editor (although you can also purchase specialized XML editors or use the XML editor built in to Visual Studio .NET). An XML document does not contain any predefined tags, but XML adopts a strict syntax for defining data in the XML document. This section discusses the various rules you should follow while creating an XML document.
Well- Formed XML Document If an XML document conforms to the XML syntax, it is known as a well-formed XML document.
Here's a concrete example to start with. This XML file represents data for two customers:
<?xml version="1.0" encoding="UTF-8"?> <!-- Customer list for Bob's Tractor Parts --> <Customers> <Customer CustomerNumber="1"> <CustomerName>Lambert Tractor Works </CustomerName> <CustomerCity>Millbank</CustomerCity> <CustomerState>WA</CustomerState> </Customer> <Customer CustomerNumber="2"> <CustomerName><![CDATA[Joe's Garage]]> </CustomerName> <CustomerCity>Doppel</CustomerCity> <CustomerState>OR</CustomerState> </Customer> </Customers>
The first thing that you find in an XML file is the XML declaration :
<?xml version="1.0" encoding="UTF-8"?>
The declaration tells you three things about this document:
It's an XML document.
It conforms to the XML 1.0 specification of W3C.
It uses the UTF-8 character set (a standard set of characters for the Western alphabet).
The XML declaration is optional. However, the W3C XML recommendation suggests that you include the XML declaration. The XML declaration can help identify the XML version and the character encoding used in an XML document. Lines that begin with a <? string and end with a ?> string are called processing instructions (PI). Processing instructions are used to provide information to the application processing an XML document on how to process the XML document.
An XML document can contain comments. Comments are set off by the opening string <!-- and the closing string --> . Here's an example:
<!-- Customer list for Bob's Tractor Parts -->
Even without knowing anything about XML, you can see some things just by looking at the sample document. In particular, XML consists of tags (that is, markup, which are contained within angle brackets) and character data. The character data is the information stored in the XML files and the tags (markup) record the structure of the XML file. XML tags usually describe the data contained by them rather than describing format or layout information.
Tags appear in pairs, with each opening tag matched by a closing tag. The closing tag has the same text as the opening tag, prefixed with a forward slash. For example, if <CustomerCity> is the opening tag, </CustomerCity> is the closing tag.
An opening tag together with a closing tag and the content between them define an element . Tags in an XML document contain the name of the element. For example, here's a single element from the sample document:
This defines an element whose name is CustomerState and whose data is OR .
Elements can sometimes contain no content. Such elements are called empty elements . For example, here's an element with no content:
This defines an element whose name is Citizen .
Empty elements can also be defined using a shorter syntax; for example,
In this case, you do not provide the closing tag, but you end the opening tag with /> .
Elements can be nested, but they cannot overlap. So the following is legal XML, defining an element named Customer that has three child elements:
<Customer CustomerNumber="1"> <CustomerName>Lambert Tractor Works</CustomerName> <CustomerCity>Millbank</CustomerCity> <CustomerState>WA</CustomerState> </Customer>
But the following is not legal XML because the CustomerCity and CustomerState elements overlap:
<Customer CustomerNumber="1"> <CustomerName>Lambert Tractor Works</CustomerName> <CustomerCity>Millbank<CustomerState> </CustomerCity>WA</CustomerState> </Customer>
Nested elements have a logical parent-child relationship. So, in the preceding example, <Customer> element is the parent of <CustomerName> element. A parent element can contain any number of child elements, but a child element belongs to only one parent element. Every element in an XML document is a child element except one element. This is the root element in the XML document that contains all other child elements. The root element in the sample document is named Customers .
The effect of the rules that nesting is okay, overlapping is not okay, and there is a single root element is that any XML document can be represented as a tree of nodes. Figure B.1 illustrates the tree-like structure of the sample document.
Elements can contain attributes. An attribute is a piece of data that further describes an element. Attributes appear in a name-value pair in the opening tag of an element. For example, the sample document includes this opening tag for an element:
This declares an element named Customer . The Customer element includes an attribute whose name is CustomerNumber and whose value is 1 .
The value of the attribute should always be enclosed in either single or double quotation marks.
If you're familiar with HTML, you know that in HTML, some elements have names dictated by the HTML specification. For example, the <H1> tag specifies a first-level heading. XML takes a different approach. You can make up any name you like for an XML element or attribute, subject to some simple naming rules:
A name can contain any alphanumeric character.
A name can contain underscores, hyphens, and periods.
A name must not contain any whitespace.
A name must start with a letter or an underscore .
The names of the XML elements and attributes can be in uppercase, lowercase, or both, but are case sensitive. For example, the following is not legal XML because the opening and closing tags of the element do not match:
Some characters have special meaning for the programs that process XML. For example, the opening angle bracket indicates the beginning of a tag. Quotation marks enclose the attributes, and so on. But sometimes you might also like to use these characters as part of the XML data.
XML offers two ways to deal with special characters in data. First, for individual characters, you can use entity references. Five entity references are defined in the XML standard:
< ‚ Translates to < (opening angle bracket).
> ‚ Translates to > (closing angle bracket).
& ‚ Translates to & (ampersand).
' ‚ Translates to ' (apostrophe).
" ‚ Translates to " (quotation mark).
You can also use a CDATA section to hold any arbitrary data, whether the data contains special characters or not. The sample document uses this approach to store a customer name containing an apostrophe as shown here:
An XML document can contain one or more namespace declarations. The sample document does not declare a namespace. Here's an example of a namespace declaration:
The namespace is declared as part of the root tag for the document. In this particular case, the namespace (introduced with the special xmlns characters) defines the prefix tr for tags within the namespace. The urn (uniform resource name) is an arbitrary string whose purpose is to distinguish this namespace from other namespaces.
XML namespaces serve the same purpose as .NET namespaces: They help cut down on naming collisions. After declaring the tr namespace, an XML document could use a tag such as this:
This example indicates that the CustomerState tag is from the tr namespace and should not be confused with any other CustomerState tag.