Section 21.1. Built-in datatypes | XML in Office 2003: Information Sharing with Desktop XML


Prev	don't be afraid of buying books	Next

21.1. Built-in datatypes

The XML Schema Datatypes spec defines two categories of datatype: primitive and derived. All primitive datatypes are defined in the spec and are therefore among the built-in datatypes. You may not create your own primitive datatype.

A derived datatype is defined in terms of one or more existing datatypes. It might be a specialized or extended version of another datatype. You may make your own derived datatypes. The spec also includes several among its built-in datatypes, which are therefore supported by every implementation.

Figure 21-1 illustrates the built-in datatypes, showing the derivations.

Figure 21-1. Built-in datatypes

21.1.1 Primitive datatypes

The primitive datatypes are the building blocks of all others. Most are also useful just by themselves.

21.1.1.1 Common programming datatypes

The first five we will cover include common datatypes in most programming languages and database management systems.

string

An arbitrarily long sequence of characters. You might use this for a title element's datatype.

boolean

True and false values. The data may be any one of the following strings: false or 0 for false values or true or 1 for true values. You might use this for a datatype that represented whether a checkbox in a graphical user interface should be turned on by default (true) or turned off by default (false).

decimal

Arbitrary precision decimal numbers. These can represent anything numerical: height, weight, financial amounts, etc. Decimal numbers may have a fractional part that follows a period as in 5.3. Decimal numbers may be preceded by a plus (+) or minus (-) sign to represent whether they are negative or positive.

float

Single precision 32-bit floating point numbers. This is a form of number that is very efficient for numerical computation on 32-bit computers. Floats may have a fractional part and may also be preceded by a plus or minus sign. Compared to decimals, floats are less precise. They are an approximation. Sometimes the number you put into the computer will not be exactly the number you get out later!^[1]

^[1] If you do use them, they may have exponents (preceded by "e" or "E", as in "2.5E5") and may take the values "NaN", "INF" and "-INF".

double

Double precision 64-bit floating point numbers. These numbers are more precise than single-precision floats but are still approximations.^[2]

^[2] And the same footnote applies!

21.1.1.2 XML datatypes

anyURI

This extremely important primitive datatype is used for URIs (typically URLs). These may include so-called fragment identifiers after a pound sign.

QName

This primitive datatype is based on the XML namespaces specification. It stands for a namespace-qualified name. In other words, a QName is a name that may have a colon in it. If it does, the text before the colon should be a namespace prefix for a namespace that has been declared. If it does not, the names should be interpreted as belonging to the default namespace. See Chapter 16, "Namespaces", on page 376 for more information on namespaces.

NOTATION

This datatype corresponds to the XML NOTATION attribute type.

21.1.1.3 Binary datatypes

There are two datatypes for representing binary data. The first is called hexBinary because it uses the hex notation, popular among practitioners of the occult and programmers of the UNIX operating system. There are sixteen hex digits.^[3] The digits 0 through 9 represent the same thing they do in ordinary decimal numbers. The letter a represents 10, b represents 11 and so forth through f representing 15. Hex is popular not only because of its mystical powers but also because a few simple calculations can turn two hex digits into a value between 0 and 255, which is exactly the amount that can be held in a single byte.

^[3] In fact, hex is short for hexadecimal, which is the base-16 numbering system.

The other datatype is called base64Binary because it represents base64-encoded data. Base64 is not as simple to translate into bytes without a computer but uses less space than hex. Hex doubles the size of the data. Base64 increases it by only a third, approximately.

21.1.1.4 Durations

The remaining primitive datatypes are all related to time: durations in this section and absolute dates and times in the next.

The duration datatype is based on the ISO 8601 date format standard. It represents a length of time, such as an hour or 3.56 seconds. Durations can be represented with a precision of seconds or fractions of a second. They can also span thousands of years.

A duration always starts with P; for example, P2Y. The 2Y that follows the P means two years (the gestation period for some elephants!).

Instead of measuring in years, you could measure in months: P3M would be three months (the amount of time it takes to get a new phone line installed in some cities).

You can also measure in years and months: P1Y3M means one year and three months (the shelf-life of a boy band). You can also count in days (with or without years or months): P3D (length of the Battle of Gettysburg).

But wait...that's not all. Durations can get as fine grained as a fraction of a second. If you want to deal in time units shorter than a day you use the designator T after you've listed all of the day/month/year parts (or directly after the P if you are interested only in hour/minute/second units).

For instance P1YT1M represents one year and one minute (which is the amount of time that a VCR with a P1Y warranty is likely to survive). As you can see, M represents both month and minute, which is why you must include the T designator that separates the date part from the time part. If you have an hour, a minute or a second part you must have the T.

In addition to minutes you may use H for hour and S for second. S is the only part of the duration that may have a fractional part. PT3M43.13S represents three minutes and 43.13 seconds (the world record for running a


	Amazon