Getting Started with Types

We hope by now you are convinced of the value of a static type system. Now that we know what a static type system can do, we will describe how it works. The following sections describe how the type system assigns a type to different kinds of expressions. The expressions described include literals and operators, variables , function calls, conditionals, paths, FLWORs, and element constructors. We begin with a description of XML Schema, XQuery types, and XQuery values, and then explain how they all relate.

XML Schema and XQuery Types

The type system of XQuery is based on XML Schema, so to understand types in XQuery, we must first understand XML Schema. XML Schema has so many features, however, that we can only describe its basic features: simple types and complex types, anonymous types, global and local declarations, and derivation by restriction.

As we describe XML Schema's features, we show how they are represented in the type notation used in the XQuery formal semantics [XQ-FS] so that you can understand the type errors that may arise. The formal type notation is simpler and more concise than the XML Schema notation, in part because it serves a different purpose. XML Schema is designed for a user 's convenience and provides many alternatives for modeling data and documents, whereas the XQuery type notation is designed for the type system's convenience and provides orthogonal (i.e., nonredundant) features for constructing types. Allowing only one way to construct a particular type, makes it easier for the type system to compare types, which it must do when type-checking a query.

You do not need to learn all of the formal type notation, but you do need to understand that it exists. Later we describe sequence types, the subset of the XQuery type notation for referring to types in XQuery expressions. Chapter 5 describes the relationship between XML Schema and XQuery's type system in more detail.

In XML Schema, element declarations may be global , at the top-level, or local , nested within a type declaration. The XML Schema for our auction data in Listing 4.2 contains both global element declarations (e.g., auction , users ) and local element declarations (e.g., name , first , last ). Local elements with the same name may have different content models. For example, the name element in element user is a complex type containing first and last elements, while the name element in article contains a string.

In the formal type notation, the globally declared elements user and rating are represented by the following declarations:

 define element user of type User define element rating of type xs:string

An element declaration associates an element name with a type. A type is either a predefined or a user-defined type. The predefined types include the forty-four built-in datatypes defined in [SCHEMA-2] (e.g., xs:string , xs:integer , etc.) plus the five types: xdt:yearMonth Duration , xdt:dayTimeDuration , xdt:untypedAtomic , xs:anySimpleType , and xs:anyType . ^[1] A user-defined type is any type defined in an imported XML Schema document.

^[1] The xdt namespace is used for special data types required by XPath and XQuery but not defined in XML Schema.

In XML Schema, a type may be globally named or may be specified in an element declaration without a name. An unnamed type is called an anonymous type . Our auction schema has four named types ( User , Article , Freeform , and Bid ), and four anonymous types associated with the local elements User/name , Article/seller , Bid/ userid , and Bid/articleno .

For example, here is how declaration of the type User is expressed in the formal type notation:

 define type User {    attribute id of type xs:ID ,    element name of type AnonymousType1 ,   element rating ? } define type AnonymousType1 {    element first of type xs:string ? ,    element last of type xs:string  }

The translation from XML Schema into the formal type notation invents a unique name for each anonymous type; thus, every element has a named type.

A complex type declaration associates a name with a content model. The name User , for example, is associated with the content model containing one id attribute and one locally declared name element followed by an optional global rating element. The name AnonymousType1 is associated with the content model containing an optional first element followed by one last element. Types can be combined with the operators for sequence ( , ) and union ( ), in addition to the occurrence indicators zero or one ( ? ), one or more ( + ), or zero or more ( * ).

In XML Schema, a simple type declaration associates a name with an atomic type, a list type, or a union type; a simple type may also specify constraining facets. The atomic types include the forty four built-in datatypes defined in [SCHEMA-2]. Here is an example of a list type.

 <xs:simpleType name="IntegerList">     <xs:list itemType="xs:integer"/> </xs:simpleType>

In the formal type notation, list types are formed using the occurrence indicators ? , + , and * . Here is the declaration corresponding to the simple type above:

 define type IntegerList { xs:integer + }

Union types are formed using the choice operator . The XQuery type system does not represent constraining facets, because facets typically constrain values and therefore cannot be checked or enforced statically.

New simple and complex types may be derived by restriction. Listing 4.7 shows a type named NewUser that restricts the type User by requiring that the first and rating elements be present. The element newuser has type NewUser .

Listing 4.7 The `NewUser` Type Derived by Restriction of the Type `User`

 <xs:complexType name="NewUser">   <xs:complexContent>     <xs:restriction base="User">       <xs:sequence>         <xs:element name="name">           <xs:complexType>             <xs:element name="first" type="xs:string"/>             <xs:element name="last" type="xs:string"/>           </xs:complexType>         </xs:element>         <xs:element name="rating" type="xs:string"/>      </xs:sequence>      <xs:attribute name="id" type="xs:ID" use="required"/>   </xs:complexType> </xs:complexType> <xs:element name="newuser" type="NewUser"/>

Listing 4.8 shows the corresponding type declarations in the formal type notation.

Listing 4.8 Type Declarations for `NewUser` in the Formal Type Notation

 define type NewUser restricts User {   attribute id of type xs:ID ,    element name of type AnonymousType2 ,   element rating } define type AnonymousType2 {    element first of type xs:string ,    element last of type xs:string  } define element newuser of type NewUser

Derivation by restriction declares a relationship between two types. This relationship depends on both names and content models, in the sense that one name may be derived by restriction from another name only if every value that matches the content model of the first also matches the content model of the second. When one type is derived from another by restriction, it is fine to pass the restricted type where the base type is expected. For example, here is a function that takes as its argument an element with any name and with type User .

 define function user-name($user as element(*,User)) as xs:string {...}

This function can be applied to either a user or newuser element.

There is a type xs:anyType from which all other types are derived. If a type definition does not specify otherwise , it is considered a restriction of xs:anyType . There is also a type xs:anySimpleType that is derived from xs:anyType and from which all other simple types are derived. XML Schema also supports derivation by extension, which we do not discuss here (see the XQuery formal semantics [XQ-FS] and Chapter 5 for details).

Values

Every XQuery expression evaluates to a value in the XQuery data model. Every value in XQuery is a sequence of individual items . Sequences are central to XQuery's semantics ”so much so that one item and a sequence containing that item are indistinguishable. An item is either an atomic value or a node. We have seen many examples of atomic values, including strings and integers. A node is either an element, attribute, document, text, comment, or processing instruction. ^[2]

^[2] Although namespace nodes exist in the XQuery data model, they are not really values; instead, they are used to define the scope of namespace prefixes. They do not contribute to an expression's type, so we do not discuss them here.

Every atomic value, element node, and attribute node has an associated XQuery type. The process of XML Schema validation specifies how elements and attributes are labeled with types. Input data is validated, and elements and attributes created while processing a query may also be validated . Atomic values are labeled with one of the twenty-three XML Schema primitive types (e.g., xs:string , xs:integer , and xs:decimal ). Element and attribute nodes are labeled with any of the forty-nine pre defined types listed earlier, and any type specified in an XML Schema used for validation.

Here is a document fragment of the input document in Listing 4.1.

 <user id="U02">   <name><first>Mary</first><last>Doe</last></name>   <rating>A</rating> </user>

After validation, this document fragment is represented by the following XQuery values. We use this notation to illustrate the labeling of elements and attributes ”it is not a valid XQuery expression, because XQuery does not support the of type notation in element and attribute constructors.

 element user of type User {   attribute id of type xs:ID { "U02" } ,   element name of type AnonymousType1 {      element first of type xs:string { "Mary" } ,      element last of type xs:string { "Doe" } ,    } ,   element rating of type xs:string { "A" } }

The element user is labeled with type User , and the attribute id is labeled with type xs:ID , etc. We can use the fn:data function to extract the typed data. If we apply fn:data to the first element above, we get the value xs:string("Mary") , which is an atomic value labeled with the type xs:string .

Sequence Types

XQuery provides a notation for referring to types in queries, called a sequence type , that is a subset of the formal type notation. Sequence types can be used in several XQuery constructs, including function declarations, variable declarations in let and for expressions, and in typeswitch , instance of , and treat as expressions. Table 4.1 provides some examples of sequence types.

A sequence type may contain an optional context for specifying a local element; if no context is given, then the element is global. Some examples are shown in Table 4.2.

Table 4.1. Examples of Sequence Types and What They Match

Sequence Type	What It Matches
`xs:decimal`	Atomic value of any type derived from `xs:decimal`
`empty()`	The empty sequence
`item()`	Any item
`node()`	Any node
`document-node()`	Any document node
`text()`	Any text node
`element()`	Any element
`element(x:td)`	Element `x:td` as defined in the imported XHTML Schema
`element(*,x:Block.type)`	Any element of type `x:Block.type`
`element(x:td,x:Block.type)`	Element with name `x:td` of type `x:Block.type`
`attribute()`	Any attribute
`attribute(@href)`	Attribute `href` as defined in the imported XHTML Schema
`attribute(@*,xs:anyURI)`	Any attribute of type `xs:anyURI`
`attribute(@href,xs:anyURI)`	Attribute with name `href` of type `xs:anyURI`
`comment()`	Any comment node
`processing-instruction()`	Any processing-instruction node

Table 4.2. Sequence Types for Local Elements

Sequence Type	What It Matches
`element(article)`	Global element `article`
`element(article/name)`	Local element `name` in global element `article`
`element(user)`	Global element `user`
`element(type(User)/context)`	Local element `name` in global type `User`
`element(type(User)/name/first)`	Local element `first` in element `name` in global type `User`
`element(type(User)/name/last)`	Local element `last` in element `name` in global type `User`

Any sequence type (other than empty ) may be followed by one of the occurrence indicators zero or one ( ? ), one or more ( + ), or zero or more ( * ). Here are some examples:

`xs:decimal?`	Matches the empty sequence or a decimal value
`element()+`	Matches any sequence of one or more elements
`item()*`	Matches any sequence

We saw earlier that the formal type notation permits us to combine types with the type operators for sequence ( , ) and union ( ), in addition to the occurrence indicators ( ? , + , and * ). This additional expressiveness is necessary to enable us to assign accurate types to expressions. Here are some examples:

 (1, "two", 3.14e0) has type (xs:integer,xs:string,xs:double) if ($x<$y) then "three" else 4.00 has type (xs:string  xs:decimal)

Even though sequence types do not include the types above, they might appear in an error message. We'll see more examples of such types in the next section.

Schema Import

A sequence type may refer to an element, attribute, or type defined in an XML Schema, if that schema has been imported in the prolog. The earlier section entitled "Validation" presented an example (the first two revised lines of Listing 4.4) with two schema imports.

 [1] import schema default element namespace =                      "http://www.example.com/auction" at "auction.xsd" [2] import schema namespace x =                           "http://www.w3.org/1999/xhtml" at "xhtml.xsd"

This imports two schemas defining two namespaces (as named by the URLs after the equal signs), defined by resources at the specified locations (as given by the at clauses). Just as in other uses of XML Schema, the location is a hint: The XQuery processor may use it to locate the schema to import, or it may find the schema to import by some other means. The namespace URL is required if the schema has a namespace; the location is optional unless the schema has no namespace. If both a namespace and a location are given, it is an error if the schema at the given location does not have as its target the given namespace.

The first import specifies that the first namespace is the default, and the second import binds the prefix x to the second namespace. Any reference to an element or type with no prefix refers to the default namespace. Thus, for example, article refers to an element in the first namespace, and x:h1 refers to an element in the second.

Relating Values and Types

Types are used for two related but distinct purposes in XQuery, connected with two phases of executing a query: analysis time (when static type analysis is performed) and evaluation time (when the query is evaluated to yield an answer).

At evaluation time, the value of a query depends upon the types that label values. For example, when comparing two values, the comparison operator used depends on whether the two values are labeled as strings or numbers ; if one is labeled as a string and the other as a number, it is a type error. Arithmetic operations also depend upon these labels: The result of adding two integers is an integer, the result of adding two decimals is a decimal, and the result of adding an integer and a decimal is a decimal. It is also possible to explicitly refer to type labels , as in the following example:

 //element(*,x:Block.type)

This returns all elements in an XHTML document labeled with type x:Block.type .

At analysis time, each expression is labeled with a type. XQuery queries are compositional, in that each query expression is built from smaller subexpressions. Just as evaluation combines the value of subexpressions to compute the value of the expression as a whole, static typing combines the types assigned to subexpressions to compute the type of the expression as a whole. In other words, both evaluation and type assignment proceed bottom up.

Evaluation-time types and analysis-time types are closely related. The evaluation-time type labeling a value must conform to the analysis-time type assigned to the expression that yields that value. For instance, if an expression is assigned the type xs:decimal , the possible values of that expression may be labeled with type xs:decimal or xs:integer or any other type derived from xs:decimal .

The formal semantics of XQuery, introduced in Chapter 5, list the rules for how to compute the value of an expression from the values of its subexpressions and how to compute the type assigned to an expression from the types of its subexpressions. In the remaining sections of this chapter, we explain by example the rules for computing the type of an expression.