Watershed Issues

Watershed Issues

A large fraction of the total effort invested in the design of XQuery by the working group has been focused on a relatively small number of "watershed issues." These are issues whose resolution had a significant global impact on the language by influencing a number of related decisions. All of these issues are complex, and some of them were quite contentious. Eight of these watershed issues are discussed here; they are listed below.

  1. Handling of untyped data . This issue deals with the kinds of operations that can be applied to data whose type is not known.

  2. Unknown and inapplicable data . This issue deals with how unknown values can be encoded in XML and how various operators should be defined on these values.

  3. What is a type ? This issue deals with how a datatype is specified in XQuery.

  4. Element constructors . This issue deals with the details of how to construct a new XML element by using a query expression.

  5. Static typing . This issue deals with the kinds of type-checking that can be performed on a query independently of any input data.

  6. Function resolution . This issue deals with how functions are defined and how function calls are processed .

  7. Error handling . This issue deals with the kinds of errors that can be encountered in queries, and whether query expressions are deterministic (that is, whether they always return the same result for a given input).

  8. Ordering operators . This issue deals with the kinds of operators that are provided in the language for controlling the order of results.

Issue 1: Handling of Untyped Data

Despite publication of XML Schema as a W3C Recommendation in May 2001, a great many XML documents exist that are not described by a schema. Some of these documents have Document Type Definitions (DTDs), and some do not have a formal description of any kind. In the absence of a schema, the character data in an XML document does not have a well-defined type. A DTD might describe this data as CDATA ("character data") or PCDATA ("parsed character data"), but in fact it might represent either text or numeric data.

Many of the operators of XQuery require data of a particular type ”for example, the arithmetic operators are defined on numeric data. It is highly desirable that these operators be usable for querying schema-less documents, even though the data in these documents is not explicitly declared to be of any particular type. The approach of requiring all untyped data to be explicitly cast into a specific type whenever it is referred to in a query was rejected as putting an unreasonable burden on the query writer. Instead, the designers of XQuery attempted to define a set of rules that allow types to be inferred for input data based on the context in which the data is used.

Consider the following fragment of an input document:

 <employee>    <name>Fred</name>    <salary>5000</salary>    <bonus>2000</bonus> </employee> 

The process of representing this input data in the Query data model causes each element node to receive a type annotation. If the input document is validated against a schema, the employee node might receive a type annotation of employee-type , and the salary and bonus nodes might receive a type annotation of xs:decimal .

Suppose that a query binds a variable named $fred to the employee element, and evaluates the expression $fred/salary + $fred/bonus . Each of the operands of the + operator evaluates to an element node whose type annotation is xs:decimal . The + operator extracts the decimal values from the nodes and adds them together. The result of the expression is an atomic value of type xs:decimal .

Now consider the following similar fragment from an input document that has no schema:

 <a>    <b>5</b>    <c>17</c> </a> 

Syntactically, this fragment consists of an element containing two nested elements. In the absence of a schema, the elements are given the generic type annotation xs:anyType , which indicates that no type information is available. An atomic value occurring inside an untyped element, such as 5 or 17 in the above example, is given the type annotation xdt:untypedAtomic , [1] which denotes an atomic value for which no type is available.

[1] The type xdt:untypedAtomic is closely related to the type xs:anySimpleType , which is the root of the simple type hierarchy in XML Schema [SCHEMA]. The type xdt:untypedAtomic is different from xs:anySimpleType in two ways: (1) xs:anySimpleType includes non-atomic values such as lists that are considered to be "simple" by XML Schema, whereas xdt:untypedAtomic includes only atomic values; and (2) xs:anySimpleType includes primitive types such as xs:string and xs:decimal as subtypes , whereas xdt:untypedAtomic has no subtypes. In general, an atomic value in a PSVI whose type property is xs:anySimpleType is represented in the Query data model with the type annotation xdt:untypedAtomic.

In the above example, suppose that a query binds a variable named $a to the outer element, and executes the expression $a/b + $a/c . Looking at the input document, we see that the nested elements b and c contain numbers, and it seems reasonable that these numbers should be added together. But it is necessary to define the rules under which this operation is performed. As in the earlier example, the + operator extracts the typed values from the b and c nodes. The value of the b node is 5, and the value of the c node is 17 , and the type of both values is xdt:untypedAtomic . Each operator in XQuery has a rule for how it handles values of type xdt:untypedAtomic . In the case of arithmetic operators such as + , the rule states that any operand of type xdt:untypedAtomic is cast to the type xs:double before the addition is performed. If a value of type xdt:untypedAtomic cannot be successfully cast to the type xs:double , the arithmetic operator raises an error. In our example expression, the operands are successfully cast to the double-precision values 5.0E0 and 1.7E1 , and the result of the expression is 2.2E1 .

Various operators in XQuery have different rules for handling of values of type xdt:untypedAtomic . As we have seen, arithmetic operators cast xdt:untypedAtomic values to xs:double . General comparison operators such as = and > , on the other hand, cast operands of type xdt:untypedAtomic to the type of the other operand, and if both operands are of type xdt:untypedAtomic , they are both cast to the type xs:string . These rules reflect the facts that addition is defined only for numbers, but comparison can be defined for any string value. Casting xdt:untypedAtomic values into numbers before comparing them would preclude comparison of non-numeric strings.

Suppose that, instead of $a/b + $a/c , our example expression had been $a/b > $a/c . Since the type of both operands is xdt:untypedAtomic , $a/b is cast to the string "5" and $a/c is cast to the string "17" , and the strings are compared using a default collation order. The result depends on the default collation, but the expression is likely to return the value true since in most collations the string "5" is greater than the string "17" .

Each XQuery operator has its own rule for processing operands based on their types. For some operators, such as the general comparison operators, the rule depends on the dynamic types of both operands. These rules make XQuery by definition a dynamically dispatched language, meaning that, in general, the semantics of an operation are determined at run-time. However, the rules are such that, when operating on well-typed data that is described by a schema, static analysis can often determine the semantics of an operation at compile type and can use this information to execute the query efficiently (avoiding run-time branching).

The working group considered the alternative of treating all untyped data as strings. In the example $a/b + $a/c above, this design would have resulted in the + operator having strings as its operands. In order for this useful example to work, it would have been necessary for arithmetic operators to cast their string operands into a numeric type (say, double ). Then 5 + "7" would no longer be a type error, even though it adds a number to a string.

Similarly, if untyped data were to be treated as strings, comparison operators between numbers and strings would need to be defined. For example, consider the expression age > 40 . In the absence of a schema, age is untyped. If untyped data is treated as a string, age > 40 will give the expected result only if its string operand is cast to a number. But then 5 = "5" and 5 = "05" would be true rather than type errors (but "5" = "05" would be false). Also, then 5 > 17 and "5" > 17 would be false but "5" > "17" would be true. It was felt that these rules would generate confusion and would compromise the strong typing of the language. Therefore, the working group chose to treat untyped data according to its context rather than always treating it as a string.

Issue 2: Unknown and Inapplicable Data

In the real world, data is sometimes unknown or inapplicable. Any data model needs to have a way to represent these states of knowledge.

In the relational data model, data is represented in a uniform way using rows and columns. Every row in a given table has the same columns , and there is exactly one value in every column. For example, if a table describing cars has a mileage column containing information about the expected miles per gallon of each car, then every row in that table must have some value in the mileage column, including rows that describe cars whose mileage is unknown and rows that describe electric cars for which the concept of mileage is inapplicable. This requirement gave rise to the "null" value in relational databases, an "out-of-range" value that is used to represent unknown or inapplicable data. In the relational query language SQL, arithmetic and comparison operators always return the null value if any of their operands is null. Since logical operators such as and and or must deal with the null value as well as the Boolean values true and false , SQL is said to use three-valued logic .

XML is considerably more flexible than the relational model when it comes to representing unknown or inapplicable data. In an XML Schema, a car element might be defined to contain make and year elements and an optional mileage element. The mileage element in turn might be defined to contain an optional decimal value. Then, any of the following is a valid XML representation of a car element:

The mileage element could be present and have a value:

 <car><make>Toyota</make><year>2002</year><mileage>26</mileage></car> 

The mileage element could be present but empty:

 <car><make>Ford</make><year>2002</year><mileage/></car> 

The mileage element could be absent:

 <car><make>Porsche</make><year>2002</year></car> 

XQuery leaves it up to the application to define the mapping of states of knowledge about cars onto various XML representations. For example, the absence of a mileage element might be used to represent the absence of any knowledge about mileage, whereas the presence of an empty mileage element might be used to represent the knowledge that mileage is inapplicable for the given car.

Because XML already provides a way to represent various states of knowledge about data, including unknown and inapplicable data, the designers of XQuery saw no need to introduce a special "out-of-range" value (which, in any case, would not have been consistent with XML Schema). Instead, XQuery focuses on defining the operators of the language in such a way that they return a reasonable and useful result for all of the possible syntactic states of their operands. This approach provides application developers with the tools they need to represent various knowledge states as they see fit.

The main operators of XQuery that operate on scalar values are arithmetic, comparison, and logical operators. These operators are also defined on nodes, but always begin by reducing their operands to scalar values by a process called atomization . The result of atomization on an absent or empty element is an empty sequence.

If arithmetic is performed on unknown or inapplicable data (for example, mileage div cost where mileage is unknown), the result should be unknown or inapplicable. Therefore, the arithmetic operators of XQuery are defined in such a way that if either of their operands (after atomization) is an empty sequence, the operator returns an empty sequence.

XQuery provides two sets of comparison operators. The general comparison operators, = , != , > , >= , < , and <= , which are inherited from XPath Version 1, have an implied existential quantifier on both operands. For example, the expression A = B , if A and B are sequences of multiple values, returns true if some item in sequence A is equal to some item in sequence B . If either of their operands is an empty sequence, the general comparison operators return false . These operators (as in XPath Version 1) have the interesting property that the predicates length > width , length < width , and length = width can all be simultaneously true (if length or width contains multiple values), and can also all be simultaneously false (if length or width contains no values).

As noted earlier, XQuery also provides a set of value comparison operators, eq , ne , gt , ge , lt , and le , which require both of their operands to consist of exactly one atomic value. These operators raise an error if either of their operands is an empty sequence. These operators are designed to be "building blocks" with well-defined mathematical properties that can serve as the basis for defining more complex logical operations.

Some consideration was given to defining three-valued comparison operators that return an empty sequence (representing the unknown truth value) if either of their operands is an empty sequence. This approach was rejected because such operators would not be transitive. For example, consider the expression salary eq 5 and bonus eq salary . A query optimizer might wish to transform this expression by adding an additional predicate bonus eq 5 , deduced by the transitivity property. This transformation is valid under two-valued logic but not under three-valued logic. For example, under three-valued logic, if salary is empty (unknown) and bonus has the value 7 , then the truth value of salary eq 5 and bonus eq salary would be unknown, but the truth value of bonus eq 5 would be false . Since transitive comparisons were considered necessary for defining other operations, such as sorting and grouping, the value comparison operators were defined using two-valued logic.

The semantics of the logical operations and , or , and not in the presence of empty sequences are constrained by compatibility with XPath Version 1. In XPath, a predicate expression that returns an empty sequence is interpreted as false (as a failed test for the existence of something). For example, the query //employee[secretary] searches for employee elements that have a secretary subelement; if the predicate expression secretary returns an empty sequence, the predicate is considered to be false . For compatibility with XPath Version 1, the and and or operators of XQuery (and the not function) treat the empty sequence as having a Boolean value of false .

In languages such as SQL that are based on three-valued logic, queries do not generally return data for which the search-condition evaluates to the unknown truth value. Therefore, the distinction between two-valued and three-valued logic is somewhat subtle. The difference can be illustrated by a query that performs negation on an intermediate result. Consider how the following two queries would operate on a car element that has no mileage subelement:

Query : Find cars whose mileage is less than or equal to 25.

 //car[mileage <= 25] 

Query : Find cars for which it is not true that mileage is greater than 25.

 //car[not(mileage > 25)] 

In SQL, if the value of mileage is null, the predicates mileage <= 25 and mileage > 25 both return the unknown truth value. The inverted predicate not(mileage > 25) also returns the unknown truth value. According to this logic, the car with no mileage subelement would not be returned by either query.

In XQuery, if mileage is an empty sequence, the predicates mileage <= 25 and mileage > 25 both return false , and not(mileage > 25) is true . Therefore the car with no mileage subelement would be returned by the second query above but not by the first query.

A language that wishes to define three-valued and , or , and not operators needs to specify the truth tables for these operators. One approach is to adopt the rule that and , or , and not return the unknown truth value if any of their operands is the unknown truth value. SQL, however, adopts a different approach, represented by the truth tables in Figure 2.3.

Figure 2.3. Truth Tables in SQL

graphics/02fig03.gif

If an XQuery application needs three-valued logic, it can easily be simulated by writing functions that use the built-in XQuery operators. The value comparison operator eq can be supplemented by a function, possibly named eq3 , that returns the empty sequence if either of its operands is empty. This function, and functions that implement the SQL truth tables of Figure 2.3, are illustrated in Listing 2.2.

Listing 2.2 Defined Function eq3 and Others Implementing the Truth Tables of Figure 2.3
 define function eq3($a as xdt:anyAtomicType?, $b as xdt:anyAtomicType?)    as xs:boolean?    {     if (empty($a) or empty($b)) then ()     else ($a eq $b)    } define function not3($a as xs:boolean?)    as xs:boolean?    {     if (empty($a)) then ()     else not($a)    } define function and3($a as xs:boolean?, $b as xs:boolean?)    as xs:boolean?    {     if ($a and $b) then true()     else if (not3($a) or not3($b)) then false()     else ()    } define function or3($a as xs:boolean?, $b as xs:boolean?)    as xs:boolean?    {     if ($a or $b) then true()     else if (not3($a) and not3($b)) then false()     else ()    } 

It is worth mentioning that XML Schema defines a special attribute named xsi:nil . This attribute affects only the process of Schema validation. It permits an element to be accepted as valid with no content, even though the element has a declared type that requires content. The xsi:nil attribute is treated by XQuery as an ordinary attribute. Its presence or absence does not affect the semantics of XQuery operators.

Issue 3: What Is a Type?

Query languages use types to check the correctness of queries and to ensure that operations are being applied to data in appropriate ways. The specification and handling of types has been one of the most complex and contentious parts of the XQuery design.

XML Schema provides a set of built-in types and a facility for defining and naming new types. In XML Schema, the name of an element or attribute is distinguished from the name of its type. For example, a schema might define an element named shipto with the type address . This means that the content of the shipto element is determined by the declaration of the address type ”possibly consisting of subelements such as street , city , state , and zipcode . However, not all elements have a named type. For example, a schema might define an element named offering to contain subelements named product and price without giving a name to the type of the offering element. In this case, the offering element is said to have an anonymous type. Furthermore, an offering element might be defined to have different types in different contexts ”for example, an offering in a university might be different from an offering in a catalog . XML Schema also allows an element, an attribute, and a type all to have the same name, so it is necessary to distinguish somehow among different usages of a name (for example, to distinguish an element named price from an attribute named price ). All the names defined in a schema belong to an XML namespace called the target namespace of the schema.

XQuery allows the names of elements, attributes, and types that are defined in a schema to be used in queries. The prolog of a query explicitly lists the schemas to be imported by the query, identifying each schema by its target namespace. Importing a schema makes all the names defined in that schema available for use in the query. An error results if two or more imported schemas define the same name with conflicting meanings. The prolog of a query can also bind a namespace prefix to the target namespace of an imported schema. For example, the following statement imports a schema and assigns the namespace prefix po to its target namespace:

 import schema namespace po = "http://example.org/purchaseOrder" 

As noted earlier, each element or attribute node in the Query data model has a type annotation that identifies the type of its content. For example, a salary element might have the type annotation xs:decimal , indicating that it contains a decimal number. A type annotation identifies either a built-in Schema type, such as xs:decimal , or a type that is defined in some imported schema. For example, a schema might define the type hatsize to be derived from xs:integer by restricting its values to a certain range. Element and attribute nodes acquire their type annotation through a process called schema validation , which involves comparing the content of the node to the schema declaration for nodes with that name. For example, the element named hat might be declared in a schema to have the type hatsize . If the content of a given element node with the name hat conforms to the definition of the type hatsize , the element node receives the type annotation hatsize during schema validation; otherwise , schema validation fails, and an error is reported . Schema validation may be performed on input documents before query processing, and on newly constructed elements and attributes during query processing. Validation of nodes against user -provided schemas is an optional feature of an XQuery implementation. In an implementation that does not support this feature, all element nodes have the generic type annotation xs:anyType .

Several places in the XQuery syntax call for a reference to a type. One of these is a function definition, which specifies the types of the function parameters and result. Other XQuery expressions that require type references include instance-of , typeswitch , and cast expressions. In these expressions, it is not always sufficient to refer to a type simply by its name. For example, when declaring the type of a function parameter, the user may wish to specify an element with a particular name, or with a particular type annotation, or both. Alternatively, the type of a function parameter might be an unrestricted element node, or some other kind of node such as an attribute or text node, or an unrestricted atomic value. Furthermore, since all XQuery values are sequences, we may want a type reference to indicate a permissible number of occurrences, such as exactly one, zero or one, zero or more, or one or more.

Specifying the syntax of a type reference in XQuery was a difficult and important part of the language design. Although XML Schema itself has all the necessary features, embedding fragments of XML Schema language in a query was not considered appropriate, because Schema syntax is quite verbose and is inconsistent in style with XQuery. The working group considered inventing a new type-definition syntax with power comparable to that of XML Schema, but rejected this approach because it would have added a great deal of complexity to XQuery and would have duplicated the features of an existing W3C recommendation.

In the end, the working group defined a type reference syntax containing a few keywords and symbols to give users a limited degree of flexibility in defining types. The main features of this type reference syntax are listed below:

  • An atomic type is referred to simply by its QName , such as xs:integer or po:price . The generic name xdt:anyAtomicType refers to a value of any atomic type.

  • Element nodes can be referred to in any of the following ways:

    • element( N , T ) denotes an element with name N and type annotation T . For example, element(shipto, address) denotes an element named shipto whose type annotation is address .

    • element( N , *) denotes an element with name N , regardless of its type annotation.

    • element(*, T ) denotes an element with type annotation T , regardless of its name.

    • element( N ) denotes an element whose name is N and whose type annotation is the (possibly anonymous) type declared in the imported schema for elements named N .

    • element( P ) , where P is a valid schema "path" beginning with the name of a global element or type and ending with one of its component elements, denotes an element whose name and type annotation conform to that schema path . For example, element (order/shipto/zipcode) denotes an element named zipcode that has been validated against the type definition in the schema context order/shipto .

    • element() denotes any element, regardless of its name or type annotation.

  • Attribute nodes can be referred to in a similar way to element nodes, except that attribute names are prefixed by an @ sign. For example, attribute(@ N , T ) denotes an attribute node with name N and type annotation T , and attribute() denotes any attribute node, regardless of its name or type annotation.

  • Other kinds of nodes can be referred to by keywords that indicate the kind of node, followed by empty parentheses. For example, text() , document() , comment() , and processing-instruction() refer to any node of the named kind; node() refers to a node of any kind; and item() refers to either a node or an atomic value.

  • Any type reference may be followed by one of the following occurrence indicators : * denotes a sequence of zero or more occurrences of the given type; + represents one or more occurrences; and ? represents zero or one occurrence. The absence of an occurrence indicator denotes exactly one occurrence of the given type. For example, xs:integer? denotes an optional integer value, and element(hatsize)* denotes a sequence of zero or more elements named hatsize .

Occurrence indicators are often useful in defining the parameter types of functions. For example, consider a function that classifies jobs into categories such as "Professional," "Clerical," and "Sales," based on criteria such as job title and salary. The definition of this function might be written as follows , indicating that the function accepts an element named job and returns a string:

 define function job-category($j as element(job)) as xs:string 

The job-category function defined above will raise an error if its argument is anything other than exactly one job element. This error might be encountered by the following query:

 for $p in //person return $p/name, job-category($p/job) 

For any person in the input stream that has no job or more than one job , this query will invoke the job-category function with an invalid argument. This error could be avoided by making the job-category function more flexible. The following definition uses the occurrence indicator * on the function parameter, allowing the function to accept a sequence of zero or more job elements:

 define function job-category($j as element(job)*) as xs:string 

Of course, the body of the job-category function must be written in such a way that it returns an appropriate result for any argument that matches the declared parameter type in the function definition.

If a function parameter is specified as an element node with a particular name, such as element(shipto) , the actual argument passed to the function can be an element that is in the substitution group of the named element. Similarly, if a function parameter is specified as an element node with a particular type, such as element(*, address) , the actual argument passed to the function can be an element whose type is derived (by restriction or by extension) from the named type.

Type references in XQuery are designed to be compatible extensions of the node tests that are defined in XPath Version 1. For example, in XPath Version 1, text() and comment() are valid node tests that can be used in path expressions to select text nodes or comment nodes. The following path expression, valid in XPath Version 1, finds all text nodes that are descendants of a particular contract element node:

 /contract[id="123"]//text() 

In XQuery, the following path expression uses an extended node test to find all element nodes of type address that are descendants of a particular contract element node. Note that the type reference syntax in XQuery is an extension of the node test syntax of XPath Version 1, and that XQuery allows a type reference to serve as a node test in a path expression.

 /contract[id="123"]//element(*, address) 

Issue 4: Element Constructors

One of the strengths of XQuery is its ability to construct new XML objects such as elements and attributes. The syntax used to do this has evolved considerably during the design of the language.

In the original Quilt proposal, a constructed element was denoted simply by an XML start tag and end tag, enclosing an expression that represented the content of the element. The expression enclosed between the tags was evaluated in the same way as any other Quilt expression. This design was criticized because it resembled XML notation but was not interpreted in the same way as XML notation, as illustrated by the following examples:

  • <greeting>Hello</greeting> . In XML this is an element containing character content. But in Quilt, Hello would be interpreted as a path expression and evaluated, unless it were denoted as a string by enclosing it in quotes as follows: <greeting>"Hello"</greeting>

  • <animal>A <color>black</ color > cat</animal> . In XML this is an element with mixed content. But in Quilt it would be a syntax error because the sequence of values inside the animal element is not separated by commas. The Quilt equivalent of the given XML expression would be as follows: <animal>"A ", <color>"black"</color>, " cat"</animal> .

The working group decided that XQuery should provide a form of element constructor that conforms more closely to XML notation. However, in the content of an element, it is necessary to distinguish literal text from an expression to be evaluated. Therefore, in an XQuery element constructor, an expression to be evaluated must be enclosed in curly braces. For example:

  • <price>123.45</price> is a constructed element named price , containing a literal value.

  • <price>{$retail * 0.85}</price> is a constructed element whose content is computed by evaluating an expression.

The syntax described above is not identical to XML notation because it uses the left-curly- brace character { to denote the beginning of an evaluated expression, whereas in XML { is an ordinary character. If a left-curly-brace is intended to be interpreted as an ordinary character in the content of a constructed element, XQuery requires it to be doubled ( {{ ).

In addition to an XML-like syntax for constructing elements, XQuery provides an alternative syntax called a computed constructor that can be used to construct elements with computed names as well as other kinds of nodes. The use of computed constructors is illustrated by the following examples:

  • element {$name} {$content} constructs an element node with a given name and content (of course, $name and $content can be replaced by any valid XQuery expression).

  • attribute {$name} {$content} constructs an attribute node with a given name and content.

  • document {$content} constructs a document node with a given content.

  • text {$content} constructs a text node with a given content.

The most complex and difficult issue in the design of XQuery constructors turned out to be determining the type of a constructed element. Since every node in the Query data model has a type annotation, it is necessary to attach a type annotation to the node produced by an element constructor. Since the type annotation of an element node indicates the type of the element's content, the first approach investigated by the working group was to derive the type annotation of the constructed node from the type of its content. For example, the following element constructor contains an integer-valued expression, so it seems natural to annotate the resulting element node with the type xs:integer :

 <a>{8}</a> 

The problem with this approach is that, by definition, elements are assigned their type annotations by the process of schema validation. The type assigned by schema validation may not be the same as the type of the data from which the element was constructed. For example, if the element in the above example were validated against a schema, it might receive an annotation that is more specific than xs:integer , such as hatsize .

In some cases, the type assigned to an element by schema validation is guaranteed to be different from the original type of its content. As an example, consider the following element constructor:

 <a>{1, "2"}</a> 

This element constructor contains an expression whose value is a sequence of an integer followed by a string. However, XML provides no way to represent this element in such a way that, after validation, its content is typed as an integer followed by a string. The XML representation of this element is as follows:

 <a>1 2</a> 

Depending on its definition in the relevant schema, this XML element might be validated as containing one string, or two integers, or possibly a more specific user-defined type such as two hatsizes. But there is no possible schema definition that will cause this element to be validated as containing an integer followed by a string.

The designers of XQuery were confronted with the fact that a type annotation based on the content of a constructed element is never guaranteed to be correct after validation of the element, and in some cases is guaranteed to be incorrect. The working group tried hard to find a reasonable rule by which the type of a constructed element could be derived from its content in "safe and obvious" cases. After much study and debate, the group found it impossible to define such a rule that satisfied everyone. As a result, they decided that each constructed element in XQuery should be automatically schema-validated as part of the construction process.

In order to perform schema validation on a constructed element, a validation mode must be specified. Validation mode can have one of the following three values:

  • strict requires that the given element name must be declared in an imported schema and that the content of the element must comply with its schema declaration.

  • skip indicates that no validation is attempted for the given element. In this mode, constructed elements are given the type annotation xs:anyType , and constructed attributes are given the type annotation xs:anySimpleType .

  • lax behaves like strict if the element name is declared in an imported schema, and like skip otherwise.

XQuery allows a query writer to specify the validation mode for a whole query or for any expression nested within a query. By specifying lax validation, a user can allow the construction of ad hoc elements that are not declared in any schema, and by specifying skip validation, a user can avoid the performance overhead of validating constructed elements entirely.

In addition to a validation mode, the process of schema validation for an element requires a validation context . Validation context controls how the name of the given element is matched against declarations in the imported schemas. If the context is global , the element name is matched against global (topmost) schema declarations. Alternatively, the context may specify a path, beginning with a global element or type declaration, within which the given element name is to be interpreted. For example, the element zipcode might not match a global schema declaration, but might match an element declaration within the context PurchaseOrder/customer/shipto .

The outermost element constructor in a query is validated in the global context unless the user specifies a different context. Each element constructor then adds the name of its element to the validation context for nested elements. For example, if the outermost constructed element in a query is named PurchaseOrder , then the immediate children of this element have a validation context of PurchaseOrder .

Issue 5: Static Typing

As noted earlier, one of the basic principles of XQuery design is that the processing of a query can include a static analysis phase in which some error checking and optimization can be performed. By definition, static analysis is performed on the query only and is independent of input data. Some kinds of static analysis are clearly essential, such as parsing and finding syntax errors. In addition, the working group saw some value in taking advantage of schema information to infer the types of query expressions at query analysis time. This kind of type inference is called static typing . Chapter 4, "Static Typing in XQuery," looks at this subject in some depth.

Static typing offers the following advantages: (1) It can guarantee in some cases that a query, if given valid input data, will generate a result that conforms to a given output schema; (2) It can be helpful in early detection of certain kinds of errors such as calling a function with the wrong type of parameter; (3) It can produce information that may be helpful in optimizing the execution of a query ”for example, it may be possible to prove by static analysis that the result of some expression is an empty sequence.

The static type of an expression is defined as the most specific type that can be deduced for that expression by examining the query only, in the absence of input data. The dynamic type of a value is the most specific type assigned to that value as the query is executed. Note that static type is a compile-time property of an expression, whereas dynamic type is a run-time property of a value.

Static typing in XQuery is based on a set of inference rules that are used to infer the static type of each expression, based on the static types of its operands. Some kinds of expressions have requirements for the types of their operands (for example, arithmetic operators are defined for numeric operands but not for strings). Static analysis starts with the static types of the "leaves" of the expression tree (simple constants and input data whose type can be inferred from the schema of the input document). It then uses inference rules to infer the static types of more complex expressions in the query, including the query itself. If, during this process, it is discovered that the static type of some expression is not appropriate for the context where it is used, a type error is raised.

Type-inference rules are written in such a way that any value that can be returned by an expression is guaranteed to conform to the static type inferred for the expression. This property of a type system is called type soundness . A consequence of this property is that a query that raises no type errors during static analysis will also raise no type errors during execution on valid input data. The importance of type soundness depends somewhat on which errors are classified as "type errors," as we will see below.

Figure 2.4 illustrates a type-inference rule used in XQuery.

Figure 2.4. A Typical Type-Inference Rule

graphics/02fig04.gif

Some inference rules use a notation ( Type1 Type2 ) to denote a type that includes all the instances of type Type1 as well as all the instances of type Type2 . This inferred type has no name. It is an example of the fact that, in many cases, the inferred static type of an expression can be given only a structural description. XQuery 1.0 Formal Semantics [XQ-FS] uses its own notation for the structural description of types, using operators such as (or), & (and), " , " (sequence), and ? (optional).

XML Schema also includes a notation for structural description of types. The working group decided not to use the notation of XML Schema in the type-inference rules in the XQuery formal semantic specification because XML Schema notation is very verbose, and because not all types inferred by the XQuery type inference rules are expressible in XML Schema notation. For example, the rule named "Schema Component Constraint: Element Declarations Consistent" [SCHEMA] prevents an XML Schema type from containing two subelements with the same name but different types. Similarly, the rule named "Schema Component Constraint: Unique Particle Attribution" prevents the definition of an XML Schema type that cannot be validated without lookahead . The XQuery type system does not have these constraints. As a result, unlike XML Schema, the XQuery type system has the closure property (any type inferred by the XQuery type inference rules is a valid type).

The structural type notation used in [XQ-FS] is used only in the formal semantic specification and is not part of the syntax of XQuery as seen by users. This decision was made to avoid adding complexity to the query language, and because the working group did not wish to introduce a user-level syntax that duplicated the facilities of XML Schema.

The working group spent considerable time on the issue of whether static typing should be based on validated type annotations or on structural comparison. As an illustration of this issue, consider a function named surgery that expects as one of its parameters an element of type surgeon . Suppose the type surgeon is defined to consist of a name and a schedule . Is it an error if the surgery function is invoked on an element that contains a name and a schedule but has not been schema-validated as a surgeon ? The working group decided that this should be a type error. In addition to assigning a specific type annotation to an element, schema validation may perform processing steps that are not otherwise supported in the XQuery type system, such as checking pattern facets and providing default attribute values.

Another question closely related to the preceding one is this: Is it a type error if the surgery function is invoked on an element that has been schema-validated as having the type brainSurgeon ? In XQuery, this function call is valid if the type brainSurgeon is declared in an imported schema to be a subtype of (derived from) the type surgeon . In other words, static type-checking in XQuery is based on validated type names and on the type hierarchy declared in imported schemas.

Although the benefits of static typing are well understood in the world of programming languages, extending this technology to the very complex world of XML Schema proved to be a challenging task. Many of the features of XML Schema, such as ordered sequences, union types , substitution groups, minimum and maximum cardinalities, pattern facets, and two different kinds of inheritance, were difficult to assimilate into a system of type-inference rules. Even after some simplifying approximations (such as limiting cardinalities to zero, one, or multiple), the resulting set of rules is quite complex.

XQuery is designed to be used in a variety of environments. Static typing of a query is of greatest benefit when the expected type of the query result is known in advance, the input documents and expected output are both described by schemas, and the query is to be executed repeatedly. This represents one usage scenario for XQuery, but not the only one. Some queries may be exploratory in nature, searching loosely structured data sources such as web pages for information of unknown types. Many XML documents are described by a DTD rather than by a schema, and many have neither a DTD nor a schema. Some XML documents may be described by structural notations other than XML Schema, such as RELAX-NG [RELAXNG]. In some cases, query inputs may be synthesized directly from information sources such as relational databases rather than from serialized XML documents.

To make XQuery adaptable for many usage environments, the language was organized as a required subset called Basic XQuery and two optional features called Schema Import and Static Typing. This approach keeps Basic XQuery relatively simple and lowers the implementation cost of the language in environments where the optional features are of limited benefit.

Basic XQuery includes all the kinds of expressions in the full language. However, a Basic XQuery implementation is not required to deal with user-defined types and is not required to raise static type errors. In Basic XQuery, the only types that are recognized are the built-in atomic types of XML Schema, such as xs:string , xs:integer , and xs:date ; the derived duration types xdt:yearMonthDuration and xdt:dayTimeDuration ; and predefined generic types such as xdt:untypedAtomic and xs:anyType . Basic XQuery might be used, for example, to query data that is exported in XML format from a relational database whose datatypes can be mapped into built-in XML types. When a Basic XQuery implementation is used to query a document that has been validated against an XML schema, the type annotations in the PSVI are mapped into their nearest built-in supertypes (complex types are mapped into xs:anyType , and derived atomic types such as hatsize are mapped into their built-in base types such as xs:integer ).

The first optional feature, Schema Import, provides a way for a query to import a schema and to take advantage of type definitions in the imported schema. For example, a query might import a schema containing the definition of the user-defined type hatsize , derived from xs:integer . This query could then define a function that accepts a parameter of type hatsize . Under the Schema Import feature, an attempt to call this function with an argument that has not been schema-validated as a hatsize is an error.

The second optional feature, Static Typing, defines a specific set of type errors and requires conforming implementations to detect and report these errors during query analysis. An XQuery implementation that does not claim conformance to this feature can still do static query analysis in its own way, for optimization or for error detection. Also, an implementation that does not claim conformance to Static Typing is still responsible for detecting and reporting dynamic type errors. Any condition that will necessarily lead to a run-time error can be reported as a static error at the discretion of the implementation.

It is important to understand that an expression that raises a static type error may nevertheless execute successfully on a given input document. This is because the rules for static type inference are conservative and require a static type error to be raised whenever it cannot be proven that no possibility of a type error exists. For example, consider the following function definition:

 define function pay($e as element(employee)) as xs:decimal?    { $e/salary + $e/bonus } 

A user may want to execute this query against an XML document containing records from a department in which every employee has at most one bonus . However, the only schema available may be a company-wide schema in which the declaration of the employee element allows multiple (zero or more) bonus subelements. In this case, a system that conforms to the Static Typing feature would be required to raise a static error, although execution of the query on the given input data would have been successful.

Conversely, a query that passes static analysis without error may still raise an error at run-time. For example, consider the following expression, where the variable $c is bound to a customer element:

 $c/balance + $c/address 

If the customer element has no schema declaration, the typed values of the balance and address subelements of a given customer will be of type xdt:untypedAtomic . Addition of two values of type xdt:untypedAtomic does not raise a static type error ”instead, at execution time the values are converted to the type xs:double . If, at run-time, the value of $c/balance or $c/address cannot be converted to xs:double , a dynamic error is raised. In order to preserve the claim of "type soundness," this kind of error is not classified as a type error.

Before leaving the subject of static typing, it is worth discussing the influence that static typing has had on the syntax of the XQuery language. Two kinds of XQuery expressions are present in the language solely in order to support better static type inference. The first kind is the typeswitch expression, which is illustrated by the example below. The expression in this example branches on the type of a variable named $employee , computing total pay in one of three ways depending on whether the employee is a salesperson, a manager, or a generic employee.

 typeswitch ($employee)   case $s as element(*, salesperson)      return $s/salary + $s/commission   case $m as element(*, manager)      return $m/salary + $m/bonus   default $d      return $d/salary 

The dynamic behavior of a typeswitch expression can always be obtained by using two other kinds of expression called an if-then-else expression and an instance-of expression. For example, the typeswitch expression in the preceding example might have been expressed as follows:

 if ($employee instance of element(*, salesperson)) then $employee/salary + $employee/commission else if ($employee instance of element(*, manager)) then $employee/salary + $employee/bonus else $employee/salary 

The dynamic (run-time) behavior of the two expressions above is identical. However, the first example will pass static type-checking because it can be proven statically that the variable $s can only be bound to an element of type salesperson , and therefore $s/commission will always return a valid result. The second example, on the other hand, is considered to be a static type error because static analysis cannot predict which branch of the conditional expression will be executed, nor can it prove that the subexpression $employee/commission will not be evaluated for some employee element that is not of type salesperson .

The second type of expression that was added to XQuery solely in support of static typing is the treat expression. To understand the use of treat , consider the following expression, in which the variable $a is bound to an element whose static type is Address :

 $a/zipcode 

Suppose that only an element of type USAddress , a subtype of Address , is guaranteed to have a zipcode . In this case, the above expression is a static error, since static analysis cannot guarantee that the element bound to $a has the type USAddress . To avoid this static error, the Static Typing feature requires this expression to be written as follows:

 ($a treat as element(*, USAddress))/zipcode 

At query analysis time, the static type of the treat expression is element(*, USAddress) , which enables the zipcode to be extracted without a static type error. The treat expression serves as a "promise" that the variable $a will be bound to an element of type USAddress at execution time. At execution time, if $a is not in fact bound to an element of type USAddress , a dynamic error is raised.

As in the case of typeswitch , the dynamic behavior of treat can be simulated by using a combination of an if-then-else expression and an instance-of expression. For example, the dynamic behavior of the preceding expression can be simulated as follows:

 if ($a instance of element(*, USAddress)) then $a/zipcode else error("Not a USAddress") 

Although this expression simulates the dynamic behavior of the preceding treat expression, it is nevertheless considered to be a static error because the compiler cannot predict which branch of the if-then-else expression will be executed. (Chapter 4 provides more details about static type-checking in XQuery.)

Issue 6: Function Resolution

The signature of a function is defined by its expected number of parameters and by the respective types of those parameters. Many modern query and programming languages support function overloading , which permits multiple functions to be defined with the same name but with different signatures. The process of choosing the best function for a given function call based on static analysis of the argument types is called function resolution . Some languages also support polymorphism , a form of function overloading in which static analysis of a function call does not necessarily choose a single function ”instead, it can result in a list of applicable functions from which the most specific function is selected at execution time on the basis of the dynamic types of the actual arguments.

The designers of XQuery made an early decision not to support function overloading. The reasons for this decision were as follows:

  • Despite the advent of schemas, much XML data is untyped. For the convenience of the user, XQuery automatically coerces untyped data to a specific type based on the context where it is used. In order for this feature to work, each function-name must correspond to a single, well-defined signature. For example, consider the untyped element <a>47</a> . If this element is passed to a function that expects an element, it retains its character as an element; if it is passed to a function that expects a decimal, the value 47 is extracted and treated as a decimal value; if it is passed to a function that expects a string, the value 47 is extracted and treated as a string. This flexibility in treatment of untyped data would not be possible if three functions could have the same name while one takes an element parameter, one takes a decimal parameter, and one takes a string parameter.

  • The type system of XQuery is based on XML Schema and is quite complex. It has two different kinds of type inheritance: derivation by restriction and derivation by extension. In addition, it has the concept of substitution groups, which allow one kind of element to be substituted for another on the basis of its name. The rules for casting one type into another and for promotion of numeric arguments are also complex. They are made even more complex by XPath Version 1 compatibility rules, which come into play when XPath Version 2 (a subset of XQuery) is used in "XPath Version 1 Compatibility Mode" (as it is when embedded in XSLT). The designers of XQuery were not eager to add another increment of complexity on top of the rules they inherited from XML Schema and XPath. In the end, it was decided that the added complexity of function overloading outweighed its benefit, at least for the first version of XQuery.

Despite the decision not to support function overloading, it was necessary for XQuery to support the XPath Version 1 core function library, which already contained some overloaded functions. Support for these functions was reconciled with the XQuery design in the following ways:

  • Some XPath Version 1 functions take an optional parameter. For example, the name function returns the name of a node, but if called with no argument, it returns the name of the context node. In XQuery, this kind of function is treated as two functions with the same name but with different arity (number of parameters). The built-in function library of XQuery contains several sets of functions that differ only in their arity. This form of overloading is permitted in the built-in function library but not among user-defined functions. It does not complicate the rules for function resolution, since a function is always uniquely identified by its name and its arity.

  • Some XPath Version 1 functions can accept several different parameter types. For example, the string function of XPath Version 1 can turn almost any kind of argument into a string. In XQuery, functions like string have a generic signature that encompasses all their possible parameter types. For example, the signature of the XQuery string function is string($s as item?) . The type item? denotes an optional occurrence of a node or atomic value. The description of the string function specifies how it operates on arguments of various types that satisfy its signature. The technique of creating a function with a generic signature is also available to users for creating user-defined functions. The body of such a function can branch on the dynamic type of the actual argument.

One painful consequence of the decision not to support function overloading was the loss of polymorphism. Polymorphism is convenient because it facilitates introducing new subtypes of existing types that inherit the behavior of their supertypes where appropriate and specialize it where necessary. A polymorphic language allows a query to iterate over a heterogeneous sequence of items, dispatching the appropriate function for each item based on its dynamic type. For example, in a polymorphic language, a generic function named postage might be defined for elements of type Address , with specialized postage functions for elements of type USAddress and UKAddress that are derived from Address .

Although XQuery does not support true polymorphism, the designers wished to make it possible and reasonably convenient for users to simulate a form of polymorphism. This was accomplished by the following rule: Any function can be invoked with an argument whose dynamic type is a subtype of the expected parameter type. When this occurs, the argument retains its original dynamic type within the body of the function.

This rule makes it possible for a user to write functions such as the postage function shown in Listing 2.3, which invokes one of several more specialized functions depending on the dynamic type of its argument:

Listing 2.3 Function Simulating a Form of Polymorphism
 define function postage($a as element(*, Address)) as xs:decimal { typeswitch($a)    case $usa as element(*, USAddress)       return us-postage($usa)    case $uka as element(*, UKAddress)       return uk-postage($uka)    default       return error("Unknown address type") } 

The postage function can be used in a query that iterates over a set of addresses of mixed type. However, if a new type named GermanAddress is introduced that is also derived from the Address type, it is necessary to add a new case to the body of the postage function, since XQuery will not automatically dispatch a specialized function for the new type.

It is worth considering whether function overloading could be introduced in a later version of XQuery. The main barrier to the introduction of this feature would be the implicit coercions of function parameters that are defined in XQuery. As noted above, untyped data can be coerced into either a string or a number, depending on context. If it were possible to define two functions with the same name that respectively take a string parameter and a numeric parameter, it would be necessary to define a priority order among the possible coercions, in order to resolve an ambiguous function call. This could certainly be done. If function overloading is introduced in a future version of XQuery, users will need to exercise care in the evolution of their function libraries to ensure that existing queries continue to run successfully.

Issue 7: Error Handling

In general, XQuery expressions return values and have no side effects. However, some special consideration is needed for error cases. For example, what should be the value of an expression that contains a division by zero?

One approach that the working group considered was to define a special error value in the Query data model, to be returned by all expressions that encounter an error. The error value would have its own error type that is different from all other types. In this approach, for example, the static type of a double-precision arithmetic expression would be (double error) rather than simply double . After some study, the group decided that including a special error value (and error type) in the Query data model complicated the semantics of the language and made it difficult to distinguish one kind of error from another.

The approach finally adopted for handling dynamic errors in XQuery is to allow an expression either to return a value or to raise a dynamic error. An expression that raises an error is not considered to return a value in the normal sense. Therefore dynamic errors are handled outside the scope of the type system. An error carries with it an ordinary value (of any type) that describes the error. A function named error is provided that raises an error and attaches the argument of the function to the error as a descriptive value.

Propagation of dynamic errors through XQuery expressions is handled by a general rule and some special rules that apply to specific kinds of expressions. The general rule is that if the operands of an expression raise one or more dynamic errors, the expression itself must also raise a dynamic error. Which of several operand errors is propagated by an expression is not specified. If one operand of an expression raises an error, the expression is permitted, but is not required, to evaluate its other operands. For example, in evaluating the expression "Hello" + (length * width * height) , an XQuery implementation can raise an error as soon as it discovers that "Hello" is not a number.

The special error-handling rules for certain kinds of expressions are intended to make these expressions easier to optimize and more efficient to implement. These rules have the effect of making the result of an XQuery expression non-deterministic in a certain limited sense. In some cases, an expression may either return a result or raise an error. However, if an expression returns a result, the result is always well defined by the semantics of the language. The circumstances in which non-deterministic handling of errors is permitted are as follows:

  • If one operand of and is false and the other operand raises an error, the and operator may either return false or raise an error.

  • If one operand of or is true and the other operand raises an error, the or operator may either return true or raise an error.

  • The result of a quantified expression may depend on the order of evaluation of its operands, which may vary from one implementation to another. As an example, consider the expression some variable in range-expr satisfies test-expr . If range-expr contains a value for which test-expr is true and another value for which test-expr raises an error, the quantified expression may either return true or raise an error. Similarly, in the universally quantified expression every variable in range-expr satisfies test-expr , if range-expr contains a value for which test-expr is false and another value for which test-expr raises an error, the quantified expression may either return false or raise an error.

  • Since some comparison operators, such as = , are defined to include an implicit existential quantifier, these operators are made non-deterministic by the above rules. For example, the expression (47, 3 div 0) = 47 , which compares a sequence of two values to a single value, might either return true or raise an error.

  • The Effective Boolean Value of an expression is defined to be true if the expression returns a sequence containing more than one item. But if an error is raised during evaluation of the items in such a sequence, the Effective Boolean Value computation may either return true or raise an error.

  • If a function can evaluate its body expression without evaluating one of its arguments, the function is allowed, but not required, to evaluate that argument. If such an argument contains an error, the function may either raise an error or return the value of its body expression.

An ideal language would be both deterministic and efficient in execution. As illustrated by some of the preceding examples, the presence of sequences in the Query data model leads to some tension between the goals of determinism and efficiency in XQuery. In general, this tension has been resolved in favor of efficiency for queries that contain errors; however, a query that contains no errors must always return a deterministic result.

Users can protect themselves against some kinds of errors by using an if-then-else expression, which evaluates one of two subexpressions (called "branches" here) depending on the Boolean value of a test expression. The if-then-else expression is defined in such a way that it evaluates only the branch that is selected by the test expression; therefore errors in the other branch are ignored. By using an if-then-else expression, a query can control the value that is returned when an error condition is encountered. For example, a user could guard against division by zero by the following expression, which returns an empty sequence rather than raising an error if the divisor is zero:

 if ($divisor ne 0) then $dividend div $divisor else ( ) 

The working group discussed a proposal for a "try/catch" mechanism similar to that used in the Java language, which would have allowed expressions at various levels of the query hierarchy to "catch" errors raised by nested expressions. An expression might specify "handlers" for various kinds of errors encountered in its operands, and in some cases might be able to recover from these errors rather than propagating them. In the end, this approach was considered to be too complex for the first version of XQuery. The working group may revisit this proposal in a future version of the language.

Issue 8: Ordering Operators

Since all values in XQuery are ordered sequences, operators for controlling the order of a sequence are of considerable importance to the language. All of the operators of XQuery return their results in a well-defined order. For example, path expressions and the union and intersect operators always return sequences of nodes in document order. Of course, the ordering of the sequences returned by various expressions in a query is not always essential to the meaning of the query. The cost of evaluating a sequence may depend strongly on the order in which it is evaluated, due to the organization of physical storage or the availability of access aids such as indexes. If the order of some result is not important, the query writer should have some way to say so, thus giving the implementation the flexibility to generate the result in the fastest or least expensive way.

For a time, the working group experimented with "unordered" versions of various operators. In the end, it was decided that the XQuery function library should include a general-purpose function named unordered . The unordered function can be applied to any sequence, and it returns the same sequence in a non-deterministic order. In effect, the unordered function signals an optimizing compiler that its argument expression can be materialized in any order that the optimizer finds convenient. Obviously, applying the unordered function to an expression E can have implications for the ordering properties of expressions that are nested inside E or that use E as an operand, and XQuery implementations are free to take advantage of these implications.

XQuery also needs to provide a way for users to specify a particular order for the items in a sequence. In fact, the result of a query may be a hierarchy of elements in which an ordering needs to be specified at several levels of the hierarchy. The working group explored two alternative approaches to this problem.

The first approach was to provide a general-purpose operator called sort by that could be applied to any expression to specify an order for the result of the expression. This might be called the "independent sorting" approach because sort by is an operator that can be applied to any sequence, and is independent of any other operator. In this approach, the meaning of E1 sort by E2 is defined as follows: For each item in the result of expression E1 , the expression E2 is evaluated with that item as the context item. The resulting values are called sort keys . The items returned by E1 are then arranged in an order that is controlled by their respective sort keys. Options are provided that allow the user to specify primary and secondary sort keys, ascending or descending order, placement of items whose sort key is empty, and other details.

The independent sorting approach is illustrated by the following example, in which a query searches its input for employees in a particular department, and returns them in alphabetical order by their names. In this example, the sort by operator is applied to a path expression.

 //employee[dept = "K55"] sort by name 

The independent sorting approach has two important disadvantages. First, it can only specify an ordering for a sequence based on sort keys that are actually present in the sequence. Suppose, for example, that a query is intended to return the names of all employees in a particular department, in the order of their salaries, but not to return the actual salaries. This query, while not impossible under the independent sorting approach, is not as straightforward as the previous example. It might be expressed as follows:

 for $e in //employee[dept = "K55"] sort by salary return $e/name 

The difficulty of expressing an ordering by the independent sorting approach becomes greater if the sort key is computed from two or more bound variables but is not included in the query result. As an example of this problem, a query might need to join employee elements with their respective department elements, and sort the result according to the ratio between the employee's salary and the department's budget, but not include this ratio in the final output. Such a query is difficult (but not impossible) to express with an independent sort by operator.

The second disadvantage of the independent sorting approach is that it separates iteration and ordering into two separate syntactic constructs. The result of an iteration might be processed in various ways, combined with other results, given a new name, and finally sorted. The most efficient way to process such a query might be to perform the initial iteration in the order required for the final result. However, the specification of this ordering may be syntactically distant from the iteration, and it may not be expressed in a way that is easy for an implementation to use in optimizing the iteration.

Another ordering approach that the working group investigated and ultimately adopted consists of combining the ordering expression with the expression that controls iteration. The iteration expression in XQuery was originally called a "FLWR" expression because of its keywords for , let , where , and return . An order by clause was added to this expression between the where and return clauses, changing the name of the expression to "FLWOR." Positioning the order by clause inside the FLWOR expression gives it access to the bound variables that control the iteration, even though the values bound to these variables may not ultimately be returned. Associating the ordering directly with the iteration also makes it easier for a system to implement the iteration efficiently. The disadvantage of this approach is that an ordering must always be expressed in the form of a FLWOR expression, even though it would not otherwise require an explicit iteration.

The advantages and disadvantages of the "ordered FLWOR" approach can be illustrated by repeating the two preceding example queries. The query that returns all the employees in a given department, sorted by name, can be expressed as follows (note that it now requires an explicit iteration, which was not needed in the "independent sorting" approach):

 for $e in //employee[dept = "K55"] order by $e/name return $e 

Similarly, the query that returns names of employees in a given department, ordered by their salaries but without returning the salaries, can be expressed as follows:

 for $e in //employee[dept = "K55"] order by $e/salary return $e/name 

The choice between the "independent sorting" approach and the "ordered FLWOR" approach is not obvious. The working group considered including both approaches, but decided not to do this on the grounds that the two alternatives are redundant and potentially confusing. In the end, the working group decided to include only the "ordered FLWOR" approach to control ordering of results in XQuery.



XQuery from the Experts(c) A Guide to the W3C XML Query Language
Beginning ASP.NET Databases Using VB.NET
ISBN: N/A
EAN: 2147483647
Year: 2005
Pages: 102

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net