Getting Started with the Formal Semantics | Beginning ASP.NET Databases Using VB.NET

The formal semantics of XQuery has three components : a dynamic semantics , a static semantics , and normalization rules . The dynamic semantics specifies precisely the relationship between input data, an XQuery expression, and output data. The static semantics specifies precisely the relationship between the type of the input data, an XQuery expression, and the type of the output data. The dynamic and static semantics are related by the important type soundness theorem . The normalization rules transform full XQuery into a small core language, which is easier to define, implement, and optimize. The dynamic and static semantics are defined in terms of the core language.

The XQuery formal semantics describes a processing model that relates query parsing, normalization, static analysis, and dynamic evaluation. This processing model works as an "idealized" implementation of the language. The parsing phase takes as input a query in XQuery's full syntax and either raises a parse error or produces a parse tree of the input query. Normalization implements XQuery's normalization rules ”it transforms the parse tree of the input query into an equivalent parse tree in the core language. Static type analysis implements XQuery's static semantics ”it takes the parse tree in the core language and assigns types to expressions in the parse tree. Static analysis either raises a static type error or produces a parse tree where each expression has been assigned a type. Dynamic evaluation implements XQuery's dynamic semantics ”it takes a parse tree in the core language (which may be the result of normalization or static analysis) and reduces expressions in the parse tree to XML values. Dynamic evaluation either raises a dynamic error or produces an XML value that is the result of the query.

The main technique we use to specify XQuery's static and dynamic semantics is commonly known as an operational semantics . An operational semantics is specified using inference rules ”you may already be familiar with inference rules from the study of logic. If you want to learn more about the mathematical foundations of semantics and inference rules, the concluding section of this chapter contains references to the foundational literature.

We introduce inference rules by specifying a tiny subset of XQuery, which has only two types ( xs:boolean and xs:integer ) and just a few constructs, including if and let expressions. We first apply the technique to describe the dynamic semantics of the language, and then the static semantics. We describe normalization last.

Dynamic Semantics

Evaluation is a process that takes an expression and returns a value. Here is how we write it in formal notation:

This is read, "Evaluation of expression Expr yields value Value ." We call this an evaluation judgment . Recall that dynamic evaluation takes a parse tree in the core language and produces an XML value. So you should read Expr as representing such a parse tree and Value as representing the resulting value. For the time being, we consider the very simple case of expressions without variables, so we don't need any environment to give the values of the variables ; we will consider environments later in this section.

Our first definition of values is simple; it includes just Boolean and integer literals.

graphics/05equ02.gif

To completely specify the integers we would need to use a grammar, but we don't go to that level of detail here. One of the tricks to effective use of formalism is to know when not to formalize!

Our first definition of expressions is also simple: literal values, one comparison operation (less than), one arithmetic operation (sum), and conditionals.

graphics/05equ03.gif

This grammar is called an abstract syntax . Issues like precedence or parenthesis are specified in the grammar of the language's concrete syntax and implemented by the parsing phase. We don't concern ourselves with concrete syntax here.

In general, evaluation relates the expression on the left of the arrow and the value on the right. For the simple case described here, every expression evaluates to at most one value. A sensible expression (like 1+2 ) yields one value, while a nonsensical expression (like if (1+2) then 3 else 4 ) yields no value.

Evaluation is described by five rules. Each inference rule has zero or more judgments above the line, called hypotheses , and one judgment below the line, called the conclusion . When all the hypotheses are true, then the conclusion must be true also. The following rules illustrate how literals, comparison, arithmetic, and conditional expressions are evaluated.

graphics/05equ05.gif

graphics/05equ06.gif

graphics/05equ07.gif

graphics/05equ08.gif

Inference rules can be read as deductions in logic. From some hypothesis (above the line), one can deduce a conclusion (below the line). Another way to read inference rules is as a program ”or an algorithm ”to compute a result from some input. In our rules, we compute a value from an expression. The inference rule explains how to find the value of a larger expression (below the line) in terms of the values of its subexpressions (above the line).

For example, here is a proof tree to evaluate the expression 1+2 . The proof tree constrains two instances of the rule ( VALUE ), with Value = 1 in the first instance and Value = 2 in the second instance; and one instance of the rule ( SUM ), with Expr = 1 , Expr ₁ = 2 , Integer = 1 , and Integer ₁ = 2 .

graphics/05equ09.gif

With some effort we have proved the obvious: The expression 1 yields the value 1 , and the expression 2 yields the value 2 , so the expression 1+2 yields their sum, which is the value 3 . Each instance of rule ( VALUE ) has no hypotheses, and so its conclusion follows immediately; and the instance of rule ( SUM ) has two hypotheses that are proved by the instances of rule ( VALUE ), and so its conclusion follows.

Another way to read the above is as a trace of an evaluation: To evaluate the expression 1+2 , evaluate the subexpression 1 to yield the value 1 , and evaluate the subexpression 2 to yield the value 2 , then add those values to yield the final value 3 .

As a second example, here is a proof tree concluding in an instance of the rule ( IF-TRUE ), where we choose Expr = 1 < 3 , Expr ₁ = 3+4 , Expr ₂ = 5+6 , and Value = 7 .

graphics/05equ10.gif

So evaluation of the expression if (1 < 2) then 3+4 else 5+6 yields the value 7 . Note that Expr ₂ = 5+6 did not appear in the hypotheses, which formalizes the intuition that the else branch is not evaluated when the condition is true .

Inference rules may be applied in any order: as long as the hypotheses are all true, we can draw the corresponding conclusion. Building a proof tree corresponds to a successful evaluation of an expression ”if you cannot build a proof tree for an expression, you cannot evaluate it!

We designed the rules so that there is at most one proof tree with a given expression in its conclusion. This means the result of evaluating the expression is deterministic. Later, when we discuss errors and evaluation order, we'll see cases where more than one proof tree exists for a given expression, which means the result of evaluating the expression is non-deterministic .

As language designers, we may write out proof trees to make sure that the semantics of an expression are what we expect them to be, or we may automate this process by building a theorem prover that applies the inference rules automatically. The rules are compact and, once you get used to them, easy to read. It is usually not difficult to convert these rules into executable code ”this is exactly what we have done in writing Galax, our implementation of XQuery [GALAX]. On the other hand, using executable code as the formal specification would be unsatisfactory, because such code contains too many low-level details that hinder readability.

Environments

Evaluation is defined as a judgment relating an expression to a value. Typically, the evaluation judgment also has a third component, an environment . Here is how we write it:

This is read, "In environment dynEnv , the evaluation of expression Expr yields the value Value ." Here dynEnv represents the dynamic environment, such as the values of variables, which captures the context available at query-evaluation time.

In general, an environment may have many components, each of which is given a name . For now, we require an environment, which we call dynEnv , with just one component, a map from variables to their values, which we call varValue . Later we will add other components to the environment.

We use the following notations to manipulate environments:

Table 5.1. Notations That Manipulate Environments

Notation	Meaning
˜	The initial environment with an empty map
dynEnv.varValue ( Var ₁ Value ₁ `,` . . . `,` Var _n Value _n )	The environment that maps Var _i to Value _i
dynEnv + varValue ( Var Value )	The environment identical to dynEnv except that Var maps to Value
dynEnv.varValue ( Var )	The value of Var in dynEnv
dom ( dynEnv.varValue )	The set of variables mapped in dynEnv

Binding a variable in the environment overrides any previous binding for the same variable. For example, consider the following environments:

graphics/05equ12.gif

Given these environments, we have dynEnv ₂ . varValue ( x ) = 1, dom ( dynEnv . varValue ) = ˜, and dom ( dynEnv ₃ . varValue ) = { x,y } . Note that x maps to a different value in dynEnv ₂ and dynEnv ₃ .

If you have ever implemented a compiler or interpreter, you have implemented some kind of environment. Typically, environments are implemented by stacks of dictionaries or associative arrays. But for the purposes of the formal semantics, we need not worry about these implementation details.

With these preliminaries out of the way, we can now formalize variables and let expressions. We extend the earlier syntax to include these.

graphics/05equ13.gif

The five rules we gave before need to be revised to mention the environment. Here is the revision of the first two rules.

graphics/05equ15.gif

The revisions of the other rules are similar.

There are also two new rules. The first rule illustrates how the value of a variable is retrieved from the dynamic environment.

If in dynEnv the value bound to Var is Value , then in dynEnv evaluating expression Var yields value Value . The rule implicitly requires that the variable be in the domain of the dynamic environment, since dynEnv . varValue ( Var ) makes sense only if Var dom ( dynEnv . varValue ). The formalism doesn't specify what happens if the variable is not in the dynamic environment. In an implementation, an undefined variable error would typically be caught prior to evaluation, so the dynamic semantics does not even mention it. Again, it is important to know when not to formalize. (Later, when some subtleties about errors arise, we will discuss how to formalize them.)

The second rule illustrates how environments are modified and how the modified environment is used in a let expression.

graphics/05equ17.gif

In the first hypothesis, the original environment dynEnv is used to evaluate expression Expr yielding value Value . In the second hypothesis, the environment is extended by binding variable Var to value Value , and the extended environment is used to evaluate expression Expr ₁ yielding value Value ₁ . The conclusion states that evaluating the let expression in the original environment dynEnv yields value Value ₁ .

A nested let expression may rebind variables. It is not always easy for a user to predict the result of a query when the same variable is bound several times. For instance, consider the following expression:

 let $x := 1 return   let $y := $x+$x return     let $x := $x+$y return       $x+$y

Proof Tree 5.1 shows a proof tree in which dynEnv , dynEnv ₁ , dynEnv ₂ , and dynEnv ₃ are defined as in the example at the beginning of this section. Applying the inference rules step by step in this proof clarifies the behavior of nested let expressions, in particular, that variables defined in inner let expressions hide variables of the same name defined in outer let expressions.

Proof Tree 5.1 Evaluation of a let Expression

graphics/05ptr01.gif

(When dealing with large proof trees, such as the one above, some readers prefer to start reading at the top with the first ( VALUE ) judgment, and some to start reading at the bottom with the final ( LET ) judgment.)

Now that we have shown a nontrivial proof tree, in what follows we often present just the rules and some example judgments, and omit the detailed proof trees. Regardless of the number or complexity of the rules that we present, however, proof trees can be used to clarify the evaluation of any expression.

Matching Values and Types

The relationship between values and types plays an important role in the formal semantics. This relationship can be illustrated by extending the grammar with a second form of let expression, which declares a type for the bound variable. The intention is that a type error is raised if the value does not match the specified type.

In addition to let expressions, type declarations occur in for , some , and every expressions and in function signatures, so once you understand the semantics of type declarations in let , you will understand them in those contexts as well.

For now, we use only two types:

Type declarations can be checked in two ways: statically , when the expression is analyzed , or dynamically , when the expression is evaluated. For now, we focus on dynamic checking; static checking is discussed in the section below entitled "Static Semantics."

We need a new matching judgment that matches a value against a given type. In what follows, judgments are always written in bold . Here is how we write the matching judgment:

This is read, "Value Value matches the type Type ."

The inference rules for literal expressions are straightforward. Every literal integer matches the type xs:integer , and every literal Boolean matches the type xs:boolean .

The modified evaluation rule for the let expression with type declarations shows how to check that the value bound to the variable matches the declared type.

graphics/05equ23.gif

The first and third hypotheses are the same as those in the rule for let without type declarations. The second hypothesis asserts that Value matches the declared type.

Since XQuery is a strongly typed language, matching values to types is central to its dynamic semantics. Later in the chapter, we define additional matches judgments relating complex XML values to complex types, but these additions don't affect the use of the matches judgment above ”the same hypothesis applies. The inference-rule formalism makes it possible to change and test out XQuery's semantics simply by adding (or deleting) rules, which illustrates another advantage of having a formal semantics.

Errors

So far, our semantics associates a value with every expression. Some expressions, however, have no value; instead they raise an error. We have seen one error ”the use of an undefined variable ”that we chose not to formalize. But the behavior of other errors can be subtle, and the prose XQuery description goes to some lengths to specify what freedom vendors have and do not have to raise errors. So it is worth the effort to formalize errors.

We introduce a new judgment to indicate that evaluation of an expression may raise an error.

This is read, "In the environment dynEnv , the evaluation of expression Expr raises the error Error ."

We classify errors as either type errors or dynamic errors.

These do not indicate the cause of the error (e.g., "divide by zero" or "value does not match required type"). The formal semantics could make this distinction, but we don't bother with it here.

To illustrate the semantics of errors, we extend our subset of XQuery with an operator for integer division.

Type errors are triggered if an operand's value does not match the operator's required type. We use the matching judgments from the preceding section, and write the following if none of the matches judgments above hold for a particular Value and Type :

The evaluation rule for division is similar to that for addition. The first two hypotheses are the same as those for addition, and the third hypo thesis guarantees that the divisor in Expr ₁ is not equal to zero:

graphics/05equ28.gif

We now add rules that indicate when errors should be raised. The next rule shows how evaluation of the integer division operator raises an error if the divisor is zero.

Notice that the rule does not require evaluation of the dividend in Expr to discover such an error.

The following rules show how type errors may be raised during evaluation of arithmetic and comparison operators. The less-than , sum, and division operators raise a type error if they act on a value that is not an integer. We give the two rules for less than; the other operators are similar.

graphics/05equ30.gif

graphics/05equ31.gif

The rule for the if expression is omitted but similar ”it raises a type error when the value of its condition expression is not a Boolean.

The let expression with a type declaration raises an error if the value is not of the declared type.

graphics/05equ32.gif

Recall that in XQuery, type declarations also occur in for , some , and every expressions and in function signatures ”the rules for raising type errors in these expressions are similar.

The following rules illustrate how an error is propagated from the subexpression in which it occurs to the containing expression. In XQuery, there is no expression for catching an error, so typically an error is propagated all the way back to the programming environment in which the query is evaluated. Here are the rules for propagating errors through the less-than operator. If either argument of an operator raises an error, then the operator raises an error. We give the two rules for less than, the rules for the other operators are similar.

The rules for an if expression are omitted, but similar ”an error is propagated if it arises in the condition, or if it arises in the branch that is evaluated.

In a let expression, an error is propagated if it arises in the first or second expression.

We bind the variable to its value before checking whether the second expression raises an error:

graphics/05equ36.gif

Here is an example of evaluation that raises an error.

graphics/05equ37.gif

where dynEnv = ˜, dynEnv ₁ = varValue ( x ).

By the rules above, sometimes an expression may raise two different errors. For example, the expression

may raise a dynamic error (for division by zero) or a type error (for using a Boolean as a summand). If the static type checking feature is in use, then all type errors are caught in the static analysis phase. But if the static type-checking feature is not in use, then a conforming implementation may raise either error for the expression above.

Note that there exist other kinds of errors in XQuery which are not detected by the type system. For instance, parse errors, which are typically detected even before static typing occurs, or recursive functions going into infinite loops , which are typically never detected by the system.

Static Semantics

The previous sections described the dynamic semantics , that is how expressions in core XQuery are evaluated. The next step is to explain the static semantics , how static types are associated with expressions in core XQuery. We reuse the inference-rule notation of the previous section to describe the static semantics.

Static typing is a remarkable innovation. If the static analysis phase concludes that an expression has a given type, then one can be sure that expression always yields a value that matches the given type. In effect, the static analysis has performed a small proof, ensuring that the expression always returns data that fits a certain format.

Static typing is a process that takes a static environment and an expression and returns a type. Here is how we write it:

This is read, "In environment statEnv , expression Expr has type Type ." We call this a typing judgment .

Here statEnv represents the static environment that captures the context available at query-analysis time, which includes variables and their types. The notation for the static environment is analogous to the notation for dynamic environments. We write ˜ for the initial static environment with an empty map. We write statEnv + varType ( Var : Type ) for the environment that is identical to statEnv except that Var maps to Type , and we write statEnv . varType ( Var ) for the type of Var in statEnv . We write dom ( statEnv . varType ) for the set of variables that are given a type in statEnv .

Here is the XQuery grammar that we have built so far:

graphics/05equ40.gif

Static typing is defined by the nine rules below, which correspond closely to the rules for dynamic evaluation on page 242 “243 and 246 “247. We are going to compare the static rules for each expression with the corresponding dynamic rules, because doing so makes crystal clear the meaning of each expression. We encourage you do make the same comparisons when you read the XQuery Formal Semantics [XQ-FS].

The static rules ( LT ), ( SUM ), ( IDIV ), ( VAR ), and ( LET ) look like the corresponding dynamic rules; we've replaced judgments of the form Expr Value by judgments of the form Expr : Type , taking care to ensure that the dynamic value has the static type. In effect, we've replaced concrete evaluation that yields values by abstract evaluation that yields types.

graphics/05equ41.gif

graphics/05equ42.gif

graphics/05equ43.gif

graphics/05equ45.gif

The dynamic evaluation rule for literal expressions ( VALUE ) breaks into one rule for each type, ( BOOLEAN ) and ( INTEGER ). We need two rules so we can assign the appropriate type to each literal expression.

The two dynamic rules for conditionals ( IF-TRUE ) and ( IF-FALSE ) coalesce into one static typing rule ( IF ).

graphics/05equ48.gif

Here, the formalism helps us distinguish clearly between the static and dynamic behavior of the language. In the static semantics, we do not examine the value of the condition expression as we do in the dynamic semantics, because the value is not known statically. Instead, we only examine the type of the condition, which must be Boolean, and we require that the branches have the same type. (XQuery, unlike many other languages, has a sufficiently expressive type system that we don't need to make this restriction; we'll return to this point later in the subsection entitled "Values and Types.")

The static rule for a let expression with a type declaration requires that the type of the first expression must match the declared type. The first judgment below ensures that Expr has type Type , and the second judgment extends the environment of variable types by adding a new mapping from the variable Var to its type Type , and then uses this new environment to type the subexpression Expr ₁ .

graphics/05equ49.gif

Recall that the two dynamic rules for a let expression with a type declaration perform a check at evaluation time to confirm that the value of the first expression matches the declared type; otherwise they raise a type error:

graphics/05equ50.gif

graphics/05equ51.gif

As we will see in the next section, an important consequence of static analysis is that we do not need to check for type errors at evaluation time, which means the matches judgment in the first dynamic rule above and the entire second rule are redundant when static typing is in effect.

Proof Tree 5.2 shows the static typing of a conditional expression, corresponding to the dynamic evaluation on page 248. Note that the static type computed for the expression is xs:integer and that the dynamic value computed for the expression is 7, which is indeed an integer.

Proof Tree 5.2 Static Typing of a Conditional Expression

graphics/05ptr02.gif

The XQuery static type rules are written so that the correspondence between an expression's type and its value is guaranteed for all expressions ”this guarantee is called type soundness , which we discuss next.

Type Soundness

Of course, the dynamic semantics and static semantics of an expression should be related. Say that an expression has a given type. Then we expect that the expression either yields a value that has that type or raises a dynamic error. This property is called type soundness . A consequence of type soundness is that, when the static typing feature of XQuery is implemented, if a query passes the static-analysis phase, then it cannot raise a type error during the evaluation phase.

Here is an example of successful evaluation.

graphics/05equ52.gif

where dynEnv = ˜, dynEnv ₁ = varValue ( x ).

As before, we write Value matches Type to indicate that the given value belongs to the given type. To phrase type soundness properly, we need to capture the relationship between a dynamic environment and the corresponding static environment. We write

if dom ( dynEnv . varValue ) = dom ( statEnv . varType ), that is, the dynamic and static environments contain the same variables, and for every variable x in that domain,

For example, the following two environments match.

We can now state two type soundness results, one for evaluation and one for errors. The first result states that dynamic and static evaluation in matching environments yield matching results.

Theorem 5.1 Type Soundness for Values

graphics/05equ56.gif

then

For example, assume statEnv ₁ and dynEnv ₁ are as above. We have

graphics/05equ58.gif

and hence

as we expect.

The type soundness theorem looks a little like an inference rule, in that it has hypotheses and a conclusion. But it differs from an inference rule in that it is not invoked as part of static analysis or dynamic evaluation; instead, it is a statement that relates those two processes.

The second result states that if an expression has a static type, then evaluation will not yield a type error.

Theorem 5.2 Type Soundness for Errors

graphics/05equ60.gif

then

Static checking guarantees the absence of type errors, but of course it cannot guarantee the absence of dynamic errors. For example, the following judgments are consistent with the type soundness for errors theorem, because the error that the expression raises is not a type error:

graphics/05equ62.gif

If evaluation of an expression raises a type error, then we know it cannot type check. For example,

This expression raises a type error when it attempts to apply addition to a Boolean. Hence we know that

does not hold for any Type .

The converse does not hold. An expression that does not raise a type error may still fail to statically type. For example,

does not raise a type error because the ill-typed expression is not evaluated. However, it still fails to type check, since

does not hold for any Type . (Recall $z has type xs:boolean .)

In other words, static typing is a conservative analysis ”passing static type checking guarantees the absence of type errors, but the absence of dynamic type errors does not guarantee that an expression will pass static type checking.

Type soundness has practical as well as theoretical implications. For implementers, type soundness determines how clever they can be when implementing the dynamic and static rules. If the implementation never performs static typing, then the dynamic-evaluation rules that raise type errors and all the matches judgments must be implemented. If the implementation always performs static typing, then the dynamic-evaluation rules that raise type errors and all the matches judgments need not be implemented. If the implementation optionally supports static typing, then the implementer may choose to implement the dynamic rules with type errors, knowing they are redundant, or skip evaluation of the type error rules and matches judgments when static typing is enabled. These last two implementation strategies are only possible when type soundness is guaranteed. If the type system is not sound, then an expression's type and value might not match, in which case elimination of run-time type checks of values would be disastrous.

XQuery has been carefully designed so that it satisfies these two soundness theorems. Usually, typed languages are designed to satisfy similar theorems. But there have been exceptions, where a language designed by one group of researchers and subsequently formalized by another was found not to satisfy an appropriate soundness theorem; this typically indicates some problem in the original language design.

Evaluation Order

A language specification should not specify too much and should give the implementer enough freedom to choose an efficient implementation. Fortunately, our formalism is precise yet flexible enough to express this freedom.

Consider the operator and on Boolean expressions.

An implementation should have the freedom to test the expressions in either order ”that is, it should be able to stop and return false if either expression evaluates to false without evaluating the other expression, and similarly, it should be able to raise an error if either expression raises an error.

We can easily capture these conditions in our evaluation and error rules. The first two rules for the conjunction ( and ) operator below yield false if either expression yields false , and the third rule yields true if both expressions yield true .

graphics/05equ70.gif

The next two rules for conjunction raise an error if either expression raises an error.

The rules for conjunction are the first ones we've seen with hypotheses that are not mutually exclusive, which means the result of evaluating a conjunction is non-deterministic. For example, the rules AND-LEFT-ERR and AND-RIGHT-FALSE are both applicable to an expression whose left-hand expression evaluates to false and whose right-hand expression raises an error. This choice means that an expression may either yield a value or raise an error. Consider the expression

Here is the proof that this expression may raise an error

graphics/05equ74.gif

And here is a proof that this expression may return false.

graphics/05equ75.gif

In such cases, the implementer is free to choose either behavior, but the rules make clear that these are the only two choices!

Note that the non-deterministic semantics for conjunction implies that the following two expressions are not equivalent, because they differ in their evaluation order.

The first expression can evaluate Expr and Expr ₁ in either order, but the second always evaluates Expr first, and only if it is true does it evaluate Expr ₁ . So the second expression is more constrained in its evaluation than the first; an optimizer might choose to replace the first expression by the second, but not conversely.

Normalization

We have covered the techniques used for the static and dynamic semantics in XQuery. The last step is to explain normalization. Earlier in the chapter, we explained that normalization rules transform full XQuery into a small core language, which is easier to define, implement, and optimize, and that the dynamic and static semantics are defined in terms of the core language.

Normalization is a process that takes an expression in full XQuery and returns an equivalent expression in core XQuery. Here is how we write it:

The Expr subscript indicates that any full XQuery expression can be normalized by the rule. Later in the chapter, we'll see rules that only normalize path expressions, in which case the subscript will be Path to indicate that restriction.

The subset of XQuery used in this section is so simple that none of the expressions require normalization. So to explain normalization, we add a variant of the let expression that includes a where clause ”this is a warm-up for the semantics of FLWOR, discussed in the section below entitled "FLWOR Expressions."

Here is the grammar for a full XQuery expression. It includes all the expressions we've seen so far, plus the new let expression.

Here's the normalization rule for the new let expression, which rewrites it into a simple let expression that returns the value of its return clause if the where condition is true; otherwise it returns the empty sequence.

Note that the rule above recursively applies normalization to the subexpressions Expr , Expr ₁ , and Expr ₂ , as these expressions may also contain let-where expressions.

The normalization rules for the other expressions are similar. For example, here is the normalization of the sum operator:

Even though the sum expression itself is a "core" expression (i.e., it's in our little language), its subexpressions might not be, so the rule must recursively normalize the subexpressions as well. The same holds true for the less-than operator and the other let and if-then-else expressions.

Literals and variables are the only expressions that do not normalize recursively. They simply normalize to themselves . Here's the rule for a variable:

Normalizing full XQuery into a smaller core allows us to provide useful syntactic constructs (like the where clause) without increasing the expressiveness or complexity of the language. It also makes formal specification easier because the dynamic and static semantics are defined on the smaller core language.

Finishing Getting Started

We have seen how to specify precisely the dynamic and static semantics of a tiny subset of XQuery and how to normalize a slightly larger subset. Even the small language covered required a fair bit of work, but in return for this, a number of details of evaluation have been precisely characterized: definition of variables in let expressions and lookup of variables in environments; matching of values against types; issues connected with order of evaluation; relating an expression's static type to its dynamic value; and where and what kind of errors may be raised.

The resulting semantics provides a good road-map for implementers; often, a naive implementation can be written as a straightforward transposition of the inference rules (see [PROG-LANG]). Even though most XQuery implementations will be quite complex and sophisticated, implementers can turn to the formal semantics as a good reference that is independent of any particular implementation strategy.

Now that you are familiar with the techniques and notations used in the formal semantics, you can go read XQuery 1.0 Formal Semantics [XQ-FS] or simply use it as a reference document. In the next section, we help you become familiar with the semantics of some key XQuery expressions, including FLWOR, path expressions, arithmetic, comparison, and constructor expressions. For the many simple expressions, such as sequences and conditionals, that we don't cover, you can go directly to the XQuery Formal Semantics.