The Benefits of a Formal Semantics
In their book
The Definition of Standard ML
, Robin Milner, Mads Tofte, and Robert Harper have the following to say about the value of
describing a programming language:
A precise description of a programming language is a prerequisite for its implementation and for its use. The description can take on many forms, each suited to a different purpose. A common form is a reference manual, which is usually a careful narrative description of each construct of the language, often
up with a formal presentation of the grammar (for example, in Backus-Naur form). This gives the programmer enough understanding for many of his purposes. But it is ill-suited for use by an implementer, or by someone who wants to
laws for equivalence of programs, or by a programmer who wants to design programs with mathematical rigor.
This document is a formal description of both the
of a language which is both designed for large projects and widely used. As such it aims to serve the whole community of people seriously
with the language. At a time when it is increasingly
that programs must withstand rigorous analysis, particularly for systems where safety is critical, a
language presentation is even important for
and contractors; for a robust program written in an
language is like a house built upon sand.
The XML Query working
has taken the unusual step of writing both a
and a formal description of XQuery.
These two descriptions benefit both language designers and implementers. For language designers, two descriptions offer twice as many opportunities to expose inconsistency, errors, or bad design. History has shown that it is useful to formalize a language's semantics before the language goes into widespread use and becomes hard to change. After the Java programming language was released, several formal semantics of aspects of the language were written. Some of these semantics revealed errors in the type system, which in
could lead to security holes in browsers that run Java programs. In some cases, revisions were then made to the Java release to plug the holes that were
We thank our colleagues in the XML Query Working Group,
the coeditors of
XQuery 1.0 Formal Semantics
[XQ-FS]. Particularly important contributions were made by Denise Draper, Peter Fankhauser, and Kristoffer Rose. Kristoffer Rose also provided invaluable help in typesetting. Some material in "Values and Types" and "Matching and Subtyping" in our second main section is taken from Sim on and Wadler's paper
The Essence of
XQuery has already benefited from its formal semantics. During the design of XQuery, the working group chose to make the type system depend more closely upon the type
assigned to elements and attributes by XML Schema validation. This change was written in prose in the XQuery language document and later written in the formal semantics. The process of formalization revealed ten issues that had not been addressed by the prose description.
For language implementers, a formal description provides more details than a prose description alone, and those details are necessary to develop complete, correct, and conforming implementations. The formal semantics contains the only complete description of the XQuery static type system, so implementers of that feature depend heavily on the formal semantics. Several early
of XQuery are actually
after the formal semantics. Our own implementation of XQuery, Galax [GALAX], corresponds so closely to the formal semantics that the formal-semantics rules that define an expression and the code that implements those rules often resemble each other.
One reason that XQuery is such an interesting language to us is that implementers are pursuing many alternative implementation strategies. For example, some relational database vendors implement XQuery within relational query engines extended with XML-aware operators, and some data-integration
focus on the difficult problem of
XQuery queries into queries that can be implemented by legacy database or business-process systems. We refer you to Chapters 6 “8, which investigate the relationship between XQuery and various evaluation environments. These alternatives make the formal semantics even more
to implementers, as it codifies XQuery's semantics independently of any particular implementation strategy.
This chapter is divided into two main sections: "Getting Started with the Formal Semantics" and "Learning More about XQuery." The first section introduces the key techniques used in the formal semantics and describes the formal rules and notations used by each technique. We
the techniques on a tiny subset of XQuery ”the focus here is on learning how to read and use the formal rules rather than learning the details of the language. The second section applies the techniques introduced in the first section to XQuery itself ”the focus here is on specific XQuery expressions and how the formal rules clarify the expression's semantics and enhance the prose description.