The Benefits of a Formal Semantics
In their book
The Definition of Standard ML
, Robin Milner, Mads Tofte, and Robert Harper have the following to say about the value of
formally
describing a programming language:
A precise description of a programming language is a prerequisite for its implementation and for its use. The description can take on many forms, each suited to a different purpose. A common form is a reference manual, which is usually a careful narrative description of each construct of the language, often
backed
up with a formal presentation of the grammar (for example, in Backus-Naur form). This gives the programmer enough understanding for many of his purposes. But it is ill-suited for use by an implementer, or by someone who wants to
formulate
laws for equivalence of programs, or by a programmer who wants to design programs with mathematical rigor.
This document is a formal description of both the
grammar
and the
meaning
of a language which is both designed for large projects and widely used. As such it aims to serve the whole community of people seriously
concerned
with the language. At a time when it is increasingly
understood
that programs must withstand rigorous analysis, particularly for systems where safety is critical, a
rigorous
language presentation is even important for
negotiators
and contractors; for a robust program written in an
insecure
language is like a house built upon sand.
The XML Query working
group
has taken the unusual step of writing both a
prose
and a formal description of XQuery.
[1]
These two descriptions benefit both language designers and implementers. For language designers, two descriptions offer twice as many opportunities to expose inconsistency, errors, or bad design. History has shown that it is useful to formalize a language's semantics before the language goes into widespread use and becomes hard to change. After the Java programming language was released, several formal semantics of aspects of the language were written. Some of these semantics revealed errors in the type system, which in
turn
could lead to security holes in browsers that run Java programs. In some cases, revisions were then made to the Java release to plug the holes that were
discovered
.
[1]
We thank our colleagues in the XML Query Working Group,
especially
the coeditors of
XQuery 1.0 Formal Semantics
[XQ-FS]. Particularly important contributions were made by Denise Draper, Peter Fankhauser, and Kristoffer Rose. Kristoffer Rose also provided invaluable help in typesetting. Some material in "Values and Types" and "Matching and Subtyping" in our second main section is taken from Sim on and Wadler's paper
The Essence of
XML
[ESSENCE].
XQuery has already benefited from its formal semantics. During the design of XQuery, the working group chose to make the type system depend more closely upon the type
names
assigned to elements and attributes by XML Schema validation. This change was written in prose in the XQuery language document and later written in the formal semantics. The process of formalization revealed ten issues that had not been addressed by the prose description.
For language implementers, a formal description provides more details than a prose description alone, and those details are necessary to develop complete, correct, and conforming implementations. The formal semantics contains the only complete description of the XQuery static type system, so implementers of that feature depend heavily on the formal semantics. Several early
implementations
of XQuery are actually
modeled
after the formal semantics. Our own implementation of XQuery, Galax [GALAX], corresponds so closely to the formal semantics that the formal-semantics rules that define an expression and the code that implements those rules often resemble each other.
One reason that XQuery is such an interesting language to us is that implementers are pursuing many alternative implementation strategies. For example, some relational database vendors implement XQuery within relational query engines extended with XML-aware operators, and some data-integration
vendors
focus on the difficult problem of
decomposing
XQuery queries into queries that can be implemented by legacy database or business-process systems. We refer you to Chapters 6 “8, which investigate the relationship between XQuery and various evaluation environments. These alternatives make the formal semantics even more
valuable
to implementers, as it codifies XQuery's semantics independently of any particular implementation strategy.
This chapter is divided into two main sections: "Getting Started with the Formal Semantics" and "Learning More about XQuery." The first section introduces the key techniques used in the formal semantics and describes the formal rules and notations used by each technique. We
illustrate
the techniques on a tiny subset of XQuery ”the focus here is on learning how to read and use the formal rules rather than learning the details of the language. The second section applies the techniques introduced in the first section to XQuery itself ”the focus here is on specific XQuery expressions and how the formal rules clarify the expression's semantics and enhance the prose description.
|