9.10. A Unification Algorithm

Unification [18] is a pattern-matching technique used in automated theorem proving, type-inference systems, computer algebra, and logic programming, e.g., Prolog [3].

A unification algorithm attempts to make two symbolic expressions equal by computing a unifying substitution for the expressions. A substitution is a function that replaces variables with other expressions. A substitution must treat all occurrences of a variable the same way, e.g., if it replaces one occurrence of the variable x by a, it must replace all occurrences of x by a. A unifying substitution, or unifier, for two expressions e₁ and e₂ is a substitution, σ, such that σ(e₁) = σ(e₂).

For example, the two expressions f(x) and f(y) can be unified by substituting x for y (or y for x). In this case, the unifier σ could be described as the function that replaces y with x and leaves other variables unchanged. On the other hand, the two expressions x + 1 and y + 2 cannot be unified. It might appear that substituting 3 for x and 2 for y would make both expressions equal to 4 and hence equal to each other. The symbolic expressions, 3 + 1 and 2 + 2, however, still differ.

Two expressions may have more than one unifier. For example, the expressions

f(x; y) and f(1; y) can be unified to f(1; y) with the substitution of 1 for x. They may also be unified to f(1; 5) with the substitution of 1 for x and 5 for y. The first substitution is preferable, since it does not commit to the unnecessary replacement of y. Unification algorithms typically produce the most general unifier, or mgu, for two expressions. The mgu for two expressions makes no unnecessary substitutions; all other unifiers for the expressions are special cases of the mgu. In the example above, the first substitution is the mgu and the second is a special case.

For the purposes of this program, a symbolic expression can be a variable, a constant, or a function application. Variables are represented by Scheme symbols, e.g., x; a function application is represented by a list with the function name in the first position and its arguments in the remaining positions, e.g., (f x); and constants are represented by zero-argument functions, e.g., (a).

The algorithm presented here finds the mgu for two terms, if it exists, using a continuation passing style, or CPS (see Section 3.4), approach to recursion on subterms. The procedure unify takes two terms and passes them to a help procedure, uni, along with an initial (identity) substitution, a success continuation, and a failure continuation. The success continuation returns the result of applying its argument, a substitution, to one of the terms, i.e., the unified result. The failure continuation simply returns its argument, a message. Because control passes by explicit continuation within unify (always with tail calls), a return from the success or failure continuation is a return from unify itself.

Substitutions are procedures. Whenever a variable is to be replaced by another term, a new substitution is formed from the variable, the term, and the existing substitution. Given a term as an argument, the new substitution replaces occurrences of its saved variable with its saved term in the result of invoking the saved substitution on the argument expression. Intuitively, a substitution is a chain of procedures, one for each variable in the substitution. The chain is terminated by the initial, identity substitution.

 (unify 'x 'y) ⇒ y (unify '(f x y) '(g x y)) ⇒ "clash" (unify '(f x (h)) '(f (h) y)) ⇒ (f (h) (h)) (unify '(f (g x) y) '(f y x)) ⇒ "cycle" (unify '(f (g x) y) '(f y (g x))) ⇒ (f (g x) (g x)) (unify '(f (g x) y) '(f y z)) ⇒ (f (g x) (g x)) (define unify #f) (let ()   ;; occurs? returns true if and only if u occurs in v   (define occurs?     (lambda (u v)       (and (pair? v)            (let f ((l (cdr v)))              (and (pair? l)                   (or (eq? u (car l))                       (occurs? u (car l))                       (f (cdr l))))))))   ;; sigma returns a new substitution procedure extending s by   ;;the substitution of u with v   (define sigma     (lambda (u v s)       (lambda (x)         (let f ((x (s x)))           (if (symbol? x)               (if (eq? x u) v x)               (cons (car x) (map f (cdr x))))))))   ;; try-subst tries to substitute u for v but may require a   ;; full unification if (s u) is not a variable, and it may   ;; fail if it sees that u occurs in v.   (define try-subst     (lambda (u v s ks kf)       (let ((u (s u)))         (if (not (symbol? u))             (uni u v s ks kf)             (let ((v (s v)))               (cond                 ((eq? u v) (ks s))                 ((occurs? u v) (kf "cycle"))                 (else (ks (sigma u v s)))))))))   ;; uni attempts to unify u and v with a continuation-passing   ;;style that returns a substitution to the success argument   ;;ks or an error message to the failure argument kf. The   ;;substitution itself is represented by a procedure from   ;; variables to terms.   (define uni     (lambda (u v s ks kf)       (cond         ((symbol? u) (try-subst u v s ks kf))         ((symbol? v) (try-subst v u s ks kf))         ((and (eq? (car u) (car v))               (= (length u) (length v)))         (let f ((u (cdr u)) (v (cdr v)) (s s))           (if (null? u)               (ks s)               (uni (car u)                    (car v)                    s                    (lambda (s) (f (cdr u) (cdr v) s))                    kf))))         (else (kf "clash")))))   ;; unify shows one possible interface to uni, where the initial   ;;substitution is the identity procedure, the initial success   ;;continuation returns the unified term, and the initial failure   ;;continuation returns the error message. (set! unify   (lambda (u v)     (uni u          v          (lambda (x) x)          (lambda (s) (s u))          (lambda (msg) msg)))))

Exercise 9.10.1.

Modify unify so that it returns its substitution rather than the unified term. Apply this substitution to both input terms to verify that it returns the same result for each.

Exercise 9.10.2.

As mentioned above, substitutions on a term are performed sequentially, requiring one entire pass through the input expression for each substituted variable. Represent the substitution differently so that only one pass through the expression need be made. Make sure that substitutions are performed not only on the input expression but also on any expressions you insert during substitution.

Exercise 9.10.3.

Extend the continuation-passing style unification algorithm into an entire continuation-passing style logic programming system.