10.1 First-Order Logic

The formal syntax of first-order logic is somewhat more complicated than that of propositional logic. The analogue in first-order logic of the set of primitive propositions is the (first-order) vocabulary 풯, which consists of relation symbols, function symbols, and constant symbols. Each relation symbol and function symbol in 풯 has some arity, which intuitively corresponds to the number of arguments it takes. If the arity is k, then the symbol is k-ary. In the earlier examples, Alice and Bob are constant symbols, American is a relation symbol of arity 1, Taller is a relation symbol of arity 2, and Father is a function symbol of arity 1. Because American is a relation symbol of arity 1, it does not make sense to write American(Alice, Bob): American takes only one argument. Similarly, it does not make sense to write Taller(Alice): Taller has arity 2 and takes two arguments. Intuitively, a relation symbol of arity 1 describes a property of an individual (is she an American or not?), a 2-ary relation symbol describes a relation between a pair of individuals, and so on. An example of a 3-ary relation symbol might be Parents(a, b, c): a and b are the parents of c. (1-ary, 2-ary, and 3-ary relations are usually called unary, binary, and ternary relations, respectively, and similarly for functions.)

Besides the symbols in the vocabulary, there is an infinite supply of variables, which are usually denoted x and y, possibly with subscripts. Constant symbols and variables are both used to denote individuals. More complicated terms denoting individuals can be formed by using function symbols. Formally, the set of terms is formed by starting with variables and constant symbols, and closing off under function application, so that if f is a k-ary function symbol and t₁, …, t_k are terms, then f(t₁, …, t_k) is a term. Terms are used in formulas. An atomic formula is either of the form P(t₁, …, t_k), where P is a k-ary relation symbol and t₁, …, t_k are terms, or of the form t₁ = t₂, where t₁ and t₂ are terms. Just as in propositional logic, more complicated formulas can be formed by closing off under negation and conjunction, so that if φ and ψ are formulas, then so are φ and φ ∧ ψ. But first-order logic is closed under one more feature: quantification. If φ is a formula and x is a variable, then ∃xφ is also a formula; ∀xφ is an abbreviation for ∃x φ. Call the resulting language ^fo(풯), or just ^fo; just as in the propositional case, I often suppress the 풯 if it does not play a significant role.

First-order logic can be used to reason about properties of addition and multiplication. The vocabulary of number theory consists of the binary function symbols + and , and the constant symbols 0 and 1. Examples of terms in this vocabulary are 1 + (1 + 1) and (1 + 1) (1 + 1). (Although I use infix notation, writing, for example, 1 + 1 rather than +(1, 1), it should be clear that + and are binary function symbols.) The term denoting the sum of k 1s is abbreviated as k. Thus, typical formulas of number theory include 2 + 3 = 5, 2 + 3 = 6, 2 + x = 6, and ∀x∀y(x + y = y + x). Clearly the first formula should be true, given the standard interpretation of the symbols, and the second to be false. It is not clear whether the third formula should be true or not, since the value of x is unknown. Finally, the fourth formula represents the fact that addition is commutative, so it should be true under the standard interpretation of these symbols. The following semantics captures these intuitions.

Semantics is given to first-order formulas using relational structures. Roughly speaking, a relational structure consists of a set of individuals, called the domain of the structure, and a way of associating with each of the elements of the vocabulary the corresponding entities over the domain. Thus, a constant symbol is associated with an element of the domain, a function symbol is associated with a function on the domain, and so on. More precisely, fix a vocabulary 풯. A relational 풯-structure (sometimes simply called a relational structure or just a structure) 풜 consists of a nonempty domain, denoted dom(풜), an assignment of a k-ary relation P^풜 ⊆ dom(풜)^k to each k-ary relation symbol P of 풯, an assignment of a k-ary function f^풜 : dom(풜)^k → dom(풜) to each k-ary function symbol f of 풯, and an assignment of a member c^풜 of the domain to each constant symbol c. P^풜, f^풜, and c^풜 are called the denotations of P, f, and c, respectively, in 풜.

For example, suppose that 풯 consists of one binary relation symbol E. In that case, a 풯-structure is simply a directed graph. (Recall that a directed graph consists of a set of nodes, some of which are connected by directed edges going one from node to another.) The domain is the set of nodes of the graph, and the interpretation of E is the edge relation of the graph, so that there is an edge from d₁ to d₂ exactly if (d₁, d₂) ∈ E^풜. As another example, consider the vocabulary of number theory discussed earlier. One relational structure for this vocabulary is the natural numbers, where 0, 1, +, and get their standard interpretation. Another is the real numbers, where, again, all the symbols get their standard interpretation. Of course, there are many other relational structures over which these symbols can be interpreted.

A relational structure does not provide an interpretation of the variables. Technically, it turns out to be convenient to have a separate function that does this. A valuation V on a structure 풜 is a function from variables to elements of dom(풜). Recall that terms are intended to represent elements in the domain. Given a structure 풜, a valuation V on 풜 can be extended in a straightforward way to a function V^풜 (I typically omit the superscript 풜 when it is clear from context) that maps terms to elements of dom(풜), simply by defining V^풜(c) = c^풜 for each constant symbol c and then extending the definition by induction on structure to arbitrary terms, by taking V^풜(f(t₁, …, t_k)) = f^풜(V^풜(t₁), …, V^풜(t)_k)).

I next want to define what it means for a formula to be true in a relational structure. Before I give the formal definition, consider a few examples. Suppose, as before, that American is a unary relation symbol, Taller is a binary relation symbol, and Alice and Bob are constant symbols. What does it mean for American(Alice) to be true in the structure 풜? If the domain of 풜 consists of people, then the interpretation American^풜 of the relation symbol American can be thought of as the set of all American people in dom(풜). Thus American(Alice) should be true in 풜 precisely if Alice^풜 ∊ American^풜. Similarly, Taller(Alice, Bob) should be true if Alice is taller than Bob under the interpretation of Taller in 풜; that is, if (Alice^풜, Bob^풜) ∈ Taller^풜.

What about quantification? The English reading suggests that a formula such as ∀xAmerican(x) should be true in the structure 풜 if every individual in dom(풜) is American, and ∃xAmerican(x) to be true if some individual in dom(풜) is an American. The truth conditions will enforce this.

Recall that a structure does not give an interpretation to the variables. Thus, a structure 풜 does not give us enough information to decide if a formula such as Taller(Alice, x) is true. That depends on the interpretation of x, which is given by a valuation. Thus, truth is defined relative to a pair (풜, V) consisting of an interpretation and a valuation: Taller(Alice, x) is true in structure 풜 under valuation V if (V(Alice), V(x)) = (Alice^풜, V(x)) ∈ Taller^풜.

As usual, the formal definition of truth in a structure 풜 under valuation V proceeds by induction on the structure of formulas. If V is a valuation, x is a variable, and d ∈ dom(풜), let V [x/d] be the valuation V′ such that V′(y) = V(y) for every variable y except x, and V′(x) = d. Thus, V [x/d] agrees with V except possibly on x and it assigns the value d to x.

(풜, V) ⊨ P(t₁, …, t_k), where P is a k-ary relation symbol and t₁, …, t_k are terms, iff (V(t₁), …, V(t_k)) ∈ P^풜;
(풜, V) ⊨ (t₁ = t₂), where t₁ and t₂ are terms, iff V(t₁) = V(t₂);
(풜, V) ⊨ φ iff (풜, V) ⊭ φ;
(풜, V) ⊨ φ₁ ∧ φ₂ iff (풜, V) ⊨ φ₁ and (풜, V) ⊨ φ₂;
(풜, V) ⊨ ∃xφ iff (풜, V [x/d]) ⊨ φ for some d ∈ dom(풜).

Recall that ∀xφ is an abbreviation for ∃x φ. It is easy to see that (풜, V) ⊨ ∀xφ iff (풜, V [x/d]) ⊨ φ for every d ∈ dom(풜) (Exercise 10.1). ∀ essentially acts as an infinite conjunction. For suppose that ψ(x) is a formula whose only free variable is x; let ψ(c) be the result of substituting c for x in ψ; that is, ψ(c) is ψ[x/c]. I sometimes abuse notation and write (풜, V) ⊨ φ(d) for d ∈ dom(풜) rather than (풜, V [x/d]) ⊨ φ. Abusing notation still further, note that (풜, V) ⊨ ∀xφ(x) iff (풜, V) ⊨ ∧_{d ∈ D} φ(d), so ∀ acts like an infinite conjunction. Similarly, (풜, V) ⊨ ∃xφ(x) iff (풜, V) ⊨ φ_{d ∈ D} φ(d), so ∃x acts like an infinite disjunction.

Returning to the examples in the language of number theory, let 풩 be the set of natural numbers, with the standard interpretation of the symbols 0, 1, +, and . Then (풩, V) ⊭ 2 + 3 = 5, (풩, V) ⊭ 2 + 3 = 6, and (풩, V) ⊨ ∀x∀y(x + y = y + x) for every valuation V, as expected. On the other hand, (풩, V) ⊨ 2 + x = 6 iff V(x) = 4; here the truth of the formula depends on the valuation. Identical results hold if 풩 is replaced by , the real numbers, again with the standard interpretation. On the other hand, let φ be the formula ∃x(x x = 2), which says that 2 has a square root. Then (풩, V) ⊨ φ and (풩, V) ⊭ φ for all valuations V.

Notice that while the truth of the formula 2 + x = 6 depends on the valuation, this is not the case for ∃x(x x = 2) or 2 + 3 = 5. Variables were originally introduced as a crutch, as "placeholders" to describe what was being quantified. It would be useful to understand when they really are acting as placeholders. Essentially, this is the case when all the variables are "bound" by quantifiers. Thus, although the valuation is necessary in determining the truth of 2 + x = 6, it is not necessary in determining the truth of ∃x(2 + x = 6), because the x in 2 + x = 6 is bound by the quantifier ∃x.

Roughly speaking, an occurrence of a variable x in φ is bound by the quantifier ∀x in a formula such as ∀xφ or by ∃x in ∃xφ; an occurrence of a variable in a formula is free if it is not bound. (A formal definition of what it means for an occurrence of a variable to be free is given in Exercise 10.2.) A formula in which no occurrences of variables are free is called a sentence. Observe that x is free in the formula Taller(c, x), but no variables are free in the formulas American(Alice) and ∃xAmerican(x), so the latter two formulas are sentences. It is not hard to show that the valuation does not affect the truth of a sentence. That is, if φ is a sentence, and V and V′ are valuations on the structure 풜, then (풜, V) ⊨ φ iff (풜, V′) ⊨ φ (Exercise 10.2). In other words, a sentence is true or false in a structure, independent of any valuation.

Satisfiability and validity for first-order logic can be defined in a manner analogous to propositional logic: a first-order formula φ is valid in 풜, written 풜 ⊨ φ if (풜, V) ⊨ φ for all valuations V; itis valid if 풜 ⊨ φ for all structures 풜; it is satisfiable if (풜, V) ⊨ φ for some structure 풜 and some valuation V.

Just as in the propositional case, φ is valid if and only if φ is not satisfiable. There are well-known sound and complete axiomatizations of first-order logic as well. Describing the axioms requires a little notation. Suppose that φ is a first-order formula in which some occurrences of x are free. Say that a term t is substitutable in φ if there is no subformula of φ of the form ∃yψ such that the variable y occurs in t. Thus, for example, f(y) is not substitutable in φ = P(A) ∧∃y(x, Q(y)), but f(x) is substitutable in φ.If f(y) is substituted for x in φ, then the resulting formula is P(A) ∧∃y(f(y), Q(y)). Notice that the y in f(y) is then bound by ∃y.If t is substitutable in φ, let φ[x/t]be the result of substituting t for all free occurrences of x. Let AX^fo consist of Prop and MP (for propositional reasoning), together with the following axioms and inference rules:

F1. ∀x(φ → ψ) → (∀xφ →∀xψ).
F2. ∀xφ → φ[x/t], where t is substitutable in φ.
F3. φ →∀xφ if x does not occur free in φ.
F4. x = x.
F5. x = y → (φ → φ′), where φ is a quantifier-free formula and φ′ is obtained from φ by replacing zero or more occurrences of x in φ by y.

UGen. From ℓ infer ∀xφ. F1, F2, and UGen can be viewed as analogues of K1, K2, and KGen, respectively, where ∀x plays the role of K_i. This analogy can be pushed further; in particular, it follows from F3 that analogues of K4 and K5 hold for ∀x (Exercise 10.4).

Theorem 10.1.1

AX^fo is a sound and complete axiomatization of first-order logic with respect to relational structures.

Proof Soundness is straightforward (Exercise 10.5); as usual, completeness is beyond the scope of this book.

In the context of propositional modal logic, it can be shown that there is no loss of generality in restricting to finite sets of worlds, at least as far as satisfiability and validity are concerned. There are finite-model theorems that show that if a formula is satisfiable at all, then it is satisfiable in a structure with only finitely many worlds. Thus, no new axioms are added by restricting to structures with only finitely many worlds. The situation is quite different in the first-order case. While there is no loss of generality in restricting to countable domains (at least, as far as satisfiability and validity are concerned), restricting to finite domains results in new axioms, as the following example shows:

Example 10.1.2

Suppose that 풯 consists of the constant symbol c and the unary function symbol f. Let φ be the following formula:

The first conjunct says that f is one-to-one; the second says that c is not in the range of f. It is easy to see that φ is satisfiable in the natural numbers: take c to be 0 and f to be the successor function (so that f(x) = x + 1). However, f is not satisfiable in a relational structure with a finite domain. For suppose that 풜 ⊨ φ for some relational structure 풜. (Since φ is a sentence, there is no need to mention the valuation.) An easy induction on k shows that c^풜, f^풜(c^풜), f^풜(f^풜(c^풜)), …, (f^풜)^k(c^풜) must all be distinct (Exercise 10.6). Thus, dom(풜) cannot be finite. It follows that φ is valid in relational structures with finite domains, although it is not valid in all relational structures (and hence is not provable in AX^fo).

Are there some reasonable axioms that can be added to AX^fo to obtain a complete axiomatization of first-order logic in finite relational structures? Somewhat surprisingly, the answer is no. The set of first-order formulas valid in finite structures is not recursively enumerable, that is, there is no program that will generate all and only the valid formulas. It follows that there cannot be a finite (or even recursively enumerable) axiom system that is sound and complete for first-order logic over finite structures. Essentially this says that there is no easy way to characterize finite domains in first-order logic. (By way of contrast, the set of formulas valid in all relational structures—finite or infinite—is recursively enumerable.)

Interestingly, in bounded domains (i.e., relational structures whose domain has cardinality at most N, for some fixed natural number N), there is a complete axiomatization. The following axiom characterizes structures whose domains have cardinality at most N, in that it is true in a structure 풜 iff dom(풜) has cardinality at most N (Exercise 10.7):

Let AX^fo_N be AX^fo together with FIN_N.

Theorem 10.1.3

AX^fo_N is a sound and complete axiomatization of first-order logic with respect to relational structures whose domain has cardinality at most N.

Proof Soundness is immediate from Exercises 10.5 and 10.7. Completeness is beyond the scope of this book (although it is in fact significantly easier to prove in the bounded case than in the unbounded case).

Propositional logic can be viewed as a very limited fragment of first-order logic, one without quantification, using only unary relations, and mentioning only one constant. Consider the propositional language ^Prop(Φ). Corresponding to Φ is the first-order vocabulary Φ* consisting of a unary relation symbol p* for every primitive proposition p in Φ and a constant symbol a. To every propositional formula φ in ^Prop(Φ), there is a corresponding first-order formula φ* over the vocabulary Φ* that results by replacing occurrences of a primitive proposition p in φ by the formula p*(a). Thus, for example, (p ∧ q)* is p*(a) ∧ q*(a). Intuitively, φ and φ* express the same proposition. More formally, there is a mapping associating with each truth assignment v over Φ a relational structure 풜_v over Φ*, where the domain of 풜_v consists of one element d, which is the interpretation of the constant symbol a, and

Proposition 10.1.4

For every propositional formula φ,

v ⊨ φ if and only if 풜_v ⊨ φ*;
φ is valid if and only if φ* is valid;
φ is satisfiable if and only if φ* is satisfiable.

Proof See Exercise 10.8.

Given that propositional logic is essentially a fragment of first-order logic, why is propositional logic of interest? Certainly, as a pedagogical matter, it is sometimes useful to focus on purely propositional formulas, without the overhead of functions, relations, and quantification. But there is a more significant reason. As I wrote in Chapters 1 and 7, increased expressive power comes at a price. For example, there is no algorithm for deciding whether a first-order formula is satisfiable. (Technically, this problem is undecidable.) It is easy to construct algorithms to check whether a propositional formula is satisfiable. (Technically, this problem is NP-complete, but that is much better than being undecidable!) If a problem can be modeled well using propositional logic, then it is worth sticking to propositional logic, rather than moving to first-order logic.

Not only can propositional logic be viewed as a fragment of first-order logic, but propositional epistemic logic can too (at least, as long as the language does not include common knowledge). Indeed, there is a translation of propositional epistemic logic that shows that, in a sense, the axioms for K_i can be viewed as consequences of the axioms for ∀x, although it is beyond the scope of this book to go into details (see the notes to this chapter for references).

Although first-order logic is more expressive than propositional logic, it is certainly far from the last word in expressive power. It can be extended in many ways. One way is to consider second-order logic. In first-order logic, there is quantification over individuals in the domain. Second-order logic allows, in addition, quantification over functions and predicates. Second-order logic is very expressive. For example, the induction axiom can be expressed in second-order logic using the language of number theory. If x is a variable ranging over natural numbers (the individuals in the domain) and P is a variable ranging over unary predicates, then the induction axiom becomes

This says that if a unary predicate P holds for 0 and holds for n + 1 whenever it holds for n, then it must hold for all the natural numbers. In this book, I do not consider second-order logic. Although it is very powerful, the increase in power does not seem that useful for reasoning about uncertainty.

Another way in which first-order logic can be extended is by allowing more general notions of quantification than just universal and existential quantifiers. For example, there can be a quantifier H standing for "at least half," so that a formula such as Hxφ(x) is true (at least in a finite domain) if at least half the elements in the domain satisfy φ. While I do not consider generalized quantifiers here, it turns out that some generalized quantifiers (such as "at least half") can in fact be captured in some of the extensions of first-order logic that I consider in Section 10.3.

Yet a third way to extend first-order logic is to add modalities, just as in propositional logic. That is the focus of this chapter.