11.1. Introduction to Lisp

You may have heard of Lisp as a language for artificial intelligence (AI). If you aren't into AI, don't worry. Lisp may have an unusual syntax, but many of its basic features are just like those of more conventional languages you may have seen, such as Java or Perl. We emphasize such features in this chapter. After introducing the basic Lisp concepts, we proceed by building up various example functions that you can actually use in Emacs. In order to try out the examples, you should be familiar with Emacs Lisp mode and Lisp interaction mode, which were discussed in Chapter 9.

11.1.1 Basic Lisp Entities

The basic elements in Lisp you need to be familiar with are functions, variables, and atoms. Functions are the only program units in Lisp; they cover the notions of procedures, subroutines, programs, and even operators in other languages.

Functions are defined as lists of the above entities, usually as lists of calls to other, existing functions. All functions have return values (as with Perl functions and non-void Java methods); a function's return value is simply the value of the last item in the list, usually the value returned by the last function called. A function call within another function is equivalent to a statement in other languages, and we use statement interchangeably with function call in this chapter. Here is the syntax for function:

(function-name argument1 argument2 ...)

which is equivalent to this:

method_name (argument1, argument2, ...);

in Java. This syntax is used for all functions, including those equivalent to arithmetic or comparison operators in other languages. For example, in order to add 2 and 4 in Java or Perl, you would use the expression 2 + 4, whereas in Lisp you would use the following:

(+ 2 4)

Similarly, where you would use 4 >= 2 (greater than or equal to), the Lisp equivalent is:

(>= 4 2)

Variables in Lisp are similar to those in any other language, except that they do not have types. A Lisp variable can assume any type of value (values themselves do have types, but variables don't impose restrictions on what they can hold).

Atoms are values of any type, including integers, floating point (real) numbers, characters, strings, Boolean truth values, symbols, and special Emacs types such as buffers, windows, and processes. The syntax for various kinds of atoms is:

Integers are what you would expect: signed whole numbers in the range -2²⁷ to 2²⁷-1.
Floating point numbers are real numbers that you can represent with decimal points and scientific notation (with lowercase "e" for the power of 10). For example, the number 5489 can be written 5489, 5.489e3, 548.9e1, and so on.
Characters are preceded by a question mark, for example, ?a. Esc, Newline, and Tab are abbreviated \e, \n, and \t respectively; other control characters are denoted with the prefix \C-, so that (for example) C-a is denoted as ?\C-a.^[3]
^[3] Integers are also allowed where characters are expected. The ASCII code is used on most machines. For example, the number 65 is interpreted as the character A on such a machine.
Strings are surrounded by double quotes; quote marks and backslashes within strings need to be preceded by a backslash. For example, "Jane said, \"See Dick run.\"" is a legal string. Strings can be split across multiple lines without any special syntax. Everything until the closing quote, including all the line breaks, is part of the string value.
Booleans use t for true and nil for false, though most of the time, if a Boolean value is expected, any non-nil value is assumed to mean true. nil is also used as a null or nonvalue in various situations, as we will see.
Symbols are names of things in Lisp, for example, names of variables or functions. Sometimes it is important to refer to the name of something instead of its value, and this is done by preceding the name with a single quote ('). For example, the define-key function, described in Chapter 10, uses the name of the command (as a symbol) rather than the command itself.

A simple example that ties many of these basic Lisp concepts together is the function setq.^[4] As you may have figured out from previous chapters, setq is a way of assigning values to variables, as in

^[4] We hope that Lisp purists will forgive us for calling setq a function, for the sake of simplicity, rather than a form, which it technically is.

(setq auto-save-interval 800)

Notice that setq is a function, unlike in other languages in which special syntax such as = or := is used for assignment. setq takes two arguments: a variable name and a value. In this example, the variable auto-save-interval (the number of keystrokes between auto-saves) is set to the value 800.

setq can actually be used to assign values to multiple variables, as in

(setq thisvar thisvalue      thatvar thatvalue      theothervar theothervalue)

The return value of setq is simply the last value assigned, in this case theothervalue. You can set the values of variables in other ways, as we'll see, but setq is the most widely applicable.

11.1.2 Defining Functions

Now it's time for an example of a simple function definition. Start Emacs without any arguments; this puts you into the *scratch* buffer, an empty buffer in Lisp interaction mode (see Chapter 9), so that you can actually try this and subsequent examples.

Before we get to the example, however, some more comments on Lisp syntax are necessary. First, you will notice that the dash (-) is used as a "break" character to separate words in names of variables, functions, and so on. This practice is simply a widely used Lisp programming convention; thus the dash takes the place of the underscore (_) in languages like C and Ada. A more important issue has to do with all of the parentheses in Lisp code. Lisp is an old language that was designed before anyone gave much thought to language syntax (it was still considered amazing that you could use any language other than the native processor's binary instruction set), so its syntax is not exactly programmer-friendly. Yet Lisp's heavy use of lists and thus its heavy use of parentheses has its advantages, as we'll see toward the end of this chapter.

The main problem a programmer faces is how to keep all the parentheses balanced properly. Compounding this problem is the usual programming convention of putting multiple right parentheses at the end of a line, rather than the more readable technique of placing each right parenthesis directly below its matching left parenthesis. Your best defense against this is the support the Emacs Lisp modes give you, particularly the Tab key for proper indentation and the flash-matching-parenthesis feature.

Now we're ready for our example function. Suppose you are a student or journalist who needs to keep track of the number of words in a paper or story you are writing. Emacs has no built-in way of counting the number of words in a buffer, so we'll write a Lisp function that does the job:

 1  (defun count-words-buffer ( ) 2    (let ((count 0)) 3      (save-excursion 4        (goto-char (point-min)) 5        (while (< (point) (point-max)) 6          (forward-word 1) 7          (setq count (1+ count))) 8        (message "buffer contains %d words." count))))

Let's go through this function line by line and see what it does. (Of course, if you are trying this in Emacs, don't type the line numbers in.)

The defun on line 1 defines the function by its name and arguments. Notice that defun is itself a function one that, when called, defines a new function. (defun returns the name of the function defined, as a symbol.) The function's arguments appear as a list of names inside parentheses; in this case, the function has no arguments. Arguments can be made optional by preceding them with the keyword &optional. If an argument is optional and not supplied when the function is called, its value is assumed to be nil.

Line 2 contains a let construct, whose general form is:

(let ((var1 value1) (var2 value2) ... )   statement-block)

The first thing let does is define the variables var1, var2, etc., and set them to the initial values value1, value2, etc. Then let executes the statement block, which is a sequence of function calls or values, just like the body of a function.

It is useful to think of let as doing three things:

Defining (or declaring) a list of variables
Setting the variables to initial values, as if with setq
Creating a block in which the variables are known; the let block is known as the scope of the variables

If a let is used to define a variable, its value can be reset later within the let block with setq. Furthermore, a variable defined with let can have the same name as a global variable; all setqs on that variable within the let block act on the local variable, leaving the global variable undisturbed. However, a setq on a variable that is not defined with a let affects the global environment. It is advisable to avoid using global variables as much as possible because their names might conflict with those of existing global variables and therefore your changes might have unexpected and inexplicable side effects later on.

So, in our example function, we use let to define the local variable count and initialize it to 0. As we will see, this variable is used as a loop counter.

Lines 3 through 8 are the statements within the let block. The first of these calls the built-in Emacs function save-excursion, which is a way of being polite. The function is going to move the cursor around the buffer, so we don't want to disorient the user by jumping them to a strange place in their file just because they asked for a word count. Calling save-excursion tells Emacs to remember the location of cursor at the beginning of the function, and go back there after executing any statements in its body. Notice how save-excursion is providing us with capability similar to let; you can think of it as a way of making the cursor location itself a local variable.

Line 4 calls goto-char. The argument to goto-char is a (nested) function call to the built-in function point-min. As we have mentioned before, point is Emacs's internal name for the position of the cursor, and we'll refer to the cursor as point throughout the remainder of this chapter. point-min returns the value of the first character position in the current buffer, which is almost always 1; then, goto-char is called with the value 1, which has the effect of moving point to the beginning of the buffer.

The next line sets up a while loop; Java and Perl have a similar construct. The while construct has the general form

   (while condition     statement-block)

Like let and save-excursion, while sets up another statement block. condition is a value (an atom, a variable, or a function returning a value). This value is tested; if it is nil, the condition is considered to be false, and the while loop terminates. If the value is other than nil, the condition is considered to be true, the statement block gets executed, the condition is tested again, and the process repeats.

Of course, it is possible to write an infinite loop. If you write a Lisp function with a while loop and try running it, and your Emacs session hangs, chances are that you have made this all-too-common mistake; just type C-g to abort it.

In our sample function, the condition is the function <, which is a less-than function with two arguments, analogous to the < operator in Java or Perl. The first argument is another function that returns the current character position of point; the second argument returns the maximum character position in the buffer, that is, the length of the buffer. The function < (and other relational functions) return a Boolean value, t or nil.

The loop's statement block consists of two statements. Line 6 moves point forward one word (i.e., as if you had typed M-f). Line 7 increments the loop counter by 1; the function 1+ is shorthand for (+ 1 variable-name). Notice that the third right parenthesis on line 7 matches the left parenthesis preceding while. So, the while loop causes Emacs to go through the current buffer a word at a time while counting the words.

The final statement in the function uses the built-in function message to print a message in the minibuffer saying how many words the buffer contains. The form of the message function will be familiar to C programmers. The first argument to message is a format string, which contains text and special formatting instructions of the form %x, where x is one of a few possible letters. For each of these instructions, in the order in which they appear in the format string, message reads the next argument and tries to interpret it according to the letter after the percent sign. Table 11-1 lists meanings for the letters in the format string.

Table 11-1. Message format strings
Format string	Meaning
`%s`	String or symbol
`%c`	Character
`%d`	Integer
`%e`	Floating point in scientific notation
`%f`	Floating point in decimal-point notation
`%g`	Floating point in whichever format yields the shortest string

For example:

(message "\"%s\" is a string, %d is a number, and %c is a character"           "hi there" 142 ?q)

causes the message:

"hi there" is a string, 142 is a number, and q is a character

to appear in the minibuffer. This is analogous to the C code:

printf ("\"%s\" is a string, %d is a number, and %c is a character\n",          "hi there", 142, 'q');

The floating-point-format characters are a bit more complicated. They assume a certain number of significant digits unless you tell them otherwise. For example, the following:

(message "This book was printed in %f, also known as %e." 2004 2004)

yields this:

This book was printed in 2004.000000, also known as 2.004000e+03.

But you can control the number of digits after the decimal point by inserting a period and the number of digits desired between the % and the e, f, or g. For example, this:

(message "This book was printed in %.3e, also known as %.0f." 2004 2004)

prints in the minibuffer:

This book was printed in 2.004e+03, also known as 2004.

11.1.3 Turning Lisp Functions into Emacs Commands

The count-words-buffer function that we've just finished works, but it still isn't as convenient to use as the Emacs commands you work with daily. If you have typed it in, try it yourself. First you need to get Emacs to evaluate the lines you typed in, thereby actually defining the function. To do this, move your cursor to just after the last closing parenthesis in the function and type C-j (or Linefeed) the "evaluate" key in Lisp interaction mode to tell Emacs to perform the function definition. You should see the name of the function appear again in the buffer; the return value of the defun function is the symbol that has been defined. (If instead you get an error message, double check that your function looks exactly like the example and that you haven't typed in the line numbers, and try again.)

Once the function is defined, you can execute it by typing (count-words-buffer) on its own line in your Lisp interaction window, and once again typing C-j after the closing parenthesis.

Now that you can execute the function correctly from a Lisp interaction window, try executing the function with M-x, as with any other Emacs command. Try typing M-x count-words-buffer Enter: you will get the error message [No match]. (You can type C-g to cancel this failed attempt.) You get this error message because you need to "register" a function with Emacs to make it available for interactive use. The function to do this is interactive, which has the form:

(interactive "prompt-string")

This statement should be the first in a function, that is, right after the line containing the defun and the documentation string (which we will cover shortly). Using interactive causes Emacs to register the function as a command and to prompt the user for the arguments declared in the defun statement. The prompt string is optional.

The prompt string has a special format: for each argument you want to prompt the user for, you provide a section of prompt string. The sections are separated by newlines (\n). The first letter of each section is a code for the type of argument you want. There are many choices; the most commonly used are listed in Table 11-2.

Table 11-2. Argument codes for interactive functions
Code	User is prompted for:
`b`	Name of an existing buffer
`e`	Event (mouse action or function key press)
`f`	Name of an existing file
`n`	Number (integer)
`s`	String
	Most of these have uppercase variations
`B`	Name of a buffer that may not exist
`F`	Name of a file that may not exist
`N`	Number, unless command is invoked with a prefix argument, in which case use the prefix argument and skip this prompt
`S`	Symbol

With the b and f options, Emacs signals an error if the buffer or file given does not already exist. Another useful option to interactive is r, which we will see later. There are many other option letters; consult the documentation for function interactive for the details. The rest of each section is the actual prompt that appears in the minibuffer.

The way interactive is used to fill in function arguments is somewhat complicated and best explained through an example. A simple example is in the function goto-percent, which we will see shortly. It contains the statement

(interactive "nPercent: ")

The n in the prompt string tells Emacs to prompt for an integer; the string Percent: appears in the minibuffer.

As a slightly more complicated example, let's say we want to write our own version of the replace-string command. Here's how we would do the prompting:

(defun replace-string (from to)   (interactive "sReplace string: \nsReplace string %s with: ")   ...)

The prompt string consists of two sections, sReplace string: and sReplace string %s with:, separated by a Newline. The initial s in each means that a string is expected; the %s is a formatting operator (as in the previous message function) that Emacs replaces with the user's response to the first prompt. When applying formatting operators in a prompt, it is as if message has been called with a list of all responses read so far, so the first formatting operator is applied to the first response, and so on.

When this command is invoked, first the prompt Replace string: appears in the minibuffer. Assume the user types fred in response. After the user presses Enter, the prompt Replace fred with: appears. The user types the replacement string and presses Enter again.

The two strings the user types are used as values of the function arguments from and to (in that order), and the command runs to completion. Thus, interactive supplies values to the function's arguments in the order of the sections of the prompt string.

The use of interactive does not preclude calling the function from other Lisp code; in this case, the calling function needs to supply values for all arguments. For example, if we were interested in calling our version of replace-string from another Lisp function that needs to replace all occurrences of "Bill" with "Deb" in a file, we would use

(replace-string "Bill" "Deb")

The function is not being called interactively in this case, so the interactive statement has no effect; the argument from is set to "Bill," and to is set to "Deb."

Getting back to our count-words-buffer command: it has no arguments, so its interactive command does not need a prompt string. The final modification we want to make to our command is to add a documentation string (or doc string for short), which is shown by online help facilities such as describe-function (C-h f). Doc strings are normal Lisp strings; they are optional and can be arbitrarily many lines long, although, by convention, the first line is a terse, complete sentence summarizing the command's functionality. Remember that any double quotes inside a string need to be preceded by backslashes.

With all of the fixes taken into account, the complete function looks like this:

(defun count-words-buffer ( )   "Count the number of words in the current buffer;  print a message in the minibuffer with the result."   (interactive)   (save-excursion     (let ((count 0))       (goto-char (point-min))       (while (< (point) (point-max))         (forward-word 1)         (setq count (1+ count)))       (message "buffer contains %d words." count))))

11.1.1 Basic Lisp Entities

11.1.2 Defining Functions

Table 11-1. Message format strings

11.1.3 Turning Lisp Functions into Emacs Commands

Table 11-2. Argument codes for interactive functions