11.5. Programming a Major Mode

After you get comfortable with Emacs Lisp programming, you may find that that "little extra something" you want Emacs to do takes the form of a major mode. In previous chapters, we covered major modes for text entry, word processor input, and programming languages. Many of these modes are quite complicated to program, so we'll provide a simple example of a major mode, from which you can learn the concepts needed to program your own. Then, in the following section, you will learn how you can customize existing major modes without changing any of the Lisp code that implements them.

We'll develop Calculator mode, a major mode for a calculator whose functionality will be familiar to you if you have used the Unix dc (desk calculator) command. It is a Reverse Polish (stack-based) calculator of the type made popular by Hewlett-Packard. After explaining some of the principal components of major modes and some interesting features of the calculator mode, we will give the mode's complete Lisp code.

11.5.1 Components of a Major Mode

A major mode has various components that integrate it into Emacs. Some are:

The symbol that is the name of the function that implements the mode
The name of the mode that appears in the mode line in parentheses
The local keymap that defines key bindings for commands in the mode
Variables and constants known only within the Lisp code for the mode
The special buffer the mode may use

Let's deal with these in order. The mode symbol is set by assigning the name of the function that implements the mode to the global variable major-mode, as in:

(setq major-mode 'calc-mode)

Similarly, the mode name is set by assigning an appropriate string to the global variable mode-name, as in:

(setq mode-name "Calculator")

The local keymap is defined using functions discussed in Chapter 10. In the case of the calculator mode, there is only one key sequence to bind (C-j), so we use a special form of the make-keymap command called make-sparse-keymap that is more efficient with a small number of key bindings. To use a keymap as the local map of a mode, we call the function use-local-map, as in:

(use-local-map calc-mode-map)

As we just saw, variables can be defined by using setq to assign a value to them, or by using let to define local variables within a function. The more "official" way to define variables is the defvar function, which allows documentation for the variable to be integrated into online help facilities such as C-h v (for describe-variable). The format is the following:

(defvar varname initial-value "description of the variable")

A variation on this is defconst, with which you can define constant values (that never change). For example:

(defconst calc-operator-regexp "[-+*/%]"   "Regular expression for recognizing operators.")

defines the regular expression to be used in searching for arithmetic operators. As you will see, we use the calc- as a prefix for the names of all functions, variables, and constants that we define for the calculator mode. Other modes use this convention; for example, all names in C++ mode begin with c++-. Using this convention is a good idea because it helps avoid potential name clashes with the thousands of other functions, variables, and so on in Emacs.

Making variables local to the mode is also desirable so that they are known only within a buffer that is running the mode.^[8] To do this, use the make-local-variable function, as in:

^[8] Unfortunately, because such variables are defined before they are made local to the mode, there is still a problem with name clashes with global variables. Therefore, it is still important to use names that aren't already used for global variables. A good strategy for avoiding this is to use variable names that start with the name of the mode.

(make-local-variable 'calc-stack)

Notice that the name of the variable, not its value, is needed; therefore a single quote precedes the variable name, turning it into a symbol.

Finally, various major modes use special buffers that are not attached to files. For example, the C-x C-b (for list-buffers) command creates a buffer called *Buffer List*. To create a buffer in a new window, use the pop-to-buffer function, as in:

(pop-to-buffer "*Calc*")

There are a couple of useful variations on pop-to-buffer. We won't use them in our mode example, but they are handy in other circumstances.

switch-to-buffer: Same as the C-x b command covered in Chapter 4; can also be used with a buffer name argument in Lisp.
set-buffer: Used only within Lisp code to designate the buffer used for editing; the best function to use for creating a temporary "work" buffer within a Lisp function.

11.5.2 More Lisp Basics: Lists

A Reverse Polish Notation calculator uses a data structure called a stack. Think of a stack as being similar to a spring-loaded dish stack in a cafeteria. When you enter a number into a RPN calculator, you push it onto the stack. When you apply an operator such as plus or minus, you pop the top two numbers off the stack, add or subtract them, and push the result back on the stack.

The list, a fundamental concept of Lisp, is a natural for implementing stacks. The list is the main concept that sets Lisp apart from other programming languages. It is a data structure that has two parts: the head and tail. These are known in Lisp jargon, for purely historical reasons, as car and cdr respectively. Think of these terms as "the first thing in the list" and "the rest of the list." The functions car and cdr, when given a list argument, return the head and tail of it, respectively.^[9] Two functions are often used for making lists. cons (construct) takes two arguments, which become the head and tail of the list respectively. list takes a list of elements and makes them into a list. For example, this:

^[9] Experienced Lisp programmers should note that Emacs Lisp does not supply standard contractions like cadr, cdar, and so on.

(list 2 3 4 5)

makes a list of the numbers from 2 to 5, and this:

(cons 1 (list 2 3 4 5))

makes a list of the numbers from 1 to 5. car applied to that list would return 1, while cdr would return the list (2 3 4 5).

These concepts are important because stacks, such as that used in the calculator mode, are easily implemented as lists. To push the value of x onto the stack calc-stack, we can just say this:

(setq calc-stack (cons x calc-stack))

If we want to get at the value at the top of the stack, the following returns that value:

(car calc-stack)

To pop the top value off the stack, we say this:

(setq calc-stack (cdr calc-stack))

Bear in mind that the elements of a list can be anything, including other lists. (This is why a list is called a recursive data structure.) In fact (ready to be confused?) just about everything in Lisp that is not an atom is a list. This includes functions, which are basically lists of function name, arguments, and expressions to be evaluated. The idea of functions as lists will come in handy very soon.

11.5.3 The Calculator Mode

The complete Lisp code for the calculator mode appears at the end of this section; you should refer to it while reading the following explanation. If you download or type the code in, you can use the calculator by typing M-x calc-mode Enter. You will be put in the buffer *Calc*. You can type a line of numbers and operators and then type C-j to evaluate the line. Table 11-7 lists the three commands in calculator mode

Table 11-7. Calculator mode commands
Command	Action
`=`	Print the value at the top of the stack.
`p`	Print the entire stack contents.
`c`	Clear the stack.

Blank spaces are not necessary, except to separate numbers. For example, typing this:

4 17*6-=

followed by C-j, evaluates (4 * 17) - 6 and causes the result, 62, to be printed.

The heart of the code for the calculator mode is the functions calc-eval and calc-next-token. (See the code at the end of this section for these.) calc-eval is bound to C-j in Calculator mode. Starting at the beginning of the line preceding C-j, it calls calc-next-token to grab each token (number, operator, or command letter) in the line and evaluate it.

calc-next-token uses a cond construct to see if there is a number, operator, or command letter at point by using the regular expressions calc-number-regexp, calc-operator-regexp, and calc-command-regexp. According to which regular expression was matched, it sets the variable calc-proc-fun to the name (symbol) of the function that should be run (either calc-push-number, calc-operate, or calc-command), and it sets tok to the result of the regular expression match.

In calc-eval, we see where the idea of a function as a list comes in. The funcall function reflects the fact that there is little difference between code and data in Lisp. We can put together a list consisting of a symbol and a bunch of expressions and evaluate it as a function, using the symbol as the function name and the expressions as arguments; this is what funcall does. In this case, the following:

(funcall calc-proc-fun tok)

treats the symbol value of calc-proc-fun as the name of the function to be called and calls it with the argument tok. Then the function does one of three things:

If the token is a number, calc-push-number pushes the number onto the stack.
If the token is an operator, calc-operate performs the operation on the top two numbers on the stack (see below).
If the token is a command, calc-command performs the appropriate command.

The function calc-operate takes the idea of functions as lists of data a step further by converting the token from the user directly into a function (an arithmetic operator). This step is accomplished by the function read, which takes a character string and converts it into a symbol. Thus, calc-operate uses funcall and read in combination as follows:

(defun calc-operate (tok)   (let ((op1 (calc-pop))         (op2 (calc-pop)))     (calc-push (funcall (read tok) op2 op1))))

This function takes the name of an arithmetic operator (as a string) as its argument. As we saw earlier, the string tok is a token extracted from the *Calc* buffer, in this case, an arithmetic operator such as + or *. The calc-operate function pops the top two arguments off the stack by using the pop function, which is similar to the use of cdr earlier. read converts the token to a symbol, and thus to the name of an arithmetic function. So, if the operator is +, then funcall is called as here:

(funcall '+ op2 op1)

Thus, the function + is called with the two arguments, which is exactly equivalent to simply (+ op2 op1). Finally, the result of the function is pushed back onto the stack.

All this voodoo is necessary so that, for example, the user can type a plus sign and Lisp automatically converts it into a plus function. We could have done the same thing less elegantly and less efficiently by writing calc-operate with a cond construct (as in calc-next-token), which would look like this:

(defun calc-operate (tok)   (let ((op1 (calc-pop))         (op2 (calc-pop)))     (cond ((equal tok "+")            (+ op2 op1))           ((equal tok "-")            (- op2 op1))           ((equal tok "*")            (* op2 op1))           ((equal tok "/")            (/ op2 op1))           (t                  (% op2 op1)))))

The final thing to notice in the calculator mode code is the function calc-mode, which starts the mode. It creates (and pops to) the *Calc* buffer. Then it kills all existing local variables in the buffer, initializes the stack to nil (empty), and creates the local variable calc-proc-fun (see the earlier discussion). Finally it sets Calculator mode as the major mode, sets the mode name, and activates the local keymap.

11.5.4 Lisp Code for the Calculator Mode

Now you should be able to understand all of the code for the calculator mode. You will notice that there really isn't that much code at all! This is testimony to the power of Lisp and the versatility of built-in Emacs functions. Once you understand how this mode works, you should be ready to start rolling your own. Without any further ado, here is the code:

;;    Calculator mode. ;;     ;;    Supports the operators +, -, *, /, and % (remainder). ;;    Commands: ;;    c       clear the stack ;;    =       print the value at the top of the stack ;;    p       print the entire stack contents ;; (defvar calc-mode-map nil   "Local keymap for calculator mode buffers.") ; set up the calculator mode keymap with  ; C-j (linefeed) as "eval" key (if calc-mode-map     nil   (setq calc-mode-map (make-sparse-keymap))   (define-key calc-mode-map "\C-j" 'calc-eval)) (defconst calc-number-regexp    "-?\\([0-9]+\\.?\\|\\.\\)[0-9]*\\(e[0-9]+\\)?"   "Regular expression for recognizing numbers.") (defconst calc-operator-regexp "[-+*/%]"   "Regular expression for recognizing operators.") (defconst calc-command-regexp "[c=ps]"   "Regular expression for recognizing commands.") (defconst calc-whitespace "[ \t]"   "Regular expression for recognizing whitespace.") ;; stack functions (defun calc-push (num)   (if (numberp num)       (setq calc-stack (cons num calc-stack)))) (defun calc-top ( )   (if (not calc-stack)       (error "stack empty.")     (car calc-stack)))        (defun calc-pop ( )   (let ((val (calc-top)))     (if val       (setq calc-stack (cdr calc-stack)))     val))        ;; functions for user commands: (defun calc-print-stack ( )   "Print entire contents of stack, from top to bottom."   (if calc-stack       (progn         (insert "\n")         (let ((stk calc-stack))           (while calc-stack             (insert (number-to-string (calc-pop)) " "))           (setq calc-stack stk)))     (error "stack empty."))) (defun calc-clear-stack ( )   "Clear the stack."   (setq calc-stack nil)   (message "stack cleared.")) (defun calc-command (tok)   "Given a command token, perform the appropriate action."   (cond ((equal tok "c")          (calc-clear-stack))         ((equal tok "=")          (insert "\n" (number-to-string (calc-top))))         ((equal tok "p")          (calc-print-stack))         (t          (message (concat "invalid command: " tok))))) (defun calc-operate (tok)   "Given an arithmetic operator (as string), pop two numbers  off the stack, perform operation tok (given as string), push the result onto the stack."   (let ((op1 (calc-pop))         (op2 (calc-pop)))     (calc-push (funcall (read tok) op2 op1)))) (defun calc-push-number (tok)   "Given a number (as string), push it (as number)  onto the stack."   (calc-push (string-to-number tok))) (defun calc-invalid-tok (tok)   (error (concat "Invalid token: " tok)) (defun calc-next-token ( )   "Pick up the next token, based on regexp search. As side effects, advance point one past the token,  and set name of function to use to process the token."   (let (tok)     (cond ((looking-at calc-number-regexp)            (goto-char (match-end 0))            (setq calc-proc-fun 'calc-push-number))           ((looking-at calc-operator-regexp)            (forward-char 1)            (setq calc-proc-fun 'calc-operate))           ((looking-at calc-command-regexp)            (forward-char 1)            (setq calc-proc-fun 'calc-command))           ((looking-at ".")                   (forward-char 1)            (setq calc-proc-fun 'calc-invalid-tok)))     ;; pick up token and advance past it (and past whitespace)     (setq tok (buffer-substring (match-beginning 0) (point)))     (if (looking-at calc-whitespace)       (goto-char (match-end 0)))     tok))      (defun calc-eval ( )   "Main evaluation function for calculator mode. Process all tokens on an input line."   (interactive)   (beginning-of-line)   (while (not (eolp))     (let ((tok (calc-next-token)))       (funcall calc-proc-fun tok)))   (insert "\n")) (defun calc-mode ( )   "Calculator mode, using H-P style postfix notation. Understands the arithmetic operators +, -, *, / and %,  plus the following commands:     c   clear stack     =   print top of stack     p   print entire stack contents (top to bottom) Linefeed (C-j) is bound to an evaluation function that  will evaluate everything on the current line. No  whitespace is necessary, except to separate numbers."   (interactive)   (pop-to-buffer "*Calc*" nil)   (kill-all-local-variables)   (make-local-variable 'calc-stack)   (setq calc-stack nil)   (make-local-variable 'calc-proc-fun)   (setq major-mode 'calc-mode)   (setq mode-name "Calculator")   (use-local-map calc-mode-map))

The following are some possible extensions to the calculator mode, offered as exercises. If you try them, you will increase your understanding of the mode's code and Emacs Lisp programming in general.

Add an operator ^ for "power" (4 5 ^ evaluates to 1024). There is no built-in power function in Emacs Lisp, but you can use the built-in function expt.
Add support for octal (base 8) and/or hexadecimal (base 16) numbers. An octal number has a leading "0," and a hexadecimal has a leading "0x"; thus, 017 equals decimal 15, and 0x17 equals decimal 23.
Add operators \+ and \* to add/multiply all of the numbers on the stack, not just the top two (e.g., 4 5 6 \+ evaluates to 15, and 4 5 6 \* evaluates to 120).^[10]
^[10] APL programmers will recognize these as variations of that language's "scan" operators.
As an additional test of your knowledge of list handling in Lisp, complete the example (Example 5) from earlier in this chapter that searches compilation-error-regexp-alist for a match to a compiler error message. (Hint: make a copy of the list, then pick off the top element repeatedly until either a match is found or the list is exhausted.)

11.5.1 Components of a Major Mode

11.5.2 More Lisp Basics: Lists

11.5.3 The Calculator Mode

Table 11-7. Calculator mode commands

11.5.4 Lisp Code for the Calculator Mode