11.3. Useful Built-in Emacs Functions

Many of the Emacs functions that exist and that you may write involve searching and manipulating the text in a buffer. Such functions are particularly useful in specialized modes, like the programming language modes described in Chapter 9. Many built-in Emacs functions relate to text in strings and buffers; the most interesting ones take advantage of Emacs's regular expression facility, which we introduced in Chapter 3.

We first describe the basic functions relating to buffers and strings that don't use regular expressions. Afterwards, we discuss regular expressions in more depth than was the case in Chapter 3, concentrating on the features that are most useful to Lisp programmers, and we describe the functions that Emacs makes available for dealing with regular expressions.

11.3.1 Buffers, Text, and Regions

Table 11-4 shows some basic Emacs functions relating to buffers, text, and strings that are only useful to Lisp programmers and thus aren't bound to keystrokes. We already saw a couple of them in the count-words-buffer example. Notice that some of these are predicates, and their names reflect this.

Table 11-4. Buffer and text functions
Function	Value or action
point	Character position of point.
mark	Character position of mark.
point-min	Minimum character position (usually 1).
point-max	Maximum character position (usually size of buffer).
bolp	Whether point is at the beginning of the line (t or `nil`).
eolp	Whether point is at the end of the line.
bobp	Whether point is at the beginning of the buffer.
eobp	Whether point is at the end of the buffer.
insert	Insert any number of arguments (strings or characters) into the buffer after point.
number-to-string	Convert a numerical argument to a string.
string-to-number	Convert a string argument to a number (integer or floating point).
char-to-string	Convert a character argument to a string.
substring	Given a string and two integer indices start and end, return the substring starting after start and ending before end. Indices start at 0. For example, `(substring "appropriate" 2 5)` returns "`pro`".
aref	Array indexing function that can be used to return individual characters from strings; takes an integer argument and returns the character as an integer, using the ASCII code (on most machines). For example, `(aref` "`appropriate" 3)` returns 114, the ASCII code for `r`.

Many functions not included in the previous table deal with buffers and text, including some that you should be familiar with as user commands. Several commonly used Emacs functions use regions, which are areas of text within a buffer. When you are using Emacs, you delineate regions by setting the mark and moving the cursor. However, region-oriented functions (such as kill-region, indent-region, and shell-command-on-region really, any function with region in its name) are actually more flexible when used within Emacs Lisp code. They typically take two integer arguments that are used as the character positions of the boundaries for the region on which they operate. These arguments default to the values of point and mark when the functions are called interactively.

Obviously, allowing point and mark as interactive defaults is a more general (and thus more desirable) approach than one in which only point and mark can be used to delineate regions. The r option to the interactive function makes it possible. For example, if we wanted to write the function translate-region-into-German, here is how we would start:

(defun translate-region-into-German (start end)   (interactive "r")   ...

The r option to interactive fills in the two arguments start and end when the function is called interactively, but if it is called from other Lisp code, both arguments must be supplied. The usual way to do this is like this:

(translate-region-into-German (point) (mark))

But you need not call it in this way. If you wanted to use this function to write another function called translate-buffer-into-German, you would only need to write the following as a "wrapper":

(defun translate-buffer-into-German ( )   (translate-region-into-German (point-min) (point-max)))

In fact, it is best to avoid using point and mark within Lisp code unless doing so is really necessary; use local variables instead. Try not to write Lisp functions as lists of commands a user would invoke; that sort of behavior is better suited to macros (see Chapter 6).

11.3.2 Regular Expressions

Regular expressions (regexps) provide much more powerful ways of dealing with text. Although most beginning Emacs users tend to avoid commands that use regexps, like replace-regexp and re-search-forward, regular expressions are widely used within Lisp code. Such modes as Dired and the programming language modes would be unthinkable without them. Regular expressions require time and patience to become comfortable with, but doing so is well worth the effort for Lisp programmers, because they are one of the most powerful features of Emacs, and many things are not practical to implement in any other way.

One trick that can be useful when you are experimenting with regular expressions and trying to get the hang of them is to type some text into a scratch buffer that corresponds to what you're trying to match, and then use isearch-forward-regexp (C-M-s) to build up the regular expression. The interactive, immediate feedback of an incremental search can show you the pieces of the regular expression in action in a way that is completely unique to Emacs.

We introduce the various features of regular expressions by way of a few examples of search-and-replace situations; such examples are easy to explain without introducing lots of extraneous details. Afterward, we describe Lisp functions that go beyond simple search-and-replace capabilities with regular expressions. The following are examples of searching and replacing tasks that the normal search/replace commands can't handle or handle poorly:

You are developing code in C, and you want to combine the functionality of the functions read and readfile into a new function called get. You want to replace all references to these functions with references to the new one.
You are writing a troff document using outline mode, as described in Chapter 7. In outline mode, headers of document sections have lines that start with one or more asterisks. You want to write a function called remove-outline-marks to get rid of these asterisks so that you can run troff on your file.
You want to change all occurrences of program in a document, including programs and program's, to module/modules/module's, without changing programming to moduleming or programmer to modulemer.
You are working on documentation for some C software that is being rewritten in Java. You want to change all the filenames in the documentation from <filename>.c to <filename>.java, since .java is the extension the javac compiler uses.
You just installed a new C++ compiler that prints error messages in German. You want to modify the Emacs compile package so that it can parse the error messages correctly (see the end of Chapter 9).

We will soon show how to use regular expressions to deal with these examples, which we refer to by number. Note that this discussion of regular expressions, although more comprehensive than that in Chapter 3, does not cover every feature; those that it doesn't cover are redundant with other features or relate to concepts that are beyond the scope of this book. It is also important to note that the regular expression syntax described here is for use with Lisp strings only; there is an important difference between the regexp syntax for Lisp strings and the regexp syntax for user commands (like replace-regexp), as we will see.

11.3.2.1 Basic operators

Regular expressions began as an idea in theoretical computer science, but they have found their way into many nooks and crannies of everyday, practical computing. The syntax used to represent them may vary, but the concepts are much the same everywhere. You probably already know a subset of regular expression notation: the wildcard characters used by the Unix shell or Windows command prompt to match filenames. The Emacs notation is a bit different; it is similar to those used by the language Perl, editors like ed and vi and Unix software tools like lex and grep. So let's start with the Emacs regular expression operators that resemble Unix shell wildcard character, which are listed in Table 11-5.

Table 11-5. Basic regular expression operators
Emacs operator	Equivalent	Function
.	`?`	Matches any character.
`.*`	`*`	Matches any string.
`[abc]`	`[abc]`	Matches `a`, `b`, or `c`.
`[a-z]`	`[a-z]`	Matches any lowercase letter.

For example, to match all filenames beginning with program in the Unix shell, you would specify program*. In Emacs, you would say program.*. To match all filenames beginning with a through e in the shell, you would use [a-e]* or [abcde]*; in Emacs, it's [a-e].* or [abcde].*. In other words, the dash within the brackets specifies a range of characters.^[6] We will provide more on ranges and bracketed character sets shortly.

^[6] Emacs uses ASCII codes (on most machines) to build ranges, but you shouldn't depend on this fact; it is better to stick to dependable things, like all-lowercase or all-uppercase alphabet subsets or [0-9] for digits, and avoid potentially nonportable items, like [A-z] and ranges involving punctuation characters.

To specify a character that is used as a regular expression operator, you need to precede it with a double-backslash, as in \\* to match an asterisk. Why a double backslash? The reason has to do with the way Emacs Lisp reads and decodes strings. When Emacs reads a string in a Lisp program, it decodes the backslash-escaped characters and thus turns double backslashes into single backslashes. If the string is being used as a regular expression that is, if it is being passed to a function that expects a regular expression argument that function uses the single backslash as part of the regular expression syntax. For example, given the following line of Lisp:

(replace-regexp "fred\\*" "bob*")

the Lisp interpreter decodes the string fred\\* as fred\* and passes it to the replace-regexp command. The replace-regexp command understands fred\* to mean fred followed by a (literal) asterisk. Notice, however, that the second argument to replace-regexp is not a regular expression, so there is no need to backslash-escape the asterisk in bob* at all. Also notice that if you were to invoke the this as a user command, you would not need to double the backslash, that is, you would type M-x replace-regexp Enter followed by fred\* and bob*. Emacs decodes strings read from the minibuffer differently.

The * regular expression operator in Emacs (by itself) actually means something different from the * in the Unix shell: it means "zero or more occurrences of whatever is before the *." Thus, because . matches any character, .* means "zero or more occurrences of any character," that is, any string at all, including the empty string. Anything can precede a *: for example, read* matches "rea" followed by zero or more d's; file[0-9]* matches "file" followed by zero or more digits.

Two operators are closely related to *. The first is +, which matches one or more occurrences of whatever precedes it. Thus, read+ matches "read" and "readdddd" but not "rea," and file[0-9]+ requires that there be at least one digit after "file." The second is ?, which matches zero or one occurrence of whatever precedes it (i.e., makes it optional). html? matches "htm" or "html," and file[0-9]? matches "file" followed by one optional digit.

Before we move on to other operators, a few more comments about character sets and ranges are in order. First, you can specify more than one range within a single character set. The set [A-Za-z] can thus be used to specify all alphabetic characters; this is better than the nonportable [A-z]. Combining ranges with lists of characters in sets is also possible; for example, [A-Za-z_] means all alphabetic characters plus underscore, that is, all characters allowed in the names of identifiers in C. If you give ^ as the first character in a set, it acts as a "not" operator; the set matches all characters that aren't the characters after the ^. For example, [^A-Za-z] matches all nonalphabetic characters.

A ^ anywhere other than first in a character set has no special meaning; it's just the caret character. Conversely, - has no special meaning if it is given first in the set; the same is true for ]. However, we don't recommend that you use this shortcut; instead, you should double-backslash-escape these characters just to be on the safe side. A double backslash preceding a nonspecial character usually means just that character but watch it! A few letters and punctuation characters are used as regular expression operators, some of which are covered in the following section. We list "booby trap" characters that become operators when double-backslash-escaped later. The ^ character has a different meaning when used outside of ranges, as we'll see soon.

11.3.2.2 Grouping and alternation

If you want to get *, +, or ? to operate on more than one character, you can use the \$ and \$ operators for grouping. Notice that, in this case (and others to follow), the backslashes are part of the operator. (All of the nonbasic regular expression operators include backslashes so as to avoid making too many characters "special." This is the most profound way in which Emacs regular expressions differ from those used in other environments, like Perl, so it's something to which you'll need to pay careful attention.) As we saw before, these characters need to be double-backslash-escaped so that Emacs decodes them properly. If one of the basic operators immediately follows \\), it works on the entire group inside the \$ and \$. For example, \$read\$* matches the empty string, "read," "readread," and so on, and read\$file\$? matches "read" or "readfile." Now we can handle Example 1, the first of the examples given at the beginning of this section, with the following Lisp code:

(replace-regexp "read\\(file\\)?" "get")

The alternation operator \\| is a "one or the other" operator; it matches either whatever precedes it or whatever comes after it. \\| treats parenthesized groups differently from the basic operators. Instead of requiring parenthesized groups to work with subexpressions of more than one character, its "power" goes out to the left and right as far as possible, until it reaches the beginning or end of the regexp, a \$, a \$, or another \\|. Some examples should make this clearer:

read\\|get matches "read" or "get"
readfile\\|read\\|get matches "readfile", "read," or "get"
\$read\\|get\$file matches "readfile" or "getfile"

In the first example, the effect of the \\| extends to both ends of the regular expression. In the second, the effect of the first \\| extends to the beginning of the regexp on the left and to the second \\| on the right. In the third, it extends to the backslash-parentheses.

11.3.2.3 Context

Another important category of regular expression operators has to do with specifying the context of a string, that is, the text around it. In Chapter 3 we saw the word-search commands, which are invoked as options within incremental search. These are special cases of context specification; in this case, the context is word-separation characters, for example, spaces or punctuation, on both sides of the string.

The simplest context operators for regular expressions are ^ and $, two more basic operators that are used at the beginning and end of regular expressions respectively. The ^ operator causes the rest of the regular expression to match only if it is at the beginning of a line; $ causes the regular expression preceding it to match only if it is at the end of a line. In Example 2, we need a function that matches occurrences of one or more asterisks at the beginning of a line; this will do it:

(defun remove-outline-marks ( )   "Remove section header marks created in outline-mode."   (interactive)   (replace-regexp "^\\*+" ""))

This function finds lines that begin with one or more asterisks (the \\* is a literal asterisk and the + means "one or more"), and it replaces the asterisk(s) with the empty string "", thus deleting them.

Note that ^ and $ can't be used in the middle of regular expressions that are intended to match strings that span more than one line. Instead, you can put \n (for Newline) in your regular expressions to match such strings. Another such character you may want to use is \t for Tab. When ^ and $ are used with regular expression searches on strings instead of buffers, they match beginning- and end-of-string, respectively; the function string-match, described later in this chapter, can be used to do regular expression search on strings.

Here is a real-life example of a complex regular expression that covers the operators we have seen so far: sentence-end, a variable Emacs uses to recognize the ends of sentences for sentence motion commands like forward-sentence (M-e). Its value is:

"[.?!][]\"')}]*\\($\\|\t\\|  \\)[ \t\n]*"

Let's look at this piece by piece. The first character set, [.?!], matches a period, question mark, or exclamation mark (the first two of these are regular expression operators, but they have no special meaning within character sets). The next part, []\"')}]*, consists of a character set containing right bracket, double quote, single quote, right parenthesis, and right curly brace. A * follows the set, meaning that zero or more occurrences of any of the characters in the set matches. So far, then, this regexp matches a sentence-ending punctuation mark followed by zero or more ending quotes, parentheses, or curly braces. Next, there is the group \$$\\|\t\\| \$, which matches any of the three alternatives $ (end of line), Tab, or two spaces. Finally, [ \t\n]* matches zero or more spaces, tabs, or newlines. Thus the sentence-ending characters can be followed by end-of-line or a combination of spaces (at least two), tabs, and newlines.

There are other context operators besides ^ and $; two of them can be used to make regular expression search act like word search. The operators \\< and \\> match the beginning and end of a word, respectively. With these we can go part of the way toward solving Example 3. The regular expression \\<program\\> matches "program" but not "programmer" or "programming" (it also won't match "microprogram"). So far so good; however, it won't match "program's" or "programs." For this, we need a more complex regular expression:

\\<program\\('s\\|s\\)?\\>

This expression means, "a word beginning with program followed optionally by apostrophe s or just s." This does the trick as far as matching the right words goes.

11.3.2.4 Retrieving portions of matches

There is still one piece missing: the ability to replace "program" with "module" while leaving any s or 's untouched. This leads to the final regular expression feature we will cover here: the ability to retrieve portions of the matched string for later use. The preceding regular expression is indeed the correct one to give as the search string for replace-regexp. As for the replace string, the answer is module\\1; in other words, the required Lisp code is:

(replace-regexp "\\<program\\('s\\|s\\)?\\>" "module\\1")

The \\1 means, in effect, "substitute the portion of the matched string that matched the subexpression inside the \$ and \$." It is the only regular-expression-related operator that can be used in replacements. In this case, it means to use 's in the replace string if the match was "program's," s if the match was "programs," or nothing if the match was just "program." The result is the correct substitution of "module" for "program," "modules" for "programs," and "module's" for "program's."

Another example of this feature solves Example 4. To match filenames <filename>.c and replace them with <filename>.java, use the Lisp code:

(replace-regexp "\\([a-zA-Z0-9_]+\\)\\.c" "\\1.java")

Remember that \\. means a literal dot (.). Note also that the filename pattern (which matches a series of one or more alphanumerics or underscores) was surrounded by \$ and \$ in the search string for the sole purpose of retrieving it later with \\1.

Actually, the \\1 operator is only a special case of a more powerful facility (as you may have guessed). In general, if you surround a portion of a regular expression with \$ and \$, the string matching the parenthesized subexpression is saved. When you specify the replace string, you can retrieve the saved substrings with \\n, where n is the number of the parenthesized subexpression from left to right, starting with 1. Parenthesized expressions can be nested; their corresponding \\n numbers are assigned in order of their \\( delimiter from left to right.

Lisp code that takes full advantage of this feature tends to contain complicated regular expressions. The best example of this in Emacs's own Lisp code is compilation-error-regexp-alist, the list of regular expressions the compile package (discussed in Chapter 9) uses to parse error messages from compilers. Here is an excerpt, adapted from the Emacs source code (it's become much too long to reproduce in its entirety; see below for some hints on how to find the actual file to study in its full glory):

(defvar compilation-error-regexp-alist   '(     ;; NOTE!  See also grep-regexp-alist, below.       ;; 4.3BSD grep, cc, lint pass 1:     ;;  /usr/src/foo/foo.c(8): warning: w may be used before set     ;; or GNU utilities:     ;;  foo.c:8: error message     ;; or HP-UX 7.0 fc:     ;;  foo.f          :16    some horrible error message     ;; or GNU utilities with column (GNAT 1.82):     ;;   foo.adb:2:1: Unit name does not match file name     ;; or with column and program name:     ;;   jade:dbcommon.dsl:133:17:E: missing argument for function call     ;;     ;; We'll insist that the number be followed by a colon or closing     ;; paren, because otherwise this matches just about anything     ;; containing a number with spaces around it.     ;; We insist on a non-digit in the file name     ;; so that we don't mistake the file name for a command name     ;; and take the line number as the file name.     ("\\([a-zA-Z][-a-zA-Z._0-9]+: ?\\)?\ \\([a-zA-Z]?:?[^:( \t\n]*[^:( \t\n0-9][^:( \t\n]*\\)[:(][ \t]*\\([0-9]+\\)\ \\([) \t]\\|:\\(\\([0-9]+:\\)\\|[0-9]*[^:0-9]\\)\\)" 2 3 6) ;; Microsoft C/C++:     ;;  keyboard.c(537) : warning C4005: 'min' : macro redefinition     ;;  d:\tmp\test.c(23) : error C2143: syntax error : missing ';' before          'if'     ;; This used to be less selective and allow characters other than     ;; parens around the line number, but that caused confusion for     ;; GNU-style error messages.     ;; This used to reject spaces and dashes in file names,     ;; but they are valid now; so I made it more strict about the error     ;; message that follows.     ("\\(\\([a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\)) \ : \\(error\\|warning\\) C[0-9]+:" 1 3) ;; Caml compiler:     ;;  File "foobar.ml", lines 5-8, characters 20-155: blah blah    ("^File \"\\([^,\" \n\t]+\\)\", lines? \\([0-9]+\\)[-0-9]*, characters? \ \\([0-9]+\\)" 1 2 3) ;; Cray C compiler error messages     ("\\(cc\\| cft\\)-[0-9]+ c\\(c\\|f77\\): ERROR \\([^,\n]+, \\)* File = \ \\([^,\n]+\\), Line = \\([0-9]+\\)" 4 5) ;; Perl -w:     ;; syntax error at automake line 922, near "':'"     ;; Perl debugging traces     ;; store::odrecall('File_A', 'x2') called at store.pm line 90     (".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]" 1 2)     ;; See http://ant.apache.org/faq.html     ;; Ant Java: works for jikes     ("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\([0-9]+\\):[0-9]+:[0-9]\ +:" 1 2 3)     ;; Ant Java: works for javac     ("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):" 1 2) )

This is a list of elements that have at least three parts each: a regular expression and two numbers. The regular expression matches error messages in the format used by a particular compiler or tool. The first number tells Emacs which of the matched subexpressions contains the filename in the error message; the second number designates which of the subexpressions contains the line number. (There can also be additional parts at the end: a third number giving the position of the column number of the error, if any, and any number of format strings used to generate the true filename from the piece found in the error message, if needed. For more details about these, look at the actual file, as described below.)

For example, the element in the list dealing with Perl contains the regular expression:

".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]"

followed by 1 and 2, meaning that the first parenthesized subexpression contains the filename and the second contains the line number. So if you have Perl's warnings turned on you always do, of course you might get an error message such as this:

syntax error at monthly_orders.pl line 1822, near "$"

The regular expression ignores everything up to at. Then it finds monthly_orders.pl, the filename, as the match to the first subexpression "[^ \n]+" (one or more nonblank, nonnewline characters), and it finds 1822, the line number, as the match to the second subexpression "[0-9]+" (one or more digits).

For the most part, these regular expressions are documented pretty well in their definitions. Understanding them in depth can still be a challenge, and writing them even more so! Suppose we want to tackle Example 5 by adding an element to this list for our new C++ compiler that prints error messages in German. In particular, it prints error messages like this:

Fehler auf Zeile linenum in filename: text of error message

Here is the element we would add to compilation-error-regexp-alist:

("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)

In this case, the second parenthesized subexpression matches the filename, and the first matches the line number.

To add this to compilation-error-regexp-alist, we need to put this line in .emacs:

(setq compilation-error-regexp-alist   (cons '("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)     compilation-error-regexp-alist))

Notice how this example resembles our example (from Chapter 9) of adding support for a new language mode to auto-mode-alist.

11.3.2.5 Regular expression operator summary

Table 11-6 concludes our discussion of regular expression operators with a reference list of all the operators covered.

Table 11-6. Regular expression operators
Operator	Function
.	Match any character.
`*`	Match 0 or more occurrences of preceding char or group.
`+`	Match 1 or more occurrences of preceding char or group.
`?`	Match 0 or 1 occurrences of preceding char or group.
`[...]`	Set of characters; see below.
`\\(`	Begin a group.
`\\)`	End a group.
`\\\|`	Match the subexpression before or after \\\|.
`^`	At beginning of regexp, match beginning of line or string.
`$`	At end of regexp, match end of line or string.
`\n`	Match Newline within a regexp.
`\t`	Match Tab within a regexp.
`\\<`	Match beginning of word.
`\\>`	Match end of word.
The following operators are meaningful within character sets:
`^`	At beginning of set, treat set as chars not to match.
`-` (dash)	Specify range of characters.
The following is also meaningful in regexp replace strings:
`\\n`	Substitute portion of match within the `n`th `\$` and `\$`, counting from left `\\(` to right, starting with 1.

Finally, the following characters are operators (not discussed here) when double-backslash-escaped: b, B, c, C, w, W, s, S, =, _, ', and `. Thus, these are "booby traps" when double-backslash-escaped. Some of these behave similarly to the character class aliases you may have encountered in Perl and Java regular expressions.

11.3.3 A Treasure Trove of Examples

As mentioned above, the full auto-mode-alist has a lot more entries and documentation than fit in this book. The compile.el module in which it is defined also contains functions that use it. One of the best ways to learn how to use Emacs Lisp (as well as discovering things you might not have even realized you can do) is to browse through the implementations of standard modules that are similar to what you're trying to achieve, or that are simply interesting. But how do you find them?

The manual way is to look at the value of the variable load-path. This is the variable Emacs consults when it needs to load a library file itself, so any library you're looking for must be in one of these directories. (This variable is discussed further in the final section of this chapter.) The problem, as you will see if you look at the current value of the variable, is that it contains a large number of directories for you to wade through, which would be pretty tedious each time you're curious about a library. (An easy way to see the variable's value is through Help's "Describe variable" feature, C-h v.)

One of the authors wrote the command listed in Example 11-1 to address this problem and uses it regularly to easily snoop on the source files that make much of Emacs run. If you don't want to type this entire function into your .emacs by hand, you can download it from this book's web site, http://www.oreilly.com/catalog/gnu3.

Example 11-1. find-library-file

(defun find-library-file (library)   "Takes a single argument LIBRARY, being a library file to search for. Searches for LIBRARY directly (in case relative to current directory, or absolute) and then searches directories in load-path in order.  It will test LIBRARY with no added extension, then with .el, and finally with .elc.  If a file is found in the search, it is visited.  If none is found, an error is signaled.  Note that order of extension searching is reversed from that of the load function."   (interactive "sFind library file: ")   (let ((path (cons "" load-path)) exact match elc test found)     (while (and (not match) path)       (setq test (concat (car path) "/" library)             match (if (condition-case nil                           (file-readable-p test)                         (error nil))                       test)             path (cdr path)))     (setq path (cons "" load-path))     (or match         (while (and (not elc) path)           (setq test (concat (car path) "/" library ".elc")                 elc (if (condition-case nil                             (file-readable-p test)                           (error nil))                         test)                 path (cdr path))))     (setq path (cons "" load-path))     (while (and (not match) path)       (setq test (concat (car path) "/" library ".el")             match (if (condition-case nil                           (file-readable-p test)                         (error nil))                       test)             path (cdr path)))     (setq found (or match elc))     (if found         (progn           (find-file found)           (and match elc                (message "(library file %s exists)" elc)                (sit-for 1))           (message "Found library file %s" found))       (error "Library file \"%s\" not found." library))))

Once this command is defined, you can visit any library's implementation by typing M-x find-library file Enter libraryname Enter. If you use it as often as this author does, you too may find it worth binding to a key sequence. We won't present a detailed discussion of how this function works because it goes a bit deeper than this chapter, but if you're curious about what some of the functions do, you can put your cursor in the function name in a Lisp buffer and use the Help system's "Describe function" (C-h f) feature to get more information about it.

If you find that most of the time when you ask for a library, you end up with a file containing a lot of cryptic numeric codes and no comments, check if the filename ends in .elc. If that is usually what you end up with, it means that only the byte-compiled versions of the libraries (see the discussion at the end of this chapter) have been installed on your system. Ask your system administrator if you can get the source installed; that's an important part of being able to learn and tweak the Emacs Lisp environment.

11.3.4 Functions That Use Regular Expressions

The functions re-search-forward, re-search-backward, replace-regexp, query-replace-regexp, highlight-regexp, isearch-forward-regexp, and isearch-backward-regexp are all user commands that use regular expressions, and they can all be used within Lisp code (though it is hard to imagine incremental search being used within Lisp code). The section on customizing major modes later in this chapter contains an example function that uses re-search-forward. To find other commands that use regexps you can use the "apropos" help feature (C-h a regexp Enter).

Other such functions aren't available as user commands. Perhaps the most widely used one is looking-at. This function takes a regular expression argument and does the following: it returns t if the text after point matches the regular expression (nil otherwise); if there was a match, it saves the pieces surrounded by \$ and \$ for future use, as seen earlier. The function string-match is similar: it takes two arguments, a regexp and a string. It returns the starting index of the portion of the string that matches the regexp, or nil if there is no match.

The functions match-beginning and match-end can be used to retrieve the saved portions of the matched string. Each takes as an argument the number of the matched expression (as in \\n in replace-regexp replace strings) and returns the character position in the buffer that marks the beginning (for match-beginning) or end (for match-end) of the matched string. With the argument 0, the character position that marks the beginning/end of the entire string matched by the regular expression is returned.

Two more functions are needed to make the above useful: we need to know how to convert the text in a buffer to a string. No problem: buffer-string returns the entire buffer as a string; buffer-substring takes two integer arguments, marking the beginning and end positions of the substring desired, and returns the substring.

With these functions, we can write a bit of Lisp code that returns a string containing the portion of the buffer that matches the nth parenthesized subexpression:

(buffer-substring (match-beginning n (match-end n)))

In fact, this construct is used so often that Emacs has a built-in function, match-string, that acts as a shorthand; (match-string n) returns the same result as in the previous example.

An example should show how this capability works. Assume you are writing the Lisp code that parses compiler error messages, as in our previous example. Your code goes through each element in compilation-error-regexp-alist, checking if the text in a buffer matches the regular expression. If it matches, your code needs to extract the filename and the line number, visit the file, and go to the line number.

Although the code for going down each element in the list is beyond what we have learned so far, the routine basically looks like this:

for each element in compilation-error-regexp-alist   (let ((regexp the regexp in the element)         (file-subexp the number of the filename subexpression)         (line-subexp the number of the line number subexpression))     (if (looking-at regexp)         (let ((filename (match-string file-subexp))               (linenum (match-string line-subexp)))           (find-file-other-window filename)           (goto-line linenum))       (otherwise, try the next element in the list)))

The second let extracts the filename from the buffer from the beginning to the end of the match to the file-subexp-th subexpression, and it extracts the line number similarly from the line-subexp-th subexpression (and converts it from a string to a number). Then the code visits the file (in another window, not the same one as the error message buffer) and goes to the line number where the error occurred.

The code for the calculator mode later in this chapter contains a few other examples of looking-at, match-beginning, and match-end.

11.3.5 Finding Other Built-in Functions

Emacs contains hundreds of built-in functions that may be of use to you in writing Lisp code. Yet finding which one to use for a given purpose is not so hard.

The first thing to realize is that you will often need to use functions that are already accessible as keyboard commands. You can use these by finding out what their function names are via the C-h k (for describe-key) command (see Chapter 14). This gives the command's full documentation, as opposed to C-h c (for describe-key-briefly), which gives only the command's name. Be careful: in a few cases, some common keyboard commands require an argument when used as Lisp functions. An example is forward-word; to get the equivalent of typing M-f, you have to use (forward-word 1).

Another powerful tool for getting the right function for the job is the command-apropos (C-h a) help function. Given a regular expression, this help function searches for all commands that match it and display their key bindings (if any) and documentation in a *Help* window. This can be a great help if you are trying to find a command that does a certain "basic" thing. For example, if you want to know about commands that operate on words, type C-h a followed by word, and you will see documentation on about a dozen and a half commands having to do with words.

The limitation with command-apropos is that it gives information only on functions that can be used as keyboard commands. Even more powerful is apropos, which is not accessible via any of the help keys (you must type M-x apropos Enter). Given a regular expression, apropos displays all functions, variables, and other symbols that match it. Be warned, though: apropos can take a long time to run and can generate very long lists if you use it with a general enough concept (such as buffer).

You should be able to use the apropos commands on a small number of well-chosen keywords and find the function(s) you need. Because, if a function seems general and basic enough, the chances are excellent that Emacs has it built-in.

After you find the function you are interested in, you may find that the documentation that apropos prints does not give you enough information about what the function does, its arguments, how to use it, or whatever. The best thing to do at this point is to search Emacs's Lisp source code for examples of the function's use. "A Treasure Trove of Examples" earlier in this chapter provides ways of finding out the names of directories Emacs loads libraries from and an easy way of looking at a library once you know its name. To search the contents of the library files you'll need to use grep or some other search facility to find examples, then edit the files found to look at the surrounding context. If you're ambitious you could put together the examples and concepts we've discussed so far to write an extension of the find-library-file command that searches the contents of the library files in each directory on the load path! Although most of Emacs's built-in Lisp code is not profusely documented, the examples of function use that it provides should be helpful and may even give you ideas for your own functions.

By now, you should have a framework of Emacs Lisp that should be sufficient for writing many useful Emacs commands. We have covered examples of various kinds of functions, both Lisp primitives and built-in Emacs functions. You should be able to extrapolate many others from the ones given in this chapter along with help techniques such as those just provided. In other words, you are well on your way to becoming a fluent Emacs Lisp programmer. To test yourself, start with the code for count-words-buffer and try writing the following functions:

count-lines-buffer: Print the number of lines in the buffer.
count-words-region: Print the number of words in a region.
what-line: Print the number of the line point is currently on.