9.3. C and C Support

9.3. C and C++ Support

Emacs automatically enters C mode when you visit a file whose suffix is .c, .h, .y (for yacc grammars), or .lex (lex specification files). Emacs invokes C++ mode when you visit a file whose suffix is .C, .H, .cc, .hh, .cpp, .cxx, .hxx, .c++, or .h++. You can also put any file in C mode manually by typing M-x c-mode Enter. Similarly, you can use c++-mode to put a buffer into C++ mode.

Both C and C++ modes are implemented in the same Emacs Lisp package, called cc-mode,^[7] which also includes a mode for the Objective-C language used in Mac OS X. C mode understands both ANSI C and the older Kernighan and Ritchie C syntax. We describe C mode functions, but you should assume that everything also applies to C++ mode. C++ mode has a small number of additional features, which we describe at the end of this section.

^[7] We know! There is no M-x cc-mode. It can be confusing. Just try to remember that the modes are named directly after the language they support.

We should also note that the Emacs mode for Perl is derived from an older version of C mode. If you program in Perl, you will find that virtually all of the motion, indentation, and formatting commands in C mode apply equally to Perl mode, with perl- replacing c- in their names. Emacs invokes Perl mode on files with suffix .pl. (However, to be honest we prefer CPerl mode, discussed later in this chapter.)

In C mode, Emacs understands the syntax elements described earlier in this chapter. The characters semicolon (;), colon (:), comma (,) curly braces ({ and }), and pound sign (#, for C preprocessor commands) are all electric, meaning that Emacs automatically indents the current line when you type them. It also actively uses the font options when you have font-lock mode turned on.

9.3.1 Motion Commands

In addition to the standard Emacs commands for words and sentences (which are mainly useful only inside multiline comments), C mode contains advanced commands that know about statements, functions,^[8] and preprocessor conditionals. A summary of these commands appears in Table 9-3.

^[8] The function commands have "defun" in their names because they are actually adaptations of analogous commands in Lisp mode; a defun is a function definition in Lisp.

Table 9-3. Advanced C motion commands
Keystrokes	Command name	Action
M-a	c-beginning-of-statement	Move to the beginning of the current statement.
M-e	c-end-of-statement	Move to the end of the current statement.
M-q	c-fill-paragraph	If in comment, fill the paragraph, preserving indentations and decorations.
C-M-a	beginning-of-defun	Move to the beginning of the body of the function surrounding the point.
C-M-e	end-of-defun	Move to the end of the function.
C-M-h	c-mark-function	Put the cursor at the beginning of the function, the mark at the end.
C-c C-q	c-indent-defun	Indent the entire function according to indentation style.
C-c C-u	c-up-conditional	Move to the beginning of the current preprocessor conditional.
C-c C-p	c-backward-conditional	Move to the previous preprocessor conditional.
C-c C-n	c-forward-conditional	Move to the next preprocessor conditional.

Notice that the statement motion commands have the same key bindings as backward-sentence and forward-sentence, respectively. In fact, they act as sentence commands if you use them within a C comment.

Similarly, M-q is normally the fill-paragraph command; C mode augments it with the ability to preserve indentations and decorative characters at the beginnings of lines. For example, if your cursor is anywhere in this comment:

/* This is   * a  * comment paragraph with wildly differing right  *  margins.  * It goes on     for a while,  * then stops.  */

typing M-q has this result:

/* This is a comment paragraph with wildly differing right margins.  * It goes on for a while, then stops. */

You will find that the preprocessor conditional motion commands are a godsend if you have to slog through someone else's voluminous code. Especially if you're faced with code built to run on a variety of systems like Emacs itself often the most important question you need answered is, "What code is actually compiled?"

With C-c C-u, you can tell instantly what preprocessor conditional governs the code in question. Consider this code block:

#define LUCYX #define BADEXIT -1 #ifdef LUCYX     ...     *ptyv = open ("/dev/ptc", O_RDWR | O_NDELAY, 0);     if (fd < 0)         return BADEXIT;     ... #else     ...     fprintf (stderr, "You can't do that on this system!");     ... #endif

Imagine that the ellipses ( . . . ) represent hundreds of lines of code. Now suppose you are trying to determine under what conditions the file /dev/ptc is opened. If your cursor is on that line of code, you can type C-c C-u, and the cursor moves to the line #ifdef LUCYX telling you that the code is compiled if you're on a LUCYX system. If you want to skip the code that would not be compiled and go directly to the end of the conditional, type C-c C-n. We will see another command that is useful for dealing with C preprocessor code later in this section.

C statement and statement block delimiter characters are bound to commands that, in addition to inserting the appropriate character, also provide proper indentation. These characters are {, }, ;, and : (for labels and switch cases). For example, if you are closing out a statement block or function body, you can press C-j (or Enter) and type }, and Emacs lines it up with its matching {. This eliminates the need for you to scroll back through the code to find out what column the { is in.

Because } is a parenthesis-type character, Emacs attempts to "flash" a matching { when you type }. If the matching { is outside of the text displayed in your window, Emacs instead prints the line containing the { in the minibuffer. Furthermore, if only whitespace (blanks or tabs) follows the { on its line, Emacs also prints a ^J (for C-j) followed by the next line, thus giving a better idea of the context of the {.

Recall the "times" example earlier in this chapter. Let's say you are typing in a } to end the function, and the { that begins the function body is off-screen. There is no code on the line following the beginning {, so you see the following in the minibuffer after you type }:

Matches {^J  int i;

9.3.2 Customizing Code Indentation Style

Coding style in C or any programming language for that matter is a very personal thing. C programmers learn from various books or by referring to various different pieces of other people's code; eventually they evolve a personal style that may or may not conform to those that they learned from.

C mode provides a rich set of features for customizing its indentation behavior that mirrors this way of learning the language. At the simplest level, you can choose a coding style by name. Then, if you're not satisfied, you can customize your chosen style or even create your own from scratch. The latter tasks, however, require a fair amount of Emacs Lisp programming knowledge (see Chapter 11) and perhaps a bit of bravery.

You can choose a named coding style with the command M-x c-set-style. This command prompts you for the name of the style you want. The easiest thing to do at this point is to type Tab, the completion character (see Chapter 14), which brings up a *Completions* window that lists all of the choices. Type one of them and press Enter to select it.

By default, Emacs comes loaded with the styles shown in Table 9-4.

Table 9-4. Built-in cc-mode indentation styles
Style	Description
bsd	Style used in code for BSD-derived versions of Unix.
cc-mode	The default coding style, from which all others are derived .
ellemtel	Style used in C++ documentation from Ellemtel Telecommunication Systems Laboratories in Sweden .
gnu	Style used in C code for Emacs itself and other GNU-related programs .
java	Style used in Java code (the default for Java mode).
k&r	Style of the classic text on C, Kernighan and Ritchie's The C Programming Language .
linux	Style used in C code that is part of the Linux kernel.
python	Style used in python extensions.
stroustrup	C++ coding style of the standard reference work, Bjarne Stroustrup's The C++ Programming Language .
user	Customizations you make to .emacs or via Custom (see Chapter 10). All other styles inherit these customizations if you set them.
whitesmith	Style used in Whitesmith Ltd.'s documentation for their C and C++ compilers .

To show how some of these styles work, let's start with the C function example from earlier in this chapter:

int times (x, y) int x, y; { int i; int result = 0; for (i = 0; i < x; i++)  { result += y; } }

If you define a region around this code and you type C-M-\ (for indent-region), Emacs reformats the code in the default style like this:

int times (x, y)     int x, y; {     int i;     int result = 0;     for (i = 0; i < x; i++)          {             result += y;         } }

If you type C-c . (for c-set-style), enter k&r, and then repeat the reformatting, the code looks like this:

int times (x, y) int x, y; {      int i;      int result = 0;      for (i = 0; i < x; i++)      {           result += y;      } }

Or, if you want to switch to GNU-style indentation, choose the style gnu and reformat. The result is:

int times (x, y)      int x, y; {   int i;   int result = 0;   for (i = 0; i < x; i++)     {       result += y;     } }

Once you decide on a coding style, you can set it up permanently by putting a line in your .emacs file that looks like this:

(add-hook 'c-mode-hook        '(lambda ( )          (c-set-style "stylename")))

Unfortunately, we'll have to wait until Chapter 11 to understand exactly what this code does. For now, make sure that you insert a single quote (') before the (lambda in the second line.

Each coding style contains subtleties that makes it nontrivial for Emacs to implement. Older versions of Emacs did this by defining several variables that controlled various indentation levels; these were not easy to work with and, frankly, did not really cover 100 percent of the nuances of each style. The current version of C mode, in contrast, uses a considerably larger set of variables too large, in fact, for anyone other than hardy Emacs Lisp hackers to deal with.

Therefore, C mode keeps track of groups of these variables and their values under named styles. One huge variable, called c-style-alist, contains all of the styles and their associated information. You can customize this beast either by changing values of variables within existing styles or by adding a style of your own. For further details, look in the file cc-mode.el in your system's Emacs Lisp directory (see Chapter 11).

9.3.3 Additional C and C++ Mode Features

C mode contains a number of other useful features, ranging from the generally useful to the arcanely obscure. Perhaps the most interesting of these are two ways of adding additional electric functionality to certain keystrokes, called auto-newline and hungry-delete-key.^[9]

^[9] These emulate electric-c-mode in the old Gosling Emacs.

When auto-newline is enabled, it causes Emacs to add a newline character and indent the new line properly whenever you type a semicolon (;), curly brace ({ or }), or, at certain times, comma (,) or colon (:). These features can save you some time and help you format your code in a consistent style.

Auto-newline is off by default. To turn it on, type C-c C-a for c-toggle-auto-state. (Repeat the same command to turn it off again.) You will see the (C) in the mode line change to (C/a) as an indication. As an example of how it works, try typing in the code for our times( ) function. Type the first two lines up to the y on the second line:

int times (x, y) int x, y

Now press the semicolon; notice that Emacs inserts a newline and brings you down to the next line:

int times (x, y) int x, y;

Type the opening curly brace, and it happens again:

int times (x, y) int x, y; {

Of course, the number of spaces Emacs indents after you type the { depends on the indentation style you are using.

The other optional electric feature, hungry-delete-key, is also off by default. To toggle it on, type C-c C-d (for c-toggle-hungry-state). You will see the (C) on the mode line change to (C/h), or if you have auto-newline turned on, from (C/a) to (C/ah).

Turning on hungry-delete-key empowers the Del key to delete all whitespace to the left of the point. To go back to the previous example, assume you just typed the open curly brace. Then, if you press Del, Emacs deletes everything back to the curly brace:

int times (x, y) int x, y; {

You can toggle the states of both auto-newline and hungry-delete-key with the command C-c C-t (for c-toggle-auto-hungry-state).

If you want either of these features on by default when you invoke Emacs, you can put lines like the following in your .emacs file:

(add-hook 'c-mode-hook       '(lambda ( )          (c-toggle-auto-state)))

If you want to combine this customization with another C mode customization, such as the indentation style in the previous example, you need to combine the lines of Emacs Lisp code as follows:

(add-hook 'c-mode-hook       '(lambda ( )          (c-set-style "stylename")           (c-toggle-auto-state)))

Again, we will see what this hook construct means in "Customizing Existing Modes" in Chapter 11.

C mode also provides support for comments; earlier in the chapter, we saw examples of this support. There is, however, another feature. You can customize M-j (for indent-new-comment-line) so that Emacs continues the same comment on the next line instead of creating a new pair of delimiters. The variable comment-multi-line controls this feature: if it is set to nil (the default), Emacs generates a new comment on the next line, as in the example from earlier in the chapter:

result += y;                    /* add the multiplicand */                                 /* */

This outcome is the result of typing M-j after multiplicand, and it shows that the cursor is positioned so that you can type the text of the second comment line. However, if you set comment-multi-line to t (or any value other than nil), you get this outcome instead:

result += y;                    /* add the multiplicand                                     */

The final feature we'll cover is C-c C-e, (for c-macro-expand). Like the conditional compilation motion commands (e.g., C-c C-u for c-up-conditional), c-macro-expand helps you answer the often difficult question, "What code actually gets compiled?" when your source code contains a morass of preprocessor directives.

To use c-macro-expand, you must first define a region. Then, when you type C-c C-e, it takes the code within the region, passes it through the actual C preprocessor, and places the output in a window called *Macroexpansion*.

To see how this procedure works, let's go back to the code example from earlier in this chapter that contains C preprocessor directives:

#define LUCYX #define BADEXIT -1 #ifdef LUCYX     *ptyv = open ("/dev/ptc", O_RDWR | O_NDELAY, 0);     if (fd < 0)         return BADEXIT; #else     fprintf (stderr, "You can't do that on this system!"); #endif

If you define a region around this chunk of code and type C-c C-e, you see following the message:

Invoking /lib/cpp -C on region...

followed by this:

done

Then you see a *Macroexpansion* window that contains this result:

    *ptyv = open ("/dev/ptc", O_RDWR | O_NDELAY, 0);     if (fd < 0)         return -1;

If you want to use c-macro-expand with a different C preprocessor command, instead of the default /lib/cpp -C (the -C option means "preserve comments in the output"), you can set the variable c-macro-preprocessor. For example, if you want to use an experimental preprocessor whose filename is /usr/local/lib/cpp, put the following line in your .emacs file:

(setq c-macro-preprocessor "/usr/local/lib/cpp -C")

It's highly recommended that you keep the -C option for not deleting comments in your code.

9.3.4 C++ Mode Differences

As we mentioned before, C++ mode uses the same Emacs Lisp package as C mode. When you're in C++ mode, Emacs understands C++ syntax, as opposed to C (or Objective-C) syntax. That results in differences in how some of the commands discussed here behave, but in ways that are not noticeable to the user.

There are few apparent differences between C++ and C mode. The most important is the Emacs Lisp code you need to put in your .emacs file to customize C++ mode: instead of c-mode-hook, you use c++-mode-hook. For example, if you want C++ mode's indentation style set to Stroustrup with automatic newlines instead of the default style, put the following in your .emacs file:

(add-hook 'c++-mode-hook       '(lambda ( )          (c-set-style "Stroustrup")          (c-toggle-auto-state)))

Notice that you can set hooks for C mode and C++ mode separately this way, so that if you program in both languages, you can set up separate indentation styles for each.

C++ mode provides an additional command: C-c : (for c-scope-operator). This command inserts the C++ double colon (::) scope operator. It's necessary because the colon (:) is normally bound to electric functionality that can reindent the line when you don't want that done. The scope operator can appear virtually anywhere in C++ code whereas the single colon usually denotes a case label, which requires special indentation. The C-c : command may seem somewhat clumsy, but it's a necessary workaround to a syntactic clash in the C++ language.

Finally, both C and C++ mode contain the commands c-forward-into-nomenclature and c-backward-into-nomenclature, which aren't bound to any keystrokes by default. These are like forward-word and backward-word, respectively, but they treat capital letters in the middle of words as if they were starting new words. For example, they treat ThisVariableName as if it were three separate words while the standard forward-word and backward-word commands treat it as one word. ThisTypeOfVariableName is a style used by C++ programmers, as opposed to this_type_of_variable_name, which is somehow more endemic to old-school C code.

C++ programmers may want to bind c-forward-into-nomenclature and c-backward-into-nomenclature to the keystrokes normally bound to the standard word motion commands. We show you how to do this in "Customizing Existing Modes" in Chapter 11.

We've covered the main features of C and C++ modes, but actually these modes include many more features, most of them quite obscure or intended only for hardcore Emacs Lisp-adept customizers. Look in the Emacs Lisp package cc-mode.el and the ever-expanding list of cc- helper packages for more details.

9.3. C and C++ Support

9.3.1 Motion Commands

Table 9-3. Advanced C motion commands

9.3.2 Customizing Code Indentation Style

Table 9-4. Built-in cc-mode indentation styles

9.3.3 Additional C and C++ Mode Features

9.3.4 C++ Mode Differences