Section 8.12. Interpreter Files

8.12. Interpreter Files

All contemporary UNIX systems support interpreter files. These files are text files that begin with a line of the form

     #! pathname [ optional-argument ]

The space between the exclamation point and the pathname is optional. The most common of these interpreter files begin with the line

    #!/bin/sh

The pathname is normally an absolute pathname, since no special operations are performed on it (i.e., PATH is not used). The recognition of these files is done within the kernel as part of processing the exec system call. The actual file that gets executed by the kernel is not the interpreter file, but the file specified by the pathname on the first line of the interpreter file. Be sure to differentiate between the interpreter filea text file that begins with #!and the interpreter, which is specified by the pathname on the first line of the interpreter file.

Be aware that systems place a size limit on the first line of an interpreter file. This limit includes the #!, the pathname, the optional argument, the terminating newline, and any spaces.

On FreeBSD 5.2.1, this limit is 128 bytes. Mac OS X 10.3 extends this limit to 512 bytes. Linux 2.4.22 supports a limit of 127 bytes, whereas Solaris 9 places the limit at 1,023 bytes.

Example

Let's look at an example to see what the kernel does with the arguments to the exec function when the file being executed is an interpreter file and the optional argument on the first line of the interpreter file. The program in Figure 8.20 execs an interpreter file.

The following shows the contents of the one-line interpreter file that is executed and the result from running the program in Figure 8.20:

    $ cat /home/sar/bin/testinterp    #!/home/sar/bin/echoarg foo    $ ./a.out    argv[0]: /home/sar/bin/echoarg    argv[1]: foo    argv[2]: /home/sar/bin/testinterp    argv[3]: myarg1    argv[4]: MY ARG2

The program echoarg (the interpreter) just echoes each of its command-line arguments. (This is the program from Figure 7.4.) Note that when the kernel execs the interpreter (/home/sar/bin/echoarg), argv[0] is the pathname of the interpreter, argv[1] is the optional argument from the interpreter file, and the remaining arguments are the pathname (/home/sar/bin/testinterp) and the second and third arguments from the call to execl in the program shown in Figure 8.20 (myarg1 and MY ARG2). Both argv[1] and argv[2] from the call to execl have been shifted right two positions. Note that the kernel takes the pathname from the execl call instead of the first argument (testinterp), on the assumption that the pathname might contain more information than the first argument.

Figure 8.20. A program that `exec`s an interpreter file

 #include "apue.h" #include <sys/wait.h> int main(void) {     pid_t   pid;     if ((pid = fork()) < 0) {         err_sys("fork error");     } else if (pid == 0) {          /* child */         if (execl("/home/sar/bin/testinterp",                   "testinterp", "myarg1", "MY ARG2", (char *)0) < 0)             err_sys("execl error");     }     if (waitpid(pid, NULL, 0) < 0) /* parent */         err_sys("waitpid error");     exit(0); }

Example

A common use for the optional argument following the interpreter pathname is to specify the -f option for programs that support this option. For example, an awk(1) program can be executed as

    awk -f myfile

which tells awk to read the awk program from the file myfile.

Systems derived from UNIX System V often include two versions of the awk language. On these systems, awk is often called "old awk" and corresponds to the original version distributed with Version 7. In contrast, nawk (new awk) contains numerous enhancements and corresponds to the language described in Aho, Kernighan, and Weinberger [1988]. This newer version provides access to the command-line arguments, which we need for the example that follows. Solaris 9 provides both versions.

The awk program is one of the utilities included by POSIX in its 1003.2 standard, which is now part of the base POSIX.1 specification in the Single UNIX Specification. This utility is also based on the language described in Aho, Kernighan, and Weinberger [1988].

The version of awk in Mac OS X 10.3 is based on the Bell Laboratories version that Lucent has placed in the public domain. FreeBSD 5.2.1 and Linux 2.4.22 ship with GNU awk, called gawk, which is linked to the name awk. The gawk version conforms to the POSIX standard, but also includes other extensions. Because they are more up-to-date, the version of awk from Bell Laboratories and gawk are preferred to either nawk or old awk. (The version of awk from Bell Laboratories is available at http://cm.bell-labs.com/cm/cs/awkbook/index.html.)

Using the -f option with an interpreter file lets us write

    #!/bin/awk -f    (awk program follows in the interpreter file)

For example, Figure 8.21 shows /usr/local/bin/awkexample (an interpreter file).

If one of the path prefixes is /usr/local/bin, we can execute the program in Figure 8.21 (assuming that we've turned on the execute bit for the file) as

    $ awkexample file1 FILENAME2 f3    ARGV[0] = awk    ARGV[1] = file1    ARGV[2] = FILENAME2    ARGV[3] = f3

When /bin/awk is executed, its command-line arguments are

    /bin/awk -f /usr/local/bin/awkexample file1 FILENAME2 f3

The pathname of the interpreter file (/usr/local/bin/awkexample) is passed to the interpreter. The filename portion of this pathname (what we typed to the shell) isn't adequate, because the interpreter (/bin/awk in this example) can't be expected to use the PATH variable to locate files. When it reads the interpreter file, awk ignores the first line, since the pound sign is awk's comment character.

We can verify these command-line arguments with the following commands:

     $ /bin/su                              become superuser     Password:                              enter superuser password     # mv /bin/awk /bin/awk.save            save the original program     # cp /home/sar/bin/echoarg /bin/awk    and replace it temporarily     # suspend                              suspend the superuser shell using job control     [1] + Stopped         /bin/su     $ awkexample file1 FILENAME2 f3     argv[0]: /bin/awk     argv[1]: -f     argv[2]: /usr/local/bin/awkexample     argv[3]: file1     argv[4]: FILENAME2     argv[5]: f3     $ fg                                    resume superuser shell using job control     /bin/su     # mv /bin/awk.save /bin/awk             restore the original program     # exit                                  and exit the superuser shell

In this example, the -f option for the interpreter is required. As we said, this tells awk where to look for the awk program. If we remove the -f option from the interpreter file, an error message usually results when we try to run it. The exact text of the message varies, depending on where the interpreter file is stored and whether the remaining arguments represent existing files. This is because the command-line arguments in this case are

    /bin/awk /usr/local/bin/awkexample file1 FILENAME2 f3

and awk is trying to interpret the string /usr/local/bin/awkexample as an awk program. If we couldn't pass at least a single optional argument to the interpreter (-f in this case), these interpreter files would be usable only with the shells.

Figure 8.21. An `awk` program as an interpreter file

 #!/bin/awk -f BEGIN {     for (i = 0; i < ARGC; i++)         printf "ARGV[%d] = %s\n", i, ARGV[i]     exit }

Are interpreter files required? Not really. They provide an efficiency gain for the user at some expense in the kernel (since it's the kernel that recognizes these files). Interpreter files are useful for the following reasons.

They hide that certain programs are scripts in some other language. For example, to execute the program in Figure 8.21, we just say
```
    awkexample optional-arguments 
```
instead of needing to know that the program is really an awk script that we would otherwise have to execute as
```
    awk -f awkexample optional-arguments 
```
Interpreter scripts provide an efficiency gain. Consider the previous example again. We could still hide that the program is an awk script, by wrapping it in a shell script:
```
     awk 'BEGIN {         for (i = 0; i < ARGC; i++)             printf "ARGV[%d] = %s\n", i, ARGV[i]         exit     }' $* 
```
The problem with this solution is that more work is required. First, the shell reads the command and tries to execlp the filename. Because the shell script is an executable file, but isn't a machine executable, an error is returned, and execlp assumes that the file is a shell script (which it is). Then /bin/sh is executed with the pathname of the shell script as its argument. The shell correctly runs our script, but to run the awk program, the shell does a fork, exec, and wait. Thus, there is more overhead in replacing an interpreter script with a shell script.
Interpreter scripts let us write shell scripts using shells other than /bin/sh. When it finds an executable file that isn't a machine executable, execlp has to choose a shell to invoke, and it always uses /bin/sh. Using an interpreter script, however, we can simply write
```
    #!/bin/csh    (C shell script follows in the interpreter file) 
```
Again, we could wrap this all in a /bin/sh script (that invokes the C shell), as we described earlier, but more overhead is required.

None of this would work as we've shown if the three shells and awk didn't use the pound sign as their comment character.

8.12. Interpreter Files

Example

Figure 8.20. A program that execs an interpreter file

Example

Figure 8.21. An awk program as an interpreter file

Figure 8.20. A program that `exec`s an interpreter file

Figure 8.21. An `awk` program as an interpreter file