5.1 UNIX File System Navigation

Team-FLY

Operating systems organize physical disks into file systems to provide high-level logical access to the actual bytes of a file. A file system is a collection of files and attributes such as location and name . Instead of specifying the physical location of a file on disk, an application specifies a filename and an offset. The operating system makes a translation to the location of the physical file through its file systems.

A directory is a file containing directory entries that associate a filename with the physical location of a file on disk. When disks were small, a simple table of filenames and their positions was a sufficient representation for the directory. Larger disks require a more flexible organization, and most file systems organize their directories in a tree structure. This representation arises quite naturally when the directories themselves are files.

Figure 5.1 shows a tree-structured organization of a typical file system. The square nodes in this tree are directories, and the / designates the root directory of the file system. The root directory is at the top of the file system tree, and everything else is under it.

Figure 5.1. Tree structure of a file system.

graphics/05fig01.gif

The directory marked dirA in Figure 5.1 contains the files my1.dat , my2.dat and dirB . The dirB file is called a subdirectory of dirA because dirB is a directory contained in dirA of the file system tree. Notice that dirB also contains a file named my1.dat . Clearly, the filename is not enough to uniquely specify a file.

The absolute or fully qualified pathname specifies all of the nodes in the file system tree on the path from the root to the file itself. The absolute path starts with a slash ( / ) to designate the root node and then lists the names of the nodes down the path to the file within the file system tree. The successive names are separated by slashes . The file my1.dat in dirA in Figure 5.1 has the fully qualified pathname /dirA/my1.dat , and my1.dat in dirB has the fully qualified pathname /dirA/dirB/my1.dat .

5.1.1 The current working directory

A program does not always have to specify files by fully qualified pathnames. At any time, each process has an associated directory, called the current working directory , that it uses for pathname resolution. If a pathname does not start with / , the program prepends the fully qualified path of the current working directory. Hence, pathnames that do not begin with / are sometimes called relative pathnames because they are specified relative to the fully qualified pathname of the current directory. A dot ( . ) specifies the current directory, and a dot-dot ( .. ) specifies the directory above the current directory. The root directory has both dot and dot-dot pointing to itself.

Example 5.1

After you enter the following command, your shell process has the current working directory /dirA/dirB .

 cd /dirA/dirB

Exercise 5.2

Suppose the current working directory of a process is the /dirA/dirB directory of Figure 5.1. State three ways by which the process can refer to the file my1.dat in directory dirA . State three ways by which the process can refer to the file my1.dat in directory dirB . What about the file my3.dat in dirC ?

Answer:

Since the current working directory is /dirA/dirB , the process can use /dirA/my1.dat , ../my1.dat or even ./../my1.dat for the my1.dat file in dirA . Some of the ways by which the process can refer to the my1.dat file of dirB include my1.dat , /dirA/dirB/my1.dat , ./my1.dat , or ../dirB/my1.dat . The file my3.dat in dirC can be referred to as /dirC/my3.dat or ../../dirC/my3.dat .

The PWD environment variable specifies the current working directory of a process. Do not directly change this variable, but rather use the getcwd function to retrieve the current working directory and use the chdir function to change the current working directory within a process.

The chdir function causes the directory specified by path to become the current working directory for the calling process.

  SYNOPSIS  #include <unistd.h>    int chdir(const char *path);  POSIX

If successful, chdir returns 0. If unsuccessful , chdir returns “1 and sets errno . The following table lists the mandatory errors for chdir .

`errno`	cause
`EACCES`	search permission on a `path` component denied
`ELOOP`	a loop exists in resolution of `path`
`ENAMETOOLONG`	the length of `path` exceeds `PATH_MAX` , or a pathname component is longer than `NAME_MAX`
`ENOENT`	a component of `path` does not name an existing directory
`ENOTDIR`	a component of the pathname is not a directory

Example 5.3

The following code changes the process current working directory to /tmp .

 char *directory = " /tmp"; if (chdir(directory) == -1)    perror("Failed to change current working directory to /tmp");

Exercise 5.4

Why do ENOENT and ENOTDIR represent different error conditions for chdir ?

Answer:

Some of the components of path may represent symbolic links that have to be followed to get the true components of the pathname. (See Section 5.4 for a discussion of symbolic links.)

The getcwd function returns the pathname of the current working directory. The buf parameter of getcwd represents a user -supplied buffer for holding the pathname of the current working directory. The size parameter specifies the maximum length pathname that buf can accommodate, including the trailing string terminator.

  SYNOPSIS  #include <unistd.h>    char *getcwd(char *buf, size_t size);  POSIX

If successful, getcwd returns a pointer to buf . If unsuccessful, getcwd returns NULL and sets errno . The following table lists the mandatory errors for getcwd .

`errno`	cause
`EINVAL`	`size` is 0
`ERANGE`	`size` is greater than 0, but smaller than the pathname + 1.

If buf is not NULL , getcwd copies the name into buf . If buf is NULL , POSIX states that the behavior of getcwd is undefined. In some implementations , getcwd uses malloc to create a buffer to hold the pathname. Do not rely on this behavior.

You should always supply getcwd with a buffer large enough to fit a string containing the pathname. Program 5.1 shows a program that uses PATH_MAX as the buffer size. PATH_MAX is an optional POSIX constant specifying the maximum length of a pathname (including the terminating null byte) for the implementation. The PATH_MAX constant may or may not be defined in limits.h . The optional POSIX constants can be omitted from limits.h if their values are indeterminate but larger than the required POSIX minimum. For PATH_MAX , the _POSIX_PATH_MAX constant specifies that an implementation must accommodate pathname lengths of at least 255. A vendor might allow PATH_MAX to depend on the amount of available memory space on a specific instance of a specific implementation.

Program 5.1 `getcwdpathmax.c`

A complete program to output the current working directory .

 #include <limits.h> #include <stdio.h> #include <unistd.h> #ifndef PATH_MAX #define PATH_MAX 255 #endif int main(void) {     char mycwd[PATH_MAX];     if (getcwd(mycwd, PATH_MAX) == NULL) {         perror("Failed to get current working directory");         return 1;     }     printf("Current working directory: %s\n", mycwd);     return 0; }

A more flexible approach uses the pathconf function to determine the real value for the maximum path length at run time. The pathconf function is one of a family of functions that allows a program to determine system and runtime limits in a platform-independent way. For example, Program 2.10 uses the sysconf member of this family to calculate the number of seconds that a program runs. The sysconf function takes a single argument, which is the name of a configurable systemwide limit such as the number of clock ticks per second ( _SC_CLK_TCK ) or the maximum number of processes allowed per user ( _SC_CHILD_MAX ).

The pathconf and fpathconf functions report limits associated with a particular file or directory. The fpathconf takes a file descriptor and the limit designator as parameters, so the file must be opened before a call to fpathconf . The pathconf function takes a pathname and a limit designator as parameters, so it can be called without the program actually opening the file. The sysconf function returns the current value of a configurable system limit that is not associated with files. Its name parameter designates the limit.

  SYNOPSIS  #include <unistd.h>    long fpathconf(int fildes, int name);    long pathconf(const char *path, int name);    long sysconf(int name);  POSIX

If successful, these functions return the value of the limit. If unsuccessful, these functions return “1 and set errno . The following table lists the mandatory errors.

`errno`	cause
`EINVAL`	`name` has an invalid value
`ELOOP`	a loop exists in resolution of `path` ( `pathconf` )

Program 5.2 shows a program that avoids the PATH_MAX problem by first calling pathconf to find the maximum pathname length. Since the program does not know the length of the path until run time, it allocates the buffer for the path dynamically.

Program 5.2 `getcwdpathconf.c`

A program that uses pathconf to output the current working directory

 #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(void) {    long maxpath;    char *mycwdp;    if ((maxpath = pathconf(".", _PC_PATH_MAX)) == -1) {       perror("Failed to determine the pathname length");       return 1;    }    if ((mycwdp = (char *) malloc(maxpath)) == NULL) {       perror("Failed to allocate space for pathname");       return 1;    }    if (getcwd(mycwdp, maxpath) == NULL) {       perror("Failed to get current working directory");       return 1;    }    printf("Current working directory: %s\n", mycwdp);    return 0; }

5.1.2 Search paths

A user executes a program in a UNIX shell by typing the pathname of the file containing the executable. Most commonly used programs and utilities are not in the user's current working directory (e.g., vi , cc ). Imagine how inconvenient it would be if you actually had to know the locations of all system executables to execute them. Fortunately, UNIX has a method of looking for executables in a systematic way. If only a name is given for an executable, the shell searches for the executable in all possible directories listed by the PATH environment variable. PATH contains the fully qualified pathnames of important directories separated by colons.

Example 5.5

The following is a typical value of the PATH environment variable.

 /usr/bin:/etc:/usr/local/bin:/usr/ccs/bin:/home/robbins/bin:.

This specification says that when you enter a command your shell should search /usr/bin first. If it does not find the command there, the shell should next examine the /etc directory and so on.

Remember that the shell does not search subdirectories of directories in the PATH unless they are also explicitly specified in the PATH . If in doubt about which version of a particular program you are actually executing, use which to get the fully qualified pathname of the executable. The which command is not part of POSIX, but it is available on most systems. Section 5.5 describes how you can write your own version of which .

It is common for programmers to create a bin directory for executables, making bin a subdirectory of their home directories. The PATH of Example 5.5 contains the /home/robbins/bin directory. The bin directory appears before dot ( . ), the current directory, in the search path leading to the problem discussed in the next exercise.

Exercise 5.6

A user develops a program called calhit in the subdirectory progs of his or her home directory and puts a copy of the executable in the bin directory of the same account. The user later modifies calhit in the progs directory without copying it to the bin directory. What happens when the programmer tries to test the new version?

Answer:

The result depends on the value of the PATH environment variable. If the user's PATH is set up in the usual way, the shell searches the bin directory first and executes the old version of the program. You can test the new version with ./calhit .

Resist the temptation to put the dot ( . ) at the beginning of the PATH in spite of the problem mentioned in Exercise 5.6. Such a PATH specification is regarded as a security risk and may lead to strange results when your shell executes local programs instead of the standard system programs of the same name.

Team-FLY

Figure 5.1. Tree structure of a file system.

5.1.1 The current working directory

Example 5.1

Exercise 5.2

Example 5.3

Exercise 5.4

Program 5.1 getcwdpathmax.c

Program 5.2 getcwdpathconf.c

5.1.2 Search paths

Example 5.5

Exercise 5.6

Program 5.1 `getcwdpathmax.c`

Program 5.2 `getcwdpathconf.c`