29.5 NFS Protocol

NFS provides transparent file access for clients to files and filesystems on a server. This differs from FTP (Chapter 27), which provides file transfer. With FTP a complete copy of the file is made. NFS accesses only the portions of a file that a process references, and a goal of NFS is to make this access transparent. This means that any client application that works with a local file should work with an NFS file, without any program changes whatsoever.

NFS is a client-server application built using Sun RPC. NFS clients access files on an NFS server by sending RPC requests to the server. While this could be done using normal user processes ” that is, the NFS client could be a user process that makes explicit RPC calls to the server, and the server could also be a user process ” NFS is normally not implemented this way for two reasons. First, accessing an NFS file must be transparent to the client. Therefore the NFS client calls are performed by the client operating system, on behalf of client user processes. Second, NFS servers are implemented within the operating system on the server for efficiency. If the NFS server were a user process, every client request and server reply (including the data being read or written) would have to cross the boundary between the kernel and the user process, which is expensive.

In this section we look at version 2 of NFS, as documented in RFC 1094 [Sun Microsystems 1988b]. A better description of Sun RPC, XDR, and NFS is given in [X/ Open 1991]. Details on using and administering NFS are in [Stern 1991]. The specifications for version 3 of the NFS protocol were released in 1993, which we cover in Section 29.7.

Figure 29.3 shows the typical arrangement of an NFS client and an NFS server. There are many subtle points in this figure.

Figure 29.3. Typical arrangement of NFS client and NFS server.

It is transparent to the client whether it's accessing a local file or an NFS file. The kernel determines this when the file is opened. After the file is opened, the kernel passes all references to local files to the box labeled "local file access," and all references to an NFS file are passed to the "NFS client" box.
The NFS client sends RPC requests to the NFS server through its TCP/IP module. NFS is used predominantly with UDP, but newer implementations can also use TCP.
The NFS server receives client requests as UDP datagrams on port 2049. Although the NFS server could use an ephemeral port that it then registers with the port mapper, UDP port 2049 is hardcoded into most implementations.
When the NFS server receives a client request, the requests are passed to its local file access routines, which access a local disk on the server.
It can take the NFS server a while to handle a client's request. The local file-system is normally accessed, which can take some time. During this time, the server does not want to block other client requests from being serviced. To handle this, most NFS servers are multithreaded ” that is, there are really multiple NFS servers running inside the server kernel. How this is handled depends on the operating system. Since most Unix kernels are not multithreaded, a common technique is to start multiple instances of a user process (often called nfsd ) that performs a single system call and remains inside the kernel as a kernel process.
Similarly, it can take the NFS client a while to handle a request from a user process on the client host. An RPC is issued to the server host, and the reply is waited for. To provide more concurrency to the user processes on the client host that are using NFS, there are normally multiple NFS clients running inside the client kernel. Again, the implementation depends on the operating system. Unix systems often use a technique similar to the NFS server technique: a user process named biod that performs a single system call and remains inside the kernel as a kernel process.

Most Unix hosts can operate as either an NFS client, an NFS server, or both. Most PC implementations (MS-DOS) only provide NFS client implementations. Most IBM mainframe implementations only provide NFS server functions.

NFS really consists of more than just the NFS protocol. Figure 29.4 shows the various RPC programs normally used with NFS.

Figure 29.4. Various RPC programs used with NFS.

The versions we show in this figure are the ones found on systems such as SunOS 4.1.3. Newer implementations are providing newer versions of some of the programs. Solaris 2.2, for example, also supports versions 3 and 4 of the port mapper, and version 2 of the mount daemon. SVR4 also supports version 3 of the port mapper.

The mount daemon is called by the NFS client host before the client can access a filesystem on the server. We discuss this below.

The lock manager and status monitor allow clients to lock portions of files that reside on an NFS server. These two programs are independent of the NFS protocol because locking requires state on both the client and server, and NFS itself is stateless on the server. (We say more about NFS's statelessness later.) Chapters 9, 10, and 11 of [X/Open 1991] document the procedures used by the lock manager and status monitor for file locking with NFS.

File Handles

A fundamental concept in NFS is the file handle. It is an opaque object used to reference a file or directory on the server. The term opaque denotes that the server creates the file handle, passes it back to the client, and then the client uses the file handle when accessing the file. The client never looks at the contents of the file handle ” its contents only make sense to the server.

Each time a client process opens a file that is really a file on an NFS server, the NFS client obtains a file handle for that file from the NFS server. Each time the NFS client reads or writes that file for the user process, the file handle is sent back to the server to identify the file being accessed.

Normal user processes never deal with file handles ” it is the NFS client code and the NFS server code that pass them back and forth. In version 2 of NFS a file handle occupies 32 bytes, although with version 3 this changes from a fixed-length field to a variable-length field of up to 68 bytes.

Unix servers normally store the following information in the file handle: the filesystem identifier (the major and minor device numbers of the filesystem), the i-node number (a unique number within a filesystem), and an i-node generation number (a number that changes each time an i-node is reused for a different file).

Mount Protocol

The client must use the NFS mount protocol to mount a server's filesystem, before the client can access files on that filesystem. This is normally done when the client is bootstrapped. The end result is for the client to obtain a file handle for the server's file-system.

Figure 29.5 shows the sequence of steps that takes place when a Unix client issues the mount (8) command, specifying an NFS mount.

Figure 29.5. Mount protocol used by Unix `mount` command.

The following steps take place.

The port mapper is started on the server, normally when the server bootstraps.
The mount daemon ( mountd ) is started on the server, after the port mapper. It creates a TCP end point and a UDP end point, and assigns ephemeral port number to each. It then registers these port numbers with the port mapper.
The mount command is executed on the client and it issues an RPC call to the port mapper on the server to obtain the port number of the server's mount daemon. Either TCP or UDP can be used for this client exchange with the port mapper, but UDP is normally used.
The port mapper replies with the port number.
The mount command issues an RPC call to the mount daemon to mount a file-system on the server. Again, either TCP or UDP can be used, but UDP is typical. The server can now validate the client, using the client's IP address and port number, to see if the server lets this client mount the specified filesystem.
The mount daemon replies with the file handle for the given filesystem.
The mount command issues the mount system call on the client to associate the file handle returned in step 5 with a local mount point on the client. This file handle is stored in the NFS client code, and from this point on any references by user processes to files on that server's filesystem will use that file handle as the starting point.

This implementation technique puts all the mount processing, other than the mount system call on the client, in user processes, instead of the kernel. The three programs we show ” the mount command, the port mapper, and the mount daemon ” are all user processes.

As an example, on our host sun (the NFS client) we execute

 sun #  mount -t nfs bsdi:/usr /nfs/bsdi/usr

This mounts the directory /usr on the host bsdi (the NFS server) as the local filesystem /nfs/bsdi/usr. Figure 29.6 shows the result.

Figure 29.6. Mounting the `bsdi: /usr` directory as `/nfs/bsdi/usr` on the host `sun.`

When we reference the file /nfs/bsdi/usr/rstevens/hello.c on the client sun we are really referencing the file /usr/rstevens/hello.c on the server bsdi.

NFS Procedures

The NFS server provides 15 procedures, which we now describe. (The numbers we use are not the same as the NFS procedure numbers, since we have grouped them according to functionality.) Although NFS was designed to work between different operating systems, and not just Unix systems, some of the procedures provide Unix functionality that might not be supported by other operating systems (e.g., hard links, symbolic links, group owner, execute permission, etc.). Chapter 4 of [Stevens 1992] contains additional information on the properties of Unix filesystems, some of which are assumed by NFS.

GETATTR. Return the attributes of a file: type of file (regular file, directory, etc.), permissions, size of file, owner of file, last-access time, and so on.
SETATTR. Set the attributes of a file. Only a subset of the attributes can be set: permissions, owner, group owner, size, last-access time, and last-modification time.
STATFS. Return the status of a filesystem: amount of available space, optimal size for transfer, and so on. Used by the Unix df command, for example.
LOOKUP. Lookup a file. This is the procedure called by the client each time a user process opens a file that's on an NFS server. A file handle is returned, along with the attributes of the file.
READ. Read from a file. The client specifies the file handle, starting byte offset, and maximum number of bytes to read (up to 8192).
WRITE. Write to a file. The client specifies the file handle, starting byte offset, number of bytes to write, and the data to write.

NFS writes are required to be synchronous. The server cannot respond OK until it has successfully written the data (and any other file information that gets updated) to disk.
CREATE. Create a file.
REMOVE. Delete a file.
RENAME. Rename a file.
LINK. Make a hard link to a file. A hard link is a Unix concept whereby a given file on disk can have any number of directory entries (i.e., names , also called hard links) that point to the file.
SYMLINK. Create a symbolic link to a file. A symbolic link is a file that contains the name of another file. Most operations that reference the symbolic link (e.g., open) really reference the file pointed to by the symbolic link.
READLINK. Read a symbolic link, that is, return the name of the file to which the symbolic link points.
MKDIR. Create a directory.
RMDIR. Delete a directory.
READDIR. Read a directory. Used by the Unix ls command, for example.

These procedure names actually begin with the prefix NFSPROC_, which we've dropped.

UDP or TCP?

NFS was originally written to use UDP, and that's what all vendors provide. Newer implementations, however, also support TCP. TCP support is provided for use on wide area networks, which are getting faster over time. NFS is no longer restricted to local area use.

The network dynamics can change drastically when going from a LAN to a WAN. The round-trip times can vary widely and congestion is more frequent. These characteristics of WANs led to the algorithms we examined with TCP ” slow start and congestion avoidance . Since UDP does not provide anything like these algorithms, either the same algorithms must be put into the NFS client and server or TCP should be used.

NFS Over TCP

The Berkeley Net/2 implementation of NFS supports either UDP or TCP. [Macklem 1991] describes this implementation. Let's look at the differences when TCP is used.

When the server bootstraps, it starts an NFS server that does a passive open on TCP port 2049, waiting for client connection requests. This is usually in addition to the normal NFS UDP server that waits for incoming datagrams to UDP port 2049.
When the client mounts the server's filesystem using TCP, it does an active open to TCP port 2049 on the server. This results in a TCP connection between the client and server for this filesystem. If the same client mounts another filesystem on the same server, another TCP connection is created.
Both the client and server set TCP's keepalive option on their ends of the connection (Chapter 23). This lets either end detect if the other end crashes, or crashes and reboots.
All applications on the client that use this server's filesystem share the single TCP connection for this filesystem. For example, in Figure 29.6 if there were another directory named smith beneath /usr on bsdi, references to files in /nfs/bsdi/usr/rstevens and /nfs/bsdi/usr/smith would share the same TCP connection.
If the client detects that the server has crashed, or crashed and rebooted (by receiving a TCP error of either "connection timed out" or "connection reset by peer"), it tries to reconnect to the server. The client does another active open to reestablish the TCP connection with the server for this filesystem. Any client requests that timed out on the previous connection are reissued on the new connection.
If the client crashes, so do the applications that are running when it crashes. When the client reboots, it will probably remount the server's filesystem using TCP, resulting in another TCP connection to the server. The previous connection between this client and server for this filesystem is half-open (the server thinks it's still open), but since the server set the keepalive option, this half-open connection will be terminated when the next keepalive probe is sent by the server's TCP.

Over time, additional vendors plan to support NFS over TCP.

29.5 NFS Protocol

29.5 NFS Protocol

Figure 29.3. Typical arrangement of NFS client and NFS server.

Figure 29.4. Various RPC programs used with NFS.

File Handles

Mount Protocol

Figure 29.5. Mount protocol used by Unix mount command.

Figure 29.6. Mounting the bsdi: /usr directory as /nfs/bsdi/usr on the host sun.

NFS Procedures

UDP or TCP?

NFS Over TCP

Figure 29.5. Mount protocol used by Unix `mount` command.

Figure 29.6. Mounting the `bsdi: /usr` directory as `/nfs/bsdi/usr` on the host `sun.`