NLM and Kernel Lock Arbitration | Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software

Let's look at how NLM works in conjunction with the kernel lock arbitration methods on the NFS clients (the cluster nodes). Remember that we want to support existing multiuser applications that use one or more of the existing kernel lock arbitration methods so that we do not have to rewrite user applications to run them on the cluster.

NLM and Kernel BSD Flock

The Linux kernel currently does not pass BSD flock requests for whole file locks to the NLM. As such, this method of file locking will not work on the Linux Enterprise Cluster when access to shared data is required across all cluster nodes.

Because BSD flocks can only lock whole files, your existing multiuser applications aren't likely to use them to share user data. BSD flocks are more commonly used by applications that fork child processes in order to prevent the child processes from doing things that would cause conflicts with each other. For example, the LPRng printing system creates child processes for sending print jobs, and the child processes create temporary control files in the /var/spool/lpd directory to ensure that print jobs are sent in the correct order.

In fact, most daemons that use BSD flocks create temporary files in a subdirectory underneath the /var directory, so when you build a Linux Enterprise Cluster you should not use a /var directory over NFS. By using a local /var directory on each cluster node, you can continue to run existing applications that use BSD flocks for temporary files.^[8]

Note

In the Linux Enterprise Cluster, NFS is used to share user data, not operating system files.

NLM and Kernel System V lockf

The System V lockf method is a wrapper around the Posix fcntl method of lock arbitration. See the next section for details.

NLM and Kernel Posix fcntl

When an application running on an NFS client issues an fcntl lock operation on data that is mounted over an NFS filesystem, several things happen:

Any file data or file attribute information stored in the local NFS client computer's cache is flushed.
The NFS server is contacted^[9] to check the attributes of the file again, in part to make sure that the permissions and access mode settings on the file still allow the process to gain access to the file.
lockd on the NFS server is contacted to determine if the lock will be granted.
If this is the first lock made by the particular NFS client, the statd daemon on both the NFS client and the NFS server records the fact that this client has made a lock request.

Posix fcntl locks that use the NLM (for files stored using NFS) are very slow when compared to the ones granted by the kernel. Also notice that due to the way the NFS client cache works, the only way to be sure that the data inside your application program's memory is the same as the data stored on the NFS server is to lock the data before reading it. If your application reads the data without locking it, another program can modify the data stored on the NFS server and render your copy (which is stored in your program's memory) inaccurate.

We'll discuss the NLM performance issues more shortly.

^[8]When you use the LPRng printing system, printer arbitration is done on a single print spool server (called the cluster node manager in a Linux Enterprise Cluster), so lock arbitration between child LPRng printing processes running on the cluster nodes is not needed. We'll discuss LPRng printing in more detail in Chapter 19.

^[9]Using the GETATTR call.