The Lock Arbitrator | Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software

We can divide cooperative lock arbitration into three categories based on who or what is acting as the lock arbitrator:

Kernel lock arbitration

The program asks the kernel for a lock on a particular file or a portion of a file. This is perhaps the most common method used by programmers when developing applications such as an order-entry system in a typical multiuser environment such as Linux or Unix.

File lock arbitration

The program creates a new file called a lock file or a dotlock file on stable storage to indicate that it would like exclusive access to a data file. Programs that access the same data must look for this file or examine its contents.^[2] This method is typically used in order to avoid subtle differences in kernel lock arbitration implementations (in different versions of Unix for example) when programmers want to develop software that will run on a variety of different operating systems.

External lock arbitration daemon

The program asks a daemon to track lock and unlock requests. This type of locking is normally used for shared storage. Examples of this type of lock arbitration include sophisticated database applications, distributed lock managers (that can run on more than one node at the same time and share lock information), and the Network Lock Manager (NLM) used in NFSv3, which will be discussed shortly.

Note

Additional lock arbitration methods for cluster file systems are provided in Appendix E.

Our cluster environment should use a cooperative lock arbitration method that works in conjunction with shared storage while still honoring the classic Unix or Linux kernel lock arbitration requests (so we do not have to rewrite all of the user applications that will share data in the cluster). Fortunately, Linux supports a shared storage external lock arbitration daemon called lockd. But before we get into the details of NFS and lockd, let's more closely examine the kernel lock arbitration methods used by existing multiuser applications.

The Existing Kernel Lock Arbitration Methods

The three most common cooperative kernel lock arbitration methods available on Linux are BSD flock, System V lockf, and Posix fcntl.

BSD Flock

This is considered an antiquated method of locking files because it only supports locking an entire file, not a range of bytes (called a byte offset) within a file. There are two types of flocks: shared and exclusive. As their names imply, many processes may hold a shared lock on one file, but only one process may hold an exclusive lock. (A file cannot be locked with both shared and exclusive locks at the same time.) Because this method only supports locking the entire file, your existing multiuser applications are probably not using this locking mechanism to arbitrate access to shared user data. We'll discuss this further in "The Network Lock Manager (NLM)."

System V lockf

The System V locking method is called lockf. On a Linux system, lockf is really just an interface to the Posix fcntl method discussed next.

Posix fcntl

The Posix-compliant fcntl system call used on Linux does a variety of things, but for the moment we are only interested in the fact that it allows a program to lock a byte-range portion of a file.^[3] Posix fcnlt locking supports the same two types of locks as the BSD flock method but uses the terms read and write instead of shared and exclusive. Multiple read locks are allowed at the same time, but when a program asks the kernel for a write lock, no other programs may hold either type of lock (read or write) for the same range of bytes within the file.

When using Posix fcnlt, the programmer decides what will happen when a lock is denied or blocked by the kernel by deciding if the program should wait for the call to fcntl to succeed (wait for the lock to be granted). If the choice is to wait, the fcntl call does not return and the kernel places the request^[4] into a blocked queue. When the program that was holding the lock releases it and the reason for the blocked lock no longer exists, the kernel will reply to the fcntl request, waking up the program from a "sleep" state with the good news that its request for a lock has been granted.

For example, three processes may all hold a Posix fcntl read lock on the byte range 1,000–1,050 of the same file at the same time. But when a fourth process comes along and wants to acquire a Posix fcntl write lock to the same range of bytes, it will be placed into the blocked queue. As long as any one of the original three programs continues to hold its read lock, the fourth program's request for a write lock will remain in the blocked queue. However, if a fifth program asks for a Posix fcntl read lock, it will be granted it immediately. Now the fourth program is still waiting for its write lock and it may have to wait forever if new processes keep coming along and acquiring new read locks. This complexity makes it more difficult to write a program that uses both Posix fcntl read and Posix fcntl write locks.

Also of note, when considering Posix fcntl locks:

If a process holds a Posix fcntl lock and forks a child process, the child process (running under a new process id or pid) will not inherit the lock from its parent.
A program that holds a lock may ask the kernel for this same lock a second time and the kernel will "grant" it again without considering this a conflict. (More about this in a moment when we discuss the Network Lock Manager and lockd.)
The Linux kernel currently does not check for collisions with Posix fcntl byte-range locks and BSD file flocks. The two locking methods do not know about each other.

Note

There are at least three additional kernel lock arbitration methods available under Linux: whole file leases, share modes (similar to Windows share modes), and mandatory locks. If your application relies on one of these methods for lock arbitration, you must use NFS version 4.

Now that we've talked about the existing kernel lock arbitration methods that work on a single, monolithic server, let's examine a locking method that allows more than one server to share locking information: the Network Lock Manager.

^[2]The PID of the process that created the dotlock file is usually placed into the file.

^[3]If the byte range covers the entire file, then this is equivalent to a whole file lock.

^[4]Identified by the process id of the calling program as well as the file and byte range to be locked.