Section 13.10. Security

13.10. Security

We mentioned in Section 13.9 that a suite of security protocols were developed as part of IPv6. These protocols were written to be independent of a particular version of IP, so they have been integrated into IPv4 and IPv6. At the network layer, security mechanisms have been added to provide authentication so that one host can know with whom it is communicating. Encryption has been added so that data can be hidden from untrusted entities as it crosses the Internet. The protocols that collectively provide security within the network layer are referred to as IPSec.

Placing the security protocols at the network layer within the protocol stack was not an arbitrary decision. It is possible to place security at just about any layer within a communication system. For example, the secure sockets layer (SSL) supports communication security at the application layer and allows a client and a server to communicate securely over an arbitrary network. At the opposite end of the spectrum are the various protocols that support security over wireless networks that work at the data-link layer. The decision to put security at the network layer was made for several reasons:

The IP protocols act as a uniform platform into which to place the security protocols. Differences in underlying hardware, such as different types of network media, did not have to be taken into account when designing and implementing IPSec because if a piece of hardware could send and receive IP datagrams, then it could also support IPSec.
Users need not do any work to use the security protocols. Because IPSec is implemented at the network, instead of the application layer, users that run network programs are automatically working securely as long as their systems administrators have properly configured the system.
Key management can be handled in an automatic way by system daemons. The hardest problem in deploying network security protocols is giving out and canceling the keys used to encrypt the data. Since IPSec is handled in the kernel, and is not usually dealt with by users, it is possible to write daemons to handle the management of keys.

Security within the context of IPSec means several things:

The ability to trust that a host is who it claims to be (authentication)
Protection against the replay of old data
Confidentiality of data (encryption)

Providing a security architecture for the Internet protocols is a complex problem. The relevant protocols are covered in several RFCs, and an overview is given in Kent & Atkinson [1998a].

FreeBSD contains two implementations of IPSec. One is derived from the KAME code base, and the other, known as Fast IPSec, is a reworking of the KAME code base so that it can work with the OpenBSD cryptographic subsystem [Leffler, 2003a]. The largest difference between the two code bases is that the Fast IPSec code does not have any cryptographic algorithms built into it and depends wholly on the cryptographic subsystem to handle the work of encrypting, decrypting, and otherwise manipulating data. Although the KAME code was the first implementation available of the IPSec protocols, and is still the most widely used, we discuss the Fast IPSec code because it allows us to explain the hardware cryptography subsystem that has been added to FreeBSD.

IPSec Overview

The protocols that make up IPSec provide a security framework for use by hosts and routers on the Internet. Security services, such as authentication and encryption, are available between two hosts, a host and a router, or two routers. When any two entities on the network (hosts or routers) are using IPSec for secure communication, they are said to have a security association (SA) between them. Each SA is unidirectional, which means that traffic is only secured between two points in the direction in which the SA has been set up. For a completely secure link two SAs are required, one in each direction.

SAs are uniquely identified by their destination address, the security protocol being used, and a security-parameter index (SPI), which is a 32-bit value that distinguishes among multiple SAs terminating at the same host or router. The SPI is the key used to lookup relevant information in the security-association database that is maintained by each system running IPSec.

An SA can be used in two modes. In transport mode, a portion of the IP header is protected as well as the IPSec header and the data. The IP header is only partially protected because it must be inspected by intermediate routers along the path between two hosts, and it is not possible, or desirable, to require every possible router to run the IPSec protocols. One reason to run security protocols end to end is so intermediate routers do not have to be trusted with the data they are handling. Another reason is that security protocols are often computationally expensive and intermediate routers often do not have the computational power to decrypt and reencrypt every packet before it is forwarded.

Since only a part of the IP header is protected in transport mode, this type of SA only provides protection to upper-layer protocols, those that are completely encapsulated within the data section of the packet, such as UDP and TCP. Figure 13.20 shows a transport-mode SA from Host A to Host D as well as the packet that would result. Host A sets up a normal IP packet with a destination of host D. It then adds the IPSec header and data. Finally, it applies whatever security protocol has been selected by the user and sends the packet, which travels through Router B to Router C and finally to Host D. Host D decrypts the packet by looking up the security protocol and keys in its security-association database.

Figure 13.20. Security association in transport mode. Key: AH authentication header; ESP encapsulating-security payload; SPI security-parameter index.

The other mode is tunnel mode, shown in Figure 13.21 (on page 572), where the entire packet is placed within an IP-over-IP tunnel [Simpson, 1995]. In tunneling, the entire packet, including all the headers and data, are placed as data within another packet and sent between two locations. Host A wants to send a packet to Host D. When the packet reaches Router B, it is placed in a secure tunnel between Router B and Router C. The entire original packet is placed inside a new packet and secured. The outer IP header identifies only the endpoints of the tunnel (Router B and Router C) and does not give away any of the original packet's header information. When the packet reaches the end of the tunnel at Router C, it is decrypted and then sent on to its original destination of Host D. In this example, neither Host A nor Host D knows that the data have been encrypted nor do they have to be running the IPSec protocols to participate in this secure communication.

Figure 13.21. Security association in tunnel mode. Key: AH authentication header; ESP encapsulating-security payload; SPI security-parameter index.

Tunnel mode is only used for host-to-router or router-to-router communications and is most often seen in the implementation of virtual private networks that connect two private networks or connect users to a corporate LAN over the public Internet.

Security Protocols

There are two security protocols specified for use with IPSec: the authentication header (AH) and the encapsulating-security payload (ESP), each of which provides different security services [Kent & Atkinson, 1998b; Kent & Atkinson, 1998c]. Both protocols are used with IPv4 and IPv6 without changes to their headers. This dual usage is possible because the packet headers are really IPv6 extension headers that properly encode information about the other protocols following them in the packet.

The AH protocol provides a packet-based authentication service as well as protection against an attacker attempting to replay old data. To understand how AH provides security, it is easiest to look at its packet header, shown in Figure 13.22. The next-header field identifies the type of packet that follows the current header. The next-header field uses the same value as the one that appears in the protocol field of an IPv4 packet: 6 for TCP, 17 for UDP, and 1 for ICMP. The payload length specifies the number of 32-bit words that are contained in the authentication header minus 2. The fudge factor of removing 2 from this number comes from the specification for IPv6 extension headers. The SPI was just explained and is simply a 32-bit number that is used by each endpoint to lookup relevant information about the security association.

Figure 13.22. Authentication header.

Authentication is provided by computing an integrity-check value (ICV) over the packet. If an AH is used in transport mode, then only parts of the IP header are protected because some of the fields are modified by intermediate routers in transit and the changes are not predictable at the sender. In tunnel mode, the whole header is protected because it is encapsulated in another packet, and the ICV is computed over the original packet. The ICV is computed using the algorithm specified by the SPI with the result stored in the authentication-data field of the authentication header. The receiver uses the same algorithm, requested by the SPI to compute the ICV on the packet it received, and compares this value with the one found in the packet's authentication-data field. If the values are the same, then the packet is accepted; otherwise, it is discarded.

One possible attack on a communication channel is to send new or false data as if it were coming from the authentic source, which is called a replay attack. To guard against a replay attack, the AH protocol uses a sequence-number field to uniquely identify each packet that is transmitted across an SA. This sequencenumber field is distinct from the field of the same name in TCP. When an SA is established, both the sender and receiver set the sequence number to zero. The sender increments the sequence number before transmitting a packet. The receiver implements a fixed-size sliding window, with its left edge being the lowest sequence number that it has seen and validated and the right edge being the highest. When a new packet is received, its sequence number is checked against the window with three possible results:

The packet's sequence number is less than the one on the left edge of the window, and the packet is discarded.
The packet's sequence number is within the window. The packet is checked to see if it is a duplicate, and if so is discarded. If the packet is not a duplicate, it is inserted into the window.
The packet's sequence number is to the right of the current window. The ICV is verified, and, if correct, the window is moved to the right to encompass the new sequence number value.

When the sequence number rolls over, after over 4 billion packets, the security association must be torn down and restarted. This restart is only a slight inconvenience because at gigabit Ethernet rates of 83,000 packets per second it takes over 14 hours for the security sequence number to roll over.

All senders assume that a receiver is using the antireplay service and always increment the sequence number, but it is not required for the receiver to implement the antireplay service, and it may be turned off at the discretion of the operator of the receiving system.

In addition to the services provided by the AH, the ESP also provides confidentiality using encryption. As with the AH it is easiest to understand the ESP if we examine its packet header, shown in Figure 13.23. The ESP header contains all the same fields as were found in the AH header, but it adds three more. The encrypted data sent using an ESP is stored in the payload-data field of the packet. The padding field that follows the payload data may be used for three purposes:

The encryption algorithm might require that the data to be encrypted be some multiple number of bytes. The padding data is added to the data to be encrypted so that the chunk of data is of the correct size.
Padding might be required to properly align some part of the packet. For example, the pad-length and next-header fields must be right-aligned in the packet, and the authentication-data field must be aligned on a 4-byte boundary.
The padding may also be used to obscure the original size of the payload in an attempt to prevent an attacker from gaining information by watching the traffic flow.

Figure 13.23. Encapsulating security-protocol header.

Key Management

User-level applications cannot use IPSec in the same way that they use transport protocols like UDP and TCP. For example, an application cannot open a secure socket to another endpoint using IPSec. Instead, all SAs are kept in the kernel and managed using a new domain and protocol family called PF_KEY_V2 [McDonald et al., 1998].

The automated distribution of keys for use in IPSec is handled by the Internet key exchange (IKE) protocol [Harkins & Carrel, 1998]. User-level daemons that implement the IKE protocol, such as Racoon, interact with the kernel using PF_KEY_V2 sockets [Sakane, 2001]. As these daemons are not implemented in the kernel, they are beyond the scope of this book.

User-level applications interact with the security database by opening a socket of type PF_KEY. There is no corresponding AF_KEY address family. Key sockets are based on the routing-socket implementation and function much like a routing socket. Whereas the routing-socket API manipulates the kernel routing table, the key-socket API manages security associations and policies. Key sockets support a connectionless-datagram facility between user applications and the kernel. User-level applications send commands in packets to the kernel's security database. Applications can also receive messages about changes to the security database, such as the expiration of security associations, by reading from a key socket.

The messages that can be sent using a key socket are shown in Table 13.6. Two groups of messages are defined for key sockets: a base set of messages that all start with SADB and a set of extension messages that starts with SADB_X. The type of the message is the second part of the name. In FreeBSD, the extension messages manipulate a security-policy database (SPDB) that is separate from the security-association database (SADB).

Table 13.6. PF_KEY messages.
Message type	Description
SADB_GETSPI	retrieve a unique security index from the kernel
SADB_UPDATE	update an existing security association
SADB_ADD	add a new security association with a known security index
SADB_DELETE	delete an existing security association
SADB_GET	retrieve information on a security association
SADB_ACQUIRE	sent to user-level daemons when kernel needs more information
SADB_REGISTER	tell the kernel this application can supply security information
SADB_EXPIRE	sent from the kernel to the application when an SA expires
SADB_FLUSH	tell the kernel to flush all SAs of a particular type
SADB_DUMP	tell the kernel to dump all SA information to the calling socket
SADB_X_PROMISC	this application wants to see all PF_KEY messages
SADB_X_PCHANGE	message sent to passive listeners
SADB_X_SPDUPDATE	update the security policy database (SPDB)
SADB_X_SPDADD	add an entry to the SPDB
SADB_X_SPDDELETE	delete an entry from the SPDB by policy index
SADB_X_SPDGET	get an entry from the SPDB
SADB_X_SPDACQUIRE	message sent by kernel to acquire an SA and policy
SADB_X_SPDDUMP	tell the kernel to dump its policy database to the calling socket
SADB_X_SPDFLUSH	flush the policy database
SADB_X_SPDSETIDX	add an entry in the SPDB by its policy index
SADB_X_SPDEXPIRE	tell the listening socket that an SPDB entry has expired
SADB_X_SPDDELETE2	delete an SPDB entry by policy identifier
Key: SA security association; SADB security-association database; SPDB security-policy database.

Key-socket messages are made up of a base header, shown in Figure 13.24, and a set of extension headers. The base header contains information that is common to all messages. The version ensures that the application will work with the version of the key-socket module in the kernel. The command being sent is encoded in the message-type field. Errors are sent to the calling socket using the same set of headers that are used to send down commands. Applications cannot depend on all errors being returned by a send or write system call made on the socket, and they must check the error number of any returned message on the socket for proper error handling. The errno field is set to an appropriate error number before the message is sent to the listening socket. The type of security association that the application wants to manipulate is placed in the SA-type field of the packet. The length of the entire message, including the base header, all extension headers, and any padding that has been inserted is stored in the length field. Each message is uniquely identified by its sequence and PID fields that match responses to requests. When the kernel sends a message to a listening process the PID is set to 0.

Figure 13.24. PF_KEY base header.

The security-association database and security-policy database cannot be changed using only the base header. To make changes, the application adds one or more extension headers to its message. Each extension header begins with a length and a type so that the entire message can be easily traversed by the kernel or an application. An association extension is shown in Figure 13.25. The association extension makes changes to a single security association, such as specifying the authentication or encryption algorithm to be used.

Figure 13.25. PF_KEY association extension.

Whenever an association extension is used, an address extension must be present as well, since each security association is identified by the network addresses of the communicating endpoints. An address extension, shown in Figure 13.26, stores information on the IPv4 or IPv6 addresses using sockaddr structures.

Figure 13.26. PF_KEY address extension.

One problem with the current PF_KEY implementation is that it is a datagram protocol, and the message size is limited to 64 kilobytes. A 64-kilobyte limit is not important to users with small databases, but when a system using IPSec is deployed in a large enterprise, with hundreds and possibly thousands of simultaneous security associations, the SADB will grow large, and this limitation makes it more difficult to write user-level daemons to manage the kernel's security databases.

The purpose of key sockets is to manage the security-association database stored in the kernel. Like many other data structures in FreeBSD, security-association structures are really objects implemented in C. Each security-association structure contains all the data related to a specific security association as well as the set of functions necessary to operate on packets associated with it.

The security-association database is stored as a doubly linked list of security-association structures. A security-association structure is shown in Figure 13.27 (on page 578). Each security association can be shared by more than one entity in the system, which is why they contain a reference count. Security associations can be in four states: LARVAL, MATURE, DYING, and DEAD. When an SA is first being created, it is put into the LARVAL state, which indicates that it is not currently usable but is still being set up. Once an SA is usable, it moves to the MATURE state. An SA remains in the MATURE state until some event, such as the SA exceeding its lifetime, moves it to the DYING state. SAs in the DYING state can be revived if an application makes a request to use an SA with the same parameters before it is marked as DEAD.

Figure 13.27. Security-association structure.

The security-association structure contains all the information on a particular SA including the algorithms used, the SPI, and the key data. All this information is used in processing packets for a particular association. The lifetime fields limit the usage of a particular SA. Although an SA is not required to have a lifetime, and so might not expire, recommended practice is to set a lifetime. Lifetimes can be given a time limit using the addtime and usetime fields. Lifetimes can be given a data-processing limit using the bytes field. The three lifetime structures pointed to by the security association encode the current usage for the association as well as its hard and soft limits. When reached, the soft-lifetime value puts the SA into the DYING state to show that its useful life is about to end. When reached, the hard-lifetime value indicates that the SA is no longer usable at all. Once an SA passes the hard-lifetime limit, it is set to the DEAD state and can be reclaimed. The current-lifetime structure contains the present usage values for the SA for example, how many bytes have been processed since the SA was created.

Each security-association structure has several tables of functions that point to routines that do the work on packets handled by that association. The tdb_xform table contains pointers to functions that implement the initialization and input and output functions for a particular security protocol such as ESP or AH. The other three tables are specific to a protocol and contain pointers to the appropriate cryptographic functions for handling the protocol being used by the SA. The reason for having this plethora of tables is that the cryptographic subsystem ported from OpenBSD used these tables to encapsulate the functions that do the real work of cryptography. To simplify the maintenance of the code, this set of interfaces and tables was retained during the port. A useful side effect of having these tables is that it makes adding new protocols or cryptographic routines simple. We describe how these tables are used later in this section.

Key sockets are implemented in the same way as other socket types. There is a domain structure, keydomain; a protocol-switch structure, keysw; a set of user-request routines, key_usrreqs; and an output-routine, key_output(). Only those routines necessary for a connectionless-datagram type of protocol are implemented in the key_usrreqs structure. Any attempt to use a key socket in a connection-oriented way for instance, by calling connect on a key socket will result in the kernel returning EINVAL to the caller.

When an application writes to a key socket, the message is eventually transferred down into the kernel and is handled by the key_output() routine. After some rudimentary error checking the message is passed to key_parse(), which does more error checks, and then is finally shuttled off through a function-pointer switch called key_types. The functions pointed to by key_types are those that do the manipulation of the security-association and security-policy databases.

If the kernel needs to send a message to listening applications because of changes in the security databases, it uses the key_sendup_mbuf() routine to copy the message to one or more listening sockets. Each socket receives its own copy of the message.

IPSec Implementation

The IPSec protocols affect all areas of packet handling in the IPv4 and IPv6 protocol stacks. In some places, IPSec uses the existing networking framework, and in others, direct callouts are made to do some part of the security processing. We will look at three of the possible paths through the IPv4 stack: inbound, outbound, and forwarding.

One twist that IPSec adds to normal packet processing is the need to process some packets more than once. An example is the arrival of an encrypted packet bound for the current system. The packet will be processed once in its encrypted form and then a second time, by the same routines, after it has been decrypted. This multipass processing is unlike regular TCP or UDP processing where the IP header is stripped from the packet and the result is handed to the TCP or UDP modules for processing and eventual delivery to a socket. This continuation style of processing packets is one reason that the IPSec software makes extensive use of packet tags. Another reason to use packet tags is that parts of IPSec, namely the cryptographic algorithms, can be supported by special-purpose hardware accelerators. A hardware accelerator may do all or part of the security processing, such as checking a packet's authentication information or decrypting the packet payload, and then pass the resulting packet into the protocol stack for final delivery to a waiting socket. The hardware needs some way to tell the protocol stack that it has completed the necessary work. It is neither possible, nor desirable, to store this information in the headers or data of the packet. Adding such information to a packet's header is an obvious security hole because a malicious sender could simply set the appropriate field and bypass the security processing. It would have been possible to extend the mbuf structure to handle this functionality, but packet tags are a more flexible way of adding meta-data to packets without modifying a key data-structure in the network stack. The tags used by IPSec are described in Table 13.7.

Table 13.7. IPSec packet tags.
Tag	Description
IPSEC_IN_DONE	Inbound IPSec processing complete.
IPSEC_OUT_DONE	Outbound IPSec processing complete.
IPSEC_IN_CRYPTO_DONE	Inbound IPSec processing handled by hardware.
IPSEC_OUT_CRYPTO_DONE	Outbound IPSec processing handled by hardware.

As we saw in Section 13.3, when an IPv4 packet is received by the kernel it is initially processed by ip_input(). The ip_input() routine does two checks on packets that are related to IPSec. The first is to see if the packet is really part of a tunnel. If a packet is being tunneled, then if it has been processed by the IPSec software already, it can bypass any filtering by filter hooks or the kernel fire-walling code. The second check is done when a packet is to be forwarded. Routers can implement security policies on packets that are forwarded. Before a packet is passed to ip_forward(), it is checked by calling the ipsec_getpolicy() function to see if there is a policy that is associated with the packet itself. The ipsec_getpolicybyaddr() function is called to check if there is a policy associated with the address of the packet. If either function returns a pointer to a policy routine, the packet is passed to that policy routine to be checked. If the packet is rejected, it is silently dropped and no error is returned to the sender.

When ip_input() has determined that the packet is valid and is destined for the local machine, the protocol-stack framework takes over. The packet is passed to the appropriate input routine using the pr_input field of the inetsw structure. Although packets using different protocols have different entry points, they eventually wind up being passed to a single routine, ipsec_common_input(), for processing. The ipsec_common_input() routine attempts to find the appropriate security-association structure for the packet based on its destination address, the security protocol it is using, and the SPI. If an appropriate association is found, then control is passed to the input routine contained in the SA's xform-switch structure. The security protocol's input routine extracts all the relevant data from the packet for example, the key being used and creates a cryptography-operation descriptor. This descriptor is then passed into the cryptographic routines. When the cryptographic routines have completed their work, they call a protocol-specific callback routine, which modifies the mbufs associated with the packet so that it may now be passed, unencrypted, back into the protocol stack via the ip_input() routine.

Applications do not know that they are using IPSec to communicate with other hosts in the Internet. For outbound packets, the use of IPSec is really controlled from within the ip_output() routine. When an outbound packet reaches the ip_output() routine, a check is made to see if there is a security policy that applies to the packet, either because of its destination address or because of the socket that sent it. If a security policy is found, then the packet is passed into the IPSec code via the ipsec4_process_packet() routine. If a security association has not been set up for this particular destination, one is created in the security-association database for it. The ipsec4_process_packet() uses the output() routine from the xform switch in the security association to pass off the packet to the security-protocol's output routine. The security-protocol's output routine uses the appropriate cryptographic routine to modify the packet for transmission. Once the packet has been modified appropriately, it is passed again into ip_output() but with the tag PACKET_TAG_IPSEC_OUT_DONE attached to it. This tag marks the packet as having completed IPSec processing, showing that it can now be transmitted like any other packet.

Cryptography Subsystem

Underlying all the security protocols provided by IPSec is a set of APIs and libraries that support cryptography. The cryptographic subsystem in FreeBSD supports both symmetric and asymmetric cryptography. Symmetric cryptography, used by IPSec, uses the same key to encrypt data as it does to decrypt it. Asymmetric cryptography, which implements public key encryption, uses one key to encrypt data and another key to decrypt it. This section describes how symmetric cryptography is implemented as it relates to a specific client, IPSec.

The cryptographic subsystem was ported from OpenBSD and optimized for a fully preemptive, SMP kernel [Leffler, 2003b]. In FreeBSD, cryptographic algorithms exist either in software or special-purpose hardware. The software module that provides support for cryptography is implemented in exactly the same way as the drivers for cryptographic hardware. This similarity means that, from the cryptography subsystem's point of view, the software and hardware drivers are the same. Upper-level users of the cryptography subsystem, such as IPSec, are all presented with the same API whether the cryptographic operations they request are being done in hardware or software.

The cryptography subsystem is implemented by two sets of APIs and two kernel threads. One set of APIs is used by software that wishes to use cryptography; the other set is used by device-driver writers to provide an interface to their hardware. The model of computation supported by the cryptographic subsystem is one of job submission and callbacks where users submit work to be done to a queue and supply a pointer to a function that will be called when the job is completed.

Before a cryptography user can submit a work to the cryptography subsystem, they must first create a session. A session is a way of encapsulating information about the type of work that the user is requesting. It is also a way of controlling the amount of resources consumed on the device, since some devices have a limitation to the amount of concurrent work they can support. A user creates a session using the crypto_newsession() routine that returns either a valid session identifier or an error.

Once the user has a proper session identifier, they then request a cryptographic descriptor, shown in Figure 13.28. The user fills in the fields of the cryptographic descriptor, including supplying an appropriate callback in the crp_callback element. When the descriptor is ready, it is handed to the cryptographic subsystem via the crypto_dispatch() routine that puts it on a queue to be processed. When the work is complete, the callback is invoked. All callbacks are of the form:

 int (*crp_callback)(     struct cryptop *arg);

Figure 13.28. Cryptographic descriptor.

If an error has occurred, the error code is contained in the crp_etype field of the cryptographic descriptor that is passed to the callback.

A set of device drivers provides the low-level interface to specialized cryptographic hardware. Each driver provides three function pointers to the cryptographic subsystem when it registers itself. Driver registration is done via a call to the crypto_register() routine.

 crypto_register(     u_int32_t driverid,     int alg,     u_intl6_t maxoplen,     u_int32_t flags,     int (*newsession)(void*, u_int32_t*, struct cryptoini*),     int (*freesession)(void*, u_int64_t),     int (*process)(void*, struct cryptop *, int),     void *arg);

The newsession() routine is called by the cryptographic subsystem whenever the crypto_newsession () routine is called by a user. The freesession() routine is called whenever the crypto_freesession() routine is called by a user, and the process() routine is called by the crypto_proc() kernel thread to pass operations into the device.

The lower half of the cryptographic subsystem uses two software interrupt threads and two queues to control the underlying hardware. Whenever there are requests on the crp_q queue, the crypto_proc() thread dequeues them and sends them to the underlying device, using the crypto_invoke() routine. Once invoked, the underlying hardware has the responsibility to handle the request. The only requirement is that when the hardware has completed its work, the device driver associated with the hardware must invoke crypto_done() that either enqueues the callback on the crp_ret_q queue or more rarely directly calls the user's callback. The crp_ret_q queue is provided because the crypto_done() routine often will be called from an interrupt context, and running the user's callback with interrupts locked out will degrade the interactive performance of the system. When running in an interrupt context, the callback will be queued and then handled later by the crypto_ret_proc software interrupt thread. This use of queues and software interrupt threads effectively decouples the kernel from any possible performance issues introduced by a variety of cryptographic hardware.

Unfortunately, there are several problems with the system just described:

Using multiple threads requires two context switches per cryptographic operation. The context switches are nontrivial and severely degrade throughput.
Some callback routines do little work, and so moving all callbacks out of the device driver's interrupt service routine adds another context switch that is expensive and unnecessary.
The dispatch queue batches operations, but many users of the cryptographic subsystem, including IPSec, do not batch operations, so this shunting of work into the dispatch queue is unnecessary overhead.

To address these performance problems several changes were made to the cryptographic subsystem. Cryptographic drivers are now supplied a hint whether there is more work to follow when work is submitted to them. The drivers can decided whether to batch work based on this hint and, where requests are not batched, completely bypass the crp_q queue. Cryptographic requests whose callback routines are short mark their requests so that the underlying device executes them directly instead of queueing them on the crypto_ret_q queue. The optimization of bypassing the crypto_req_q queue is especially useful to users of the /dev/crypto device, whose callback routine awakens only the thread that wrote to it. All these optimizations are described more fully in Leffler [2003b].