Section 3.3. Common Internet File System and Server Message Blocks | Inside Windows Storage: Server Storage Technologies for Windows 2000, Windows Server 2003 and Beyond

3.3 Common Internet File System and Server Message Blocks

The Common Internet File System (CIFS) has its roots in the Server Message Block technology first introduced in the days of MS-DOS 3.3. Server Message Block is popularly referred to as SMB . SMB defines a protocol for a client to send file system “oriented requests ( open file, read, write, lock, and close) to a file server.

Before we dive into the technical details of CIFS and SMB, a small digression explaining the political difference between the two is in order. Originally there was only SMB technology, used as a client/server file system protocol in the PC world. In the mid-1990s Microsoft rechristened its implementation of SMB as CIFS and positioned CIFS as a competitor to both WebNFS and NFS. Microsoft provided an informational RFC (Request for Comments) to the Internet Engineering Task Force (IETF) ^[2] and subsequently let it expire without ever attempting to move the RFC onto any kind of IETF specifications track.

^[2] Yours truly was a coauthor of the RFC.

Independently of Microsoft's work, network-attached storage vendors started a movement to create a CIFS specification and organized CIFS trade shows and plug fests. The Storage Networking Industry Association (SNIA) took on the task of publishing a CIFS specification. Microsoft also made available a CIFS specification (called "Common Internet File System File Access Protocol") on a royalty-free basis (see the reference list at the end of the book).

The SNIA CIFS specification and the Microsoft CIFS specification are fairly similar, and both cover the protocol used by Windows NT 4.0 clients to interact with Windows NT servers. Neither specification covers new SMBs used by newer versions of Windows (such as Windows 2000 client-side caching, described in Section 3.2.6), and neither covers all the server-to-server communications protocols. The newer SMBs, not included in the specification that is available without any royalty, are covered in an SMB specification that Microsoft has voluntarily made available on a royalty basis as part of its legal proceedings with the EU and the United States government. See the reference titled "Microsoft Settlement Program: Communications Protocol Program" on the Microsoft Web site (http://www.microsoft.com/legal/protocols) for further details.

To summarize, Microsoft now appears once again to be referring to its own implementation as SMB , a proprietary protocol that is a superset of the industry standard CIFS.

It is also worth noting the historical association between SMB/CIFS and NetBIOS. NetBIOS is a session-layer network application programming interface that is now relegated to historical significance. The API provides an abstraction, allowing an application to run with a variety of network protocols, such as TCP/IP, NetWare, or the now historical transport protocol XNS (Xerox Network System). The need for an API that provides the ability to write network-aware applications in a transport-independent manner still exists. However, this need is now largely filled by the sockets interface in general and, in the Windows world, the Winsock interface in particular.

Microsoft originally used NetBIOS for name resolution (resolving a server name to a network address), but now it uses the industry standard Domain Name Service (DNS) for this purpose.

The original Microsoft implementation did not use TCP/IP as the transport protocol beneath the NetBIOS layer. Microsoft next moved to using TCP/IP as the transport protocol, but it also continued to use NetBIOS, though the dependence on NetBIOS was reduced. With the assignment of a TCP/IP port for SMB file servers, the dependence on NetBIOS was finally eliminated, at least as far as the core protocol was concerned . However, the situation was confusing because some secondary services used by Windows clients and servers remained NetBIOS based. A good example is how servers announce their presence and the services they offer and how other servers amalgamate these announcements for the benefit of clients. Over a period of time, these services were redesigned, and the dependence on NetBIOS was finally eliminated altogether with Windows 2000.

Finally, the SMB roots can be seen in the fact that every CIFS request and response must begin with "0xFF", followed by the ASCII characters "SMB."

3.3.1 CIFS Flavors

There really is no concrete definition of a CIFS standard. Various flavors of SMB protocols are referred to as dialects . Here are just a few:

A flavor used by DOS and Windows 3.X clients
A flavor used to connect to non-Windows servers
A flavor used by Windows NT clients

In general, a client sends out a negotiate request to the server and lists all the flavors it implements. The server picks the implementation with the highest functionality it can support and sends an appropriate response. Depending on the flavor negotiated, certain requests and their corresponding responses may or may not be legal. To make things even more confusing, the flavor negotiated does not completely define the implementation. Certain bits can be turned on or off in the response to indicate whether or not a particular functionality is supported. In other words, even with a certain protocol negotiated, variations exist. For example, one bit indicates whether or not long file names are supported.

As defined by Microsoft in the informational RFC ^[3] (which, according to the IETF's rules, technically is now out-of-date), the CIFS protocol defines client and server interaction for file access and manipulation. Other functionality, such as printing and server announcements, is outside the purview of CIFS.

^[3] Yours truly was a coauthor of the RFC.

SNIA is working on providing a CIFS specification. SNIA also holds an annual CIFS conference and hosts several interoperability events that include CIFS interoperability.

SMB has been an Open Group standard since 1992 (X/Open CAE Specification C209). The Open Group defined SMB as a specification for the purpose of proving interoperability among DOS, Windows, OS/2, and UNIX.

3.3.2 CIFS Protocol Description

CIFS requests and responses have a basic structure that is well defined. The fields within the SMB themselves are well defined, with some variations depending on the CIFS dialect negotiated and the capabilities implemented by the client and the server.

Table 3.1 shows the overall structure of an SMB. Note that only the parts that are present in all SMBs are shown. The details of each individual SMB are outside the scope of the discussion.

Some of the fields in Table 3.1 bear more explanation than what is provided in the "Description" column of the table.

The command field is 1 byte long and indicates the nature of the request, and the server faithfully copies this value into the response, allowing the client to analyze the response. The CIFS specification lists the values and definitions for this 1-byte field. The commands specified allow for operations such as opening a file, reading a file, writing a file, and locking a byte range in a file. All of these operations are initiated in response to an application request.

In addition, CIFS client requests (and their corresponding server responses) are initiated by the redirector code without any explicit participation by the application. Examples are caching and opportunistic locking, which is explained in Section 3.3.5. The CIFS RFC and SNIA specification, as well as the Open Group specification, defines the values and semantics for the 1-byte CIFS command code.

Table 3.1. SMB Header Structure

Field	Size	Description
0xFFSMB	4 bytes	Always set to value 0xFFSMB.
Command	1 byte	Indicates the nature of the request.
Status	4 bytes	32-bit error code (preferred; generated by Windows NT servers and returned as a 32-bit error code to clients that understand 32-bit Windows NT “style error codes) OR For the benefit of older clients that do not understand 32-bit error codes, the error is mapped to an old-style error structure consisting of An 8-bit error class indicating success or the nature of the error ”that is, whether the error is an error reported by the server operating system, an error reported by the server, a hardware error, or an SMB protocol error 8 bits ignored 16-bit error code that is meaningful only if the error class indicates an error of some kind)
Flags	1 byte	Semantics explained in Table 3.2.
Flags2	2 bytes	Semantics explained in Table 3.3.
Pad/Signature	12 bytes	Pad/Signature; described in the text.
Tid	2 bytes	Used to identify the share/server resource that the request is for; established via the TreeConnect SMB request.
Pid or Process Id	2 bytes but can optionally be 4 bytes	Set by the client; indicates the client process that is making the request; used by the server to track file open mode and locks; echoed back by the server and, together with Mid, uniquely identifies which one of multiple outstanding requests the server response is for.
Mid or Multiplexer Id	2 bytes	Set and used by the client; server faithfully echoes Mid in response; the client uses Mid and Pid to identify which one of multiple pending requests the response is for.
Uid	2 bytes	Assigned by the server once the client has been authenticated; the client needs to use this in all requests.
Parameters	variable	Consists of a 16-bit word count that indicates the number of 16-bit words that follow. For each SMB command, this count is usually a fixed entity with one word count for the command and a second word count for the response. This word count is usually a small number, less than 5 words or so.
Data	variable	Consists of a 16-bit count indicating the number of bytes (8-bit bytes) of data that follows . Compared to the Parameters field, the data field can be a much larger amount ”for example, in the kilobyte range or even more. For example, for a read or write SMB, this data is the actual file data that is being read and written.

Keep in mind that new values and semantics for this byte field may appear without notice as the protocol evolves with future versions of Windows.

There are multiple commands to achieve the same basic operation; for example, multiple commands exist for opening a file as well as reading and writing a file. Some are no longer used, in other cases, different commands may be used, depending on the protocol dialect negotiated.

Examples of the range of functionality specified by this command field include

Negotiating an SMB dialect.
Establishing a session.
Traversing a directory and enumerating a file or directory.
Opening, creating, closing, or deleting a file.
Byte range locking and unlocking.
Multiple flavors of read and write operations.
Printing operations.
File and directory change notifications.
Transaction operations in which data, parameters, and a transaction operation are specified. The CIFS server performs the requested operation and returns the result, data, and parameters. Examples of transaction operations include, but are not limited to, a distributed file system (Dfs) referral and manipulation of extended attributes.

Table 3.2 describes the functionality of the Flags field listed in Table 3.1.

The Flags2 field in Table 3.1 indicates even more optional functionality. This functionality is summarized in Table 3.3.

The Pad/Signature field originally started out being just pad bytes. Over the years , this field has evolved. The Pad field can consist of the following:

2 bytes of a Process Id in order to allow a 32-bit Process Id
8 bytes used for storing the signature when SMB signing (see the Flags2 description in Table 3.3 and Section 3.3.3) is turned on
2 unused bytes

3.3.3 CIFS Security

CIFS enforces security at the server. An administrator can disable this security if so desired, but this is hardly ever done and the default option enforces security.

Older CIFS dialects allowed a plain-text password to be sent by a client, but that is now strongly discouraged. CIFS allows for a resource at the server to be protected by a user -specific password (called user-level security ). This is the preferred method of security. For backward compatibility, CIFS servers also allow for a resource to be protected by a single password that is the same, irrespective of the user. Because the resource in question is deemed to be offered as a "share" by the server, this is referred to as share-level security . Share-level security is highly discouraged and in fact has been removed from Windows 2000 Server. The first SMB that a client sends to a server is always the SMB_NEGOTIATE_PROTOCOL SMB, which is used to negotiate the CIFS dialect to be used. The response to the SMB_NEGOTIATE_PROTOCOL SMB indicates whether the server has been configured in user-level or share-level security.

Table 3.2. Flags Field Semantics

Value	Description
0x01	Reserved; used by obsolete requests
0x02	Reserved; must be zero
0x04	Indicates that pathnames and file names should be treated as case sensitive
0x08	Reserved
0x10	Reserved; used by obsolete requests
0x20	Reserved, used by obsolete requests
0x40	Reserved; used by obsolete requests
0x80	Indicates that this is an SMB response

Table 3.3. Flags2 Field Semantics

Value	Description
0x0001	Client understands long file names; server may return long file names
0x0002	Client understands OS/2 extended attributes
0x0004	SMB signing is turned on
0x0008	Reserved
0x0010	Reserved
0x0020	Reserved
0x0040	Any path name in the request is a long name
0x0080	Reserved
0x0100	Reserved
0x0200	Reserved
0x0400	Reserved
0x0800	Indicates extended security, which is described in Section 3.3.3
0x1000	Pathnames in the request should be resolved with Dfs
0x2000	Paging I/O indicating that reads should be allowed if client has execute access
0x4000	Indicates 32-bit error code being returned; if clear, indicates old DOS-style error
0x8000	If set, indicates that pathnames in SMB are Unicode; if clear, pathnames are ASCII

Starting with Windows 2000 and Windows NT4 SP3, Microsoft offered the capability to place a digital signature on SMBs exchanged between a client and a server. A server may be configured to require that the client implement this; otherwise the client will be denied access. The signing does place an overhead on both the client and the server, but that is the price one must pay for security. Note that this signing and verification is two-way; that is, the client signs the SMB requests it sends, the server verifies this signature, the server signs each SMB response it sends, and the client verifies this signature. Referring back to Table 3.1, the Pad/Signature field of the SMB header is where the signature is carried in each SMB.

Again, the SMB_NEGOTIATE_PROTOCOL SMB response is used to convey information (by the server to the client) about

Whether or not the server can support SMB signing
Whether or not the server requires SMB signing

3.3.4 CIFS Authentication

The CIFS protocol provides for security negotiation between the client and the server. A server may be configured to reject a client offer to negotiate a level of security deemed unacceptably low.

CIFS offers some authentication mechanisms that a server can use to authenticate a client. CIFS optionally also provides the means for a client to authenticate a server. At the simplest level of authentication, a client can supply a user identity and password in plain text. For obvious reasons, this approach is highly discouraged. Indeed, a server can be configured to prohibit access to clients that send cleartext passwords.

The authentication can be accomplished via a mechanism called Challenge/Response protocol . When a client sends an SMB_NEGOTIATE_PROTOCOL SMB to negotiate the CIFS dialect to be used, a flag bit value in the server response indicates whether or not the server supports the Challenge/Response protocol. If the server does support the Challenge/Response protocol, the 8-byte challenge is included in the response to the SMB_NEGOTIATE_PROTOCOL SMB. The challenge is simply a random value with a very low probability of being repeated. Both the client and the server derive a key from the user password. They both then encrypt the challenge, using the key and a DES (Data Encryption Standard) algorithm. The client sends its response to the server, which compares the response with its own computed value. If the two match, the client has demonstrated knowledge of the password and has thus authenticated itself.

CIFS also supports an authentication mechanism called Extended Security . (No prizes for guessing that the capability to perform Extended Security is indicated by the server in its response to the SMB_NEGOTIATE_PROTOCOL SMB.) Extended Security provides a means of supporting arbitrary authentication protocols within CIFS. When Extended Security is negotiated, the first security blob is in the SMB_NEGOTIATE_PROTOCOL response. Security blobs are opaque to the CIFS protocol. CIFS relies on a mechanism at the client and the server to generate and interpret these blobs. Subsequent blobs may be exchanged with further SMB traffic.

Using Extended Security, Microsoft has introduced support for Kerberos in Windows 2000 and later products. The Windows 2000 Kerberos implementation introduces some Microsoft proprietary wrinkles . In particular, Microsoft uses some fields in Kerberos tickets to pass security information about the groups to which a client belongs. The Microsoft Kerberos implementation allows for mutual authentication, meaning that not only can the server authenticate the client, but the client can also authenticate the server.

Microsoft defines another way, called Netlogon , for a client to negotiate a session with a server using machine credentials (as opposed to user credentials). Netlogon is used to establish a secure RPC (remote procedure call) session, and the protocol is a little richer in that it supplies user access tokens that the CIFS-defined logon does not. Netlogon is typically used for server-to-server communication (one server acts as a client to an other). The expired Microsoft CIFS RFC does not document Netlogon.

Finally, a CIFS server need not implement the authentication mechanisms itself. CIFS also allows for pass-through authentication, whereby the server can simply request a challenge from another server, pass the challenge to the client, and pass the client challenge response to the authentication server. In addition, if the authentication server responds positively, it will allow the client the desired access. This is known as pass-through authentication .

3.3.5 CIFS Optimization Features

CIFS defines several features that provide for efficient optimized communication. Sections 3.3.5.1 and 3.3.5.2 describe these features.

3.3.5.1 The CIFS AndX SMB

CIFS defines a way to chain requests in a dependent manner, allowing for optimizations in that two operations are achieved with a single round-trip between client and server. This feature is called AndX ; NFS version 4 introduces a similar feature in the form of the COMPOUND procedure. An example is a CIFS client sending an OpenAndRead request to a server or a client sending a WriteAndClose request to a CIFS server. The idea is that instead of sending two requests ”for example, Open followed by Read ”and receiving two responses, a single OpenAndRead request is sent and a single response is received. This is particularly useful when a high round-trip latency time is involved.

3.3.5.2 Opportunistic Locking

CIFS supports a performance enhancement feature called opportunistic locking . Opportunistic locks are also referred to as oplocks , which distinguishes them from regular locks. There are two basic goals behind opportunistic locking.

The first goal is to lock a file when possible and start caching the file locally at the client. When the lock conditions can no longer be maintained , the protocol allows a "grace period" within which the client needs to flush its cache. The locking and unlocking happens transparently to an application; all the work is done within the CIFS implementation at the client and the server. The application need not be modified to take advantage of the resulting enhanced performance.

Consider an application that opens a file on a network server for read/write access and writes 128-byte records to the file. Without oplocks, each 128-byte write would cause network traffic to flow. With oplocks, the client would cache the file locally, and multiple write operations would be coalesced into a single write operation that flowed across the network. For example, assume that the client uses 4,096-byte buffers and that the client writes 128 bytes at a time in a serial fashion. The first buffer would accommodate data from 32 write operations (4,096 · 128 = 32), and all of the data from the first 32 write operations would flow across the network via a single write request. If the write operation could not be cached, 32 write operations (instead of one) would be flowing across the network. The reduction of network traffic from 32 write operations to a single write operation provides a lot of efficiency and optimization.

The second goal of opportunistic locking is to increase the number of scenarios in which it can be applied. When oplocks can be applied, caching efficiency can be improved. Increasing the number of scenarios in which opportunistic locking can be achieved thus provides a distinct advantage. Assume that an instance of an application opens a file (on a network server) for read/write access and an oplock is requested and granted. The client-side code can cache file writes. Assume that another instance of the same application is run from a different client. One way to deal with this is to break the oplock and have both applications use network I/O to immediately commit any file write requests done by the application. Another way to deal with this is to delay the oplock break until the second instance of the application actually attempts a write operation. Very often the application may not attempt a write operation at all.

When conditions at the server change, the server sends an oplock break notification to the client. Examples of the situation when a server sends an oplock break include when another client requests access to a file or when a client writes to a file. The server implements logic to time out and clean up the server state (including closing the client session) if a client does not respond to an oplock break request. The client requests an oplock only when it is desirable; for example, if the application requests that the file be opened for exclusive access, there is no point in requesting an oplock.

Specifically, oplocks are implemented in three scenarios:

Exclusive oplock
Batch oplock
Level II oplock

These scenarios are described in the sections that follow.

Exclusive Oplock

An exclusive oplock is requested by the CIFS mini-redirector when an application opens a file with read or write access. The server grants the oplock if the file is not yet opened by any other client. The sequence of operations is illustrated in Figure 3.3.

Figure 3.3. Exclusive Oplock Exchange Sequence

graphics/03fig03.gif

To begin with, the first client sends a file open request, also asking for an exclusive oplock. The server performs the checks and grants the oplock. The first client happily starts caching the file, performing read-ahead and write-behind operations. Sometime later, another client ”say, client 2 (not shown in Figure 3.3) ”sends an open request for the same file. The server notices that client 1 has an exclusive oplock on the requested file and sends an oplock break notification to client 1. Client 1 flushes its buffers, sending write and lock requests as appropriate. Once all the state has been flushed, client 1 sends an indication that it is done with its oplock break notification processing. At this point, the server sends an open response to client 2, allowing it, too, to open the file. Client 1 continues, taking care not to cache the file locally. The assumption here is that client 1 opened the file in a mode indicating a willingness to let other clients open the file as well.

Level II Oplock

Very often clients open a file in read/write mode and never write to the file before closing it. Level II oplocks are designed to facilitate sharing and caching of files in this situation. Exclusive oplocks and batch oplocks (described in the next section) are always acquired because of a request by a client. A level II oplock, however, is never requested by a client. A client starts off by requesting an exclusive oplock. If that is granted, the server may, under the circumstances to be described here, demote the exclusive oplock to a level II oplock.

Consider Figure 3.4. Client 1 starts off requesting an exclusive oplock and starts caching the file locally. In particular, client 1 does read-ahead caching and also caches locks locally. Remember, in this case clients do not write to the file. At some point, client 2 (again client 2 is not shown in the figure) requests access to the same file. The server sends client 1 a notification indicating that client 1 should demote itself from an exclusive oplock to a level II oplock. Client 1 flushes its locks and indicates that it has finished processing the oplock notification. At that point the server sends a successful open response to client 2 and also grants it a level II oplock. Again, the assumption is that client 1 opened the file indicating that it was willing to let other clients open the file at the same time.

Figure 3.4. Level II Oplock

graphics/03fig04.gif

With a level II oplock, clients are not allowed to buffer locks. The advantage in this scheme is that implementing coherence at the server is simplified and clients can still buffer read data, cutting down on network traffic. When any client writes, the server breaks the level II oplocks to no oplocks at all. Because all clients are not buffering locks while they have a level II oplock, a successful write implies that the write was done in a region of the file that no other client had locked. When level II oplocks are broken, clients can no longer buffer read data.

Batch Oplock

Batch oplocks are used to optimize performance while batch files are being executed. The batch command processor typically opens a file, seeks to the proper position within the file, reads a command line, closes the file, and executes the command line. Next it opens the file, seeks to the next line, closes the file, and executes the next line. This cycle is continued until the processing ends.

Figure 3.5 shows the sequence of operations. Client 1 opens a batch file and requests a batch oplock. Assume that the server grants the batch oplock, since nobody else is writing to the batch file. Client 1 seeks to a particular position in the file and performs a read operation. The batch command processor executes the line it read. Then the batch command processor closes the file. The CIFS mini-redirector takes no action on the close request ( essentially doing a delayed close). The batch command processor opens the file, and the CIFS mini-redirector takes no action, other than canceling the delayed close that it has queued up. The batch command processor reads the next line by doing a seek and read. The CIFS mini-redirector sends the seek and read requests.

Figure 3.5. Batch Oplock

graphics/03fig05.gif

The advantage is that network traffic is reduced (there are fewer close and open requests). Server resources are also optimized because the server does not have to process a close request immediately followed by a request to open the same file it just closed.

Top