|
The remaining sections in this chapter discuss various configuration and implementation issues to consider when determining how to best optimize the performance of your network storage. Specifically, these sections discuss the following:
Determining What Data to Store RemotelyThe performance of local file I/O is usually better than when the same files are accessed through a network file system. Local I/O does not overload network resources such as routers and Ethernet segments. When the server has a large amount of RAM available, sufficient for caching the commonly accessed files, there are cases in which network file system speeds can exceed local noncached file access. Storage is easier to manage centrally, and keeping a few copies of large, infrequently accessed files can avoid the problem of duplicate copies of the same files being kept on each enterprise desktop filling them to overflowing. SAN Versus Network File Systems/NASAs a general rule, storage area networks can transfer data faster than network file systems over networks of similar speed, because less processing is involved in parsing the network requests at the block (rather than the file) level. In addition, storage area networks often use specialized high-speed Fibre Channel switches and network hardware. Network file systems, however, provide additional security as well as the capability to back up and manage files more intuitively, advantages that often outweigh the performance advantages of SAN. Additionally, SAN cabling restrictions (and concerns about security when accessing data over corporate networks) can limit the appeal of SANs to large server rooms. The Network File System ProtocolNetwork file system protocols come in all shapes and sizes. Some require complex clients to manage complex state information (such as AFS), whereas others are idempotent (stateless), such as NFS version 3. They vary from OS/2-centric (such as SMB) to Windows-centric (such as CIFS) and UNIX-centric (such as NFS). They vary in their security models, performance, and, of course, complexity. SANs move data as if it were blocks on disk, whereas most network file systems move data based on the filename or file identifier. Making sense of this maze of protocols requires looking back and categorizing the protocols into families. The following list groups related network file system protocols into families to make it easier to understand their characteristics:
A more detailed description of the more popular network file systems follows. The file systems are listed in order of the approximate size of their installed base:
We have discussed multiple popular network file systems (including NFSv3, NFSv4, CIFS, HTTP/WebDAV, and AFS). Because network file systems have been around for almost 20 years, as can be seen from the history mentioned here, why haven't we converged on one dominant network file system (as we have converged on TCP/IP for the lower layers)? Because network and SAN file systems present unique design problems that are not perfectly addressed by any one network file system. Why are network file systems so hard to design? The following are some reasons:
NFS version 4 and CIFS/Samba (CIFS kernel client and Samba server) on Linux address most of the issues listed here and are commonly used. But is one clearly better? Is NFS better than CIFS? Not always. The trade-offs to consider are as follows:
Note that NFS is not popular on Windows, because most Windows versions do not include an NFS server or client in the operating system itself, and the NFS protocol maps better to the simpler UNIX VFS interface than to the complex, functionally rich Windows IFS. Microsoft does offer a simple NFS (version 2 and 3) server as a free download as part of its Services for UNIX. NFS version 4, due to new security and caching features, will be appealing in the future, especially for Linux-to-Linux network file system access, but because it is not well supported on most Windows clients and servers, its adoption has been slow. Its Linux implementation in the 2.6 kernel is as yet unproven and is missing some optional features. NFSv3 performance from Linux clients over Gigabit Ethernet can be spectacular, especially to NFS servers based on the 2.6 kernel. NFS version 3 (over UDP, at least) receives the most testing and is most likely the most stable choice for network file systems when mounting from Linux clients to Linux servers. Client and Server Implementation ChoicesFor NFS, the implementation choice is simple: The most popular client and server implementation is the one available in the kernel itself. However, there are choices for the RPC (SunRPC) daemon. The choices available when building the Linux 2.6 kernel are whether to enable support for the following:
For SMB/CIFS, there is a choice of two clients: the legacy smbfs and the newer CIFS VFS. For the server, by far the most popular choice is the Samba network file server, which provides not just SMB/CIFS network file serving but also the following:
Tuning the Linux ClientSome Key ConceptsThis section covers some of the key concepts you need to consider when tuning the Linux client. These concepts include the following:
Protocol LayeringThe following trace, taken from the Ethereal network analyzer, shows the typical 20 network frames (requests and responses) that occur at mount time (see Figure 16-2). The trace was taken from the mount command using CIFS VFS version 1.0.3 running on a Linux kernel 2.4.23 based client. Figure 16-2. The typical 20 network frames that occur at mount time.Figure 16-3 shows a more detailed view of a particular framein this case, the Tree Connect request clearly shows the layering of an SMB request (the mount data, with the SMB header) inside an RFC 1001 (NetBIOS session service) frame, inside a TCP/IP frame, inside an Ethernet frame. Figure 16-3. A detailed view of the Tree Connect request.In this example, the SMB request is 86 bytes (including 32 bytes of SMB header), preceded by a 4-byte RFC 1001 header (length), preceded by 32 bytes of TCP header, 20 bytes of IP header, and 14 bytes of Ethernet header. This layering is much like nesting envelopesthat is, enclosing an envelope with a letter for a child inside a larger envelope addressed to the child's parent. Opportunistic LockingFigure 16-4 shows the protocol flow involved in opportunistic locking (oplock) handling for CIFS. CIFS has two types of opportunistic locks that are used to control distributed access to files. By contrast, AFS and DFS have much more complex but heavyweight token-management mechanisms. NFS versions 2 and 3 have no locking mechanisms and therefore have relaxed UNIX file semantics at the risk of data integrity. Figure 16-4. Protocol flow in opportunistic locking. (Source: www.microsoft.com/Mind/1196/CIFS.htm)The first type of opportunistic lock is the whole file lock (exclusive oplock), which allows the client to do aggressive write-behind and read-ahead caching on the file, often greatly improving performance. The second type of oplock is a read oplock. A read oplock allows multiple clients to read, but not update, a file. Attempts to update such a file by one client cause all clients with that file open to lose their caching privileges for that file. Distributed caching encounters a problem with Linux because files that are closed cannot be safely cached with the oplock mechanism alone. (Although not particularly common, the standard POSIX file semantics allow a memory mapped file to have no open file instances, but the data associated with the inode can still be read.) A third type of oplock, the batch oplock, is more rare. It is used to address certain performance problems associated with the line-by-line interpretation of DOS batch files (scripts) by allowing limited read-ahead caching of batch files by the client. These distributed caching mechanisms in the Linux client are in addition to and unrelated to caching being done in the server file system's page manager, and again in the server disk controllers. MetadataFile and directory timestamps reflect the time of the last update to a file or directory. The CIFS network file system client uses timestamps to determine whether data in its page cache needs to be discarded when reopening a file that has been closed, but this caching is transparent to the user. AFS and DFS clients have a much more complex token-management approach to achieving cache consistency, which can safely cache network files on disk on the client for long periods. NFS versions 2 and 3 generally have looser data consistency and read-ahead and write-behind cache data based on timers. File Change NotificationFile change notification is available in NFS version 4 and CIFS as a way of allowing a client application to be notified of changes to certain files and directories. File change notification can be used to augment the client file system's distributed caching facility, although this has not been proven to be efficient and is not currently implemented for this purpose in the Linux clients. Read-only Volumes and Read-Only FilesRead-only clients can cache aggressively if the mounted volume is known to contain read-only data (such as a mount to a server that is exporting a read-only CD-ROM or DVD). However, read-only clients provide little benefit over oplock, which in effect allows the same thing. The legacy smbfs client does not do safe distributed caching (oplock). Instead, it relies on timers to determine the invalidation frequency for client-cached network file data and has limited performance adjustments. A significant improvement in smbfs performance was obtained midway through the 2.4 kernel development when the smbfs network request buffer size was increased from 512 bytes to approximately one page (4096 bytes). The Linux CIFS client implements oplock, which is enabled by default (oplock can be disabled by setting /proc/fs/cifs/OplockEnabled to 1 on the client). The Linux CIFS client attempts to negotiate a buffer size of about 16K. If the server supports this buffer size, more efficient read and write transfers with fewer network round trips can be achieved. By reducing the rsize and wsize (configurable by specifying the size as an option on mount), as with the NFS client, the default read and write size can be reduced in an attempt to minimize TCP packet fragmentation, but this usually slows performance. The lookup caching mechanism in the CIFS VFS, as in smbfs, is done based on a timer rather than with the CIFS FindNotify API. This caching of inode metadata, even for short periods, improves performance over the alternativethat is, revalidating inode metadata on every lookup, but with the risk that the client's view of stat information (such as file size and timestamps) on a file will be out of date more often. CIFS lookup caching can be disabled by setting /proc/fs/cifs/LookupCacheEnabled. A limited set of statistics is kept by the CIFS client in /proc/fs/cifs/Stats. Enabling debugging or tracing (setting /proc/fs/cifs/cifsFYI to 1, or setting /proc/fs/cifs/traceSMB to 1) can slow the client performance slightly. Unlike the current 2.6 smbfs and cifs client file system modules, the NFS client does a good job of dispatching multiple read-ahead requests in parallel to a single server from a single client. This parallelism helps keep the server disk busy. The difference is even more significant when writing large files from a single process on the client to the server. A Linux client can copy files using NFS version 3 much faster to a lightly loaded Linux server on Gigabit Ethernet than using CIFS copying to a Linux/Samba server. This is due to the efficient implementation of multipage asynchronous write-behind in the NFS client. The differences in read performance between NFS and CIFS are not as dramatic because both implement multipage read-ahead (readpages). The CIFS client is in turn faster than the smbfs client for file copy from the same Samba server because CIFS can use a larger read size (16K versus 4K) of the CIFS VFS and can read more than one page at a time (via the new readpages function, which is now an optional feature that 2.6 kernel file systems may implement). Future versions of the CIFS VFS client should be able to narrow the performance gap against NFS. However, exceeding NFS performance for large file copy will require a redesign of the SMB dispatching mechanism of the Samba server, as is being done for Samba version 4. When multiple processes copy different files to the same server, CIFS benefits from the capability to queue as many as 50 simultaneous read or write requests to the server. Linux File Server TuningNetwork file system tuning is complex. Bottlenecks can be caused by high CPU usage, disk usage, or network usage, and network latency can dramatically influence throughput. The following list summarizes some general principles to use when evaluating performance improvements:
A common approach for evaluating potential performance improvements due to altered configuration settings is the following:
NFSFor the NFS client, the default rsize and wsize can be specified on mount. Typical values are 4K to 32K. For NFS versions they are constrained by the server and were changed with the implementation of NFS server over TCP to support 32K. The Linux client supports up to a 32K rsize/wsize. Setting an rsize larger than the MTU (typically 1500 bytes) results in fragmentation and reassembly of the higher-level SunRPC frames across multiple TCP frames, which in some cases slows performance. You can experiment with IOzone and Bonnie to determine optimal values. The netstat and nfsstat tools can be used to get useful TCP and NFS statistics, respectively, to correlate with benchmark throughput and timing results, and tracepath can be used to determine network frame sizes (which can be changed via the ifconfig MTU option). 2.6 adds the capability to configure NFS/SunRPC for TCP, which is somewhat slower than the default (NFS/SunRPC over TCP). NFS over UDP is often used on local area networks, but if timeouts reported by the nfsstat command are excessive, consider increasing the values of the NFS mount options retrans and timeo. The number of server instances of nfsd can greatly affect performance and can be adjusted in the Linux server system's startup script. Linux NFS servers can export data using the "sync" or "async" flag, with the latter yielding better performance due to write-behind at the server at the risk of data integrity problems if the server fails. SambaThree configuration settings significantly impact Samba performance and should be examined in the server's smb.conf file:
In addition to the preceding, Samba performance can be reduced by enabling kernel oplocks (rather than letting the Samba server manage oplocks internally) and by enabling ACLs and ACL inheritance in the file system (which puts additional load on the server's local file system to retrieve xattrs). Samba performance is also sensitive to changes in TCP socket options (such as TCP_NODELAY, which can be specified in the smb.conf parameter socket_options). Performance MeasurementMany tools for performance measurement exist and can be used in conjunction with server utilization information. Some tools are now conveniently viewable as text via pseudo files in the /proc directory, in order to help you evaluate performance trade-offs. The most commonly used tools to measure file I/O performance are as follows:
These tools are discussed in detail in Chapter 6, "Benchmarks as an Aid to Understanding Workload Performance." Load Measurement for Improved Capacity PlanningThe CIFS client can measure the number of common requests by enabling CIFS statistics in the kernel configuration and by examining /proc/fs/cifs/Stats. This can be useful for determining when a server is responding slowly. An example of the statistics follows: Resources in use CIFS Session: 2 Share (unique mount targets): 2 SMB Request/Response Buffer: 2 Operations (MIDs): 0 0 session 0 share reconnects Total vfs operations: 550378 maximum at one time: 6 1) \\localhost\stevef SMBs: 11956 Oplock Breaks: 0 Reads: 89 Bytes 1145705 Writes: 3962 Bytes: 1888452 Opens: 868 Deletes: 934 Mkdirs: 118 Rmdirs: 118 Renames: 263 T2 Renames 0 2) \\192.168.0.4\c$ SMBs: 365570 Oplock Breaks: 0 Reads: 124712 Bytes 456637519 Writes: 152198 Bytes: 613673810 Opens: 3 Deletes: 0 Mkdirs: 0 Rmdirs: 0 Renames: 0 T2 Renames 0 Print Server PerformancePrint server performance is affected by four major factors:
In the case of Windows systems printing to Linux servers due to the breadth of Windows print driver support, it is common for the print job to be processed mostly on the client (rather than partially on the client and partially on the server, as is often the case on Windows clients printing to Windows servers). When a print job is rendered on the client and sent as a raw print file to the server, the amount of network traffic required to print the network print job is much larger, and the file may use significant disk space on the server, but the amount of server CPU required for printing the job is less. When Samba in particular is used as a print server, print jobs usually pass through multiple additional layers on the serverthe CUPS subsystem, then Ghostscript, and then a print driver that can slow performance. Print drivers on Linux vary widely in quality of implementation, but the OMNI and CUPS projects are bringing more consistency to the Linux print architecture, which should be reflected in improved Linux print driver performance over time. |
|