The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003 conference that was held at Goettingen, Germany, in April 2003. Material has been added from other sources, but it was Jeremy who inspired the structure that follows . 28.2.1 The Ultimate GoalAll clustering technologies aim to achieve one or more of the following:
A clustered file server ideally has the following properties:
28.2.2 Why Is This So Hard?In short, the problem is one of state .
28.2.2.1 The Front-End ChallengeTo make it possible for a cluster of file servers to appear as a single server that has one name and one IP address, the incoming TCP data streams from clients must be processed by the front end virtual server. This server must de- multiplex the incoming packets at the SMB protocol layer level and then feed the SMB packet to different servers in the cluster. One could split all IPC$ connections and RPC calls to one server to handle printing and user lookup requirements. RPC Printing handles are shared between different IPC4 sessions ” it is hard to split this across clustered servers! Conceptually speaking, all other servers would then provide only file services. This is a simpler problem to concentrate on. 28.2.2.2 De-multiplexing SMB RequestsDe-multiplexing of SMB requests requires knowledge of SMB state information, all of which must be held by the front-end virtual server. This is a perplexing and complicated problem to solve. Windows XP and later have changed semantics so state information (vuid, tid, fid) must match for a successful operation. This makes things simpler than before and is a positive step forward. SMB requests are sent by vuid to their associated server. No code exists today to affect this solution. This problem is conceptually similar to the problem of correctly handling requests from multiple requests from Windows 2000 Terminal Server in Samba. One possibility is to start by exposing the server pool to clients directly. This could eliminate the de-mulitplexing step. 28.2.2.3 The Distributed File System ChallengeThere exists many distributed file systems for UNIX and Linux. Many could be adopted to backend our cluster, so long as awareness of SMB semantics is kept in mind (share modes, locking and oplock issues in particular). Common free distributed file systems include:
The server pool (cluster) can use any distributed file system backend if all SMB semantics are performed within this pool. 28.2.2.4 Restrictive Contraints on Distributed File SystemsWhere a clustered server provides purely SMB services, oplock handling may be done within the server pool without imposing a need for this to be passed to the backend file system pool. On the other hand, where the server pool also provides NFS or other file services, it will be essential that the implementation be oplock aware so it can interoperate with SMB services. This is a significant challenge today. A failure to provide this will result in a significant loss of performance that will be sorely noted by users of Microsoft Windows clients. Last, all state information must be shared across the server pool. 28.2.2.5 Server Pool CommunicationsMost backend file systems support POSIX file semantics. This makes it difficult to push SMB semantics back into the file system. POSIX locks have different properties and semantics from SMB locks. All smbd processes in the server pool must of necessity communicate very quickly. For this, the current tdb file structure that Samba uses is not suitable for use across a network. Clustered smbd 's must use something else. 28.2.2.6 Server Pool Communications DemandsHigh speed inter-server communications in the server pool is a design prerequisite for a fully functional system. Possibilities for this include:
We have yet to identify metrics for performance demands to enable this to happen effectively. 28.2.2.7 Required Modifications to SambaSamba needs to be significantly modified to work with a high-speed server inter-connect system to permit transparent fail-over clustering. Particular functions inside Samba that will be affected include:
28.2.3 A Simple SolutionAllowing fail-over servers to handle different functions within the exported file system removes the problem of requiring a distributed locking protocol. If only one server is active in a pair, the need for high speed server interconnect is avoided. This allows the use of existing high availability solutions, instead of inventing a new one. This simpler solution comes at a price ” the cost of which is the need to manage a more complex file name space. Since there is now not a single file system, administrators must remember where all services are located ” a complexity not easily dealt with. The virtual server is still needed to redirect requests to backend servers. Backend file space integrity is the responsibility of the administrator. 28.2.4 High Availability Server ProductsFail-over servers must communicate in order to handle resource fail-over. This is essential for high availaiblity services. The use of a dedicated heartbeat is a common technique to introduce some intelligence into the fail-over process. This is often done over a dedicated link (LAN or serial). Many fail-over solutions (like Red Hat Cluster Manager, as well as Microsoft Wolfpack) can use a shared SCSI of Fiber Channel disk storage array for fail-over communication. Information regarding Red Hat high availability solutions for Samba may be obtained from: www.redhat.com. [1]
The Linux High Availability project is a resource worthy of consultation if your desire is to build a highly available Samba file server solution. Please consult the home page at www.linux-ha.org/. [2]
Front-end server complexity remains a challenge for high availability as it needs to deal gracefully with backend failures, while at the same time it needs to provide continuity of service to all network clients. 28.2.5 MS-DFS: The Poor Man's ClusterMS-DFS links can be used to redirect clients to disparate backend servers. This pushes complexity back to the network client, something already included by Microsoft. MS-DFS creates the illusion of a simple, continuous file system name space, that even works at the file level. Above all, at the cost of complexity of management, a distributed (pseudo-cluster) can be created using existing Samba functionality. 28.2.6 Conclusions
|