Remote Procedure Call

Team-Fly    

Solaris™ Operating Environment Boot Camp
By David Rhodes, Dominic Butler
Table of Contents
Chapter 18.  NFS, DFS, and Autofs


Many programs are designed to work in a client-server mode. These are applications that have been written as two components: a server and a client. Each component has the ability to run on separate machines across the network if required. Often, the server contains the main "guts" of the program, maybe controlling a centralized database, while the client might be a relatively small program that takes user requests and passes them on to the server.

There are many ways that client-server programs can be written; a common way is to use socket-based libraries. Another is to base them on RPC, as in the case of NFS.

Let's look at some of the features of RPC that make this possible.

Rpcbind

This program is the "RPC server." It is started on every machine at boot-time by the start-up script /etc/init.d/rpc. It is commonly known as the "portmapper" and is essentially a program that controls mapping between an RPC program and a network port number.

To understand the reason why we need to perform this mapping, let's quickly look at how "normal" socket-based client-server programs communicate with each other.

Let's assume that we start the telnet daemon on a server. It will determine the port number it should listen on by querying the /etc/services file, which will be port 23. Telnet is a "well-known" service, so every machine that runs the telnet daemon will start it on that port.

When a client machine starts a telnet session, it will also query the services file and determine that it should connect to port 23 on the server, where the telnet daemon is listening for incoming connections. Once the client has successfully bound to the server, the two can communicate.

When RPC was introduced, it was decided to make the port allocation dynamic. One of the reasons for this was to try and avoid the port conflicts that may arise as more people use RPC to create client-server applications.

The way that dynamic allocation works is that RPC-based programs contain a "program number" and a "version number." The program number is different for every program (although the client and server portions of the same program will have the same number).

When an RPC-based server's process starts, it contacts rpcbind and requests a port that it can use for communication. Rpcbind obtains the program and version number from the server and registers them, along with the port that it has assigned to the process.

Now, when a client starts, it will send a request to rpcbind, also including its program and version number, asking for the port that it can contact its server on. Rpcbind will check which server it has registered with those details and inform the client of the correct port.

Rpcbind itself is classed as a "well-known" service and as such is defined in /etc/services to run on port 111, as shown below. This enables clients on remote machines to easily access it:

 hydrogen# grep rpcbind /etc/services sunrpc    111/udp    rpcbind sunrpc    111/tcp    rpcbind hydrogen# 

As an example, we may see the following conversation taking place between the client, server, and rpcbind:

  • server -> rpcbind (port 111): Hello, my program number is 200001230, my version number is 1, what port can I use?

  • rpcbind -> server: Hi, you can use port number 12345.

  • client -> rpcbind (port 111): Hello, my program number is 200001230, my version number is 1, where can I find my server?

  • rpcbind -> client: Hi, your server is at port number 12345.

  • client -> server (port 12345): Ah, there you are!

Program and Version Numbers

So, how do we know which program number to use? To avoid confusion between RPC-based programs, the program number should be selected from a specific range. Table 18.1 shows the values, along with who is responsible for administering numbers from each range.

Table 18.1. The RPC Program Number Range

Number Range (Hex)

Allocated By

Used For

01FFFFFFF

Sun

"Well-known" products, such as NFS

200000003FFFFFFF

Developers

Third-party/in-house software

400000005FFFFFFF

Transient programs

Temporary numbers for new programs, or ones being debugged

600000006FFFFFFF

Reserved

 

800000007FFFFFFF

Reserved

 

A00000009FFFFFFF

Reserved

 

C0000000BFFFFFFF

Reserved

 

E0000000FFFFFFFF

Reserved

 

The version number is much easier to select. It is simply the version number of the program and is incremented with every release.

For example, we saw from the earlier "conversation" between the machines that we have assigned a program number of "200001230," and, as it's the first version, it has a version number of "1."

Transport and Machine Independency

"Transport independent" means that RPC isn't concerned about what it is running onit just uses whatever is requested by the application. This also means that RPC isn't concerned about any errors; either the application or the underlying transport will need to cater for them, and react to any errors as it sees fit.

For example, if the application uses a reliable transport such as TCP/IP, the error checking will be handled there. If, on the other hand, an unreliable transport such as User Datagram Protocol (UDP) is being used, the application will have to handle its own errors.

External Data Representation

If an application is written to work on many types of platform/operating systems, it must be able to work around problems such as different byte ordering schemes. To do this, the server encodes data in a format known as External Data Representation (XDR). This includes the specifics of any data structures that need to be passed to the client. When the client receives the data, the client decodes it and uses it accordingly.

RPC Database

When we query rpcbind, we can either use the program number or the program name. This is possible because certain "well-known" RPC-based services are included in the RPC database file named /etc/rpc. This file simply provides a name, and any aliases, for a program number, in a way similar to the hosts and services files. For example, the NFS mountd process that we will use later has the following entry:

 helium# grep mountd /etc/rpc mountd        100005 mount showmount helium# 

This allows the name (mountd), the number (100005), or any of the aliases (mount, showmount) to be used as arguments to commands such as rpcinfo (which we'll come across later).

NFS Daemons

Now that we know how RPC works in general, let's look at the part of NFS that uses it. Listed in Table 18.2 are the NFS daemons, along with the type of machine they run on.

Table 18.2. NFS Daemons

Daemon

Runs On

Started By

mountd

Servers

/etc/init.d/nfs.server

nfsd

Servers

/etc/init.d/nfs.server

statd

Servers and clients

/etc/init.d/nfs.client

lockd

Servers and clients

/etc/init.d/nfs.client

The main ones we are interested in here are mountd and nfsd. These run on the NFS servers and are responsible for answering client requests, providing the NFS mounts, and passing the data back to the clients.

We won't deal with the client processes (statd and lockd) in any detail, as there isn't anything we can really do with them other than check that they are responding. All we really need to know here is that they are used for the NFS file-locking services.

Resources and Filehandles

Although it is called the "Network Filesystem," NFS doesn't really think of filesystems in the way we think of them. The reason for this is that NFS is written to work with machines on different platforms, which have different ideas of what a filesystem is! Instead, NFS deals with "resources," which are actually the systems files and directories.

The next problem we have is that the server needs to be able to provide a consistent way of referring to a given file at any timeeven when it is communicating with different platforms.

For example, a PC might mount /extra_disk/some_dir as its G drive. When it wants to read something, say G:\demo_file, the Solaris server must know that the file it is referring to is actually /extra_disk/some_dir/demo_file. It also needs to be able to cope with any other differences such as physical drive details, path name separators, illegal characters, and so forth.

To achieve this, the server allocates each resource something known as a "filehandle." This is a string of characters built up from details such as the filesystem the file resides on and the inode number of the file. It acts in a way similar to a file descriptor for local files, in that all NFS transactions use the filehandle rather than the actual file name. By doing this, the problems we mentioned earlier can be overcome because the server is the only one that needs to know the low-level details of the file and to be able to decode the filehandle.

In NFS Version 2, the filehandle was a fixed array of 32 bytes, while in Version 3 it's a variable-length array of up to 64 bytes. They may look something similar to that shown below:

  • 73a 1 a0000 f105 50b037ce a0000 2 5a8ad1ed

Decoding Filehandles

A command named showfh used to be supplied with SunOS (Solaris 1.x) to determine the file details for any given filehandle. This was useful when any NFS errors were seen because, as the example below shows, the filename is not shown:

 NFS write error on host helium: Permission denied (filehandle: 800025 2 a0000 ce620 95b3f69b a0000 2 5987f29a) 

This command is no longer supplied, but its functionality has been reproduced as a script that dissects the filehandle passed in to determine the file details. The script is widely available as fhfind, but we have included it below, removing nonfunctional lines to save space:

 #!/bin/sh # if [ $# -ne 8 ]; then   echo "Usage: fhfind <filehandle> e.g."   echo "fhfind 1540002 2 a0000 4df07 48df4455 a0000 2 25d1121d"   exit 1 fi fileSystemID=$1 fileID=`echo $4 | tr [a-z] [A-Z]` # # Use the device id to find the /etc/mnttab # entry and thus the mountpoint for the filesystem. # E=`grep $fileSystemID /etc/mnttab` if [ "$E" = "" ]; then   echo "Cannot find filesystem for devid $fileSystemID"   exit 0 fi set - $E mountPoint=$2 # # alter the inode number from hex to decimal # inodeNum=`echo "ibase=16 $fileID" | bc` echo "Now searching $mountPoint for inode number $inodeNum" echo find $mountPoint -mount -inum $inodeNum -print 2>/dev/null 

Client-Server Communication

Once a server has been defined and its resources selected, they need to be advertised as being available. To do this, the server "shares" (or "exports," as it was originally known) the resources using DFS administration files and commands.

The client is then able to access the remote data. For this, it needs to talk to the NFS-related server processes, mountd and nfsd. As these are RPC-based programs, we'll see the same steps being followed that we outlined in the section on RPC, which will be as follows:

  1. The client issues a mount request for the resources it requires.

  2. The client's mount will first contact rpcbind on the server to find out which port its mountd can be found on.

  3. The client's mount will contact mountd on the server and request the filehandle for the mount point it is trying to access.

  4. The resource is mounted on the client and /etc/mnttab is updated. This contains the list of currently mounted devices.

  5. When the device has successfully mounted, the server's mountd will update its own /etc/rmtab. This contains a list of resources that have been remotely mounted, along with the name of the machine that has the resources mounted.

  6. The client can now try to access this data. To do this, it will contact rpcbind again to determine the nfsd port.

  7. The client will contact nfsd, passing it the filehandle of any data it needs.

  8. The server's nfsd will work with the data, either writing it to disk or passing data back to the client for any reads.

NFS Versions

We mentioned earlier that RPC uses whatever transports the application requests. Originally NFS was written to only use UDP, but it has since been modified to support both TCP and UDP. Table 18.3 highlights this and other changes between the releases, along with the release of Solaris that the NFS version was available in.

Table 18.3. NFS Versions

NFS Version

Solaris Release

Notes

1

Never released

 

2

Pre Solaris 2.5

UDP support only

3

Solaris 2.5

TCP and UDP support

Files < 2 GB

3

Solaris 2.6+

TCP and UDP support

Files > 2 GB

Caching improved

DFS Files

We stated earlier that the DFS package is used to administer the NFS configuration. A number of commands are available for controlling which resources are available. We'll look at these in a moment, but first let's look at how DFS determines which resources it should control. If we take a look in /etc/dfs, we'll see the following files listed:

 helium# ls /etc/dfs dfstab    dfstypes    sharetab helium# 

Dfstypes contains the list of available distributed filesystem types; in other words, the types of filesystems that can be controlled with DFS. By default, it only contains an entry for NFS, as shown below:

 helium# cat /etc/dfs/dfstypes nfs NFS UTILITIES helium# 

The dfstab file contains entries for each resource that can be shared, along with the DFS type and any constraints that may apply. For example, to share a file named /file/to/share using NFS, we could use the following entry in dfstab:

 helium# cat /etc/dfs/dfstab share -F nfs /file/to/share helium# 

The dfstab file is needed to cause the NFS server daemons to launch. For example, to share the file above (once the daemons are running), we could run the following command (assuming the path was a valid path; otherwise, we would see an error indicating that the file doesn't exist):

 helium# share -F nfs /file/to/share 

This would update the final file in this directory, sharetab. This lists any resources that are currently shared, and in this case would contain the following entry:

 helium# cat /etc/dfs/sharetab /file/to/share    -    nfs    rw helium# 

Similarly, we can use unshare to prevent the resource from being shared out, as shown below:

 helium# unshare -F nfs /file/to/share helium# 

It's also possible to share or unshare a whole set of resources at once, which is achieved by running the commands shareall and unshareall. By default, these commands will use the first entry in /etc/dfs/dfstypes as the filesystem type to work on. This is normally NFS so it shouldn't cause you any problems, but if you have a number of other DFS types available you may need to run the following for NFS:

 helium# shareall -F nfs helium# unshareall -F nfs helium# 

    Team-Fly    
    Top
     



    Solaris Operating Environment Boot Camp
    Solaris Operating Environment Boot Camp
    ISBN: 0130342874
    EAN: 2147483647
    Year: 2002
    Pages: 301

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net