6.3 GPFS installation and basic configuration

 < Day Day Up > 

Here we describe the steps we took to install GPFS on our cluster. We used the NSD model, using one internal disk per server node. Both nodes act as client and server. Although our cluster was not a realistic one, bringing up a real production GPFS system would follow the same path .

6.3.1 Planning

GPFS is becoming more and more dynamic in the way it handles its components . You can add or remove disks and nodes and change file system settings without stopping GPFS. However, careful planning is still a very important step.

You need to consider the following areas:

  • Hardware

    Verify that the hardware ( especially storage devices) you are using is supported by GPFS.

    Refer to the FAQ page for more information:

    http://publib.boulder.ibm.com/clresctr/library/gpfs_faq.html

  • Software

    Ensure that you have the right levels of GPFS at hand. GPFS 2.2 for pLinux requires SLES8 Service Pack 3 or later. (RHAS 3 is not yet supported for GPFS.)

  • Performance

    This is the time to design your GPFS setup to satisfy the planned file system performance. The following areas should be examined carefully :

    - The number, capacity and speed of the disks

    - The number of disk adapters

    - The number and capacity of server nodes

    - The number and types of network adapters and switches, if you plan to use the Network Shared Disk (NSD) model

    An understanding of future file access patterns, as well as of the file size distribution in the file system, is also invaluable.

    GPFS stripes file data across the disks and uses large block size to maximize I/O efficiency. If you can change the block size of your application, make it a multiple of the GPFS block size.

  • Resilience

    You must define the level of resilience that you need to satisfy your operational requirements, and then proceed accordingly . Carefully review the failure groups and the number of replicas that you need to configure in order to satisfy your high availability requirements.

6.3.2 Software installation

The following RPMs must be installed:

  • gpfs.gpl-2.2.0

  • gpfs.msg.en_US-2.2.0

  • gpfs.docs-2.2.0

  • gpfs.base-2.2.0

The kernel source, development tools, and cross-compilers need to be installed in order to be able to compile the portability layer. The imake command, which is part of the xdevel package, is also needed.

If all your nodes are identical, you may install these development tools on one node only and then copy the binaries to the other nodes; this technique is described in 6.3.3, "Compiling the portability layer" on page 291.

6.3.3 Compiling the portability layer

An explanation of how to compile the portability layer is detailed in a README file located under /usr/lpp/mmfs/src. You can build as a regular user (which requires this user to have write permission in the /usr/lpp/mmfs/src directory and read permission for all files under the /usr/src/linux/ directory).

All the action takes place in the /usr/src/mmfs/src/config directory. The first step in compiling the portability layer is to set the environment, as shown in Example 6-11.

Example 6-11. Compilation of the portability layer
 r01n33:/usr/lpp/mmfs/src/config #  export SHARKCLONEROOT=/usr/lpp/mmfs/src  r01n33:/usr/lpp/mmfs/src/config #  cp site.mcr.proto site.mcr  

Use the site.mcr.proto file as a template. The real file is site.mcr, and it needs to be edited to suit your needs; this is well documented inside the file.

Example 6-12 on page 292 shows the differences between the template file and a file that worked for us.

Example 6-12. diff site.mcr site.mcr.proto
 r01n33:/usr/lpp/mmfs/src/config #  diff site.mcr.proto site.mcr  13c13 < #define GPFS_ARCH_I386 --- > /* #define GPFS_ARCH_I386 */ 15c15 < /* #define GPFS_ARCH_PPC64 */ --- > #define GPFS_ARCH_PPC64 34c34 < LINUX_DISTRIBUTION = REDHAT_LINUX --- > /* LINUX_DISTRIBUTION = REDHAT_LINUX */ 36c36 < /* LINUX_DISTRIBUTION = SUSE_LINUX */ --- > LINUX_DISTRIBUTION = SUSE_LINUX 55c55 < #define LINUX_KERNEL_VERSION 2041900 --- > #define LINUX_KERNEL_VERSION 2042183 

The differences are quite easy to see. The architecture needs to be changed to PPC64, and the name of the distribution and the Linux kernel version need to be changed, too.

To determine which kernel version you are running, use the command shown in Example 6-13; note that the version number is highlighted.

Example 6-13. Get to know your kernel version
 r01n33:/usr/lpp/mmfs/src/config #  cat /proc/version  Linux version  2.4.21-83  -pseries64 (root@PowerPC64-pSeries.suse.de) (gcc version 3.2.2) #1 SMP Tue Sep 30 11:30:48 UTC 2003 

Next, you need to configure the installed kernel source tree. The source tree is not configured properly when the kernel-source RPM is installedthis is true for both SLES8 and RHAS 3 distributions.

Several commands are needed to rectify the situation; these commands are shown in Example 6-14 on page 293 for SLES8 (RHAS 3 is not supported at the time of writing).

Adjust the names of the files in the /boot directory to your environment, if needed. Root authority is required.

Example 6-14. Configure the kernel source tree under SLES8
 r01n33:/usr/lpp/mmfs/src/config # cd /usr/src/linux-2.4.21-83 r01n33:/usr/src/linux-2.4.21-83 # sh make_ppc64.sh distclean r01n33:/usr/src/linux-2.4.21-83 # cp /boot/vmlinuz-2.4.21.config .config r01n33:/usr/src/linux-2.4.21-83 # sh make_ppc64.sh oldconfig $(/bin/pwd)/include/linux/version.h update-modverfile 

Once this is done, move back to the /usr/lpp/mmfs/src directory to build and install the portability layer, as shown in Example 6-15.

Example 6-15. Build and install the portability layer
 r01n33:/usr/lpp/mmfs/src #  make World  ... Checking Destination Directories.... \c \c \c touch install.he \c \c \c touch install.ti make[1]: Leaving directory `/usr/lpp/mmfs/src/misc' r01n33:/usr/lpp/mmfs/src #  make InstallImages  cd gpl-linux; /usr/bin/make InstallImages; make[1]: Entering directory `/usr/lpp/mmfs/src/gpl-linux'  mmfslinux   mmfs25   lxtrace   dumpconv   tracedev  make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' 

The five files highlighted in Example 6-15 are copied into the /usr/lpp/mmfs/bin directory. If all the nodes in your cluster are identical and run the same Linux kernel, you can simply copy over these files to the /usr/lpp/mmfs/bin directory on all nodes.

6.3.4 Configuring and bringing up GPFS

This proceeds in seven steps, for NSD-based configurations:

  1. Create the GPFS cluster.

  2. Create a GPFS nodeset inside this cluster.

  3. Start up the GPFS nodeset.

  4. Create the local disk partitions.

  5. Create the Network Shared Disks.

  6. Create a GPFS file system.

  7. Mount the GPFS file system.

Create the GPFS cluster

We defined the GPFS cluster on top of the RSCT peer domain created during 6.2, "RSCT peer domain setup" on page 285. First, you need to create the file that describes the nodes that will be part of the GPFS cluster. Its contents are shown in Example 6-16. The syntax is one node per line.

Example 6-16. Example of a GPFS nodes file
 r01n33:~ #  cat /tmp/gpfsnodes  gr01n33 gr01n34 

To create the GPFS cluster, use the mmcrcluster command as shown in Example 6-17 on page 294. You do not need to specify the RSCT peer domain; the GPFS cluster will be defined on the current peer domain, which must be online when you issue the command.

Note that, in the example, we specified the use of ssh and scp . In order for the command to succeed, root must be able to execute ssh from all nodes, to all nodes, using all interfaces, without being prompted for a password. Refer to 3.12, "ssh" on page 144 for a discussion on how to set up ssh .

You have to designate a primary and a backup server. What is meant here by "primary" and "secondary" servers are nodes that hold the cluster data , not the subsequent file data that will indeed be spread over all the nodes in the nodesets that will be defined later.

Note also the use of the lc type for the cluster. lc (loose clusters) is the only type supported for Linux on pSeries.

In Example 6-17, we also show the use of the mmlscluster command to list the properties of the newly created cluster. mmlscluster is correct in reporting that the two nodes do not belong to any nodeset.

Example 6-17. mmcrcluster command
 r01n33:~ #  mmcrcluster -t lc -p gr01n33 -s gr01n34 -r /usr/bin/ssh -R /usr/bin/scp -n /tmp/gpfsnodes 2>&1  Thu Oct 30 15:12:35 PST 2003: mmcrcluster: Processing node gr01n33 Thu Oct 30 15:12:36 PST 2003: mmcrcluster: Processing node gr01n34 mmcrcluster: Command successfully completed mmcrcluster: Propagating the changes to all affected nodes. This is an asynchronous process. r01n33:~ #  mmlscluster  GPFS cluster information ========================   GPFS cluster type:         lc   GPFS cluster id:           gpfs1067555555   RSCT peer domain name:     itso   Remote shell command:      /usr/bin/ssh   Remote file copy command:  /usr/bin/scp GPFS cluster data repository servers: -------------------------------------   Primary server:   gr01n33   Secondary server: gr01n34 Cluster nodes that are not assigned to a nodeset: -------------------------------------------------    1   (1)   gr01n33     10.10.10.33      gr01n33    2   (2)   gr01n34     10.10.10.34      gr01n34 
Create a GPFS nodeset

To create a GPFS nodeset, you need a simple file that lists, one line per node, the name of the nodes to include in the nodeset. This has to be a subset of the nodes composing the cluster. To create the nodeset, use the mmconfig command as shown in Example 6-18 on page 295. Use mmlsconfig to display the properties of the nodeset.

Example 6-18. Creating the GPFS nodeset
 r01n33:~ #  cat /tmp/gpfsnodeset  gr01n33 gr01n34 r01n33:~ #  mmconfig -n /tmp/gpfsnodeset  mmconfig: Command successfully completed mmconfig: Propagating the changes to all affected nodes. This is an asynchronous process. r01n33:~ #  mmlsconfig  Configuration data for nodeset  1  : --------------------------------- clusterType lc comm_protocol TCP multinode yes autoload no useSingleNodeQuorum no useDiskLease yes group Gpfs.1 recgroup GpfsRec.1 maxFeatureLevelAllowed 700 File systems in nodeset 1: -------------------------- (none) 

mmlsconfig shows the GPFS nodeset id that was assigned (in our case, it was 1). You can tailor this at mmconfig time, but the important thing is to remember the id, as it will be used to start up GPFS on the nodeset.

Note

Be aware that the nodeset id is only significant if you want to have multiple nodesets (which is optional, and not common for lc clusters). If there is only one nodeset in the cluster, there is no reason to know the nodeset, as you never need to supply it (GPFS, by default, picks the id of the nodeset on which the command is being executed).


Start up the GPFS nodeset

We are now ready to start up GPFS. This is done with the mmstartup command, to which we must give the GPFS nodeset id as returned by mmlsconfig. This is described in Example 6-19.

It is useful to check the status of Topology Services and Group Services after GPFS has been started. To do so, use the lssrc command with the -ls flag on each of the two subsystems.

Example 6-19. Starting up the GPFS nodeset
 r01n33:~ #  mmstartup -C 1  Thu Oct 30 15:14:31 PST 2003: mmstartup: Starting GPFS ... gr01n33:  0513-059 The mmfs Subsystem has been started. Subsystem PID is 18224. gr01n34:  0513-059 The mmfs Subsystem has been started. Subsystem PID is 8145. r01n33:~ #  lssrc -ls cthats  Subsystem         Group            PID     Status  cthats           cthats           17262   active Network Name   Indx Defd Mbrs St Adapter ID      Group ID CG1            [ 0]    2    2  S 10.10.10.33     10.10.10.34 CG1            [ 0] eth1         0`47a19a84      0`47a19a7e HB Interval = 1 secs. Sensitivity = 4 missed beats Missed HBs: Total: 0 Current group: 0 Packets sent    : 261 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 317 ICMP 0 Dropped: 0 NIM's PID: 17321 CG2            [ 1]    2    2  S 129.40.34.33    129.40.34.34 CG2            [ 1] eth0         0`47a19a82      0`47a19a7d HB Interval = 1 secs. Sensitivity = 4 missed beats Missed HBs: Total: 0 Current group: 0 Packets sent    : 262 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 317 ICMP 0 Dropped: 0 NIM's PID: 17324   2 locally connected Clients with PIDs:  rmcd( 17378) hagsd( 17266)   Configuration Instance = 1067555418   Default: HB Interval = 1 secs. Sensitivity = 8 missed beats   Daemon employs no security   Segments pinned: Text Data Stack.   Text segment size: 131611 KB. Static data segment size: 595 KB.   Dynamic data segment size: 939. Number of outstanding malloc: 130   User time 0 sec. System time 0 sec.   Number of page faults: 1100. Process swapped out 0 times.   Number of nodes up: 2. Number of nodes down: 0. r01n33:~ #  lssrc -ls cthags  Subsystem         Group            PID     Status  cthags           cthags           17266   active 3 locally-connected clients.  Their PIDs: 17200(IBM.ConfigRMd) 17378(rmcd)  18344(mmfsd)  HA Group Services domain information: Domain established by node 1 Number of groups known locally: 5                    Number of   Number of local Group name         providers   providers/subscribers  GpfsRec.1  2           1           0  Gpfs.1  2           1           0 rmc_peers                2           1           0 NsdGpfs.1                2           1           0 IBM.ConfigRM             2           1           0 

If the portability layer was not properly compiled and installed on all the nodes, mmstartup will fail.

You may also see, in the GPFS log file (/var/mmfs/gen/mmfslog), entries such as those shown in Example 6-20.

Example 6-20. mmfsd will not start without the portability layer
 #  cat /var/mmfs/gen/mmfslog  Wed Oct 29 08:07:10 PST 2003 runmmfs starting Removing old /var/adm/ras/mmfs.log.* files: /bin/mv: cannot stat `/var/adm/ras/mmfs.log.previous': No such file or directory Unloading modules from /usr/lpp/mmfs/bin Error: /usr/lpp/mmfs/bin/mmfslinux kernel extension does not exist.        Please compile a custom mmfslinux module for your kernel.        See /usr/lpp/mmfs/src/README for directions. Error: unable to verify kernel/module configuration Loading modules from /usr/lpp/mmfs/bin Error: /usr/lpp/mmfs/bin/mmfslinux kernel extension does not exist.        Please compile a custom mmfslinux module for your kernel.        See /usr/lpp/mmfs/src/README for directions. Error: unable to verify kernel/module configuration Wed Oct 29 08:07:10 PST 2003 runmmfs: error in loading or unloading the mmfs kernel extension Wed Oct 29 08:07:10 PST 2003 runmmfs: stopping SRC ... 0513-044 The mmfs Subsystem was requested to stop. Wed Oct 29 08:07:10 PST 2003 runmmfs: received an SRC stop request; exiting 

In our case, it was compiled and installed properly, and the GPFS kernel modules show up as illustrated in Example 6-21.

Example 6-21. GPFS kernel modules loaded
 r01n33:~ #  lsmod  Module                  Size  Used by     Tainted: PF mmfs                 1106392   1 mmfslinux             219800   1  [mmfs] tracedev               14880   1  [mmfs mmfslinux] ipv6                  481800  -1  (autoclean) key                   102936   0  (autoclean) [ipv6] e1000                 152368   1 e100                  106696   1 lvm-mod               110912   0  (autoclean) 
Create the local disk partitions (optional)

GPFS will accept anything that looks like a block device, whether it is a partition or an entire disk.

In our example, we set aside a partition on each node, as listed in Example 6-22. On the first node, we used /dev/sda4, and on the second node, we used /dev/sdb4.

Example 6-22. Local disk partitions
  r01n33:~ # fdisk /dev/sda  The number of cylinders for this disk is set to 34715. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs    (e.g., DOS FDISK, OS/2 FDISK) Command (m for help):  p  Disk /dev/sda: 64 heads, 32 sectors, 34715 cylinders Units = cylinders of 2048 * 512 bytes    Device Boot    Start     End      Blocks   Id  System /dev/sda1   *         1       4        4080   41  PPC PReP Boot /dev/sda2             5    1029     1049600   82  Linux swap /dev/sda3          1030   10246     9438208   83  Linux  /dev/sda4         10247   26631    16778240   83  Linux   r01n34:~ # fdisk /dev/sdb  The number of cylinders for this disk is set to 34715. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs    (e.g., DOS FDISK, OS/2 FDISK) Command (m for help):  p  Disk /dev/sdb: 64 heads, 32 sectors, 34715 cylinders Units = cylinders of 2048 * 512 bytes    Device Boot    Start     End     Blocks   Id  System /dev/sdb1   *         1       4       4080   41  PPC PReP Boot /dev/sdb2             5    1029    1049600   82  Linux swap /dev/sdb3          1030    9222    8389632   83  Linux  /dev/sdb4          9223   25607   16778240   83  Linux  
Create the Network Shared Disks (NSDs)

It is now time to create the Network Shared Disks (NSDs) that we will use to store our file data (this step could have been done after cluster creation).

We used the mmcrnsd command to perform this operation. We gave it a description file, listed in Example 6-23. The mmlsnsd is used to display the NSDs just created.

The purpose of this step is to prepare all the NSDs, and to assign, to each NSD, a unique name across the cluster; a PVID is stored in the NSD itself. We need a unified naming scheme, because each node names its local partitions irrespective of the other nodes.

As Example 6-23 on page 300 shows, we have two NSDs (one per node) capable of storing file system data and file system meta data, each one served by a single server (we have no twin-tailed disks here). Each disk belongs to its own failure group, as there is no common point of failure for the two disks, which are on separate machines. Refer to GPFS documentation for a full description of the syntax.

Example 6-23. Create the NSDs and check them
 r01n33:~ #  cat /tmp/gpfsnsd  /dev/sda4:gr01n33::dataAndMetadata:1 /dev/sdb4:gr01n34::dataAndMetadata:2 r01n33:~ #  mmcrnsd -F /tmp/gpfsnsd  mmcrnsd: Propagating the changes to all affected nodes. This is an asynchronous process. r01n33:~ #  mmlsnsd  File system   Disk name    Primary node            Backup node ---------------------------------------------------------------------------  (free disk)   gpfs1nsd     gr01n33  (free disk)   gpfs2nsd     gr01n34 
Create a file system

GPFS is in operation, and we have NSDs available for receiving data. Now we can create a file system. We do this using the mmcrfs command, as shown in Example 6-24 on page 300.

mmcrnsd is clever enough to modify the disk description file, which we gave it in "Create the Network Shared Disks (NSDs)" on page 299, and convert it to a format suitable to the mmcrfs command. The local partition names are replaced by the global, cluster-wide NSD names. Refer to the mmfscrfs man page for a detailed explanation of its syntax. In our example, we use no data replication and we chose to name the file system /dev/gpfs0 and to mount it automatically (-A 1) under /bigfs.

The mmlsdisk command lists the current usage of the NSD disks by GPFS. df -h shows that the file system is indeed mounted and the dd command demonstrates that we can write into the GPFS file system. The file system is mounted on all the nodes in the nodeset.

Example 6-24. GPFS file system creation
 r01n33: #  cat /tmp/gpfsnsd  # /dev/sda4:gr01n33::dataAndMetadata:1 gpfs1nsd:::dataAndMetadata:1 # /dev/sdb4:gr01n34::dataAndMetadata:2 gpfs2nsd:::dataAndMetadata:2 r01n33: #  mmcrfs /bigfs gpfs0 -F /tmp/gpfsnsd -C 1  The following disks of gpfs0 will be formatted on node r01n33:     gpfs1nsd: size 16778240 KB     gpfs2nsd: size 16778240 KB Formatting file system ... Creating Inode File Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Flushing Allocation Maps Completed creation of file system /dev/gpfs0. mmcrfs: Propagating the changes to all affected nodes. This is an asynchronous process. r01n33:~ #  mmlsdisk gpfs0  disk         driver   sector failure holds    holds name         type       size   group metadata data  status        availability ------------ -------- ------ ------- -------- ----- ------------- ------------ gpfs1nsd     nsd         512       1 yes      yes   ready         up gpfs2nsd     nsd         512       2 yes      yes   ready         up r01n33:~ #  df -h  Filesystem            Size  Used Avail Use% Mounted on /dev/sda3             9.1G  4.5G  4.6G  50% / shmfs                 7.3G     0  7.3G   0% /dev/shm /dev/gpfs0             33G   43M   32G   1% /bigfs r01n33:~ #  dd if=/dev/zero of=/bigfs/junk bs=1024k count=1024  1024+0 records in 1024+0 records out 

6.3.5 Shutting down and restarting GPFS

To shut down GPFS, the first step is to unmount the GPFS file systems on all the nodes where they are mounted. Then the mmshutdown command is issued on one of the nodes as shown in Example 6-25 on page 301.

Example 6-25. Shutting down GPFS
 r01n33:~ #  mmshutdown -a  Thu Oct 30 14:57:19 PST 2003: mmshutdown: Starting force unmount of GPFS file systems Thu Oct 30 14:57:24 PST 2003: mmshutdown: Shutting down GPFS daemons r01n33:  0513-044 The mmfs Subsystem was requested to stop. r01n33:  Shutting down! r01n33:  Unloading modules from /usr/lpp/mmfs/bin r01n33:  Unloading module mmfs r01n33:  Unloading module mmfslinux r01n33:  Unloading module tracedev r01n34:  0513-044 The mmfs Subsystem was requested to stop. r01n34:  Shutting down! r01n34:  Unloading modules from /usr/lpp/mmfs/bin r01n34:  Unloading module mmfs r01n34:  Unloading module mmfslinux r01n34:  Unloading module tracedev Thu Oct 30 14:57:33 PST 2003: mmshutdown: Finished 

To restart GPFS, simply issue the mmstartup -C nodesetID command as shown in Example 6-19 on page 296. If you created your GPFS file systems with the automatic mount on start up option, they should get mounted now.

 < Day Day Up > 


Quintero - Deploying Linux on IBM E-Server Pseries Clusters
Quintero - Deploying Linux on IBM E-Server Pseries Clusters
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 108

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net