At some point in your career as a Red Hat Enterprise Linux administrator, maybe even on the Red Hat exams, you're going to be faced with a system that will not boot. It will be up to you to determine the cause of the problem and implement a fix. Sometimes, the problem may be due to hardware failure: the system in question has a bad power supply or has experienced a hard disk crash.
Quite often, however, the failure of a system to boot can be traced back to the actions of a user: you, the system administrator! When you are editing certain system configuration files, typographical errors can render your system unbootable.
Any time you plan to make any substantial modifications to your system or change key configuration files, back them up first. Then, after making changes, you should actually reboot your system rather than assume that it will boot up the next time you need a reboot. It's much better to encounter problems while you can still remember exactly which changes you made. It is even better if you can go back to a working configuration file.
As described earlier in this chapter, the main tool is the linux rescue environment provided by the first installation CD. Per the Exam Prep guide, you also need to know how to diagnose and correct boot failures arising from bootloader, module, and filesystem errors; It's broken down into three sections. In addition, some of the key tools to diagnose and correct problems with network services, as described in previous chapters, are summarized here. Key tools are discussed that allow you to add, remove, and resize logical volumes. And finally to diagnose and correct networking services problems where SELinux contexts are interfering with proper operation, use the Setroubleshoot browser described in Chapter 15.
The boot loader associated with Red Hat Enterprise Linux 5 is GRUB. For an extensive discussion, see Chapter 3. It can help you to know how to:
Associate the root directive with the partition with the /boot directory.
Boot into the desired, non-default runlevel.
Access the GRUB command line.
Test different GRUB commands.
Use command completion to find and use the exact names of your kernel and initial RAM disk.
While it isn't necessary to know all these skills, they can help you diagnose problems more quickly during your exam.
Exercise 16-6: Troubleshooting the Boot Loader
For this exercise, you'll need a partner. Have your partner make changes to your system. As your partner works to create a network problem for you to solve on your computer, look away until the computer is rebooting.
It's most helpful if you have a VMware snapshot of your RHEL system. Problems like those created in this exercise have caused administrators to mess up their systems in other ways. You'll also need the first RHEL installation CD.
Back up the configuration file associated with the boot loader, /boot/grub/grub.conf. Make sure to back up this file to a non-standard location, in case your partner also backs up any files before changing them.
Open the /boot/grub/grub.conf configuration file in a text editor. Focus on the kernel command line, which might look like one of the following:
kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/VolGroup00/LogVol00
or
kernel /vmlinuz-2.6.18-8.el5 ro root=LABEL=/
Introduce a typographical error in the root directive in the kernel command line.
Reboot the system, and let your partner back at the computer. Tell him or her to address the error message shown. Give your partner the first RHEL 5 installation CD.
Make sure to tell your partner to back up any files that he or she might change to the appropriate home directory.
Whatever happens, restore the original /boot/grub/grub.conf configuration file when your partner is finished with this exercise. (Alternatively, you can restore the configuration from a VMware snapshot.)
Most kernels are compiled with loadable modules. Current Linux distributions, including RHEL, configure modules in the initial RAM disk, which is compiled into a initrd-* file in the /boot directory. As you can see in the GRUB configuration file, the initial RAM disk is normally associated with the last line in a GRUB configuration stanza. As described in Chapter 8, you can create a new initial RAM disk configuration file with the mkinitrd command. But errors are also possible, as you'll see in the following exercise.
Exercise 16-7: Troubleshooting Boot Loader Modules
For this exercise, you'll need a partner. Have your partner make changes to your system. As your partner works to create a network problem for you to solve on your computer, look away until the computer is rebooting.
It's most helpful if you have a VMware snapshot of your RHEL system. Problems like those created in this exercise have caused administrators to mess up their systems in other ways. You'll also need the first RHEL installation CD.
Back up the configuration file associated with the boot loader, /boot/grub/grub.conf. Make sure to back up this file to a non-standard location, in case your partner also backs up any files before changing them.
Open the /boot/grub/grub.conf configuration file in a text editor. Focus on the initrd command line, which might look like the following:
initrd /initrd-2.6.18-8.el5.img
Misspell both initrd words in this line.
Reboot the system, and let your partner back at the computer. Tell him or her to address the error message shown. Give your partner the first RHEL 5 installation CD.
Make sure to tell your partner to back up any files that he or she might change to the appropriate home directory.
Whatever happens, restore the original /boot/grub/grub.conf configuration file when your partner is finished with this exercise. (Alternatively, you can restore the configuration from a VMware snapshot.)
Although there are potentially many things that will prevent a system from booting, these problems can be generally categorized as either hardware problems or software and configuration problems. The most common hardware-related problem you will probably encounter is a bad hard drive; like all mechanical devices with moving parts, these have a finite lifetime and will eventually fail. Fortunately, the Red Hat exams do not require you to address hardware failures.
Software and configuration problems, however, can be a little more difficult. At first glance, they can look just like regular hardware problems.
In addition to knowing how to mount disk partitions, edit files, and manipulate files, you will need to know how to use several other commands to fix problems from rescue mode or single-user mode. The most useful of these are the df, fdisk, and the fsck commands. To diagnose a problem, you need to know how these commands work at least at a rudimentary level.
The Linux df command was covered briefly in Chapter 4. When you use df, you can find mounted directories, the capacity of each partition, and the percentage of each partition that's filled with files. The result shown back in Figure 16-8 illustrates the result in kilobytes. There are a couple of simple variations; the following commands provide output in megabytes and inodes:
# df -m # df -i
The Linux fdisk and parted utilities were covered briefly in Chapter 4. When you use fdisk or parted, you can find the partitions you have available for mounting. For example, the fdisk -l /dev/hda (or parted /dev/hda print) command lists available partitions on the first IDE hard disk:
# fdisk -l /dev/hda Disk /dev/hda: 15.0GB, 15020457984 bytes 240 heads, 63 sectors/track, 1940 cylinders Units = cylinders of 15120 * 512 = 7741440 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 949 7174408+ b Win95 FAT32 /dev/hda2 950 963 105840 83 Linux /dev/hda3 964 1871 6864480 83 Linux /dev/hda4 1872 1940 521640 f Win95 Ext'd (LBA) /dev/hda5 1872 1940 521608+ 82 Linux swap
Looking at the output from fdisk, it's easy to identify the partitions configured with a Linux format, /dev/hda2, /dev/hda3, and /dev/hda5. Given the size of each partition, it is reasonable to conclude that /dev/hda2 is associated with /boot, and /dev/hda3 is associated with root (/). Here's a fairly complex output from parted:
# parted /dev/sda print Model: ATA HDS728080PLA380 (scsi) Disk /dev/sda: 82.3GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 197MB 197MB primary ext3 boot 2 197MB 15.2GB 15.0GB primary ext3 3 15.2GB 16.2GB 1003MB primary linux-swap 4 16.2GB 82.3GB 66.1GB extended 5 16.2GB 16.3GB 107MB logical ext3 6 16.3GB 26.8GB 10.5GB logical ext3 7 26.8GB 41.8GB 15.0GB logical fat32 lba 8 41.8GB 51.8GB 10.0GB logical ext3 9 51.8GB 82.3GB 30.5GB logical ext3 Information: Don't forget to update /etc/fstab, if necessary.
In this example, it's easy to identify the Linux swap partition. Since /boot partitions are small and normally configured toward the front of a drive (with a boot flag), it's reasonable to associate it with /dev/sda1.
For simple partitioning schemes, this is easy. It gets far more complicated when you have lots of partitions. You should always have some documentation available that clearly identifies your partition layout within your filesystem:
# fdisk -l /dev/hda Disk /dev/hda: 26.8 GB, 26843545600 255 heads, 63 sectors/track, 3263 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14 268 2048287+ b Win95 FAT32 /dev/hda3 269 396 1028160 83 Linux /dev/hda4 397 3263 23029177+ f Win95 Ext'd (LBA) /dev/hda5 397 1097 5630751 83 Linux /dev/hda6 1098 1734 5116671 83 Linux /dev/hda7 1735 1989 2048256 83 Linux /dev/hda8 1990 2244 2048256 83 Linux /dev/hda9 2245 2372 1028218+ 83 Linux /dev/hda10 2373 2499 1020096 82 Linux swap /dev/hda11 2500 2626 1020096 83 Linux /dev/hda12 2627 2753 1020096 83 Linux /dev/hda13 2754 2880 1020096 83 Linux /dev/hda14 2881 3007 1020096 83 Linux /dev/hda15 3008 3134 1020096 83 Linux /dev/hda16 3135 3236 1020096 83 Linux
In this example, it's easy to identify the Linux swap partition. Since /boot partitions are small and normally configured toward the front of a drive, it's reasonable to associate it with /dev/hda1.
However, that is just a guess; some trial and error may be required. For example, after mounting /dev/hda2 on an empty directory, you would want to check the contents of that directory for the typical contents of /boot.
Based on the previous output from fdisk -l, you could probably use a little help to identify the filesystems associated with the other partitions. The e2label command can help. When you set up a new filesystem, the associated partition is normally marked with a label. For example, the following command tells you that the /usr filesystem is normally mounted on /dev/hda5.
# e2label Usage: e2label device [newlabel] # e2label /dev/hda5 /usr
You can get a lot more information on each partition with the dumpe2fs command, as shown in Figure 16-9.
Figure 16-9: The dumpe2fs command provides a lot of information.
The dumpe2fs command not only does the job of e2label but also tells you about the format, whether it has a journal, and the block size. Proceed further through the output, and you'll find the locations for backup superblocks, which can help you use the fsck or e2fsck command to select the appropriate superblock for your Linux partition.
On the Job | fsck is a "front end" for e2fsck, which is used to check partitions formatted to the ext2 and ext3 filesystems. |
You should also know how to use the fsck command. This command is a front end for most of the filesystem formats available in Linux, such as ext2, ext3, reiserfs, vfat, and more. This command is used to check the filesystem on a partition for consistency. In order to use the fsck command effectively, you need to understand something about how filesystems are laid out on disk partitions.
When you format a disk partition under Linux using the mkfs command, it sets aside a certain portion of the disk to use for storing inodes, which are data structures that contain the actual disk block addresses that point to file data on a disk. The mkfs command also stores information about the size of the filesystem, the filesystem label, and the number of inodes in a special location at the start of the partition called the superblock. If the superblock is corrupted or destroyed, the remaining information on the disk is unreadable. Because the superblock is so vital to the integrity of the data on a partition, the mkfs command makes duplicate copies of the superblock at fixed intervals on the partition, which you can find with the dumpe2fs command described earlier.
The fsck command checks for and corrects problems with filesystem consistency by looking for things such as disk blocks that are marked as free but are actually in use (and vice versa), inodes that don't have a corresponding directory entry, inodes with incorrect link counts, and a number of other problems. The fsck command will also fix a corrupted superblock. If fsck fails due to a corrupt superblock, you can use the fsck command with the -b option to specify an alternative superblock. For example, the following command performs a consistency check on the filesystem on disk partition /dev/hda5, using the superblock located at disk block 8193:
# fsck -b 8193 /dev/hda5
Get to know the key commands and the associated options for checking disks and partitions: fdisk, e2label, dumpe2fs, and fsck. Practice using these commands to check your partitions-on a test computer! (Some of these commands can destroy data.)
You may find corruption in some key files or commands such as bash or init. If you do, one option is to reload the files from the original RPMs. For example, if the init command were to be corrupted or erased, you can reload it from the SysVinit RPM.
When you boot your system into the linux rescue environment, you'll still need to connect to the network source associated with the installation RPMs. So you should set up networking as described earlier when you boot into the linux rescue environment. Assume the /bin/bash command is corrupt or missing and has to be replaced. With that in mind, take the following steps:
Run the df command. You should see how the linux rescue environment mounted your partitions.
If possible, mount the network source. If it's an NFS share in the /inst directory on the server.example.com system, you can do so on the /mnt/ source directory (create /mnt/source if required) with the following command:
# mount -t nfs server.example.com:/inst /mnt/source
Copy the bash RPM from the /mnt/source directory. It isn't necessary, but can help if your connection is dropped. Use the following command:
# cp /mnt/source/Server/bash-*.rpm .
Install the bash RPM, forcing installation over current files:
# rpm -Uvh --force --root=/mnt/sysimage bash-*.rpm
Run the following chroot command to move into the standard directory tree:
# chroot /mnt/sysimage
Check the status of the bash command.
# rpm -Vf /bin/bash
If you see no output, you'll know that there is no longer a problem with the bash command. (You can also use this command at the start of the process to see if there is a problem.) You should now be able to run the exit command twice to reboot your computer-and at least this problem should be solved.
The Red Hat Exam Prep guide says that you may have to diagnose and correct problems with network services during the Troubleshooting and System Maintenance portion of the RHCE exam. These are the same network services that you may need to configure during the Installation and Configuration portion of the same exam. Example scenarios and solutions are shown as follows. Needless to say, the solutions are far from complete: for example, firewalls and network configuration issues can prevent communication to any network service.
You may need to add, remove, and/or resize a logical volume (LV) during the Troubleshooting and System Maintenance portion of the RHCE exam. As you'll recall from Chapter 8, LVs involve commands that manage physical volumes (PVs), LVs, and volume groups (VGs). You can manage this process from the command line or the LVM Configuration Tool. For detailed information on each process, see Chapter 8. In this section, we'll review the basics. (If you're not already familiar with logical volumes, reread Chapter 8. Otherwise, this section will probably confuse you to no end.)
SCENARIO & SOLUTION | |
---|---|
A connection to a local Web server results in a 404 file not found error. | Make sure there is an appropriate index.html in the DocumentRoot directory, and the SELinux permissions for index.html match what you see from a ls -Z/var/www/html command. |
You're unable to connect to your Samba home directory on a remote system. | Check the [homes] share in /etc/samba/smb.conf. Use testparm to check that configuration file for errors. |
You have trouble connecting to a remote NFS share. | Make sure the directory is shared properly in /etc/ exports, the directory has been exported, is visible from a showmount -e hostname command, and appropriate services are running. |
You're unable to log into the remote FTP server anonymously. | Make sure anonymous access is enabled in /etc/vsftpd/vsftpd.conf. |
You can't connect through the Web proxy server. | Make sure appropriate access is enabled to the right network in /etc/squid/squid.conf. |
You're unable to send outgoing e-mail through a local network server. | Make sure one outgoing mail server (sendmail or Postfix) is active and properly configured. |
You're unable to receive secure POP3 e-mail through a local network server. | Make sure the Dovecot service is properly configured in /etc/dovecot.conf. If you use an alternative service, configure its configuration file instead. |
You want to limit SSH access to specific users. | Use the AllowUsers directive in /etc/ssh/sshd_config. |
First, to check the current status of your LVs, PVs, and VGs, the lvdisplay, pvdisplay, and vgdisplay commands can help. Those commands work with the mount command and the contents of /etc/fstab to confirm your configuration and the currently mounted LVs.
If you need to add an LV, and you don't have sufficient free space in your PVs/VGs, you'll need to create a new partition, configured to the appropriate partition type in a tool such as fdisk or parted. This section examines the process of creating a new LV. If you have room in existing PVs or VGs, you can skip ahead and add an LV more quickly.
To create a new PV from a properly configured partition, you can use the pvcreate command. For example, if you've just created /dev/hda9 and /dev/hda10 for this purpose, use the following command:
# pvcreate /dev/hda9 /dev/hda10
You can reverse the process with the pvremove command-in this case:
# pvremove /dev/hda9 /dev/hda10
If you've already assigned space from the noted partitions to an LV, the pvremove command will give you an error message.
To create a new VG from one or more unused PVs, use the vgcreate command. For example, if you wanted to create a VG named myvg from the aforementioned PVs, use the following command:
# vgcreate myvg /dev/hda9 /dev/hda10
If this command works, you'll find a VG device in /dev/myvg.
To create a new LV from an available VG, the lvcreate command can help. For example, if you wanted to create a new LV of 1000MB from the VG named myvg, run the following command:
# lvcreate -L 1000 myvg
You'll see the name of the new LV in the output; normally it'll be something like lvol0. In this case, the LV device is /dev/myvg/lvol0.
Expanding the space available from an LV is a two-step process. First, you'll extend the LV in available free space, perhaps from a PV that you've just created. Then you'll expand the filesystem to fill the LV with the resize2fs command.
To expand an existing LV, use the lvextend command. You'll need an existing PV partition with available free space, such as /dev/sda2. For example, if you're increasing the size of the aforementioned LV by 500MB, use the following command:
# lvextend -L +500 /dev/myvg/lvol0 /dev/sda2
You can then resize ext2 or ext3 formatted filesystems with the resize2fs command. For the noted device, the command would be
# resize2fs /dev/myvg/lvol0
If you're decreasing the size of an LV, the process is more complex. There is no command similar to resize2fs that reduces the size of a formatted filesystem. You'll need to unmount the LV, and then apply the resize2fs command to the LV device file, with the desired smaller size at the end of the command.
Removing an LV is easy; as long as the data in the LV has been saved, the lvremove command works well. For example, if you've created a second LV in the same VG, you might use the following command:
# lvremove /dev/myvg/lvol1
The final Troubleshooting and System Maintenance item in the RHCE part of the Exam Prep guide is the ability to diagnose and correct networking services problems where SELinux contexts are interfering with proper operation.
In most cases, this is simpler than it looks. SELinux log messages are stored in /var/log/messages with an avc label. But even better, the Setroubleshoot browser can identify SELinux issues, describe causes, and even suggest solutions. Watch it for suggested commands such as chcon to change SELinux contexts and sesetbool to set SELinux booleans. All you need to do is open the browser in a GUI with the sealert -b command, and browse the most recent errors. For more information, see Chapter 15.