| < Day Day Up > |
|
At some point in your career as a Red Hat Enterprise Linux administrator, maybe even on the Red Hat exams, you're going to be faced with a system that will not boot. It will be up to you to determine the cause of the problem and implement a fix. Sometimes, the problem may be due to hardware failure: the system in question has a bad power supply or has experienced a hard disk crash.
Quite often, however, the failure of a system to boot can be traced back to the actions of a user: you, the system administrator! When you are editing certain system configuration files, typographical errors can render your system unbootable.
Any time you plan to make any substantial modifications to your system or change key configuration files, back them up first. Then, after making changes, you should actually reboot your system rather than assume that it will boot up the next time you need a reboot. It's much better to encounter problems while you can still remember exactly which changes you made. It is even better if you can go back to a working configuration file.
Exam Watch | Know every detail that you can about the linux rescue environment for the RHCE exam. |
To prepare for boot failures, you should make sure you have a valid boot floppy for your system. But boot floppies can be lost. So it's also important to know how to use the Red Hat installation boot disk or CD to get to the Linux Rescue environment, first discussed in Chapter 2. Refer to that chapter for more information on creating a installation boot disk.
While most of this section applies to the RHCE exam, the RHCT part of the Exam Prep guide suggests that you need to know how to boot Linux into different runlevels, which you can learn about near the end of this part of this chapter.
When you installed RHEL 3, the last screen may have asked whether you wanted a boot disk. If you answered No to this prompt, you can still create a valid boot floppy for your computer using the mkbootdisk command. This command reads the selected kernel images in /boot and the default boot loader, GRUB or LILO, to create a LILO-style boot image on a floppy disk. For example, if the current version of the RHEL 3 kernel is 2.4.21-4.EL, use this command:
# mkbootdisk 2.4.21-4.EL
You may be able to fix a few problems with this boot disk, such as an accidentally deleted master boot record, by booting from your boot disk.
The mkbootdisk command in Red Hat Enterprise Linux 3 may not work as described. In a desktop environment, it worked perfectly. On my notebook computer, however, it created a syslinux.cfg file on the floppy, with the following two lines at the end of the file:
append initrd=initrd.img ro ro root=/dev/hda2
This actually causes a kernel panic. You can find out more in the Red Hat bug database at bugzilla.redhat.com. In this database, bug number 109834 suggests that this is also a problem on Red Hat Linux 9. But as shown in this bug and in bug 116446, the syslinux.cfg file is easily fixed. In this case, I'd combine these two lines into the following:
append initrd=initrd.img ro root=/dev/hda2
With this fix, the associated boot disk will now work.
If the kernel can't locate the root filesystem, or if the root filesystem is damaged, the Linux kernel will issue a kernel panic with messages similar to the following:
Creating root device Mounting root filesystem kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystems with ordered data mode. pivotroot: pivot_root (/sysroot,/sysroot/initrd) failed: 2) Freeing unused kernel memory: 272k freed Kernel panic: No init found. Try passing init= option to kernel
Although this may look very bad the first time you encounter it, often the problem can easily be fixed from the linux rescue environment with a little bit of work. Other problems may also require the use of the linux rescue environment, as described in the following section.
As discussed in Chapter 3, you can start Linux in rescue mode from the Red Hat Enterprise Linux installation CD or boot disk. When you type linux rescue at the installation boot prompt and go through the steps, the installation disks install a compact version of a root filesystem. As this information has to fit on two 1.44MB floppy disks, it includes a minimal set of utilities that will allow you to mount a disk and either repair the problem with the disk or edit the broken files on the disk.
Exam Watch | The RHCE portion of the Red Hat Exam Prep guide explicitly states that you need to know how to boot into linux rescue mode from the first RHEL 3 installation CD. |
To boot into rescue mode, first boot your system either using your boot floppy or directly with the first installation CD in a bootable CD-ROM drive, as shown in Figure 11-2.
Figure 11-2: Booting into linux rescue mode
If you've booted from the first RHEL 3 installation CD, you have two options at the boot prompt: you can type linux rescue or linux rescue askmethod and press ENTER. If you may not need access to the installation RPMs, or have booted from the RHEL 3 boot floppy, linux rescue is sufficient. If you may need the installation RPMs, linux rescue askmethod allows you to connect to the network installation server that you used in Chapter 2-and that may be available to you during the Red Hat exams.
On The Job | When booting from the RHEL 3 1.44 MB boot floppy, the linux rescue command is functionally equivalent to the linux rescue askmethod command when booting from the first RHEL 3 installation CD. |
When you run the linux rescue askmethod command, it's as if rescue mode isn't working; you're taken through the first steps of RHEL 3 installation process in text mode. You'll need to enter a language, a keyboard type, and the location of the RHEL 3 installation files, as shown in Figure 11-3.
Exam Watch | The Red Hat Exam Prep guide suggests that RHCE candidates need to know how to start rescue mode from the first RHEL 3 installation CD. If you need access to the RHEL 3 installation files on a remote network server, run the linux rescue askmethod command when you see the prompt shown in Figure 11-2. Otherwise, the linux rescue command is sufficient. |
Next, you're asked to configure an IP address, network mask, gateway, and primary (DNS) nameserver for on the local computer. Follow any relevant instructions on your exam carefully. Then, as described in Chapter 2, you're asked to point to the network installation server name or IP address, as well as the directory which contains the Red Hat installation files. Once the files associated with the linux rescue environment are loaded, you'll see the screen shown in Figure 11-4.
Figure 11-4: Three choices in the linux rescue environment
As you can see, you now have three options. I address each option in detail in the following sections:
Continue will search through and mount the available filesystems.
Read-Only performs the same tasks as Continue, except all filesystems that are found are mounted read-only.
Skip does not try to look through the available filesystems. Instead, it proceeds directly to a root shell prompt.
When you select Continue from the screen shown in Figure 11-4, you're taken through the standard linux rescue environment. The rescue files search for your root directory (/) filesystem. If found, your standard root directory (/) is mounted on /mnt/sysimage. All of your other regular filesystems are subdirectories of root; for example, your /etc directory will be found on /mnt/sysimage/etc.
Not all of your filesystems may mount properly. You may see error messages such as:
Error mounting filesystem on sdb1: Invalid argument
This suggests that at least the filesystem that you would normally mount on /dev/sdb1 isn't working for some reason. If the linux rescue environment can mount your root directory (/), you'll see a message noting that your system has been mounted, as shown in Figure 11-5.
Figure 11-5: The linux rescue environment has found your root directory (/)
Click OK. You should see the following prompt messages:
Your system is mounted under the /mnt/sysimage directory. When finished please exit from the shell and your system will reboot. sh-2.05b#
You'll use the chroot /mnt/sysimage command shortly. Now you can work on repairing any files or filesystems that might be damaged. First, check for unmounted filesystems. Run a df command. The output should look similar to Figure 11-6.
Figure 11-6: Labels, filesystems, and partitions
Compare the result to the /mnt/sysimage/etc/fstab configuration file. If some filesystem is not mounted, it may be configured incorrectly in the fstab file. Alternatively, the label associated with a partition may not match the filesystem shown in your fstab file. For example, to find the label associated with /dev/sda1, run the following command:
# e2label /dev/sda1
which should return the name of a filesystem to be mounted on that partition such as /boot.
Sometimes an unmounted filesystem just needs a little cleaning; remember, a command such as the following cleans the /dev/sdb1 partition.
# fsck /dev/sdb1
The fsck command works only on an unmounted filesystem. For example, if you get a message such as:
WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage.
unmount the subject filesystem such with a command such as umount /dev/sdb1. If that doesn't work, restart the rescue process. When you get to the screen shown in Figure 11-4, select Skip and read the 'No Mount linux rescue Environment' section later in this chapter.
Remember the message in Figure 11-5? It includes an important clue. All you need to do to restore the original filesystem structure is to run the following command:
# chroot /mnt/sysimage
When you use the rescue disk, your standard root directory (/) is actually mounted on the /mnt/sysimage directory. This command resets your standard root directory (/), so you don't have to go to the /mnt/sysimage subdirectory.
When you've made your changes, run the sync command, twice, to make sure any changes you've made are written to disk. Type the exit command, twice. Linux should automatically run the sync command again when you exit, making sure any changes are written to disk. Then it stops, allowing you to reboot or restart your computer.
On The Job | Normally, it should not be necessary to run the sync command. However, running it several times does make sure that any pending data is actually written to your floppy and hard disks. |
When you select the Read-Only option shown in Figure 11-4, you'll get the same basic prompt. There is little difference between regular and read-only rescue mode. The rescue system attempts to do everything that it would under regular mode, except all partitions are mounted read-only.
This is appropriate if you have a large number of mounted filesystems; it can help you cull through what is and isn't working with less risk of overwriting key configuration files.
When you select the Skip option shown in Figure 11-4, the installation files loads a minimal root image from into a RAM disk created by the kernel, and takes you to a root shell prompt (#) as shown:
When finished, please exit from the shell and your system will reboot. -/bin/sh-2.05b#
At this point, you have access to a basic set of commands. You can mount filesystems, create directories, move files, and edit files using vi. You can apply the fdisk and fsck commands to various hard disks and partitions. A few other basic commands are also available.
The greatest difficulty in operating from the rescue environment is that you are working with a minimal version of the Linux operating system. Many of the commands you are used to having at your disposal are not available at this level. If your root partition has not been completely destroyed, you may be able to mount this partition to your temporary root directory in memory and access commands from there.
But you may need a little help identifying the partitions on your system. As I'll show you shortly, the fdisk -l /dev/hda command lists the configured partitions on the first IDE hard drive. You can create a new directory such as /mnt/sysimage, mount a partition such as /dev/hda2 on that directory, and check the result with the following commands:
# mkdir /mnt/sysimage # mount /dev/hda2 /mnt/sysimage # ls /mnt/sysimage
If you can verify that you've mounted the standard root directory (/) filesystem on the /mnt/sysimage directory, you can run the chroot /mnt/sysimage command. You can then have full access to the commands and configuration files available under that mounted partition.
On The Job | If you mount partitions from your hard drive in rescue mode and then make changes to files on those partitions, remember to use the sync command. This writes your files to disk so the information isn't lost if you hit the power button on your computer. Alternatively, a umount command applied to any partition also writes data to disk. |
At the boot loader prompt, you can start Linux at a different runlevel. This may be useful for two purposes. If your default runlevel in /etc/inittab is 5, your system normally boots into the GUI. If you're having problems booting into the GUI, you can start RHEL into the standard text mode, runlevel 3.
Exam Watch | In the current Red Hat Exam Prep guide, it states that RHCTs (and therefore also RHCEs) should be able to boot systems into different runlevels. The method is described in this section. |
One other option to help rescue a damaged Linux system is single-user mode. This is appropriate if your system can find at least the root filesystem (/). Your system may not have problems finding its root partition and starting the boot process, but it may encounter problems such as damaged configuration files or an inability to boot into one of the higher runlevels. When you boot into single-user mode, your options are similar to the standard linux rescue environment where the system has already been mounted and the chroot /mnt/sysimage command has been applied.
To boot into a different runlevel, first let us assume that you're using the default RHEL 3 boot loader, GRUB. In that case, press P (lower case) to enter the GRUB password if required. Press A (lower case) to modify the kernel arguments. When you see a line similar to
grub append> ro root=LABEL=/
add one of the following commands (shown in bold) to the end of that line:
grub append> ro root=LABEL=/ single grub append> ro root=LABEL=/ 1 grub append> ro root=LABEL=/ init=/bin/sh
Alternatively, if you're using LILO, the linux single command will do nicely. Any of these commands will boot Linux into a minimal runtime environment, and you will receive a bash shell prompt (bash#).
Naturally, you can use the same technique to boot into another runlevel; for example, to boot from the GRUB boot loader into runlevel 3, navigate to where you can modify the kernel arguments, and add the following command to the end of the following line:
grub append> ro root=LABEL=/ 3
On The Job | The terms boot loader and bootloader are used interchangeably. For the purpose of this book, I've used the term 'boot loader,' as that seems to be the direction of the Red Hat documentation. However, the term 'bootloader' is still common even in Red Hat documentation. |
When you boot into single-user mode, no password is required to access the system. Running your system in single-user mode is somewhat similar to running a system booted into rescue mode. Many of the commands and utilities you normally use are unavailable. You may have to mount additional drives or partitions and specify the full pathname when running some commands.
When you have corrected the problem, you can reboot the system. Alternatively, you can type the exit command to boot into the default runlevel as defined in /etc/inittab, probably runlevels 3 or 5.
On The Job | In single-user mode, any user can change the root password. You do not want people rebooting your computer to go into single-user mode to change your root password. Therefore, it's important to keep your server in a secure location. Alternatively, you can password-protect GRUB to keep anyone with physical access to your computer from booting it in single-user mode. |
Although there are potentially many things that will cause a system not to boot, they can roughly be categorized as either hardware problems or software and configuration problems. The most common hardware-related problem you will probably encounter is a bad hard disk drive; like all mechanical devices with moving parts, these have a finite lifetime and will eventually fail. Fortunately, the Red Hat exams do not require you to address hardware failures.
Software and configuration problems, however, can be a little more difficult. At first glance, they can look just like regular hardware problems.
In addition to knowing how to mount disk partitions, edit files, and manipulate files, you will need to know how to use several other commands in order to be able to fix problems from rescue mode or single-user mode. The most useful of these are the df, fdisk, and the fsck commands. To diagnose a problem, you need to know how these commands work at least at a rudimentary level.
The Linux df command was covered briefly in Chapter 3. When you use df, you can find mounted directories, the capacity of each partition, and the percentage of each partition that's filled with files. The result shown back in Figure 11-6 illustrates the result in kilobytes. There are a couple of simple variations; the following commands give the result in megabytes and inodes:
# df -m # df -i
The Linux fdisk command was covered briefly in Chapter 3. When you use fdisk, you can find the partitions you have available for mounting. For example, the fdisk -l /dev/hda command lists available partitions on the first IDE hard disk:
# fdisk -l /dev/hda Disk /dev/hda: 15.0GB, 15020457984 bytes 240 heads, 63 sectors/track, 1940 cylinders Units = cylinders of 15120 * 512 = 7741440 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 949 7174408+ b Win95 FAT32 /dev/hda2 950 963 105840 83 Linux /dev/hda3 964 1871 6864480 83 Linux /dev/hda4 1872 1940 521640 f Win95 Ext'd (LBA) /dev/hda5 1872 1940 521608+ 82 Linux swap
Looking at the output from fdisk, it's easy to identify the partitions configured with a Linux format, /dev/hda2, /dev/hda3, and /dev/hda5. Given the size of each partition, it is reasonable to conclude that /dev/hda2 is associated with /boot, and /dev/hda3 is associated with root (/).
For simple partitioning schemes, this is easy. It gets far more complicated when you have lots of partitions, as in this next example. You should always have some documentation available that clearly identifies your partition layout within your filesystem:
# fdisk -l /dev/hda Disk /dev/hda: 26.8 GB, 26843545600 255 heads, 63 sectors/track, 3263 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14 268 2048287+ b Win95 FAT32 /dev/hda3 269 396 1028160 83 Linux /dev/hda4 397 3263 23029177+ f Win95 Ext'd (LBA) /dev/hda5 397 1097 5630751 83 Linux /dev/hda6 1098 1734 5116671 83 Linux /dev/hda7 1735 1989 2048256 83 Linux /dev/hda8 1990 2244 2048256 83 Linux /dev/hda9 2245 2372 1028218+ 83 Linux /dev/hda10 2373 2499 1020096 82 Linux swap /dev/hda11 2500 2626 1020096 83 Linux /dev/hda12 2627 2753 1020096 83 Linux /dev/hda13 2754 2880 1020096 83 Linux /dev/hda14 2881 3007 1020096 83 Linux /dev/hda15 3008 3134 1020096 83 Linux /dev/hda16 3135 3236 1020096 83 Linux
In this example, it's easy to identify the Linux swap partition. Since /boot partitions are small and normally configured toward the front of a drive, it's reasonable to associate it with /dev/hda1.
However, that is just a guess; some trial and error may be required. For example, after mounting /dev/hda2 on an empty directory, you would want to check the contents of that directory for the typical contents of /boot.
Based on the previous example, you probably could use a little help to identify the filesystems associated with the other partitions. That's where the e2label command can help. When you set up a new filesystem, the associated partition is normally marked with a label. For example, the following command tells you that the /usr filesystem is normally mounted on /dev/hda5.
# e2label Usage: e2label device [newlabel] # e2label /dev/hda5 /usr
You can get a lot more information on each partition with the dumpe2fs command, as shown in Figure 11-7.
Figure 11-7: The dumpe2fs command gives a lot of information.
The dumpe2fs command not only does the job of e2label but also tells you about the format, whether it has a journal, and the block size. Proceed further down this list, and you'll find the locations for backup superblocks, which can help you use the fsck or e2fsck commands to check select the appropriate superblock for your Linux partition.
On The Job | fsck is a 'front end' for e2fsck, which is used to check partitions formatted to the ext2 and ext3 filesystems. |
You should also know how to use the fsck command. This command is a front end for most of the filesystem formats available in Linux, such as ext2, ext3, reiserfs, vfat, and more. This command is used to check the filesystem on a partition for consistency. In order to effectively use the fsck command, you need to understand something about how filesystems are laid out on disk partitions.
When you format a disk partition under Linux using the mkfs command, it sets aside a certain portion of the disk to use for storing inodes, which are data structures that contain the actual disk block addresses that point to file data on a disk. The mkfs command also stores information about the size of the filesystem, the filesystem label, and the number of inodes in a special location at the start of the partition called the superblock. If the superblock is corrupted or destroyed, the remaining information on the disk is unreadable. Because the superblock is so vital to the integrity of the data on a partition, the mkfs command makes duplicate copies of the superblock at fixed intervals on the partition, which you can find with the dumpe2fs command described earlier.
The fsck command checks for, and corrects problems with, filesystem consistency by looking for things such as disk blocks that are marked as free but are actually in use (and vice versa), inodes that don't have a corresponding directory entry, inodes with incorrect link counts, and a number of other problems. The fsck command will also fix a corrupted superblock. If fsck fails due to a corrupt superblock, you can use the fsck command with the -b option to specify an alternative superblock. For example, the command:
# fsck -b 8193 /dev/hda5
tells fsck to perform a consistency check on the filesystem on disk partition /dev/hda5, using the superblock located at disk block 8193.
Exam Watch | Get to know the key commands and the associated options for checking disks and partitions: fdisk, e2label, dumpe2fs, and fsck. Practice using these commands to check your partitions-on a test computer! (Some of these commands can destroy data.) |
There are two boot loaders, GRUB and LILO. While you may be more familiar with LILO, Red Hat Enterprise Linux has adapted GRUB as the default boot loader. One of the benefits is that any changes that you make to the GRUB configuration file, /boot/grub/grub.conf, need not be written to your hard disk's Master Boot Record (MBR). However, if your MBR has been overwritten by an MS-DOS or Windows NT/2000/2003 boot loader, you can tell your BIOS to look for GRUB with the grub-install command. For example, if the /boot directory is on the first SCSI hard drive, you would run the following command:
# grub-install /dev/sda
Alternatively, if you are using LILO, you need to run the lilo command whenever you rebuild your Linux kernel or change the disk partition associated with the /boot directory. Otherwise, LILO may not be able to find your boot files. In this case, you will have to use the linux rescue environment to fix the problem.
In either case, errors to the boot loader configuration file are a common problem that can keep Linux from booting properly.
You may find corruption in some key files or commands such as mount or init. If you do, one option is to reload the files from the original RPMs. For example, if the mount command were to be corrupted or erased, you can reload it from the mount RPM.
When you boot your system into the linux rescue environment, using a network source, you have access to those network source files in the /mnt/source directory. After your computer boots into the linux rescue environment, you'll want to take the following steps:
Run the df command. You should see how the linux rescue environment mounted your partitions. You should also see your network source on the /mnt/source directory.
Copy the mount RPM from the /mnt/source directory. This allows you to reinstall the mount RPM later with files in the correct locations. Use the following command:
# cp /mnt/source/RedHat/RPMS/mount-*.rpm /mnt/sysimage/root/
Run the following chroot command to move into the standard directory tree:
# chroot /mnt/sysimage
Install the mount RPM, forcing installation over current files.
# rpm -Uvh --force /root/mount-*.rpm
Check the status of the mount command.
# rpm -Vf /bin/mount
If you see no output, you'll know that there is no longer a problem with the mount command. (You can also use this command at the start of the process to see if there is a problem.) You should now be able to run the exit command twice to reboot your computer-and at least this problem should be solved.
Two places where you are likely to make errors that result in an unbootable system are in the boot loader and filesystem configuration files, /boot/grub/grub.conf and /etc/fstab. In each case, identifying the wrong partition as the root partition (/) can lead to a kernel panic. Other configuration errors in /boot/grub/grub.conf can also cause a kernel panic when you boot Linux. Whenever you make changes to these files, the only way to test them out is to reboot Linux.
Exam Watch | As a Red Hat Enterprise Linux administrator, you will be expected to know how to fix improperly configured boot files. For this reason, a substantial portion of the exam is devoted to testing your troubleshooting and analysis skills. |
The following scenarios and solutions list some possible problems and solutions that you can have during the boot process, and possible associated solutions. It is far from comprehensive. The solutions that I've listed work on my computer, as I've configured it. There may be (and often is) more than one possible cause. These solutions may not work for you on your computer or on the Red Hat exams. To know what else to try, you need more experience.
To get the equivalent of more experience, try additional scenarios as proposed in the following Scenario & Solution. Once you're familiar with the linux rescue environment, test these scenarios. For the first scenario shown, change the name of the grub.conf file so it can't be loaded. See what it does on your system. Use the linux rescue environment to boot into RHEL 3 and use the noted solution to fix your system. Two of the possible error messages are shown in Figure 11-8 and Figure 11-9.
Figure 11-8: One possible error message
Figure 11-9: A second possible error message
If you have a problem during the boot process, get as much information as you can. Use the experience that you have to recognize or diagnose the problem. Then boot into your system using a different method, confirm and then fix the problem (naturally, that's the hard part).
Exam Watch | Whenever you're working in rescue mode or single-user mode, always remember to run the sync command to save changes to your drives before halting or rebooting your system. |
The easiest way to boot into an unbootable system is with a customized boot disk. Alternatively, you may be able to boot into your system at a different runlevel such as 1 or 3. If that is not available or appropriate to your problem, you'll also need to know how to use the linux rescue environment to rescue a system, using the following basic steps:
Boot using a Red Hat Enterprise Linux installation floppy or the first installation CD.
Know the location of your installation files, such as from a CD or over a network. You are taken to single-user mode.
At the rescue shell prompt, use fdisk -l diskdevice to identify your partitions.
If filesystem problems are suspected or indicated, run fsck on the afflicted partitions.
If the problem is with a configuration file:
Create (a) temporary mount point(s), if necessary.
Mount the appropriate partition(s), if necessary.
Use the vi editor to fix the problem in the broken file(s).
Sync your changes to the drive.
Type the exit command as needed to restart the system.
When you boot RHEL 3, you see a grub> prompt in place of the standard GRUB boot menu. | You may have a problem that prevents the boot loader from reading the GRUB configuration file, grub.conf. The file may be missing or corrupt. |
When you boot your computer, you see the following message: 'Missing operating system' | Your Master Boot Record (MBR) has been erased, and you'll need to go into the Linux rescue environment and run grub-install /dev/hda (or /dev/sda) to reload GRUB on the MBR. |
During the boot process, you see the following message: 'Cannot open file ‘/proc/mounts' for reading (no such file or directory)' | You may have a problem with a corrupt mount command. You'll need to reload it from the mount RPM. |
You see the following prompt: init-2.05b# | Check the current directory tree. If you see the standard directories, your init command may be corrupt. Try reloading it from the SysVinit RPM. |
You see the following message: 'INIT: No inittab file found' | This is straightforward-there is something wrong with your /etc/inittab file; it may be erased. You'll want to go into the linux rescue environment and restore it from backup or the initscripts RPM. |
You see a message such as what's shown in Figure 11-8. | You may have a problem with the integrity of the /etc/fstab file. Start the linux rescue environment, check the integrity of /etc/fstab. If there is still a problem, run the steps described earlier to check the superblock and more. (For more experience, try including additional errors in /etc/fstab.) |
You see a message similar to what's shown in Figure 11-9. Take careful note of the last file cited in the message. | RHEL 3 has encountered some problem when reading the grub.conf configuration file. Start the Linux rescue environment and check this file. Alternatively, check the files noted in grub.conf referenced in the /boot directory. For example, you can create a new initrd file with the mkinitrd command. (For more experience, try introducing other errors in the /boot directory.) |
When you boot RHEL 3 into runlevel 5, you don't see a graphical login screen; a text login screen flashes periodically. | You may have some problem with the X Font Server. Boot into a different runlevel that does not require the X server, such as 1, 2, or 3. Check the xfs service script to see that it's set to run in runlevel 5. If yes, check for related error messages. |
You see a Welcome to Kudzu screen during the boot process; afterwards, it stops after detecting a network card. | This is symptomatic of a problem with one of the boot scripts, such as /etc/inittab or /etc/rc.d/rc.sysinit. Boot into the linux rescue environment and check the files associated with the boot process. |
Exercise 11-4: Performing an Emergency Boot Procedure
To do this exercise, you should have a test system at your disposal. Any data on the computer where you do this procedure is at risk. In this exercise, you will 'break' your system by purposely misconfiguring a file and then reboot into the linux rescue environment to fix the problem. You'll be configuring a boot option in your GRUB menu which makes Linux look for boot files in the wrong partition.
Install the mkbootdisk RPM from a network source if required:
# rpm -q mkbootdisk # rpm -Uvh /mnt/inst/RedHat/RPMS/mkbootdisk*
Even if you already have a boot disk, make one. Insert a floppy into the disk drive and type the following (the `uname -r` command switch includes the version number of the current kernel):
# mkbootdisk `uname -r`
Edit the file /boot/grub/etc/grub.conf and make a copy of your boot stanza. Title this stanza badboot. Change the location of the root device to point to an invalid partition. For example, if your original grub.conf looks like this:
default=0 timeout=10 splashimage=(hd0,0)/grub/splash.xpm.gz title Red Hat Enterprise Linux ES (2.4.21-4.EL) root=(hd0,0) kernel /vmlinuz-2.4.21-4.EL ro root=LABEL=/ initrd /initrd-2.4.21-4.EL.img
your new version should look like this:
default=0 timeout=10 splashimage=(hd0,0)/grub/splash.xpm.gz title Red Hat Enterprise Linux ES (2.4.21-4.EL) root=(hd0,0) kernel /vmlinuz-2.4.21-4.EL ro root=LABEL=/ initrd /initrd-2.4.21-4.EL.img title badboot root=(hd0,1) kernel /vmlinuz-2.4.21-4.EL ro root=LABEL=/ initrd /initrd-2.4.21-4.EL.img
Reboot your system. In the GRUB menu, select badboot. GRUB will return a 'File not found' message.
Since you left a valid boot stanza, your system isn't really broken. To fix the problem, however, you're going to boot into rescue mode. Insert your first RHEL 3 installation CD, and reboot the system. At the prompt, type linux rescue askmethod.
Proceed through the first steps of the Red Hat Enterprise Linux installation process.
When you see the Rescue menu, select Skip. None of your partitions will be mounted.
Although you know the source of the problem, once you boot into rescue mode, you should familiarize yourself with some of the repair utilities:
-/bin/sh-2.05b# fdisk -l Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14 474 3702982+ 83 Linux /dev/hda3 475 522 385560 82 Linux swap
The output of the following command will vary.
-/bin/sh-2.05b# fsck -y /dev/hda1 fsck 1.32 (09-Nov-2002) WARNING: couldn't open /etc/fstab: No such file or directory e2fsck 1.32 (09-Nov-2002) /boot: clean, 47/26104 files, 18339/104391 blocks
Create (a) temporary mount point(s) for your /boot and root directory (/) partitions, and mount those partitions (if they are not already mounted). If the output from fdisk -l is different for you, revise the mounted devices accordingly.
-/bin/sh-2.05b# mkdir /tmpmnt -/bin/sh-2.05b# mkdir /tmpmnt/boot -/bin/sh-2.05b# mount /dev/hda1 /tmpmnt/boot
Edit the bad stanza in grub.conf and fix the problems:
-/bin/sh-2.05b# vi /tmpmnt/boot/grub/grub.conf
Your new version should look like this:
title badboot root=(hd0,0) kernel /vmlinuz-2.4.21-4.EL ro root=LABEL=/ initrd /initrd-2.4.21-4.EL.img
Save your changes to the grub.conf file.
Unmount any mounted partitions and sync your changes:
# umount /dev/hda1 # sync
Remove any boot media from your disk drives. Type exit to unmount all drives and restart the system. You should now be able to boot from the badboot stanza.
| < Day Day Up > |
|