4.5 Linux rescue methods

 < Day Day Up > 

4.5 Linux rescue methods

You may often face problems with boot loaders, file systems, configuration files and so on. These problems may, in some cases, prevent you from booting up the system for fixing or debugging. In this section, we discuss common problems that you may encounter, and describe solutions. Figure 4-7 shows problem determination flow.

Figure 4-7. Flow chart of debugging Linux for pSeries

graphics/04fig07.gif

4.5.1 Boot loader corruption

The most common problem in Linux for pSeries that administrators face is that the system is unable to boot after installing. This could also happen if you have accidentally overwritten your boot loader. If the boot loader in the PReP partition called yaboot is corrupted, you will need to create another a new partition and reactivate it.

PReP boot loader corrupted

Before diagnosing boot loader corruption, make sure the system or LPAR boots up to the E1F1 LED panel, and proceed to load the respective boot loader from the disk.

  1. Boot from the network or from CD-ROM and load the appropriate driver that you may need for your system. Refer to 2.1.4, "Review your choices" on page 23 for the basic device drivers used by adapters and Linux on pSeries.

  2. Load the all the modules that are needed for your system to load, as shown in Figure 4-8. For example, if you have an SCSI external CD-ROM that you plan to use to boot the installation, you will need to load the driver for the SCSI card.

    Figure 4-8. Loading modules from the SuSE Installer

    graphics/04fig08.jpg

  3. After loading the required module, you will then select the Start Rescue System shown in Figure 4-9 on page 186, and then select the location of your kernel. You have the choice of booting from CD, network, hard disk or floppy. When boot into the rescue mode, the SLES Installer will give you a small Linux operating system located in the ramdisk . With this, you will be given a "Rescue" prompt with full root access.

    Figure 4-9. Booting into rescue mode for recovery

    graphics/04fig09.jpg

  4. Do a file system check on your disk:

      Rescue :/ # fsck.reiserfs /dev/<disk>  
  5. Create a new partition with the command fdisk and set the type of the partition to PReP Boot (ID 41) and active boot device. The size of the partition is recommended not to exceed 8 Mb in size , as it will only contain the image that will be used to boot the system.

      Rescue :/ # fdisk /dev/<disk>  

    Select option "n" and add a primary partition. If you have an existing partition, use the option "d" to delete it first. Make sure the size created is less than 8 Mb. Then use option "t" to change the boot type to 41. 41 is the boot type for PPC PReP Boot.

  6. Recreate the PReP Boot image using the dd command:

      Rescue :/ # dd if=/boot/yaboot.chrp of=/dev/<disk> bs=4k  
  7. Reboot the system. If everything goes right, you should now be able to boot your system without any problem.

Tip

If you have a DHCP and NFS server, you can place the zImage.initrd.ppc64-2.4.21 kernel file into the server. The file is available from the first SLES8 CD. Set the server or LPAR to boot from this kernel image. In this way, you can rescue or boot your system even if the PReP boot partition is corrupted.


4.5.2 File system corruption

Very often, when the server did not shut down properly, the file systems or file could risk corruption. Although a journaled file system can help in many cases, it is not foolproof. There are cases where you might need to rebuild the logs and database structure.

File system corruption
  1. If you have file system corruption or configuration file corruption, you can boot the system into single user mode. If it is your root partition that is corrupted, skip this step and proceed to step 3.

    If you choose not to use yaboot for booting automatically (for example, in dual-boot systems), you should still create the PReP boot loader, but it must be not active. You can boot up your system to the openfirmware prompt and the pass the respective parameter to the yaboot prompt:

      0> boot disk  

    When it reaches the yaboot prompt, key in the following:-

      yaboot : linux single console=hvc0  

    Figure 4-10 on page 188 shows the diagram of booting the disk from the openfirmware prompt to the yaboot prompt.

    Figure 4-10. Boot up system into rescue from open firmware

    graphics/04fig10.gif

  2. Running file system check on the file system:

      (none):~ # fsck.reiserfs /dev/<disk>  
  3. If this did not fix your problem, you will need to use the first CD1 from the SLES and boot into rescue mode. After boot into rescue mode, rerun the fsck.reiserfs command. If you are using other file systems, the command will differ .

    After that, create a new mount point in the rescue system and then mount your file systems over it.

  4. Mount the root file system into a mount point:

      Rescue:/ # mount -t reiserfs /dev/sda2 /mnt/<mount_point>  
  5. If necessay, you can modify and update /etc/fstab accordingly so that it can boot up correctly.

Should you need to reset the root password, you can change the root directory and provide the root password accordingly.

  Rescue:/ # chroot /mnt/root   Rescue:/ # passwd root  

4.5.3 RHAS 3 rescue mode

For RHAS 3, you can use the installation disk in rescue mode to provide quick access to your disk partition to perform recovery and changes for your corrupted Linux system. To boot up into rescue mode, boot up the CD-ROM until the yaboot prompt:

  yaboot : linux rescue  

If you do not have a CD-ROM attached to the system, you can boot up the system into open firmware and run the following. You also press the key 8 when the LED shows E1F1; this will get you to the open firmware prompt as well.

  0 > boot net rescue  

Once the kernel is loaded, select the language and the location of the rescue image. Then the installation program will attempt to mount the disk partition on your system. It will presents you with a shell prompt, where you can perform the necessary rescue methods. To exit, type: exit 0 ; this will automatically reboot the system.

Refer to 2.3.3, "Unattended installation" on page 67 for information about how to set up the network boot.

4.5.4 Using /proc file systems

The /proc file system in Linux provides real-time information about the kernel and the hardware devices that are present in the server. Some of these are read-only and others are read-write which allows you to modify or tune the hardware for better performance. Refer to "File system tuning" on page 208 for information on how to tune your system using /proc.

Some commonly used commands on Linux are listed in Table 4-4 on page 190.

Table 4-4. Commands used on Linux

# procinfo

Provides a brief overview of the system.

# cat /proc/ cpuinfo

Display CPU information, clock speed.

# cat /proc/ppc64/lparcfg

Display a snapshot of the current LPAR configuration; this is useful for the server attached to HMC.

  # cat /proc/ppc64/lparcfg  serial_number=IBM,xxxxxxxx    system_type=IBM,7038-6M2    partition_id=1    system_active_processors=2    system_potential_processors=2    partition_active_processors=2    partition_potential_processors=2    partition_entitled_capacity=200    partition_max_entitled_capacity=400    shared_processor_mode=0 

# hwscan --list

Display the list system devices and adapters with classification of the type of device.

# hwinfo

Display detailed output of the complete hardware that is currently in the server. This is very useful for administrators trying to analyze and debug hardware problems.

In addition to displaying details about the devices in /proc, you can also use /proc to remove and add SCSI devices on the fly. To list the SCSI devices you have on your system, run the command cat /proc/scsi/scsi . Example 4-14 shows the output of the command.

Example 4-14. Display SCSI devices from /proc
 lpar7:/proc/scsi # cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 01 Lun: 00   Vendor: IBM      Model: CDRM00203     !K Rev: 1_06   Type:   CD-ROM                           ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 08 Lun: 00   Vendor: IBM      Model: IC35L146UCDY10-0 Rev: S25F   Type:   Direct-Access                    ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 09 Lun: 00   Vendor: IBM      Model: IC35L146UCDY10-0 Rev: S25F   Type:   Direct-Access                    ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 14 Lun: 00   Vendor: IBM      Model: HSBPM2   PU2SCSI Rev: 0015   Type:   Enclosure                        ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 15 Lun: 00   Vendor: IBM      Model: HSBPD4M  PU3SCSI Rev: 0015   Type:   Enclosure                        ANSI SCSI revision: 02 

You have a number of attached devices after the output shown in Example 4-14. The first line describes the how the hardware are being connected, followed by the vendor and the type of device. Existing devices can be removed using the command echo "scsi remove-single-device <h> <b> <t> <l>" > /proc/scsi/scsi where <h> is the host adapter, <b> for channel id, <t> for scsi target id and <l> for lun. After this, run the command cat /proc/scsi/scsi to see if the remove was successful. Example 4-15 shows removing a single scsi disk ( /dev/sdb ).

Example 4-15. Removing a single device in Linux
 lpar7:/proc/scsi # fdisk -l Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes    Device Boot    Start      End     Blocks   Id  System /dev/sda1   *         1        1       8001   41  PPC PReP Boot /dev/sda3            15      537    4200997+  83  Linux /dev/sda4           538    17848  139050607+   5  Extended /dev/sda5           538      799    2104483+  82  Linux swap /dev/sda6           800    17848  136946061   fd  Linux raid autodetect Disk /dev/sdb: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes    Device Boot    Start       End    Blocks   Id  System lpar7:/proc/scsi # echo "scsi remove-single-device 0 0 9 0" > /proc/scsi/scsi lpar7:/proc/scsi # fdisk -l Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes    Device Boot    Start      End     Blocks   Id  System /dev/sda1   *         1        1       8001   41  PPC PReP Boot /dev/sda3            15      537    4200997+  83  Linux /dev/sda4           538    17848  139050607+   5  Extended /dev/sda5           538      799    2104483+  82  Linux swap /dev/sda6           800    17848  136946061   fd  Linux raid autodetect 

You can also add a new SCSI device by using the command echo "scsi add-single-device <h> <b> <t> <l>"> /proc/scsi. In Example 4-16, we add the SCSI disk we removed in Example 4-15 back to the system.

Example 4-16. Adding a SCSI disk
 lpar7:/proc/scsi # echo "scsi add-single-device 0 0 9 0" > /proc/scsi/scsi lpar7:/proc/scsi # fdisk -l Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes    Device Boot    Start      End     Blocks   Id  System /dev/sda1   *         1        1       8001   41  PPC PReP Boot /dev/sda3            15      537    4200997+  83  Linux /dev/sda4           538    17848  139050607+   5  Extended /dev/sda5           538      799    2104483+  82  Linux swap /dev/sda6           800    17848  136946061   fd  Linux raid autodetect Disk /dev/sdb: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes    Device Boot    Start      End     Blocks   Id  System 
 < Day Day Up > 


Quintero - Deploying Linux on IBM E-Server Pseries Clusters
Quintero - Deploying Linux on IBM E-Server Pseries Clusters
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 108

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net