Partition Table/Master Boot Record: BackupNow that we have defined the location of the master bootloader and some of its functions and limits, we need to discuss how this region is backed up. As would be expected, the partition table is the most important disk region because it defines the location of data. Although boot code and the master partition table reside in the MBR, boot code is much easier than the MBR to repair. Recovering data from a failing boot drive does not require a successful restore of the bootloader. The boot code can be bypassed by booting from a repair CD or simply by booting from a different drive. Successful data recovery in this scenario requires only that the partition table be intact. Losing the partition table renders the data inaccessible. To recover a partition table, we need an MBR backup. Loaders, such as LILO, write a backup MBR by default. This backup is usually found in /boot in a file called /boot/boot.XXXX. File boot.XXXX is a raw copy of the primary bootable partition. Another way to create this backup is by running the following command as root: dd if=/dev/disk_device_file of=/boot/boot.XXXX bs=512 count=1. Restoration of the MBR can be achieved in either method by issuing the following command: dd if= boot/boot.XXXX of=/dev/disk_device_file bs=512 count=1. However, in the event of a partition loss, recovering the raw MBR file from /boot filesystem becomes a daunting task. This task is further compounded by the fact that we can only recover the primary partition table, not the logical tables throughout the drive. Partition Recovery WalkthroughAfter backup is obtained, restoration can begin. We need to be aware of the expected results when an MBR is destroyed and the steps necessary for recovery. In the following section, we discuss the destruction and restoration of the MBR with detailed examples. First, we must confirm that the partition table is correct. In our example, a simple partition table has been defined and depicted using cfdisk. Note that in the next listing, highlighted in bold are four partitions, all marked primary, and the first is bootable. [root@localhost root]# cfdisk -P rst /dev/hde Disk Drive: /dev/hde Sector 0: 0x000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ~~~~~~~ Skip to save space~~~~ 0x1A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (80 01 <Boot 0x1C0: 01 00 83 FE 3F 00 3F 00 00 00 82 3E 00 00) (00 00 <-pri 2 0x1D0: 01 01 83 FE 3F 01 C1 3E 00 00 C1 3E 00 00) (00 00 <-pri 3 0x1E0: 01 02 83 FE 3F 02 82 7D 00 00 C1 3E 00 00) (00 00 <-pri 4 0x1F0: 01 03 83 FE 3F 0E 43 BC 00 00 0C F1 02 00) [55 AA] END Partition Table for /dev/hde First Last # Type Sector Sector Offset Length Filesystem Type (ID) Flags -- ------- -------- --------- ------ --------- ---------------- -------- 1 Primary 0 16064 63 16065 Linux (83) Boot (80) 2 Primary 16065 32129 0 16065 Linux (83) None (00) 3 Primary 32130 48194 0 16065 Linux (83) None (00) 4 Primary 48195 240974 0 192780 Linux (83) None (00) None 240975 12594959 0 12353985 Unusable None (00) Partition Table for /dev/hde ---Starting--- ----Ending---- Start Number of # Flags Head Sect Cyl ID Head Sect Cyl Sector Sectors -- ----- ---- ---- ---- ---- ---- ---- ---- -------- --------- 1 0x80 1 1 0 0x83 254 63 0 63 16002 2 0x00 0 1 1 0x83 254 63 1 16065 16065 3 0x00 0 1 2 0x83 254 63 2 32130 16065 4 0x00 0 1 3 0x83 254 63 14 48195 192780 Next, we create a filesystem on the first partition and mount the partition's filesystem to demonstrate its availability to the end user. [root@localhost root]# mke2fs -j /dev/hde1 mke2fs 1.34 (25-Jul-2003) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 2000 inodes, 8000 blocks 400 blocks (5.00%) reserved for the super user First data block=1 1 block group 8192 blocks per group, 8192 fragments per group 2000 inodes per group Writing inode tables: done Creating journal (1024 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 30 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@localhost root]# mount /dev/hde1 /hde-test/ [root@localhost root]# df Filesystem Size Used Avail Use% Mounted on /dev/ide/host2/bus0/target0/lun0/part1 7.6M 1.1M 6.2M 15% /hde-test <----Confirmation that our filesystem and partition are avail. Demonstrating a FailureNow that we have a valid partition table with a filesystem on partition 1, we need to demonstrate a failure. Next, we unmount the filesystem, create an MBR backup, and destroy the MBR. We then confirm that the MBR is flawed with cfdisk, viewing the partition. [root@localhost root]# umount /hde-test [root@localhost root]# dd if=/dev/hde of=/tmp/hde.mbr.primary.part bs=512 count=1 1+0 records in 1+0 records out [root@localhost root]# dd if=/dev/zero of=/dev/hde bs=512 count=1 1+0 records in 1+0 records out [root@localhost root]# cfdisk -P rst /dev/hde [root@localhost root]# echo $? 3 cfdisk returns an error code of 3. Error codes for cfdisk include the following:
Though we have proven that the MBR/partition table has been destroyed, we have neither rebooted the OS nor updated the kernel resident memory for the device structure. Because the kernel has not been updated with the MBR info cleared, we can still mount the drive. For example: [root@localhost root]# mount /dev/hde1 /hde-test/ [root@localhost root]# df Filesystem Size Used Avail Use% Mounted on /dev/ide/host2/bus0/target0/lun0/part1 7.6M 1.1M 6.2M 15% /hde-test <---Filesystem/ partition mounted even though no table exists to instruct the kernel of a partition location. [root@localhost root]# umount /hde-test To understand this example, we just need to remember that the running kernel memory still contains all the partition information for the filesystem. Until we rescan the partition table, this data structure remains constant. In our example, we just disconnect the running drive, removing the driver from the kernel (rmmod). After a few seconds, we reactivate the driver (insmod), and a rescan of the drive is initiated. The kernel is unable to find a usable partition table on the first 512 bytes of the drive or any other LBA location, so mounting the filesystem fails. It is important to understand that the filesystem is still intact, but it is lying on a disk with no boundaries. Mounting a PartitionNext, we demonstrate the mounting of partition 1 after the drive has been removed and added back to the running kernel. [root@localhost root]# mount /dev/hde1 /hde-test/ /dev/hde1: Invalid argument mount: you must specify the filesystem type Note that /dev/hde1 is an invalid argument to the mount command because no partitions are defined. [root@localhost root]# cfdisk -P rst /dev/hde [root@localhost root]# echo $? 3 <---Review previous notes to determine this error return code. The next step is to recover the partition table. [root@localhost root]# dd if=/tmp/hde.mbr.primary.part of=/dev/hde bs=512 count=1 1+0 records in 1+0 records out [root@localhost root]# cfdisk -P rst /dev/hde Disk Drive: /dev/hde Sector 0: 0x000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ~~~~~~~ Skip to save space~~~~ 0x1A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 01 0x1C0: 01 00 83 FE 3F 00 3F 00 00 00 82 3E 00 00 00 00 0x1D0: 01 01 83 FE 3F 01 C1 3E 00 00 C1 3E 00 00 00 00 0x1E0: 01 02 83 FE 3F 02 82 7D 00 00 C1 3E 00 00 00 00 0x1F0: 01 03 83 FE 3F 0E 43 BC 00 00 0C F1 02 00 55 AA Now we mount the filesystem located at the first partition. Remember that the running kernel is not aware of the new partition table. The mount should fail. [root@localhost root]# mount /dev/hde1 /hde-test/ /dev/hde1: Invalid argument mount: you must specify the filesystem type In fact, the mount did fail. To work around this issue, a scan must be initiated to update the kernel memory. Perform the same steps as before: rmmod the driver controlling the external or internal device and insmod after a few seconds. [root@localhost root]# mount /dev/hde1 /hde-test/ [root@localhost root]# df Filesystem Size Used Avail Use% Mounted on / /dev/hde1 7.6M 1.1M 6.2M 15% /hde-test The same procedure can be used for logical partitions. However, you must know the locations because they are relative to the last partition, as mentioned earlier in this chapter. Recovering Superblock and Inode Table on ext FilesystemsFilesystem superblock recovery is very similar to partition table recovery. Without the superblock on an extent-based filesystem and many other filesystems, locating the data within the filesystem becomes a daunting challenge. In the following exercise, we depict a simple partition table and filesystem, and we demonstrate steps to find, back up, and destroy a superblock table. To begin, choose a tool to create a small partition. Results should look something like this: [root@localhost root]# cfdisk -P rst /dev/hde Disk Drive: /dev/hde Sector 0: 0x000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ~~~~~~~ Skip to save space~~~~ 0x1B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 01 0x1C0: 01 00 83 0E 3F CE 3F 00 00 00 E0 FB 02 00 00 00 0x1D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA Partition Table for /dev/hde First Last # Type Sector Sector Offset Length Filesystem Type (ID) Flags -- ------- -------- --------- ------ --------- ---------------- -------- 1 Primary 0 195614 63 195615 Linux (83) Boot (80) Pri/Log 195615 12594959 0 12399345 Free Space None (00) Partition Table for /dev/hde ---Starting--- ----Ending---- Start Number of # Flags Head Sect Cyl ID Head Sect Cyl Sector Sectors -- ----- ---- ---- ---- ---- ---- ---- ---- -------- --------- 1 0x80 1 1 0 0x83 14 63 206 63 195552 2 0x00 0 0 0 0x00 0 0 0 0 0 3 0x00 0 0 0 0x00 0 0 0 0 0 4 0x00 0 0 0 0x00 0 0 0 0 0 Build a filesystem on the created partition. mkfs.ext3 /dev/hde1 mke2fs 1.34 (25-Jul-2003) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 24480 inodes, 97776 blocks 4888 blocks (5.00%) reserved for the super user First data block=1 12 block groups 8192 blocks per group, 8192 fragments per group 2040 inodes per group Superblock backups stored on blocks: <--- Note the superblock locations... 8193, 24577, 40961, 57345, 73729 Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done Note that the block size can differ depending on the size of the actual filesystem. In this example, the first superblock (SB) resides at: dd if=/dev/hde of=/tmp/hde_sb.out bs=512 count=8 skip=65 8+0 records in 8+0 records out Remember to skip the first 63 sectors to reach the location of partition onethe SB block resides at block 1 or at 1024 bytes, which is the size of the filesystem block. SB is two bytes in size. [root@localhost root]# dd if=/dev/zero of=/dev/hde count=8 bs=512 seek=65 8+0 records in 8+0 records out [root@localhost root]# df Filesystem Size Used Avail Use% Mounted on /dev/vg01/home 2.0G 1.7G 290M 86% /home Confirm that /hde-test is not mounted. [root@localhost root]# mount /dev/hde1 /hde-test/ mount: you must specify the filesystem type [root@localhost root]# tune2fs -l /dev/hde1 tune2fs 1.34 (25-Jul-2003) tune2fs: Bad magic number in super-block while trying to open /dev/hde1 Couldn't find valid filesystem superblock. We have successfully destroyed the superblock. The next step is to recover it by using the alternate superblock. [root@localhost root]# e2fsck -b 8193 /dev/hde1 e2fsck 1.34 (25-Jul-2003) /dev/hde1 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: +(1--4387) Fix<y>? yes Free blocks count wrong for group #0 (3806, counted=3805). Fix<y>? yes Free blocks count wrong (90552, counted=90551). Fix<y>? yes Inode bitmap differences: +(1--12) Fix<y>? yes Free inodes count wrong for group #0 (2029, counted=2028). Fix<y>? yes Free inodes count wrong (24469, counted=24468). Fix<y>? yes /dev/hde1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/hde1: 12/24480 files (0.0% non-contiguous), 7225/97776 blocks Now the true test... Is the filesystem available to be mounted? Next, we prove that the filesystem is restored and that data is available. [root@localhost root]# mount /dev/hde1 /hde-test/ <--- mount successful [root@localhost root]# ll /hde-test/ total 13 -rw-r--r-- 1 root root 65 Sep 8 19:17 greg_greg_greg_.txt <--- File exists... drwx------ 2 root root 12288 Sep 8 19:11 lost+found/ These steps show how to restore a superblock and confirm the availability of the data. Other methods exist for making backups for superblocks. Confirming the location of the superblocks is only half the battle. The other half is knowing how to back it up. If an alternate superblock resides at block 8193 of the filesystem on a 1024-byte block with a 63-byte offset, the following command can be used to grab the superblock: dd if=/dev/hde of=/tmp/hde_sb.out2 bs=512 count=8 skip=16449 After a backup has been made of the MBR, including the filesystem's superblock and data within filesystem, we should cover one last hurdle. Filesystem capacity is restricted in more ways than just raw capacity. The superblock controls two basic limits, which include raw capacity and inodes. When troubleshooting filesystem capacity errors, partition tables and superblocks are usually the last resort. Many Linux users encounter the simple inode limit when millions of small files reside in a filesystem. As shown next, a while loop creates thousands of files, each taking up an available inode, which exceeds the filesystem's capacity with regards to inode count, not raw capacity. #!/bin/sh count=1 total=0 while [ $total -ne $* ] do total='expr $total + $count' echo "$total"> /hde-test/$total done ./count_greg.sh 50000 ./count_greg.sh: line 7: /hde-test/24437: No space left on device This program creates thousands of files, which occupy all available inodes yet leave plenty of capacity for the filesystem. [greg@localhost tmp]$ df Filesystem Size Used Avail Use% Mounted on /dev/hde1 93M 29M 60M 33% /hde-test tune2fs -l /dev/hde1 tune2fs 1.34 (25-Jul-2003) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 19826da5-0597-47e2-955b-b5aa81fcca55 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal filetype needs_recovery sparse_super Default mount options: (none) Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 24480 Block count: 97776 Reserved block count: 4888 Free blocks: 65737 Free inodes: 0 <--- Zero inodes left so filesystem has no available pointers though space remains. First block: 1 Block size: 1024 Fragment size: 1024 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 2040 Inode blocks per group: 255 Filesystem created: Wed Sep 8 19:11:15 2004 Last mount time: Wed Sep 8 22:21:40 2004 Last write time: Wed Sep 8 22:30:57 2004 Mount count: 2 Maximum mount count: 21 Last checked: Wed Sep 8 21:36:09 2004 Check interval: 15552000 (6 months) Next check after: Mon Mar 7 20:36:09 2005 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 64a819b6-d567-49d7-bd11-f50c35d961fb It's important to back up superblocks, especially for those extremely large filesystems over 2TB. If an application fails, and the superblock is overwritten or left in an unstable state, the data may be valid, but with no pointers to the data, recovery becomes time consuming. |