I've reported on strange disk errors before and wasn't able to pin point the problem. Well, luckily I found what the problem but sadly I still don't know caused the problem to begin with...
Every now and then I ran into a server that won't (re)boot and fail with multiple reports of missing disks, ext2 or ext3 superblocks to be missing and to try e2fsck -b 8193. Result: the server cannot load some LVM Volume Group (VG) and the Logical Volume (LV) can't be found. Basically it means you've lost a disk. You have errors similar to this:
LVM has several backup methods so not all is lost and chances are only the LVM meta data is corrupt, so the LVM manager doesn't know what to do with a physical disk and can't find its LVM signatures either. Novell thankfully has a CoolSolution for Recovering a Lost LVM Volume Disk.
My problem server (a VM) has two disks sda and sdb. Both are in LVM; the first contains the root system (vgsystem) and (Oracle) software, the second is a data disk (vgdata). Somehow the LVM meta data of /dev/sdb were corrupted. This meant vgdata was missing and accordingly lvdata as well. This then lead to /etc/fstab not being able to mount that partition, which then caused boot to fail. Why? Because /etc/init/boot.lvm runs very early in the startup cycle and causes several failures, so you are dumped in the rescue and maintenance shell... make sense so far?
Just comment out (i.e. ignore) the mount point to the missing LVM disk in /etc/fstab and reboot. That should work. Then follow the steps to repair corrupt LVM meta data in Solution 2!
Every now and then I ran into a server that won't (re)boot and fail with multiple reports of missing disks, ext2 or ext3 superblocks to be missing and to try e2fsck -b 8193. Result: the server cannot load some LVM Volume Group (VG) and the Logical Volume (LV) can't be found. Basically it means you've lost a disk. You have errors similar to this:
"Couldn't find all physical volumes for volume group vgdata."
"Couldn't find device with uuid '56pgEk-0zLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'."
'Volume group "vgdata" not found'
LVM has several backup methods so not all is lost and chances are only the LVM meta data is corrupt, so the LVM manager doesn't know what to do with a physical disk and can't find its LVM signatures either. Novell thankfully has a CoolSolution for Recovering a Lost LVM Volume Disk.
My problem server (a VM) has two disks sda and sdb. Both are in LVM; the first contains the root system (vgsystem) and (Oracle) software, the second is a data disk (vgdata). Somehow the LVM meta data of /dev/sdb were corrupted. This meant vgdata was missing and accordingly lvdata as well. This then lead to /etc/fstab not being able to mount that partition, which then caused boot to fail. Why? Because /etc/init/boot.lvm runs very early in the startup cycle and causes several failures, so you are dumped in the rescue and maintenance shell... make sense so far?
Just comment out (i.e. ignore) the mount point to the missing LVM disk in /etc/fstab and reboot. That should work. Then follow the steps to repair corrupt LVM meta data in Solution 2!
Comments