Skip to main content

Recovering a Lost LVM Disk or Volume Group

I've reported on strange disk errors before and wasn't able to pin point the problem. Well, luckily I found what the problem but sadly I still don't know caused the problem to begin with...

Every now and then I ran into a server that won't (re)boot and fail with multiple reports of missing disks, ext2 or ext3 superblocks to be missing and to try e2fsck -b 8193. Result: the server cannot load some LVM Volume Group (VG) and the Logical Volume (LV) can't be found. Basically it means you've lost a disk. You have errors similar to this:
"Couldn't find all physical volumes for volume group vgdata."
"Couldn't find device with uuid '56pgEk-0zLS-cKBc-z9vJ-kP65-DUBI-hwZPSu'."
'Volume group "vgdata" not found'

LVM has several backup methods so not all is lost and chances are only the LVM meta data is corrupt, so the LVM manager doesn't know what to do with a physical disk and can't find its LVM signatures either. Novell thankfully has a CoolSolution for Recovering a Lost LVM Volume Disk.

My problem server (a VM) has two disks sda and sdb. Both are in LVM; the first contains the root system (vgsystem) and (Oracle) software, the second is a data disk (vgdata). Somehow the LVM meta data of /dev/sdb were corrupted. This meant vgdata was missing and accordingly lvdata as well. This then lead to /etc/fstab not being able to mount that partition, which then caused boot to fail. Why? Because /etc/init/boot.lvm runs very early in the startup cycle and causes several failures, so you are dumped in the rescue and maintenance shell... make sense so far?

Just comment out (i.e. ignore) the mount point to the missing LVM disk in /etc/fstab and reboot. That should work. Then follow the steps to repair corrupt LVM meta data in Solution 2!

Comments

Popular posts from this blog

Preventing PuTTY timeouts

Just found a great tip to prevent timeouts of PuTTY sessions. I'm fine with timeouts by the host, but in our case the firewall kills sessions after 30 minutes of inactivity... When using PuTTY to ssh to your Linux/Unix servers, be sure to use the feature to send NULL packets to prevent a timeout. I've set it to once every 900 seconds, i.e. 15 minutes... See screenshot on the right.

Removing VGs or LVs from LVM

While are many excellent tutorials about creating and using LVM on Linux, not may show you how you can remove disks from LVM Volume Groups (VG) and reclaim storage or how to remove a Logical Volume (LV) from your LVM set-up. Here is what I did: Use -t to TEST ANY LVM action first! We are going to release 1 TB from LVM. The Volume group was extended with 1 TB storage to serve as a cheap NFS/CIFS file server when setting up our data center. It is now deprecated and replaced by a NAS so it's no longer needed. 1) check LVM; note the four 256 GB LUNs [root@server ~]# pvscan -v Wiping cache of LVM-capable devices Wiping internal VG cache Walking through all physical volumes PV /dev/sdb1 VG vgdata lvm2 [50.00 GB / 0 free] PV /dev/sdc1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sdd1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sde1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sdf1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sdg ...

Tuning the nscd name cache daemon

I've been playing a bit with the nscd now and want to share some tips related to tuning the nscd.conf file. To see how the DNS cache is doing, use nscd -g. nscd configuration: 0 server debug level 26m 57s server runtime 5 current number of threads 32 maximum number of threads 0 number of times clients had to wait yes paranoia mode enabled 3600 restart internal passwd cache: no cache is enabled [other zero output removed] group cache: no cache is enabled [other zero output removed] hosts cache: yes cache is enabled yes cache is persistent yes cache is shared 211 suggested size 216064 total data pool size 1144 used data pool size 3600 seconds time to live for positive entries 20 seconds time to live for negative entries 66254 cache hi...