Skip to main content

Linux won't boot from SAN after install

I had another issue with Dell's Poweredge 2950 and the combination on Oracle Enterprise Linux (OEL) 4 update 5 (4U5), qlogic 2460 single port HBAs and an EMC DMX SAN. I was trying to install a Linux server to boot from SAN... It seems be caused by Dell's BIOS and Linux device enumeration, about which I've reported earlier.
Dell uses DRAC to service its servers remotely. Using DRAC, you can mount virtual devices: a floppy and/or a CD (image). With these, you can install driver updates, install an OS or apply updates to a machine as if you're sitting directly with them. Very handy, but...
These virtual media get recognized as USB/SCSI devices, or so it seems, and this causes Linux to assign device names to them, just like ordinary (SCSI) disks. In my case, the Dell Virtual Floppy gets enumerated as /dev/sda, which bumps up the internal PERC controller and my QLogic HBA. This results in a boot drive letter change from /dev/sda to /dev/sdb or sdb to sdc. Therefore, your freshly installed system - which was fine until reboot - won't reboot. It's disks are gone and it can't find your root partition.

I did find a work-around though and hope this helps someone. First thing you'll need to do, is install everything inside LVM. I did and it seems LVM can handle partitions moving around by tagging its devices with a unique udev ID. If devices move to a different drive "letter", then LVM will be able to find it. The boot cycle will take a bit longer because it needs to scan for the IDs, but your system will boot.

Next, boot your machine, enter BIOS, set the "USB Drive Virtual Type" to 'harddisk' instead of (the intuitive) 'floppy'. This moves the virtual floppy in the harddisk boot sequence menu and you can use that menu to set the order of the disk controllers. I set my HBA as the first, PERC second and the virtual floppy last.

If you now (re)install Linux according to taste, using LVM to create a VolumeGroup and LogicalVolumes for your system partitions (lvroot, lvvar, lvtmp, etc.), you'll be fine. One more thing: after partitioning the disk(s), check the "advanced boot options" checkbox and set the boot order of your devices: put the LUN (here: sdb) before the local disk (here: sda).
Upon reboot the virtual floppy still gets inserted as /dev/sda, but the driver ignores it and LVM will find your root partition on /dev/sdc (vs /dev/sdb when you installed it).

More on Linux Enumeration of NICs.

Update: You can also solve this, if you're fine with the new order, by editing the grub menu and pointing grub to the proper new disk, i.e. hd(X,0) vs/ hd(0,0), where sda will be 0, sdb will be 1, etc. See man grub

Comments

Popular posts from this blog

Preventing PuTTY timeouts

Just found a great tip to prevent timeouts of PuTTY sessions. I'm fine with timeouts by the host, but in our case the firewall kills sessions after 30 minutes of inactivity... When using PuTTY to ssh to your Linux/Unix servers, be sure to use the feature to send NULL packets to prevent a timeout. I've set it to once every 900 seconds, i.e. 15 minutes... See screenshot on the right.

Tuning the nscd name cache daemon

I've been playing a bit with the nscd now and want to share some tips related to tuning the nscd.conf file. To see how the DNS cache is doing, use nscd -g. nscd configuration: 0 server debug level 26m 57s server runtime 5 current number of threads 32 maximum number of threads 0 number of times clients had to wait yes paranoia mode enabled 3600 restart internal passwd cache: no cache is enabled [other zero output removed] group cache: no cache is enabled [other zero output removed] hosts cache: yes cache is enabled yes cache is persistent yes cache is shared 211 suggested size 216064 total data pool size 1144 used data pool size 3600 seconds time to live for positive entries 20 seconds time to live for negative entries 66254 cache hi...

Dell Linux - OMSA Hardware Monitoring

Just getting started using Dell's OpenManage Server Administrator (OMSA) on our Oracle Linux platform. There are some confusing instructions going around so it's not immediately clear what to do, hence my blogging here. :) There is a site on Dell - Hardware Monitoring , as well as a wiki with instruction on how to setup their OMSA tooling using yum or up2date. [update]My first update for their instructions: be sure your server has Internet access, as most servers will use a proxy or so. use export http_proxy=http://yourproxy.example.com:port to configure it just for the session, and setup up2date to use an HTTP proxy by editing the settings in /etc/sysconfig/rhn/up2date .