Skip to main content

Strange I/O errors with SAN storage

We've found some really strange I/O errors (Qlogic qle2460, firmware 1.24) using LUNs on our DMX-3 SAN. One HBA was faulty so we replaced it. However upon restoring the OS and reinstalling it, more problems appeared. The new HBA would not boot at all using the existing disks. So we disabled it in the BIOS and booted from the other (original) HBA. Both HBAs have the same firmware, same settings.

Upon booting anything involving the disks (we boot from SAN and have data disks there as well) is extremely sluggish. Letting the server do its thing, I got a ton of I/O errors first during disk discovery, then again during mounting of file systems.

ERROR: ddf1: reading /dev/sdb[Input/output error]
ERROR: hpt37x: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: pdc: reading /dev/sdb[Input/output error]
ERROR: sil: reading /dev/sdb[Input/output error]
ERROR: ddf1: reading /dev/sdc[Input/output error]
ERROR: hpt37x: reading /dev/sdc[Input/output error]
ERROR: pdc: reading /dev/sdc[Input/output error]
ERROR: pdc: reading /dev/sdc[Input/output error]
ERROR: pdc: reading /dev/sdc[Input/output error]
ERROR: pdc: reading /dev/sdc[Input/output error]
ERROR: pdc: reading /dev/sdc[Input/output error]
ERROR: sil: reading /dev/sdc[Input/output error]
ERROR: ddf1: reading /dev/sdd[Input/output error]
ERROR: hpt37x: reading /dev/sdd[Input/output error]
ERROR: pdc: reading /dev/sdd[Input/output error]
ERROR: pdc: reading /dev/sdd[Input/output error]
ERROR: pdc: reading /dev/sdd[Input/output error]
ERROR: pdc: reading /dev/sdd[Input/output error]
ERROR: pdc: reading /dev/sdd[Input/output error]
ERROR: sil: reading /dev/sdd[Input/output error]
...

and so on for all disks (LUNs) attached.

Searching the web gave me a few hits but no solutions (see 1|2|3). However, all errors were related to local RAID setups using ATA/SATA disks. I am not using local RAID. We have Dell Poweredge 2950 servers with 2 qle2460 HBAs. The internal PERC5/i is enabled as it provides the swap disk space, but it doesn't do anything. Furthermore, sdb, sdc and so on are SAN disks. So why do I get RAID errors from them? Could this point to motherboard errors? PCI bus errors? Broken FC cables? Bad FC switch configuration of simply damaged LUNs from the SAN?

Comments

Popular posts from this blog

Preventing PuTTY timeouts

Just found a great tip to prevent timeouts of PuTTY sessions. I'm fine with timeouts by the host, but in our case the firewall kills sessions after 30 minutes of inactivity... When using PuTTY to ssh to your Linux/Unix servers, be sure to use the feature to send NULL packets to prevent a timeout. I've set it to once every 900 seconds, i.e. 15 minutes... See screenshot on the right.

Removing VGs or LVs from LVM

While are many excellent tutorials about creating and using LVM on Linux, not may show you how you can remove disks from LVM Volume Groups (VG) and reclaim storage or how to remove a Logical Volume (LV) from your LVM set-up. Here is what I did: Use -t to TEST ANY LVM action first! We are going to release 1 TB from LVM. The Volume group was extended with 1 TB storage to serve as a cheap NFS/CIFS file server when setting up our data center. It is now deprecated and replaced by a NAS so it's no longer needed. 1) check LVM; note the four 256 GB LUNs [root@server ~]# pvscan -v Wiping cache of LVM-capable devices Wiping internal VG cache Walking through all physical volumes PV /dev/sdb1 VG vgdata lvm2 [50.00 GB / 0 free] PV /dev/sdc1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sdd1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sde1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sdf1 VG vgdata lvm2 [256.00 GB / 0 free] PV /dev/sdg ...

Dell Linux - OMSA Hardware Monitoring

Just getting started using Dell's OpenManage Server Administrator (OMSA) on our Oracle Linux platform. There are some confusing instructions going around so it's not immediately clear what to do, hence my blogging here. :) There is a site on Dell - Hardware Monitoring , as well as a wiki with instruction on how to setup their OMSA tooling using yum or up2date. [update]My first update for their instructions: be sure your server has Internet access, as most servers will use a proxy or so. use export http_proxy=http://yourproxy.example.com:port to configure it just for the session, and setup up2date to use an HTTP proxy by editing the settings in /etc/sysconfig/rhn/up2date .