[SATLUG] Adventures in MD RAID + LVM...

John Pappas j at jvpappas.net
Sat Feb 19 16:09:45 CST 2011

So over the last week or so, I have been reconfiguring my storage server.  I
had the following configuration:

OS RAID1 on 2x 350GB:
/dev/md0 :: EXT3 formatted boot - 256MB partitions on /dev/sd[ab]1
/dev/md3 :: SWAP - 4GB partitions on /dev/sd[ab]2
/dev/md1 :: Remaining space - LVM VG "sys" on /dev/sd[ab]3

Data RAID5 on 5x500G partitions
/dev/md2 :: LVM VG "data" on 5x500G partitions on /dev/sd[cdefg]1
Actual component disks were 1x1TB, 4x500GB

I also had LVM VG "data2" with data on remaining space on 1TB /dev/sdc2 and
a stand-alone 1TB VG "data3" on /dev/sdh1

I bought 4x2TB disks to upgrade the data VG (may buy a 5th to even it out).

So here's how it went:

   1. I did a `mdadm /dev/md2 --fail /dev/sd_1 --remove /dev/sd_1` where _
   is the disk that I am upgrading.
   2. I inserted the new disk, and fdisk'ed it so that the new disk had a
   0xfd (Linux RAID Autodetect) 1TB primary partition 1 and 1TB 0x8e primary
   partition 2 (LVM)
   3. a `mdadm -a /dev/md2 /dev/sd_1` added the now 1TB partition to the
   RAID group.
   4. After rebuild (about 160 Minutes or ~3 Hours) I did steps 1-3 on
   another of the 500GB spindles.
   5. After another rebuild (~3 hours), repeat steps 1-3, except create only
   1 0xfd primary partition on new disk
   6. After another rebuild (~3 hours), repeat steps 1-3, except create only
   1 0xfd primary partition on new disk
   (so now I have 1x1TB disk with 2x500GB partitions, 2x2TB with 2x1TB
   partitions, and 2x2TB disk with 1x2TB 0xfd partition)
   7. In order to reclaim a 1TB partition from the remaining 1TB disk I have
   to evacuate the existing LVM PVs on that disk so:
   8. `pvcreate /dev/sdc2`, `vgextend data2 /dev/sdc2`, `pvmove /dev/sdf2
   /dev/sdc2`.  The PVMove took about 4 hours.
   9. `vgreduce data2 /dev/sdf2` and then steps 1-3 on /dev/sdf except whole
   1TB disk is part 1 0xfd
   10. After a rebuild I was left with 1x1TB, 2x1TB+1TB LVM, 2x2TB
   partitions, except now I have 5x1TB RAID 5 (4TB Usable)
   11. `mdadm --grow /dev/md2 -z max` grows md2 to claim the new space on
   the RAID components (5x500GB)
   12. After the rebuild/reshape, consolidate the 3 different VGs to just
   one: "data"
   13. In order to do this, the losing VG has to be taken offline for a
   moment; so `umount /data/data2/*` & `vgchange data2 -a n`
   14. `vgmerge data data2` merges the LVs into one VG, and `vgchange data
   -a y` brings the new LVs online, then edit /etc/fstab and remount merged LVs
   from data2
   15. `pvmove /dev/sdc2 /dev/md2` moves the VGs on the old data2 PV
   16. After move (takes a bit, like 6 hours) can reclaim the space from the
   now evacuated PV: `vgreduce data /dev/sdc2`; `pvremove /dev/sdc2`
   17. Can now (re)do steps 1-3 on /dev/sdc to reformat to 1x2TB primary
   18. After rebuild (now takes about ~340 Min or 6 Hrs), I am left with
   4x2TB and 1xTB component disks, and ~750G of new free space.
   19. I then do steps 13-16 on VG data3, remove that disk and put it in my
   Dish DVR in my RV so that I can increase PVR space.

Note that no reboot nor system downtime incurred, other than umount/mount
during the vgmerge.  Also note that 4TB of the disks in use is unused (the
second half of the 2TB disks.  In order to reclaim that space I have 2

   1. Replace singleton 1TB with a 2TB and do a reshape to grow to 5x2TB
   resulting in 8TB usable in VG data
   2. Upgrade mdadm version to 3.1 and do a reshape where I shrink VG to
   4x1TB and then reshape over 4 spindles (5th becomes a spare), then re-grow
   to reclaim the space resulting in 4x2TB RAID 5, and 6TB usable in VG data

I am ling toward #1 as the 2TB spindles are under $100 and #2 will take a
really long time and is fairly complex, where a swap-n-grow is really simple
and takes 1/3 the rebuild/resync cycles.

With the now freed 4x500GB disks, I did a swap and replace on each of the
2x350GB spindles that I was using for /boot (/dev/md0), swap (md3), and root
VG "sys" (md1).

This is where I fubar'ed it.  I forgot to re-configure GRUB on the replaced
500GB boot RAID spindles, so when I did finally reboot the other day (kernel
and other updates), the system would not boot, since nether of the two boot
drives /dev/sd[ab] had the grub boot sector, nor was GRUB setup on those

 All I needed to do to avoid this situation is just install Grub on both the
boot spindles so that in the case of a failure, I can boot of either disk:

   1. `grub` to enter grub shell
   2. grub> root (hdX,0); Where X is the hd number of the primary disk in
   this case 0
   3. grub> setup (hdX,0)
   4. grub> root (hdY,0); Where X is the hd number of the redundant disk in
   this case 1
   5. grub> setup (hdY,0)

Sadly, doing this via a rescue CD or other LIVE CD on the installed OS seems
to be above my head, so I am not sure how the recovery is going to end up,
but the data VG is intact and once I get the OS rebuilt on the boot RAID1
spindles, I will be back in business having grown the sys VG by 150GB
(350G->500G) and the data VG by 2.5TB (5x500G->5x1T)


More information about the SATLUG mailing list