[SATLUG] 1U Server and SAN

Mark McCoy realmcking at gmail.com
Fri Oct 6 23:22:23 CDT 2006


Warning! Heavy ZFS evangelism follows!!

On 10/6/06, Travis H. <solinym at gmail.com> wrote:
> On 9/20/06, Greg Willden <gwillden at gmail.com> wrote:
> > For the SAN have you looked at www.coraid.com?  The ATA over Ethernet
> > (AoE) is really cool.  A coworker is using one for a project and it's
> > pretty neat.
>
> Yeah, seconded -- if you can afford the ~ 2x markup, it's probably worth it.
> You get block-level access to the drives, so it doesn't matter if your client
> wants to write to NTFS volumes but the NAS server only speaks Linux.
>
> I read the storagemojo.com site quite a bit, and it seems this is the thing,
> unless you can run ZFS, but if the clients don't talk it, I don't know about
> running NFS over ZFS.

ZFS shares _very_ nicely (NFS==SUN, remember)

>
> Their suggestion was actually to create the file _systems_ as regular files,
> so that you can copy them, grow them, shrink them, back them up using
> something other than dd or LVM (incidentally, none of the resizing tools
> work with LVM, so if you have to move data you need space to copy it,
> and you're in linearland so you need contiguous space).
>
> But then again, if you're already going to go through the mapping from
> block level to file to block level again, why not use the loopback device
> thingie (OpenBSD calls it vnd, not sure if it's called losetup or -o loop or
> what in GNUspeak).  You can then export the filesystems via iSCSI,
> without paying 100% markup, with a little sweat-of-the-brow.

Opensolaris now, and Solaris 10 11/06 (aka update 3) will support
using zfs (zvols, actually) as backing store for iSCSI.  All of the
features of ZFS exported to any system (instant snapshots and
transparent compression of your NTFS anyone?).

>
> So far as I know, the only solution that allows for really dynamic growing
> of the overall pool while maintaining hardware failure tolerance is ZFS,
> and I suspect that ZFS doesn't have all the bugs worked out that you
> would expect if you're betting your corporation's data on it.  But if
> something goes wrong, you can blame Sun.

Actually, ZFS goes through the most tortuous stress test imaginable
every day.  The current build is automatically checked out, built and
the test package runs against it.  More filesystem pain than most
filesystems see in a lifetime and the tests only take about 20
seconds!  http://blogs.sun.com/bill/date/20051116

>
> Note that only one client can use a LUN/target over iSCSI.  Not sure
> about AoE, probably true of that too.  And any RAID has to be done
> below that level (at the actual physical block level).
>
> One more thing.  You might be tempted to try RAID 5.  The advice I
> wish I had taken was to NOT do that, and to spend a little more for
> the extra capacity to do RAID 1.  The parity calculations mean another
> round trip from user to kernel mode if using software, and they mean
> a parity disk bottleneck even in hardware.  With RAID 5, a write takes
> 2 reads, a parity calculation, and 3 writes.  With RAID 1, a write takes
> 2 writes.  Plus recovery is simplified; just use the data on one drive
> until you can get another.  And no vendor lock-in on an outboard RAID
> card that probably runs a 80186 anyway.  Makes sense to google,
> makes sense to me.

This is always something that depends on your data and your access
patterns.  A blanket "RAID x is always better than RAID y" statement
will always have exceptions.   There is always a tradeoff between IO
speed and data integrity.

If I was to build new array for general usage (with zfs, natch), I
would stripe my pool across several raidz arrays with a couple of hot
spares (raidz is ZFS's version of RAID 5 without the parity write
hole).  This gives a good trade-off between data integrity,
performance, and capacity.  Adding space is easy, as well, just add
another raidz to the pool.


> --
> Enhance your calm, fellow citizen; it's just ones and zeroes.
> Unix "guru" for rent or hire -><- http://www.lightconsulting.com/~travis/
> GPG fingerprint: 9D3F 395A DAC5 5CCC 9066  151D 0A6B 4098 0C55 1484
> --
> _______________________________________________
> SATLUG mailing list
> SATLUG at satlug.org
> http://alamo.satlug.org/mailman/listinfo/satlug to unsubscribe
> Powered by Rackspace (www.rackspace.com)
>


-- 
Mark McCoy -- Professional Unix geek

Here in America we are descended in blood and in spirit from
revolutionists and rebels - men and women who dared to dissent from
accepted doctrine. As their heirs, may we never confuse honest dissent
with disloyal subversion. -- Dwight D. Eisenhower


More information about the SATLUG mailing list