[SATLUG] Enterprise Storage On a Shoestring
Robert Pearson
e2eiod at gmail.com
Fri Dec 1 02:47:39 CST 2006
Table of Contents
1) Enterprise Storage On a Shoestring, Pt. I
2) HPC
===========================================
1) Enterprise Storage On a Shoestring, Pt. I
StorageMojo post entitled "Enterprise Storage On a Shoestring, Pt. I":
<<http://storagemojo.com/?p=318>>
Download and read the PDF at:
"FAB: enterprise storage systems on a shoestring (pdf)"
<<http://www.hpl.hp.com/research/ssp/papers/FAB-HOTOS03.pdf>>
"Google has *almost* created enterprise-class storage from
commodities. Microsoft has Boxwood. Amazon has, but they aren't
telling.
Now HP has people saying "we can build enterprise class storage from
commodity components." And they've done it. They said it, and a lot
more, in this paper, "FAB: enterprise storage systems on a shoestring
(pdf)", by Svend Frølund, Arif Merchant, Yasushi Saito, Susan Spence
and Alistair Veitch."
This approach should scale well both vertically and horizontally.
2) HPC
On 11/28/06, Robert Pearson <e2eiod at gmail.com> wrote:
> On 11/28/06, Borries Demeler <demeler at biochem.uthscsa.edu> wrote:
> When I went to SC'06 I saw one vendor (Microway) who was selling a
> Lustre rack based on SATA drives with a switched infiniband interconnect.
> I believe it was a 8 TB unit, using Lustre as the parallel file system.
> They were maxing out the infiniband connection during reads and writes.
Microway reports some impressive bandwidth numbers at:
<<http://www.microway.com/interconnects/>>
using---
Qlogic InfiniPath HTX Adapter Features and Specifications:
* World's best cluster interconnect performance
<<http://www.pathscale.com/infinipath-perf.html>>
o Lowest MPI latency (1.29µs with HTX, 1.67 for PCI Express)
o 88 byte n½ message size (streaming)
o 1884 MB/s bi-directional peak bandwidth (streaming)
o TCP/IP bandwidth of 583 MB/s
> Think of Lustre as software RAID for networked drives. I talked to the
> Lustre folks and they told me they are working on a RAID-5 like redundancy
> layer. Right now what we have is just a little better than RAID-0, i.e.,
> it has all the speedup, the bundling of disks into a single storage unit, but
> then it also has a one-up on software RAID-0: If one drive fails only the
> part that is stored on this drive is inaccessible, which in the case of
> a smaller file may be the entire file. In the case of software or hardware
> RAID-0, if one drive fails, the entire system is hosed. Here, you can continue
> to run, although without access of the data stored on the bad drive, ditto
> for one of the nodes going down.
The trend for HPC may be away from RAID. The rebuild time kills you.
My guess is they will go to the "shelf" approach and write to multiple
locations for redundancy. The process to do this seems to still be
undecided.
> In the future, you will be able to simply plug in a new drive and the
> system rebuilds automatically. Lustre is open source and has commercial
> support.
I expect "rebuilding" to become extinct. It takes too long. If you
have n+1 locations (minimum of 2) and you lose one because you lose a
drive, the Storage system will automatically create a new copy at a
new location. Disks are cheap.
More information about the SATLUG
mailing list