[SATLUG] 67 TB for less than $8K

Brad Knowles brad at shub-internet.org
Tue Oct 9 22:30:51 CDT 2012

On Oct 9, 2012, at 7:36 PM, Bruce Dubbs <bruce.dubbs at gmail.com> wrote:

> "We are currently seeing failures in less than 1 percent of the Hitachi Deskstar 5K3000 HDS5C3030ALA630 drives that we’re installing in pod 2.0."

Right, but I didn't see any stats on how many pod 1.0 units they had versus the pod 2.0 units.  I got the impression that they did not have that many pod 2.0 units installed yet, so it's hard to say how the long-term failure rates on the new units will work out relative to the entire population.

> They also say:  "We have yet to see any drives die because of old age" but that doesn't seem consistent with the statement with the 5%/year failure across their entire 9K drive inventory.

I think they're expecting to see a "bathtub curve" where lots and lots of drives all start dying around the same time as the drives get older and older, and the "old age mortality" rate starts looking a lot like the "infant mortality" rate.

The way I would take their comments is that they have not yet seen the significant uptick in drive failures that they would expect as the first large batches of drives start reaching their operational age limits.

But there's a lot of reading-between-the-lines here.

> That does appear to be the case, but I've never seen a disk drive that takes two power inputs.  That could be engineered around though.

Disk drives wouldn't take two power inputs themselves, but it's not too hard to imagine power rails that can take inputs from more than one power supply and then distribute that power back out.

In that case, I'd expect to see an n+1 power supply configuration.

> I do think that networking is the real bottleneck, not the drive setup.  They said they can easily saturate a 1GB network connection.

Check the benchmarking that Chris shows at <http://bioteam.net/2011/08/backblaze-performance/>.  Specifically, they benchmarked a single NFS client writing at 90MB/s versus pure local disk writing was 135MB/s.  They can saturate a Gig-E network, but keeping all the traffic purely on local disk is only roughly 50% faster.

I don't think there's any way that they could saturate a dual/bonded Gig-E network.  Not with this design.  If they had used SAS-to-SATA expanders, maybe.  But not SATA-to-SATA.

The blog comment at <http://storagemojo.com/2011/07/20/open-source-storage-array/#comment-217808> points to a 50-drive (5x5x2) "next-generation" storage server from the Facebook Open Compute Project at <http://www.theregister.co.uk/2011/06/28/facebook_open_compute_2_preview/page2.html>,and that looks likely to me to be a better option.  But even their older 30-disk model at <http://opencompute.org/projects/open-vault-storage/> seems like it would be a better choice than the Backblaze pod.

> I doubt that an organization that needs 50+ TB of storage thinks that an $8K expenditure for HW is expensive.  The engineering time is more expensive, but a commercial solution can be even more expensive.

Oh, commercial solutions can be much more expensive -- EMC and all the storage vendors are very good proof of that.

I'm just saying that I think there are better alternatives out there that cost less money than the Backblaze pod, and have lower O&M costs as well.

Heck, I'd be willing to bet that the OCP units have a better ratio on price/storage as well as price/performance.  But I'm just guessing.

Besides, we're not really talking about $8k pods, we're talking about $12k pods (see the price breakdown by Chris at <http://bioteam.net/2011/08/real-world-backblaze-costs/>).  With the design of the Backblaze pods as it officially stands, the only real redundancy is if you have more than one of them -- so, now you're talking about $24k.

If you're going to spend $24k, there's lots of options out there that should likewise be considered.

> Experience is always valuable.  I don't know how to learn lessons the easy way.

True, but you don't necessarily always have to repeat all the mistakes that others have already made, in order to gain useful experience.

Sometimes you can learn from their mistakes without having to repeat them.

Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>

More information about the SATLUG mailing list