[SATLUG] Open Source deduplication capabilities?

Brad Knowles brad at shub-internet.org
Wed Jun 18 23:12:29 CDT 2008

On 6/18/08, John Pappas wrote:

>  Agreed.  I am not a TSM guy, so I am not sure how a restore works (how
>  many tapes are needed for a complete restore in an incremental forever?).

I think that depends on your D2D2T environment.  In the case of my 
current employer, we don't actually use tape at all right now -- 
everything we back up is exclusively disk-to-disk only, and it stops 
there without ever going to tape.

There has recently been an acquisition of a really expensive 
mainframe tape library that can be shared cross-platform, and our 
group has been given use of two tape drives and ~200 tapes per 
library.  We're still trying to figure out the best way to make all 
that work with our current environment, and I imagine that de-dupe is 
going to play a role in that.

>  Most environments can get away with replacing tape with a dedupe
>  methodology (or single instance storage) of any type and see a good
>  benefit (think fileshares and OS backups).  My comments were based
>  on a first-hand demonstration in a large datacenter with real data
>  and replication concerns.

Interestingly enough, we just had a presentation this evening at the 
Austin Sun User Group (see <http://www.austinsug.org/>) by a senior 
consultant working for the second-largest Sun VAR in the country, and 
he still has not seen a de-dupe environment that he would use in a 
primary storage function.  Backup, sure.  But not primary.  I've 
heard the same from the founder of the Austin chapter of the Storage 
Networking User Group, and other experts in the area.

>  I am not Curtis Preston, but I do know somethings, and for the near-line
>  or D2D2T needs, DD is a good fit.  SIR (unless it has DB knowledge built
>  in) falters on DB tablespace storage as the files are "different", where
>  block dedupe can account for the tablespace index changes and whatnot.

I think you have to be careful regardless of how you reduce the 
amount of data being stored.  Simply because you will have fewer 
physical copies of that data around means that you have to protect 
those copies even more, otherwise the cost of losing that one single 
copy you've got of the critical data on your network is ... 

Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>

More information about the SATLUG mailing list