[SATLUG] Open Source deduplication capabilities?

herb cee hc at lookcee.com
Wed Jun 18 21:03:56 CDT 2008


John Pappas wrote:
> On Sun, Jun 15, 2008 at 5:39 PM, Brad Knowles <brad at shub-internet.org>
> wrote:
>
>   
>> On 6/15/08, John Pappas wrote:
>>
>>                                        IMHO, dedupe is the only way to
>>     
>>> bring
>>>  offsite data replication within reach of any SMB where bandwidth is the
>>>  bottleneck.
>>>
>>>       
>> Anyone seriously considering de-duplication technologies should look at the
>> series of articles at <http://www.backupcentral.com/content/view/175/1/>
>> written by Curtis Preston (author of the O'Reilly books "Unix Backup &
>> Restore", the 2nd edition "Backup and Restore" which covers Unix and
>> Unix-like OSes as well as others, and "Using SANs and NAS").
>>
>>     
>
> Sure, but dedupe does not AFAIK infer data integrity.  Ultimately, the
> design goals of your initiative define what (if any) dedupe methodologies
> should be used.  I like DD as it has data integrity measures built in
> (hashes, ECC, etc).
>
> Keep in mind that a lot depends on the kinds of other tools you're using.
>   
>>  For example, in a TSM "incrementals forever" type of environment, the fact
>> that you're taking only incremental backups means that you're effectively
>> doing a lot of de-duplication on the input side, and further de-duplication
>> is not likely to buy you as much.  See <
>> http://www.backupcentral.com/content/view/136/47/>.
>>     
>
>
> Agreed.  I am not a TSM guy, so I am not sure how a restore works (how many
> tapes are needed for a complete restore in an incremental forever?).  Most
> environments can get away with replacing tape with a dedupe methodology (or
> single instance storage) of any type and see a good benefit (think
> fileshares and OS backups).  My comments were based on a first-hand
> demonstration in a large datacenter with real data and replication
> concerns.  I am not Curtis Preston, but I do know somethings, and for the
> near-line or D2D2T needs, DD is a good fit.  SIR (unless it has DB knowledge
> built in) falters on DB tablespace storage as the files are "different",
> where block dedupe can account for the tablespace index changes and whatnot.
>
> Thoughts?
> jp
>   

John if you do not mind just a curious query, what volumes of data would 
place a Co into position to utilize this tech? I know for sure not me 
but very curious,
Thanks
herb




More information about the SATLUG mailing list