[SATLUG] Open Source deduplication capabilities?
hc at lookcee.com
Wed Jun 18 21:03:56 CDT 2008
John Pappas wrote:
> On Sun, Jun 15, 2008 at 5:39 PM, Brad Knowles <brad at shub-internet.org>
>> On 6/15/08, John Pappas wrote:
>> IMHO, dedupe is the only way to
>>> offsite data replication within reach of any SMB where bandwidth is the
>> Anyone seriously considering de-duplication technologies should look at the
>> series of articles at <http://www.backupcentral.com/content/view/175/1/>
>> written by Curtis Preston (author of the O'Reilly books "Unix Backup &
>> Restore", the 2nd edition "Backup and Restore" which covers Unix and
>> Unix-like OSes as well as others, and "Using SANs and NAS").
> Sure, but dedupe does not AFAIK infer data integrity. Ultimately, the
> design goals of your initiative define what (if any) dedupe methodologies
> should be used. I like DD as it has data integrity measures built in
> (hashes, ECC, etc).
> Keep in mind that a lot depends on the kinds of other tools you're using.
>> For example, in a TSM "incrementals forever" type of environment, the fact
>> that you're taking only incremental backups means that you're effectively
>> doing a lot of de-duplication on the input side, and further de-duplication
>> is not likely to buy you as much. See <
> Agreed. I am not a TSM guy, so I am not sure how a restore works (how many
> tapes are needed for a complete restore in an incremental forever?). Most
> environments can get away with replacing tape with a dedupe methodology (or
> single instance storage) of any type and see a good benefit (think
> fileshares and OS backups). My comments were based on a first-hand
> demonstration in a large datacenter with real data and replication
> concerns. I am not Curtis Preston, but I do know somethings, and for the
> near-line or D2D2T needs, DD is a good fit. SIR (unless it has DB knowledge
> built in) falters on DB tablespace storage as the files are "different",
> where block dedupe can account for the tablespace index changes and whatnot.
John if you do not mind just a curious query, what volumes of data would
place a Co into position to utilize this tech? I know for sure not me
but very curious,
More information about the SATLUG