[SATLUG] Open Source deduplication capabilities?

Henry Pugsley henry.pugsley at gmail.com
Wed Jun 4 10:28:17 CDT 2008

On Tue, Jun 3, 2008 at 9:55 PM, Brad Knowles <brad at shub-internet.org> wrote:

> On 6/3/08, Jeremy Mann wrote:
>   I just read a great article in this months Information Week about a
>>  new backup technology called 'data deduplication'. Anybody out there
>>  actually using it and if so, how much has it dropped your storage
>>  requirements for your backups?
> It's not really that new.  Network Appliance and EMC have been selling
> software to do this on their systems for years now.
> It's primarily used in backups, where you're doing a disk-to-disk-to-tape
> kind of environment (a.k.a., D2D2T), and you're backing up a lot of the same
> data from multiple clients (think full backups, including OS, of hundreds of
> desktop PCs).  In those environments, it can save you a huge amount of
> space.

NetApp is pushing deduping for live data, especially in virtualized
environments.  Their argument is if you have 4 Windows servers in a virtual
environment, then there are several gigs of data that can be de-duped just
in the OS portion alone.  Apparently the de-duping works at the block level,
so the data that is de-duped doesn't even have to be from the same file
types.  In an ideal world with these 4 Windows servers, you could save
nearly 75% on storage space the day they are installed, but I don't know how
these numbers change after the machines are running for a while and the data
changes.  NetApp does not recommend de-duping on volumes that store large
volumes of video, graphics, or other large binary files, mostly for
performance issues.  I don't imagine this type of content would de-dupe well


More information about the SATLUG mailing list