[SATLUG] Doublekiller bash clone for Linux

Brad Knowles brad at shub-internet.org
Tue Sep 25 01:44:26 CDT 2007


On 9/24/07, Jonathan Hull wrote:

>  Well, this could be done with some fancy use of find and md5sum I
>  would think. My head hurts too much to write it though, heh.

You'd definitely want to use MD-5 or SHA-1 to calculate the hash, as 
opposed to CRC-32.  There's just way, way too many chances of 
collisions with CRC-32.  Heck, there's chances of collisions with 
MD-5, but they're rare enough that you could follow them with a 
binary comparison afterward to see if the files really are identical.

You also want to check the inode number to make sure that you don't 
have to links to the same inode -- i.e., the same file, as known by 
two (or more) different names.  Then, once you get your list of 
candidates, you want to go back and do a real-time comparison to make 
sure that the files haven't changed since your first sweep.

Doing this kind of thing right is a rather involved development process.


You could also use Google and type in "Linux duplicate finder", and 
hit the "I Feel Lucky" button, which will take you to 
<http://www.pixelbeat.org/fslint/>, which even includes links to 
reviews of the software (e.g., 
<http://www.linuxjournal.com/node/1000198>).

Then there's "zsDuplicateHunter" (see 
<http://www.zizasoft.com/products/zsDuplicateHunter/index.shtml>), 
which is available for multiple platforms, including Linux.  Then 
there's a Java-based Duplicate File Finder program at 
<http://www.linux.org/apps/AppId_8359.html>.  And DuMP3 at 
<http://www.bestsoftware4download.com/software/t-free-dump3-for-linux-gtk-ppc-download-keyaacpo.html>. 
And the Puppy Linux page at 
<http://www.murga-linux.com/puppy/viewtopic.php?t=18502&sid=5672a4da5370d339432a6c8ae6eff9fc> 
mentions fdupes and rdfind, as well as also linking back to fslint.

That's it for the first page of actual related results as returned by Google.

-- 
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>


More information about the SATLUG mailing list