[SATLUG] General question -- determining file integrity

Alan Lesmerises alesmerises at satx.rr.com
Mon Sep 8 23:01:04 CDT 2014

I know this isn't necessarily a Linux question, but I think the topic 
(and hopefully the answers, if there are any) could be of interest to 
some members of the group.

I recently had to migrate all my files from one computer to another, and 
while I was at it, I decided to undertake the long postponed task of 
doing a major digital clean-up.  This consisted of capturing or deleting 
files from my hard drive, a boatload of old floppies, CDs, etc.  I 
literally had thousands of files left over from older systems going back 
more than 20 years.  I also did a thorough de-duplication (not in the 
sense of backups, but in having multiple copies of the same file in 
different locations).

However, the last task presented me with a problem that I'm not sure how 
to tackle.  Before I would delete duplicate files, I would run a utility 
to compare the files in 2 parallel folders. There were several instances 
where all the files SHOULD have all matched.  But when I performed the 
file comparison, there were instances where some 10% of the files showed 
differences (such as in my substantial library of digital music files).

Now I'm not fully versed in all the capabilities in some of the Linux 
file systems (whether they compute any checksums on files, etc.), and 
I'm not looking to get into a discussion about that now.

What I'm somewhat stumped by is how can I determine which file (of a 
pair that I am comparing) if the corrupted one, and which one is ok?  Or 
are they both corrupted?  And is there a way to compare 3 or more (the 
assumption being that if n-1 files all match, then the nth file is the 
bad one).  And since I have a rather large number of files to deal with, 
any kind of manual (non-automated) method would not really be viable.

Any thoughts?

Al Lesmerises

More information about the SATLUG mailing list