[SATLUG] checking for data file type and deleting (metadata search?)

Brad Knowles brad at shub-internet.org
Sat Jun 2 22:05:05 CDT 2007


On 6/2/07, Thomas King wrote:

>>  file `find . -type f` | awk '{if ($2=="DOS") print $1}' | tr -d : | xargs rm

	[ ... deletia ... ]

>  Anyone ambitious enough to break that command down for us? I'm sure there are
>  folks, like me, that haven't seen some of these commands. :)

It's not hard.  Let's first break this down by the stages of the 
pipeline.  The first stage is:

	file `find . -type f`

What this does is run the "find" command in the current directory and 
have it print out all files of type "f" (i.e., normal files as 
opposed to device special files, directories, symbolic links, 
etc...), and then feed that as input to the "file" command.  It uses 
the back-tick command subshell method of running the "find" command, 
and then substitutes the output from that command subshell as 
arguments to the "file" command in the current shell.

Stage two is to take the output of the "file" command and use the 
programming language "awk" to look for anywhere that it says "DOS" as 
the second argument for any given file.  For those files, it prints 
out the first argument on that line, which would presumably be the 
filename in question that is a DOS file.

The third stage uses the "tr" command to delete colons from the 
output file name.

The fourth and final stage is to feed all those files to "xargs", 
which in this case will take all the filenames that can fit onto a 
single line and then run the "rm" command on all those files, and if 
there are any files left to process then it will handle the next 
chunk, and so on until there are no more DOS files to be deleted.


However, there are some problems here.  For one, not all versions of 
"find" will automatically do a "-print" for you as part of the 
command.  Moreover, if any DOS file had a space or other special 
character in it, that would mess up the whole rest of the process. 
You'd need to do a "-print0" to get around that.  Third, this runs 
the risk of finding so many DOS files that you would exceed the 
limits of what could be stuffed into a single command line, and you'd 
just have the whole thing fail to work.

Let's try to re-work this and see if we can improve it (wrapped for 
readability):

	find . -type f -print0 | xargs -0 file | \
	awk '{if ($2=="DOS") print $1}' | tr -d : | xargs rm

Now we're taking all the normal files located in the current 
directory (and below), then doing a null-terminated print of those 
file names.  That output is fed to xargs with a particular option to 
tell it that all strings are null terminated, and we feed these files 
in chunks to the "file" command.  The rest is the same.

Since we take the output of "find" and send that output to xargs, we 
don't run the risk of generating so many files that we overrun the 
command-line buffer on the shell.  We also use null-terminated 
strings so that we avoid the problem of choking on files that have 
special characters in their names.

This version is simpler and more robust, although it does use more 
stages to the pipeline.

-- 
Brad Knowles <brad at shub-internet.org>, Consultant & Author
LinkedIn Profile: <http://tinyurl.com/y8kpxu>
Slides from Invited Talks: <http://tinyurl.com/tj6q4>

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0


More information about the SATLUG mailing list