[SATLUG] checking for data file type and deleting (metadata
search?)
Brad Knowles
brad at shub-internet.org
Sat Jun 2 22:05:05 CDT 2007
On 6/2/07, Thomas King wrote:
>> file `find . -type f` | awk '{if ($2=="DOS") print $1}' | tr -d : | xargs rm
[ ... deletia ... ]
> Anyone ambitious enough to break that command down for us? I'm sure there are
> folks, like me, that haven't seen some of these commands. :)
It's not hard. Let's first break this down by the stages of the
pipeline. The first stage is:
file `find . -type f`
What this does is run the "find" command in the current directory and
have it print out all files of type "f" (i.e., normal files as
opposed to device special files, directories, symbolic links,
etc...), and then feed that as input to the "file" command. It uses
the back-tick command subshell method of running the "find" command,
and then substitutes the output from that command subshell as
arguments to the "file" command in the current shell.
Stage two is to take the output of the "file" command and use the
programming language "awk" to look for anywhere that it says "DOS" as
the second argument for any given file. For those files, it prints
out the first argument on that line, which would presumably be the
filename in question that is a DOS file.
The third stage uses the "tr" command to delete colons from the
output file name.
The fourth and final stage is to feed all those files to "xargs",
which in this case will take all the filenames that can fit onto a
single line and then run the "rm" command on all those files, and if
there are any files left to process then it will handle the next
chunk, and so on until there are no more DOS files to be deleted.
However, there are some problems here. For one, not all versions of
"find" will automatically do a "-print" for you as part of the
command. Moreover, if any DOS file had a space or other special
character in it, that would mess up the whole rest of the process.
You'd need to do a "-print0" to get around that. Third, this runs
the risk of finding so many DOS files that you would exceed the
limits of what could be stuffed into a single command line, and you'd
just have the whole thing fail to work.
Let's try to re-work this and see if we can improve it (wrapped for
readability):
find . -type f -print0 | xargs -0 file | \
awk '{if ($2=="DOS") print $1}' | tr -d : | xargs rm
Now we're taking all the normal files located in the current
directory (and below), then doing a null-terminated print of those
file names. That output is fed to xargs with a particular option to
tell it that all strings are null terminated, and we feed these files
in chunks to the "file" command. The rest is the same.
Since we take the output of "find" and send that output to xargs, we
don't run the risk of generating so many files that we overrun the
command-line buffer on the shell. We also use null-terminated
strings so that we avoid the problem of choking on files that have
special characters in their names.
This version is simpler and more robust, although it does use more
stages to the pipeline.
--
Brad Knowles <brad at shub-internet.org>, Consultant & Author
LinkedIn Profile: <http://tinyurl.com/y8kpxu>
Slides from Invited Talks: <http://tinyurl.com/tj6q4>
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
More information about the SATLUG
mailing list