why unix | RBL service | netrs | please | ripcalc | linescroll
hosted services

hosted services

find is one of those programs which continues to offer staggering features.

Its hard to put into words just how useful find is, but to say that it utilises every aspect of the file system might sum it up. Whats more, I just can't think of anything close to the capabilities on an MS operating system, for example how do you search for files that match a given permission, belong to a given user, in directories greater than 4 deep, then print the results into a compression program, every hour.

Just to help separate why these commands might not work for you, find/tar/xargs might be called gnufind/gnutar/gnuxargs.

Some of the features that I use on a regular basis are

conditions

command definition
-depth match the files in directories before the directory names. This is useful if you want to delete all entries with rm for example, where rm would not be able to delete the directory while it contains files unless rm was told to work recursively.
-maxdepth n do not enter directories deeper than n.
-newerat timestamp show entries that have an atime later than time stmap. However, there are other values that can be given to -newer, such as Birth, c inode, and modification times. The winner for me, is the timestamp value can be given as a epoch, simply add the @ prefix for find to understand this.
-user/-group name match only on name
-iname name match on a name, *.txt for example
-iregex '.*regex.*' case insensitive regular expression match
-perm perm is quite useful, mainly with -g=w, to match files with group write, negation is useful too with \! prefixing the -perm -g=w, -perm `mode` is also highly useful
-L -type l this reports only broken symlinks

It's worth noting with conditions that they're easy to get wrong in some cases. I'd like to draw to your attention something like the following command:

$ find /etc -maxdepth 1 -type f \! -iregex '.*skel.*' -o \! -iregex '.*shadow.*'

If we run this we should, normally expect to not see the skel directory written to standard out, or any other directory for that matter as we clearly specified -type f.

However, as we can clearly see: $ find /etc -maxdepth 1 -type f \! -iregex '.*skel.*' -o \! -iregex '.*shadow.*' | grep skel /etc/skel

So, why was that. Well, it's due to the way that the boolean operators are handled. We're asking:

  1. Find within /etc
    1. maximum depth of 1
    2. type f
    3. not matching .*skel.*
    4. or
    5. not matching .*shadow.*

Without knowing how that is evaluated you'd assume it's all one long condition, but that's not how find works so we have to use parentheses. It's also worth noting that [[bash]] has it's own handling of parenthesis, so we need to escape them, like so;

\( and \)

So, what we end up with is the following:

$ find /etc -maxdepth 1 -type f \( \! -iregex '.*skel.*' -o \! -iregex '.*shadow.*' \) | grep skel $

Which does exactly what we're expecting.

actions

command definition
-printf format printf works just like in c/perl etc, it takes a format parameter which allows details such as the time various parts of a time stamp, (%C+ for example uses a pre-formatted string). Often I give the format of '%s\t%u\t%g\t%p\n' to show who is using what.
-exec any command can be given for find to execute on each matching

file. However, its often more efficient to give this to xargs. Some versions of grep cannot take a large number of arguments. Simply add a -ln number, of arguments.

One of the most useful things you can do with find involves xargs, where you can pipe a list of file names to xargs which spawns less instances of a program with the given input as parameters.

find . -print0 | xargs --null echo

for example. Use of -print0 uses a null (ascii 0, \0) character to separate filenames. The null character is not permitted in file names (at least, all applications will treat the null character as the string terminator) so xargs can be resistant to shell code.

Another powerful combination is find with [[tar]], just like with xargs you can feed tar null separated files on stdin.

find . -print0 | tar --null -cvf -T -

tar is incredibly useful for grafting files with permissions from directory to directory or host to host.

Of course, file system operations take a lot of IO, so be sure to [[ionice]] it.

ionice -c 3 find . -mindepth 1 -type f -print0 | ionice -c 3 xargs --null rm

empty

There's a rather useful condition that can be given to find named -empty. If you provide this then it will test as true if either a file or directory is empty. Supposing you have to clean out and free some disk inodes after a recent purge of content you may find yourself doing something along the lines of:

$ find . -depth -type d -empty -exec rmdir {} \;

This instructs find to descend to the lowest leaf first, return true upon -empty criteria being matched and execute rmdir before ascending the tree.

It's really up to you if you think that creating a list of directories to xargs to rmdir is more or less efficient than executing one rmdir per directory.

Personally I feel that creating a list prior to piping to xargs is more efficient, but the number of iterations to do this may not always be known before hand.

Using an intermediate file is probably the best solution:

while [[ true ]] ; do 
    find . -depth -mindepth 1 -type d -empty -print0 > ../file ; 
    if [[ -s ../file ]] ; then 
        ls -l ../file ; 
        cat ../file | xargs -0 rmdir ; 
    fi ; 
    if [[ ! -s ../file ]] ; then 
        break ; 
    fi ; 
done ;