Project17.Get Clever Finding Files


Project 17. Get Clever Finding Files

"What files have I modified today, and do I have any larger than 20 MB?"

This project shows you how to search the file system for specific files based on file type, size, timestamp, and permissions. It uses find to find files and perform simple processing on them. Project 15 uses locate and find to search for files by name. Project 18 shows how to process the files that were found. Project 20 gives some handy find tips.

Search Criteria

The find command is very powerful. It recursively searches an entire directory structure for files that match a given set of criteria. The criteria are based on:

  • File name and pathname with pattern matching (see Project 15)

  • File type and size

  • Timestamp for last access and modification times

  • Permissions including file owner, associated group, permissions, and flags

  • Complex conditions that combine criteria with AND, OR, NOT, and parentheses

The find command generates a list of files that match the specified criteria. Sometimes this is all you want, but often you'll need to process either the list itself or each file named in the list. There are several ways in which you can do this:

  • Pipe to another command to process the file list

  • Use find itself to process the files

  • Use -exec to invoke another command to process each file (see Project 18)

  • Use pipelining and xargs to invoke another command to process each file (see Project 18)

Find Criteria

When working with Unix files and directories, it's often helpful (or necessary) to identify files that meet one or more criteria, either as a means of simply locating desired information or content, or as the first step in a process that involves comparing, sorting, or performing other operations based on those criteria. The key to many criteria-based searches are in the option settings for command find.

Find by Filename and Pathname

Refer to Project 15, which shows you how to search the file system for specific files based on filename and pathname.

Tip

Using find with option -d causes the lowest-level directory to be listed first, which can be useful sometimes.


Find by Type

The find command normally considers all files, but if you want to limit the search to a specific type of file, use the primary -type. The type can be a regular file, a directory, a symbolic link, or a special type such as a socket.

To search for directories only (-type d), type

$ find . -type d -iname test ./test ./Trial/One/Test ./Trial2/version1/test


To search for files only (-type f), type

$ find . -type f -iname test ./test/test ./Trial2/test ./Trial2/version1/test/a/test


Find by Size

The find command can search for files whose size is equal to, greater than, or less than a given size. Precede the given size with plus to search for files bigger than that size and minus to search for files smaller than that size. The size is specified as the number of 512-byte blocks (that is, it's specified in units of 512 bytes). To give some examples:

  • -size 2 means 1K bytes (2 times 512); 1K bytes equals 1024 bytes.

  • -size 2048 means 1M bytes; 1M bytes equals 1024K bytes.

  • -size +2048 means greater than 1M bytes.

  • -size -2048 means less than 1 M bytes.

Small Files

The size of a file is always rounded up to the nearest 512-byte block. Therefore, a file of 1 byte will be treated as though it were one 512- byte block. In particular, the file will not be found by specifying -size 0 (as expected) or -size -1 (perhaps not as expected).


To find all pictures bigger than 10 M bytes, we would type

$ find ~/Pictures -size +20480


To find all empty files, we could use either of the following.

$ find . -size 0 $ find . -size -1


Note

Primary -print tells find to print (display) the name of each file it finds. Modern versions of find do this anyway, so it's not usually required.


For teensy-weensy files, for which blocks are too coarse a measure, specify the size in bytes (characters) by appending c to the size. Here, we find all files of exactly 19 bytes.

$ find . -size 19c -ls 996343  8 -rw-r--r--  1 saruman saruman ... ./Im19Bytes


The primary -ls tells find to list the file's details instead of just its name.

Find by Timestamp

The find command can search for files based on their time stamp, either

  • Time of last access

  • Time of last modification

  • Time of creation, which seems always to be the time of last modification in Mac OS X's HFS+ file system

The find command works by units of time. A unit can be either a minute or a day (24 hours). Time is calculated as the difference between the time stamp of a file and the time find itself was started. The difference is rounded up to the next unit when testing for equality (last modified one minute ago) but is not rounded otherwise (last modified less than one minute ago). Therefore, we can specify criteria such "last accessed one minute ago,""last modified less than two days ago," or "last modified more than seven days ago."

The primaries are -amin, -mmin, -atime, and -mtime. a means access time, and m means modification time; min means units of minutes, and time means units of days. As with size, units of time that are preceded by plus mean a number of units greater than the amount specified, and units of time that are preceded by a minus mean fewer units than the amount specified.

As an example, let's create two files and check that they were both modified/created less than one minute ago. The primary -mmin -1 is formed by m, meaning modified, and min, meaning units of minutes. The value -1 means a time less than one unit before find was invoked.

$ touch f-mod f-access $ find . -mmin -1 ./f-access ./f-mod


Repeat the find command until it reports no files; then proceed. (If you're a very slow typist, change the time period to two minutes.)

Now let's use the command touch -a to access (but not modify) file f-access and thereby change its access time stamp. Then we'll check to see which files were accessed, and which files were modified, less than one minute ago.

$ touch -a f-access $ find . -mmin -1 $ $ find . -amin -1 ./f-access


Next, we use the command touch -m to change the modification time of the file f-mod (we could have achieved the same thing by editing it, but touching it is easier) and then check which files were accessed, and which files were modified, less than one minute ago. Depending on how long we take to type the commands, the file f-access may or may not be reported as being accessed less than a minute ago.

$ touch -m f-mod $ find . -mmin -1 ./f-mod $ find . -amin -1 ./f-access


Note

Read the man page for the touch command.


Finally, wait a few minutes and try again.

$ find . -mmin -1 $ find . -amin -1


Here are a couple more examples.

Learn More

Projects 7 and 8 cover users, groups, and permissions.


Find all files in your home directory modified within the last 24 hoursuseful when you want to perform a daily backup.

$ find ~ -mtime 1


Find files you've forgotten about.

$ find ~ -atime +1000


Find by Owner, Associated Group, and Permissions

The find command accepts search criteria including file owner (or user owner, in Unix terminology), associated group, and permissions. Here are a few examples of how they work.

To find all files owned by the user saruman in the directory /Users/saruman, we use the primary -user as our search criterion.

$ find /Users/saruman -user saruman


(This is user saruman's home directory, so we'll omit the long list of matching files.)

To find all files in the same directory that aren't owned by the user saruman, we use the primary -not to invert the sense of any criteria that follow (in this case, the primary -user again). Not surprisingly, the results list is much shorter this time.

$ find /Users/saruman -not -user saruman /Users/saruman/Development/c32-1


Let's find all pictures that are not associated with the group saruman. A group criterion is introduced by the primary -group.

$ find /Users/saruman/Pictures -not -group saruman -ls 871501 6368 -rw-r--r-- 1 saruman admin 320911 Mar 12 09:41 /Users/saruman/Pictures/people/Domi/sledges2photo1.psd 823934 320 -rw-r--r-- 1 saruman admin 123297 Feb 10 19:06 /Users/saruman/Pictures/web-site/jan/home


As you may have noticed, the preceding command uses yet another primary of the find command called -ls. Not to be confused with Unix command ls, it instructs find to display a file's details instead of just its name.

To specify permissions as search criteria, we use the primary -perm and express permissions in the octal or symbolic formats expected by command chmod. (If you are unfamiliar with these concepts, refer to Project 8.)

The following examples use find in a directory containing just one file, xxx. Each example uses permissions as search criteria to see whether file xxx matches. Follow them carefully to understand how the criteria are matched.

First, we'll use ls to display the permissions for file xxx. You'll see that the permissions grant write access to owner saruman and read access to group saruman and everyone else (others):

$ ls -l xxx --w-r--r-- 1 saruman saruman 0 20 May 23:42 xxx


Our first example seeks files with permissions set exactly as stated in our search criteria. Files will match only if their permission settings match the -perm criteria and all unspecified permissions are unset. The permissions in this example match those of xxx exactly.

$ find . -perm u+w,g+r,o+r ./xxx


If we modify the search criteria, additionally specifying write access to others, find no longer locates file xxx. Its permissions no longer match the search criteria exactly.

$ find . -perm u+w,g+r,o+rw $


We can find files that have one or more of the stated permissions set by preceding the permissions with a plus sign (+). File xxx now matches again.

$ find . -perm +u+w,g+r,o+r ./xxx


We can find files that have all of the stated permissions set (but may also have others set) by preceding the permissions with a minus sign (-).

$ find . -perm -u+w,g+r,o+r ./xxx


Changing the permissions on file xxx to remove "others read" will cause find to fail. The file no longer has all of the stated permissions.

$ chmod 240 xxx $ ls -l xxx --w-r----- 1 saruman saruman 0 20 May 23:42 xxx $ find . -perm -u+w,g+r,o+r $


This example will find file xxx because a plus sign means one or more permissions, not all permissions.

$ find . -perm +u+w,g+r,o+r ./xxx


When specifying an exact match or all (but not one or more), you may specify that particular permissions should not be set. o-r, for example, means that "other read" should not be set.

$ find . -perm u+w,g+r,o-r ./xxx $ find . -perm -u+w,g+r,o-r ./xxx


Use Complex Conditions

The find command has a very powerful syntax that lets you combine primaries into expressions by using AND (-and), OR (-or), and NOT (! -false -not) operators, thereby creating complex search criteria. You enclose expressions in parentheses, which must be escaped from the shell. Here are some examples:

Find all .html and .ws files in ~/Sites:

$ find ~/Sites -name "*.ws" -or -name "*.html"


Find files modified less than one day ago AND bigger than 5 MB:

$ find . -mtime -1 -and -size +10240


Find files modified more than one day ago AND smaller than 5 MB:

$ find . -mtime +1 -and -size -10240


Find files modified less than one day ago AND bigger than 5 MB:

OR

modified more than one day ago AND smaller than 5 MB:

(The following command must be on one line, with a space between the first expression in parentheses and the -or operator.)

$ find . \( -mtime -1 -and -size +10240 \) ¬ -or \( -mtime +1 -and -size -10240 \)


Note the use of parentheses (escaped from the shell by backslash symbols) to ensure that the AND and OR expressions are evaluated in the correct order: The two ANDs will be evaluated; then their results will be ORed.

Learn More

Project 23 tells you more about the grep command.


When primaries are grouped in an expression, find assumes by default that AND is the intended operator, so the expressions in this case can be shortened to

$ find . \( -mtime -1 -size +10240 \) ¬ -or \( -mtime +1 -size -10240 \)


Also, find evaluates AND operators before OR operators, allowing us to omit the parentheses too.

$ find . -mtime -1 -size +10240 -or -mtime +1 -size -10240


Process Each File

You may process the list of files returned by find in one of three ways:

  • Process the list itself, not the files or their contents.

  • Process each file named in the list with one of find's built-in file-processing primaries.

  • Process each file named in the list with another command.

It's important to be aware of these three different methods and how to realize each. The next three sections illustrate each technique using a simple example.

Process the List

We may want to process the list of filenames by sorting it alphabetically or filtering it with grep. To display each filename that includes the text hello, we can use the following:

$ find ~ -iname "*.txt" | grep -i "hello" /Users/saruman/Documents/Letters/hello.txt


Process the Files with find

The find command itself has a couple of primaries for processing files directly. The first is -ls to display file details, used in some of the examples above. The second is -delete to delete every file in a list. Naturally, caution is recommended with this one. Let's delete all files in our home directory that match the pattern *letter.txt.

Dry-run the command first.

$ find ~ -iname "*letter.txt" /Users/saruman/Documents/Letters/my_letter.txt


Learn More

Project 18 shows how to process the files returned by find by using the -exec primary and the xargs command.


Now delete the files, and check what's left.

$ find ~ -iname "*letter.txt" -delete $ find ~ -iname "*letter.txt" $


Process the Files with -exec and xargs

If we wish to search the contents of each file for the text hello, we must hand each filename off to an external command such as grep. In this example, grep is given a list of files to search. Compare this with the first example, in which the text of the list itself was searched by grep, not the contents of each file in the list.

$ find ~ -iname "*.txt" | xargs grep "hello" /Users/saruman/Documents/Letters/letter-to-jan.txt:Hello Jan,





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net