Project11.Globbing with ?


Project 11. Globbing with [^*?]

"How do I list all JPEG files?"

This project illustrates globbing and introduces the pattern-matching operators. It uses filename patterns written to match specific sets of filenames.

The Power of Pattern Matching

The widely used Unix technique of globbing lets you specify an ambiguous filename and has the shell find all files with names that match it. Globbing lets you operate on multiple files at the same time without having to name them individually (or even know their names in the first place).

We specify a filename pattern such as *.jpg by using pattern-matching operators (also known as wildcards) such as * ^ ?. The shell performs an operation known as globbing or wildcard expansion, expanding the pattern into a list of filenames that match the pattern. Each wildcard operator has its matching rules. A filename pattern may contain more than one wildcard, including multiple instances of the same one.

Star Globbing

Use the star pattern-matching operator to select all files. Bash expands a command line such as file * by replacing star with a list of all files in the current directory. In the example that follows, the filenames are passed to command file a handy utility to determine what type a file is (directory, executable, text, empty, and so on).

$ cd ~ $ file * Desktop:     directory Documents:   directory Library:     directory ...


List All JPEG Files

Star is matched by any sequence of characters, including none. The filename pattern in the command line

$ ls *.jpg


is matched by all filenames that end in .jpg.

Tip

Star is not matched by filenames that begin with dot (hidden files). That's usually what you want, but if not, match the dot explicitly by using dot-star.

$ file .*


"Dotglob" in Project 12 shows you how to change this behavior.


A star can be used anywhere and many times. The pattern A*B*C* is matched by ABC and AxBCyy but not AxCyy, because it has no B to match against.

Warning

Be very careful not to add a stray space in a command such as

$ rm *.jpg


An erroneous space after the star causes rm to remove all nonhidden files (because a lone star matches all filenames) and then to try to remove the file .jpg. I've seen it done!


Glob the Path

Globbing is not limited to filenames; you may use wildcards anywhere in the pathname. In the next example, the star pattern-matching operator is used in two directory names as well as the filename.

$ ls 101*/dir*/file* 101-projects/dir1/file1    101-projects/dir1/file2 101-projects/dir2/file3    101-projects/dir3/file4


Match One Character

The query (?) pattern-matching operator matches exactly one character. All characters match query. The two comparative listings on the following page first show all files in the directory and then just those that match the filename pattern. The second listing is space-padded to make it easier for you to see which files matched.

$ ls baag   bag     bags    bfg    bg    big    blag    bug $ ls b?g        bag             bfg          big            bug


What Globs?

The shell does the globbing when it interprets a command line, searching the current directory (or the given pathname) for all filenames that match the pattern. It expands the command line by replacing a pattern with the filenames that match the pattern. The command itself never sees the patternonly the filenames. If a command cannot process many filenames, globbing will not work with it.


A list of characters in square brackets is matched by any one of those characters (and exactly one, not zero and not two). The examples below should illustrate this.

$ ls baag   bag     bags    bfg    bg    big    blag    bug $ ls b[aeiou]g        bag                          big            bug $ ls b[aeiou]g*        bag     bags                 big            bug


Tip

To include a ] symbol in a list of characters, put it first in the list. Pattern x[][]z matches x[z and x]z but nothing else.

To include a dash symbol in a list of characters, put it first or last in the list. To include both a dash and ], put the bracket first and the dash last. Pattern x[][-]z matches x-z, x[z, and x]z.


Reject specific characters by starting the list with a caret (^) operator. Here, any character other than those listed matches.

$ ls baag   bag     bags    bfg    bg    big    blag    bug $ ls b[^aeiou]g                        bfg


Note

Although all shells have the same basic globbing rules, this project is written specifically for Bash. You might encounter different behavior in other shells; check their man pages.


Match Character Ranges and Classes

You may shorten a continuous sequence of characters by using a range. Specify the start and end characters of the range separated by a dash (-). For example:

  • [a-z] matches any lowercase letter and is equivalent to [abcdefghi...xyz].

  • [0-9] matches any digit and is equivalent to [0123456789].

  • [A-Z0-9] matches any uppercase letter or digit.

Pattern-matching operators may be combined. Here, we match filenames that start with either m or t ([mt]), followed by any number of intervening characters (*), ending in literally day followed by exactly one digit ([0-9]).

$ ls [mt]*day[0-9] -rw-r--r--  1 saruman  saruman  0 18 May 13:05 monday1 -rw-r--r--  1 saruman  saruman  0 18 May 13:05 thursday1 -rw-r--r--  1 saruman  saruman  0 18 May 13:05 tuesday3


Tip

If you wish to know exactly which characters are included in a particular class, check out the Section 3 man page for the corresponding library function. The library function is named like the class but starts with is. To read about character class [:space:], for example, look at the man page for isspace by typing

$ man 3 isspace



Bash provides character classes using the syntax [[:class-name:]]. Character classes can be used in place of ranges. The following pattern matches any letter followed by exactly two digits.

$ ls [[:alpha:]][[:digit:]][[:digit:]] A23


Bash character classes include alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, and xdigit.

Note

An illegally formed pattern will not be recognized as such, but as an attempt to form a filename. The shell won't report an error message (other than perhaps "No such file or directory"). Consider

$ ls [A-Z][0-9][0-9] A23


versus

$ ls [A-Z0-90-9] ls: [A-Z0-90-9]: No such file or directory


Similarly,

[[:alpha:]][[:digit:]] is a legal pattern, whereas [[:alpha:][:digit:]] is not.


Escape from Globbing

You may wish to specify a file that contains one or more pattern-matching operators as part of its name. Filenames may be protected from shell interpretation by escaping either the entire filename or the particular characters that have a special meaning to the shell. Special characters include the pattern-matching operators and the escape characters themselves: single quote, double quotes, and backslash. Here are some examples applied to the following (oddly named) files.

$ ls a* a"*'b  a"b  a'b  a*b  a\b


Escape a backslash:

$ ls "a\b" a\b $ ls a\\b a\b


Escape double quotes and single quote:

$ ls 'a"b" a"b $ ls "a"b" a"b $ ls a\"b a"b


Learn More

Project 12 shows you how to control and customize globbing in Bash.


Finally, escape the combination of special characters in the filename a"*"b by using a combination of techniques. The double quotes are escaped by backslash, whereas the star and single quote are escaped by enclosing them in double quotes.

$ ls a\""*'b" a"*'b


Some commands, such as find, do their own expansion of pattern-matching operators. When using such a command, you must escape the operators to prevent the shell from expanding them before they are passed to the command. Here, the shell expands an unescaped star, confusing find.

$ find . -name * find: bag: unknown expression primary


Escaping the star allows it to be passed to find.

$ find . -name "*" . ./baag ./bag ...


Shrink Big Command Lines

When many files, in the order of thousands, match a pattern, the command line gets too big, causing an error to be thrown. Suppose you have a command scale that you want to apply to the thousands of JPEG files in the current directory. You may issue the command

$ scale *.jpg bash: /usr/local/bin/scale: Argument list too long


Note: The scale command is given as an example of a command you might employ; you won't necessarily have the command on your system.

Learn More

Projects 15, 17, and 18 tell you all about using commands find and xargs.

Project 6 explains how to pipe output from one command to another.


You'll notice that Bash has thrown an error because *.jpg expanded to too many files. We use the find command to solve this problem. The command does its own wildcard expansion independent of the shell (remember to escape pattern-matching operators). Pipe the output from find (that's the list of files) to the xargs command. The xargs command constructs an argument list from its input and passes the list to the target command.

$ find . -name "*.jpg" | xargs scale


Run this command line as is, and you may wonder how it improves on our original: It has merely pushed the long-argument-list problem onto xargs, which will be forced to construct a command line with thousands of arguments. That's fine, however, because xargs has a few tricks up its sleeve. Using its option -n, you can specify the maximum number of arguments to use in each invocation of the target command, thus limiting the maximum size of a command line. In this example, we invoke scale repeatedly in batches of 1,000 until all arguments have been processed.

$ find . -name "*.jpg" -print0 | xargs -0 -n1000 scale


The extra options -print0 and -0 are there to cope with spaces in filenames.




Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net