Project 11. Globbing with [^*?]"How do I list all JPEG files?" This project illustrates globbing and introduces the pattern-matching operators. It uses filename patterns written to match specific sets of filenames. The Power of Pattern MatchingThe widely used Unix technique of globbing lets you specify an ambiguous filename and has the shell find all files with names that match it. Globbing lets you operate on multiple files at the same time without having to name them individually (or even know their names in the first place). We specify a filename pattern such as *.jpg by using pattern-matching operators (also known as wildcards) such as * ^ ?. The shell performs an operation known as globbing or wildcard expansion, expanding the pattern into a list of filenames that match the pattern. Each wildcard operator has its matching rules. A filename pattern may contain more than one wildcard, including multiple instances of the same one. Star GlobbingUse the star pattern-matching operator to select all files. Bash expands a command line such as file * by replacing star with a list of all files in the current directory. In the example that follows, the filenames are passed to command file a handy utility to determine what type a file is (directory, executable, text, empty, and so on). $ cd ~ $ file * Desktop: directory Documents: directory Library: directory ... List All JPEG FilesStar is matched by any sequence of characters, including none. The filename pattern in the command line $ ls *.jpg is matched by all filenames that end in .jpg. Tip
A star can be used anywhere and many times. The pattern A*B*C* is matched by ABC and AxBCyy but not AxCyy, because it has no B to match against. Warning
Glob the PathGlobbing is not limited to filenames; you may use wildcards anywhere in the pathname. In the next example, the star pattern-matching operator is used in two directory names as well as the filename. $ ls 101*/dir*/file* 101-projects/dir1/file1 101-projects/dir1/file2 101-projects/dir2/file3 101-projects/dir3/file4 Match One CharacterThe query (?) pattern-matching operator matches exactly one character. All characters match query. The two comparative listings on the following page first show all files in the directory and then just those that match the filename pattern. The second listing is space-padded to make it easier for you to see which files matched. $ ls baag bag bags bfg bg big blag bug $ ls b?g bag bfg big bug
A list of characters in square brackets is matched by any one of those characters (and exactly one, not zero and not two). The examples below should illustrate this. $ ls baag bag bags bfg bg big blag bug $ ls b[aeiou]g bag big bug $ ls b[aeiou]g* bag bags big bug Tip
Reject specific characters by starting the list with a caret (^) operator. Here, any character other than those listed matches. $ ls baag bag bags bfg bg big blag bug $ ls b[^aeiou]g bfg Note
Match Character Ranges and ClassesYou may shorten a continuous sequence of characters by using a range. Specify the start and end characters of the range separated by a dash (-). For example:
Pattern-matching operators may be combined. Here, we match filenames that start with either m or t ([mt]), followed by any number of intervening characters (*), ending in literally day followed by exactly one digit ([0-9]). $ ls [mt]*day[0-9] -rw-r--r-- 1 saruman saruman 0 18 May 13:05 monday1 -rw-r--r-- 1 saruman saruman 0 18 May 13:05 thursday1 -rw-r--r-- 1 saruman saruman 0 18 May 13:05 tuesday3 Tip
Bash provides character classes using the syntax [[:class-name:]]. Character classes can be used in place of ranges. The following pattern matches any letter followed by exactly two digits. $ ls [[:alpha:]][[:digit:]][[:digit:]] A23 Bash character classes include alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, and xdigit. Note
Escape from GlobbingYou may wish to specify a file that contains one or more pattern-matching operators as part of its name. Filenames may be protected from shell interpretation by escaping either the entire filename or the particular characters that have a special meaning to the shell. Special characters include the pattern-matching operators and the escape characters themselves: single quote, double quotes, and backslash. Here are some examples applied to the following (oddly named) files. $ ls a* a"*'b a"b a'b a*b a\b Escape a backslash: $ ls "a\b" a\b $ ls a\\b a\b Escape double quotes and single quote: $ ls 'a"b" a"b $ ls "a"b" a"b $ ls a\"b a"b Learn More
Finally, escape the combination of special characters in the filename a"*"b by using a combination of techniques. The double quotes are escaped by backslash, whereas the star and single quote are escaped by enclosing them in double quotes. $ ls a\""*'b" a"*'b Some commands, such as find, do their own expansion of pattern-matching operators. When using such a command, you must escape the operators to prevent the shell from expanding them before they are passed to the command. Here, the shell expands an unescaped star, confusing find. $ find . -name * find: bag: unknown expression primary Escaping the star allows it to be passed to find. $ find . -name "*" . ./baag ./bag ... Shrink Big Command LinesWhen many files, in the order of thousands, match a pattern, the command line gets too big, causing an error to be thrown. Suppose you have a command scale that you want to apply to the thousands of JPEG files in the current directory. You may issue the command $ scale *.jpg bash: /usr/local/bin/scale: Argument list too long Note: The scale command is given as an example of a command you might employ; you won't necessarily have the command on your system. Learn More
You'll notice that Bash has thrown an error because *.jpg expanded to too many files. We use the find command to solve this problem. The command does its own wildcard expansion independent of the shell (remember to escape pattern-matching operators). Pipe the output from find (that's the list of files) to the xargs command. The xargs command constructs an argument list from its input and passes the list to the target command. $ find . -name "*.jpg" | xargs scale Run this command line as is, and you may wonder how it improves on our original: It has merely pushed the long-argument-list problem onto xargs, which will be forced to construct a command line with thousands of arguments. That's fine, however, because xargs has a few tricks up its sleeve. Using its option -n, you can specify the maximum number of arguments to use in each invocation of the target command, thus limiting the maximum size of a command line. In this example, we invoke scale repeatedly in batches of 1,000 until all arguments have been processed. $ find . -name "*.jpg" -print0 | xargs -0 -n1000 scale The extra options -print0 and -0 are there to cope with spaces in filenames. |