Section 14.5. File Name Globbing

   


14.5. File Name Globbing

Most Linux users take it for granted that running ls *.c does not tell them all about the file in the current directory called *.c. Instead, they expect to see a list of all the file names in the current directory whose names end with .c. This file-name expansion from *.c to ladsh.c dircontents.c (for example) is normally handled by the shell, which globs all the parameters to programs it runs. Programs that help users manipulate files often need to glob file names, as well. There are two common ways to glob file names from inside a program.

14.5.1. Use a Subprocess

The oldest method is simply to run a shell as a child process and let it glob the file names for you. The standard popen()[3] function makes this simple just run the command ls *.c tHRough popen() and read the results. Although this may seem simplistic, it is a simple solution to the globbing problem and is highly portable (which is why applications like Perl use this approach).

[3] See page 132 for information on popen().

Here is a program that globs all its arguments and displays all of the matches:

  1: /* popenglob.c */  2:  3: #include <stdio.h>  4: #include <string.h>  5: #include <sys/wait.h>  6: #include <unistd.h>  7:  8: int main(int argc, const char ** argv) {  9:     char buf[1024]; 10:     FILE * ls; 11:     int result; 12:     int i; 13: 14:     strcpy(buf, "ls "); 15: 16:     for (i = 1; i < argc; i++) { 17:         strcat(buf, argv[i]); 18:         strcat(buf, " "); 19:     } 20: 21:     ls = popen(buf, "r"); 22:     if (!ls) { 23:         perror("popen"); 24:         return 1; 25:     } 26: 27:     while (fgets(buf, sizeof(buf), ls)) 28:         printf("%s", buf); 29: 30:     result = pclose(ls); 31: 32:     if (!WIFEXITED(result)) return 1; 33: 34:     return 0; 35: } 


14.5.2. Internal Globbing

If you need to glob many file names, running many subshells through popen() may be too inefficient. The glob() function allows you to glob file names without running any subprocesses, at the price of increased complexity and reduced portability. Although glob() is specified by POSIX.2, many Unix variants do not yet support it.

 #include <glob.h> int glob(const char *pattern, int flags,          int errfunc(const char * epath, int eerrno), glob_t * pglob); 


The first parameter, pattern, specifies the pattern that file names must match. This function understands the *, ?, and [] globbing operators, and optionally also the {, }, and ~ globbing operators, and treats them identically to the standard shells. The final parameter is a pointer to a structure that gets filled in with the results of the glob. The structure is defined like this:

 #include <glob.h> typedef struct {     int gl_pathc;      /* number of paths in gl_pathv */     char **gl_pathv;   /* list of gl_pathc matched pathnames */     int gl_offs;       /* slots to reserve in gl_pathv for GLOB_DOOFS */ } glob_t; 


The flags are of one or more of the following values bitwise OR'ed together:

GLOB_ERR

Returned if an error occurs (if the function cannot read the contents of a directory due to permissions problems, for example).

GLOB_MARK

If the pattern matches a directory name, that directory name will have a / appended to it on return.

GLOB_NOSORT

Normally, the returned pathnames are sorted alphabetically. If this flag is specified, they are not sorted.

GLOB_DOOFS

If set, the first pglob->gl_offs strings in the returned list of pathnames are left empty. This allows glob() to be used while building a set of arguments that will be passed directly to execv().

GLOB_NOCHECK

If no file names match the pattern, the pattern itself is returned as the sole match (usually, no matches are returned). In either case, if the pattern does not contain any globbing operators, the pattern is returned.

GLOB_APPEND

pglob is assumed to be a valid result from a previous call to glob(), and any results from this invocation are appended to the results from the previous call. This makes it easy to glob multiple patterns.

GLOB_NOESCAPE

Usually, if a backslash (\) precedes a globbing operator, the operator is taken as a normal character instead of being assigned its special meaning. For example, the pattern a\* usually matches only a file named a*. If GLOB_NOESCAPE is specified, \ loses this special meaning, and a\* matches any file name that begins with the characters a\. In this case, a\ and a\bcd would be matched, but arachnid would not because it does not contain a \.

GLOB_PERIOD

Most shells do not allow glob operators to match files whose names begin with a . (try ls * in your home directory and compare it with ls -a.). The glob() function generally behaves this way, but GLOB_PERIOD allows the globbing operators to match a leading . character. GLOB_PERIOD is not defined by POSIX.

GLOB_BRACE

Many shells (following the lead of csh) expand sequences with braces as alternatives; for example, the pattern "{a,b} "is expanded to "a b ", and the pattern "a{,b,c} "to" a ab ac ". The GLOB_BRACE enables this behavior. GLOB_BRACE is not defined by POSIX.

GLOB_NOMAGIC

Acts just like GLOB_NOCHECK except that it appends the pattern to the list of results only if it contains no special characters. GLOB_NOMAGIC is not defined by POSIX.

GLOB_TILDE

Turns on tilde expansion, in which ~ or the substring ~/ is expanded to the path to the current user's home directory, and ~user is expanded to the path to user's home directory. GLOB_TILDE is not defined by POSIX.

GLOB_ONLYDIR

Matches only directories, not any other type of file. GLOB_ONLYDIR is not defined by POSIX.


Often, glob() encounters directories to which the process does not have access, which causes an error. Although the error may need to be handled in some manner, if the glob() returns the error (thanks to GLOB_ERR), there is no way to restart the globbing operation where the previous globbing operation encountered the error. As this makes it difficult both to handle errors that occur during a glob() and to complete the glob, glob() allows the errors to be reported to a function of the caller's choice, which is specified in the third parameter to glob(). It should be prototyped as follows:

 int globerr(const char * pathname, int globerrno); 


The function is passed the pathname that caused the error and the errno value that resulted from one of opendir(), readdir(), or stat(). If the error function returns nonzero, glob() returns with an error. Otherwise, the globbing operation is continued.

The results of the glob are stored in the glob_t structure referenced by pglob. It includes the following members, which allow the caller to find the matched file names:

gl_pathc

The number of pathnames that matched the pattern

gl_pathv

Array of pathnames that matched the pattern


After the returned glob_t has been used, the memory it uses should be freed by passing it to globfree().

 void globfree(glob_t * pglob); 


glob() returns GLOB_NOSPACE if it ran out of memory, GLOB_ABEND if a read error caused the function to fail, GLOB_NOMATCH if no matches were found, or 0 if the function succeeded and found matches.

To help illustrate glob(), here is a program called globit, which accepts multiple patterns as arguments, globs them all, and displays the result. If an error occurs, a message describing the error is displayed, but the glob operation is continued.

  1: /* globit.c */  2:  3: #include <errno.h>  4: #include <glob.h>  5: #include <stdio.h>  6: #include <string.h>  7: #include <unistd.h>  8:  9: /* This is the error function we pass to glob(). It just displays 10:    an error and returns success, which allows the glob() to 11:    continue. */ 12: int errfn(const char * pathname, int theerr) { 13:     fprintf(stderr, "error accessing %s: %s\n", pathname, 14:             strerror(theerr)); 15: 16:     /* We want the glob operation to continue, so return 0 */ 17:     return 0; 18: } 19: 20: int main(int argc, const char ** argv) { 21:     glob_t result; 22:     int i, rc, flags; 23: 24:     if (argc < 2) { 25:         printf("at least one argument must be given\n"); 26:         return 1; 27:     } 28: 29:     /* set flags to 0; it gets changed to GLOB_APPEND later */ 30:     flags = 0; 31: 32:     /* iterate over all of the command-line arguments */ 33:     for (i = 1; i < argc; i++) { 34:         rc = glob(argv[i], flags, errfn, &result); 35: 36:         /* GLOB_ABEND can't happen thanks to errfn */ 37:         if (rc == GLOB_NOSPACE) { 38:             fprintf(stderr, "out of space during glob operation\n"); 39:             return 1; 40:         } 41: 42:         flags |= GLOB_APPEND; 43:     } 44: 45:     if (!result.gl_pathc) { 46:         fprintf(stderr, "no matches\n"); 47:         rc = 1; 48:     } else { 49:         for (i = 0; i < result.gl_pathc; i++) 50:             puts(result.gl_pathv[i]); 51:         rc = 0; 52:     } 53: 54:     /* the glob structure uses memory from the malloc() pool, which 55:        needs to be freed */ 56:     globfree(&result); 57: 58:     return rc; 59: } 



       
    top
     


    Linux Application Development
    Linux Application Development (paperback) (2nd Edition)
    ISBN: 0321563220
    EAN: 2147483647
    Year: 2003
    Pages: 168

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net