14.5. Command-Line Processing
Providing a consistent set of command-line arguments across all applications helps the users of the suite, but it can also help the implementers and the maintainers. If a collection of programs all use consistent command-line arguments, then each program can use the same approach to parsing those arguments. Defining a consistent command-line interface makes the programs easier to write in the first place, because once the command-line processing has been set up for the first application, the universal components of it can be refactored into a separate module and reused by subsequent programs (as described under "Interapplication Consistency" later in this chapter). This approach also makes the suite much more maintainable, as debugging or enhancing that one module automatically fixes or extends the command-line processing of perhaps dozens of individual applications. There are plenty of inappropriate ways to parse command lines. For example, Perl has a built-in -s option (as documented in the perlrun manpage) that will happily unpack your command line for you, as Example 14-1 demonstrates. Example 14-1. Command-line parsing via perl -s#!/usr/bin/perl -s # Use the -s shebang line option to handle command lines of the form: # # > orchestrate -in=source.txt -out=dest.orc -v # The -s automatically parses the command line into these package variables... use vars qw( $in $out $verbose $len); # Handle meta-options (which will appear in package variables whose names # start with a dash. Oh, the humanity!!!)... no strict qw( refs ); X::Version->throw( ) if ${-version}; X::Usage->throw( ) if ${-usage}; X::Help->throw( ) if ${-help}; X::Man->throw( ) if ${-man}; # Report intended behaviour... if ($verbose) { print "Loading first $len chunks of file: $in\n" } # etc. Under -s, every command-line argument of the form -argname is converted to a package variable ${argname}. The use of a package variable is a problem in itself, but it gets worse. The interpreter names each of these variables by simply removing the leading dash of the corresponding command-line flag. So the leading dash of -h is removed to create ${h}, and the leading dash of -help is removed to generate ${help}. Unfortunately, when a mandatory meta-option like --help appears on the command line, its single leading dash is removed too, producing the variable ${-help}, which is legal only under no strict 'refs'. A better solution, though much more complex, would be to define a regular expression for each valid option, in whatever form you wished them to take. Then you would test any matches against the command line using iterated /gc pattern matches (see Chapter 12). An argument that doesn't match any of your regexes could be caught at the end of the outer loop and reported as an error. Example 14-2 illustrates exactly that approach. Example 14-2. Command-line parsing via a hand-coded parser# Handle command lines of the form: # # > orchestrate -in=source.txt -out dest.orc -v # Create table describing argument flags, default values, # and how to match the remainder of each argument... my @options = ( { flag=>'-in', val=>'-', pat=>qr/ \s* =? \s* (\S*) /xms }, { flag=>'-out', val=>'-', pat=>qr/ \s* =? \s* (\S*) /xms }, { flag=>'-len', val=>24, pat=>qr/ \s* =? \s* (\d+) /xms }, { flag=>'--verbose', val=>0, pat=>qr/ /xms }, ); # Initialize hash for arguments... my %arg = map { $_->{flag} => $_->{val} } @options; # Create table of meta-options and associated regex... my %meta_option = ( '--version' => sub { X::Version->throw( ) }, '--usage' => sub { X::Usage->throw( ) }, '--help' => sub { X::Help->throw( ) }, '--man' => sub { X::Man->throw( ) }, ); my $meta_option = join '|', reverse sort keys %meta_option; # Reconstruct full command line, and start matching at the start... my $cmdline = join $SPACE, @ARGV; pos $cmdline = 0; # Step through cmdline... ARG: while (pos $cmdline < length $cmdline) { # Checking for a meta-option each time... if (my ($meta) = $cmdline =~ m/ \s* ($meta_option) \b /gcxms ) { $meta_option{$meta}->( ); } # Then trying each option... for my $opt_ref ( @options ) { # Seeing whether that option matches at this point in the cmdline... if (my ($val) = $cmdline =~ m/\G \s* $opt_ref->{flag} $opt_ref->{pat} /gcxms) { # And, if so, storing the value and moving on... $arg{$opt_ref->{flag}} = $val; next ARG; } } # Otherwise, extract the next chunk of text # and report it as an unknown flag... my ($unknown) = $cmdline =~ m/ (\S*) /xms; croak "Unknown cmdline flag: $unknown"; } # Report intended behaviour... if ($arg{'--verbose'}) { print "Loading first $arg{-len} chunks of file: $arg{-in}\n" } # etc. Using a table-drive approach here is importantboth because it would make it easier to add extra options as the program develops, and because data-driven solutions are much easier to factor out into a separate module that can later be shared by your entire application suite. And, of course, many people already have done exactly that: factored out their table-driven command-line processors into modules. Such modules are traditionally created within the Getopt:: namespace, and Perl's standard library comes with two of them: Getopt::Std and Getopt::Long. The Getopt::Std module can recognize only single-character flags (except for help and version) and so is not recommended. Getopt::Long, on the other hand, is a much cleaner and more powerful tool. For example, the earlier command-line processing examples could be simplified to the version shown in Example 14-3. Example 14-3. Command-line parsing via Getopt::Long That's noticeably shorter than the regex-based version in Example 14-2, and much more robust than the version in Example 14-1. It's also neatly table-driven, so you could refactor it out into your own module, to be re-used across all your applications. And it uses a core module, so your program will be portable to any Perl platform. Getopt::Long is probably more than adequate for most developers' command-line processing needs. And while its feature set is still limited, those very limitations may actually be an advantage, as they tend to discourage the creation of "adventurous" interfaces. However, if your applications do have more advanced requirementssuch as mutually exclusive options (verbose vs taciturn), or options that can be used only with other options (-bak being valid only if insitu is in effect), or options that imply other options (garrulous implying verbose)then there are dozens of other Getopt:: modules on the CPAN to choose from[*].
One of the most powerful and adaptable of these is Getopt::Clade. With it, the command-line processing implemented in the previous examples could be implemented as in Example 14-4. Example 14-4. Command-line parsing via Getopt::Clade To create an interface using Getopt::Clade, you simply load the module and pass it the usage message you'd like to see. It then extracts the various options you've specified, builds a parser for them, parses the command line, and then does any appropriate type-checking on what it finds. For example, the -i flag's <file> slot is specified with the suffix :in, indicating that it's supposed to be an input file. So Getopt::Clade checks whether any string in that slot is the name of a readable file. Likewise, the :+int marker in -l <l:+int> causes the module to accept only a positive integer in that slot. Once the command line has been parsed and verified, the module fills in any missing defaults, and puts results in the standard %ARGV hash[*].
Notice that there are no specifications for help, usage, version, or man flags; they're always generated automatically. Likewise, there's no need for explicit error-handling code: if command-line parsing fails, Getopt::Clade generates the appropriate error message automatically, piecing together a full usage line from the options you specified. The module has many other features, and is definitely worth considering when implementing complex command-line interfaces. |