Flylib.com

Books Software

 
 
 

Hack 33. Presolve Module Paths


Hack 33. Presolve Module Paths

Make programs on complex installations start more quickly.

In certain circumstances, one of Perl's major strengths can be a weakness. Even though you can manipulate where Perl looks for modules (@INC) at runtime according to your needs [Hack #29], and even though you can use thousands of modules from the CPAN, your system has to find and load these modules.

For a short-running , repeated program, this can be expensive, especially if you have many paths in @INC from custom testing paths, sitewide paths, staging servers, business-wide repositories, and the like. Fortunately, there's more than one way to solve this. One approach is to resolve all of the paths just once, and then use your program as normal.

The Hack

"Trace All Used Modules" [Hack #74] shows how putting a code reference into @INC allows you to execute code every time you use or require a module. That works here, too.

package Devel::Presolve;

use strict;
use warnings;

my @track;

BEGIN { unshift @INC, \&resolve_path }

sub resolve_path
{
    my ($code, $module) = @_;
    push @track, $module;
    return;
}

INIT
{
    print "BEGIN\n{\n";

    for my $tracked (@track)
    {
        print "\trequire( \$INC{'$tracked'} = '$INC{$tracked}' );\n";
    }

    print "}\n1;\n";
    exit;
}

1;

Devel::Presolve 's resolve_path( ) captures every request to load a module, stores the module name , and returns. Thus Perl attempts to load the module as normal. After the entire program has finished compiling, but before it starts to run [Hack #70], it prints to STDOUT a BEGIN block that loads all of the modules by absolute filepath then exits the program.

Running the Hack

Put Devel::Presolve somewhere in your path . Then run your slow-starting program while loading the module. Redirect the output to a file of your choosing:

$

perl -MDevel::Preload slow_program.pl > preload.pm


preload.pm will contain something similar to:

BEGIN
{
    require( $INC{'CGI.pm'}      = '/usr/lib/perl5/5.8.7/CGI.pm' );
    require( $INC{'CGI/Util.pm'} = '/usr/lib/perl5/5.8.7/CGI/Util.pm' );
    require( $INC{'vars.pm'}     = '/usr/lib/perl5/5.8.7/vars.pm' );
    require( $INC{'constant.pm'} = '/usr/lib/perl5/5.8.7/constant.pm' );
    require( $INC{'overload.pm'} = '/usr/lib/perl5/5.8.7/overload.pm' );
}

1;

You can either include the contents of this file at the start of slow_program.pl or load it as the first module. If you do the latter, put the file in a directory at the front of @INC , lest you erase any performance gains.

Note that the trick of assigning to %INC within the require avoids a potentially nasty module-reloading bug, where Perl doesn't see require '/usr/lib/perl5/5.8.7./CGI.pm' as loading the same file as use CGI; does.

Hacking the Hack

Pre-resolving paths likely won't help long-running programs. For short-running programs where startup time can dwarf calculation time, it may, depending on how complex your @INC is. Be especially careful that upgrading Perl or installing new versions of modules may invalidate this cacheit is a cacheand cause strange errors. This technique may work better only when you want to deploy a program to a production system, but likely not when you're merely developing or testing.



Hack 34. Create a Standard Module Toolkit

Curb your addiction to explicit use statements.

Most experienced Perl programmers rely on a core set of modules and subroutines that they use in just about every application they create. For example, if you work with XML documents on a daily basis (and you certainly have our deepest sympathy there), then you probably use either XML::Parser or XML::SAX or XML::We::Built::Our::Own::Damn::Solution all the time.

If those documents contain lists of files that you need to manipulate, then you probably use File::Spec or File::Spec::Functions as well, and perhaps File::Find too. Maybe you need to verify and manipulate dates and times on those files, so you regularly pull in half a dozen of the DateTime modules.

If the application has an interactive component, you might continually need to use the prompt( ) subroutine from IO::Prompt [Hack #14]. Likewise, you might frequently make use of the efficient slurp( ) function from File::Slurp . You might also like to have Smart::Comments instantly available [Hack #54] to simplify debugging. Of course, you always specify use strict and use warnings , and probably use Carp as well.

A Mess of Modules

This adds up to a tediously long list of standard modules, most of which you need to load every time you write a new application:

#! /usr/bin/perl

use strict;
use warnings;
use Carp;
use Smart::Comments;
use XML::Parser;
use File::Spec;
use IO::Prompt qw( prompt );
use File::Spec::Functions;
use File::Slurp qw( slurp );
use DateTime;
use DateTime::Duration;
use DateTime::TimeZone;
use DateTime::TimeZone::Antarctica::Mawson;
# etc.
# etc.

It would be great if you could shove all these usual suspects in a single file:

package Std::Modules;

use strict;
use warnings;
use Carp;
use Smart::Comments;
use XML::Parser;
use File::Spec;
use IO::Prompt qw( prompt );
use File::Spec::Functions;
use File::Slurp qw( slurp );
use DateTime;
use DateTime::Duration;
use DateTime::TimeZone;
use DateTime::TimeZone::Antarctica::Mawson;
# etc.

1;

and just use that one module instead:

#! /usr/bin/perl

use Std::Modules;

Of course, that fails dismally. Using a module that uses other modules isn't the same as using those other modules directly. In most cases, you'd be importing the components you need into the wrong namespace (into Std::Modules instead of main ) or into the wrong lexical scope (for use strict and use warnings ).

The Hack

What you really need is a way to create a far more cunning module: one that cuts-and-pastes any use statements inside it into any file that uses the module. The easiest way to accomplish that kind of sneakiness is with the Filter::Macro CPAN module. As its name suggests, this module is a source filter that converts what follows it into a macro. Perl then replaces any subsequent use of that macro-ized module with the contents of the module. For example:

package Std::Modules;
use Filter::Macro;     # <-- The magic happens here

use strict;
use warnings;
use Carp;
use Smart::Comments;
use XML::Parser;
use File::Spec;
use IO::Prompt qw( prompt );
use File::Spec::Functions;
use File::Slurp qw( slurp );
use DateTime;
use DateTime::Duration;
use DateTime::TimeZone;
use DateTime::TimeZone::Antarctica::Mawson;
# etc.
# etc.

1;

Now, whenever you write:

#! /usr/bin/perl

use Std::Modules;

all of those other use statements inside Std::Modules are pasted into your code, in place of the use Std::Modules statement itself.

Hacking the Hack

There's also a more modular and powerful variation on this idea available. The Toolkit module (also on CPAN) allows you to specify a collection of standard module inclusions as separate files in a standard directory structure. Once you have them set up, you can automatically use them all just by writing:

#! /usr/bin/perl

use Toolkit;

The advantage of this approach is that you can also set up "conditional usages"files that tell Toolkit to import specific subroutines from specific modules, but only when something actually uses those subroutines. For example, you can tell Toolkit not to always load:

use IO::Prompt qw( prompt );
use File::Slurp qw( slurp );

but only to load the IO::Prompt module if something actually uses the prompt( ) subroutine, and likewise to defer loading File::Slurp for slurp( ) until actually necessary.

That way, you can safely specify dozens of handy subroutines and modules in your standard toolkit, but only pay the loading costs for those you actually use.