1.9 CPAN Modules


The Comprehensive Perl Archive Network (CPAN, http://www.cpan.org) is an impressively large collection of Perl code (mostly Perl modules). CPAN is easily accessible and searchable on the Web, and you can use its modules for a variety of programming tasks .

By now you should have the basic idea of how modules are defined and used, so let's take some time to explore CPAN to see what goodies are available.

There are two important points about CPAN. First, a large number of the things you might want your programs to do have already been programmed and are easily obtained in downloadable modules. You just have to go find them at CPAN, install them on your computer, and call them from your program. We'll take a look at an example of exactly that in this section.

Second, all code on CPAN is free of charge and available for use by a very unrestrictive copyright declaration. Sound good? Keep reading.

CPAN includes convenient ways to search for useful modules, and there's a CPAN.pm module built-in with Perl that makes downloading and installing modules quite easy (when things work well, which they usually do). If you can't find CPAN.pm , you should consider updating your current version.

You can find more information by typing the following at the command line:

 perldoc CPAN 

You can also check the Frequently Asked Questions (FAQ) available at the CPAN web site.

1.9.1 What's Available at CPAN?

The CPAN web site offers several "views" of the CPAN collection of modules and several alternate ways of searching (by module name , category, full text search of the module documentation, etc.). Here is the top-level organization of the modules by overall category:

 Development Support Operating System Interfaces Networking Devices IPC Data Type Utilities Database Interfaces User Interfaces Language Interfaces File Names Systems Locking String Lang Text Proc Opt Arg Param Proc Internationalization Locale Security and Encryption World Wide Web HTML HTTP CGI Server and Daemon Utilities Archiving and Compression Images Pixmaps Bitmaps Mail and Usenet News Control Flow Utilities File Handle Input Output Microsoft Windows Modules Miscellaneous Modules Commercial Software Interfaces Not In Modulelist 

1.9.2 Searching CPAN

CPAN's main web page has a few ways to search the contents. Let's say you need to perform some statistics and are looking for code that's already available. We'll go through the steps necessary to search for the code, download and install it, and use the module in a program.

At the main CPAN page, look for "searching" and click on search.cpan.org . If you search for "statistics" in all locations, you'll get over 300 hits, so you should restrict your search to modules with the pull-down menu. You'll get 25 hits (more by the time you read this); here's what you'll see:

 1.  Statistics::Candidates Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest 2. Statistics::ChiSquare How random is your data? Statistics-ChiSquare-0.3 - 23 Nov 2001 - Jon Orwant 3. Statistics::Contingency Calculate precision, recall, F1, accuracy, etc. Statistics-Contingency-0.03 - 09 Aug 2002 - Ken Williams 4. Statistics::DEA Discontiguous Exponential Averaging Statistics-DEA-0.04 - 17 Aug 2002 - Jarkko Hietaniemi 5. Statistics::Descriptive Module of basic descriptive statistical functions. Statistics-Descriptive-2.4 - 26 Apr 1999 - Colin Kuskie 6. Statistics::Distributions Perl module for calculating critical values of common statistical distributions Statistics-Distributions-0.07 - 22 Jun 2001 - Michael Kospach 7. Statistics::Frequency simple counting of elements Statistics-Frequency-0.02 - 24 Apr 2002 - Jarkko Hietaniemi 8. Statistics::GaussHelmert General weighted least squares estimation Statistics-GaussHelmert-0.05 - 18 Apr 2002 - Stephan Heuel 9. Statistics::LTU An implementation of Linear Threshold Units Statistics-LTU-2.8 - 27 Feb 1997 - Tom Fawcett 10. Statistics::Lite Small stats stuff. Statistics-Lite-1.02 - 15 Apr 2002 - Brian Lalonde  11.  Statistics::MaxEntropy Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest 12. Statistics::OLS perform ordinary least squares and associated statistics, v 0.07. Statistics-OLS-0.07 - 13 Oct 2000 - Sanford Morton 13. Statistics::ROC receiver-operator-characteristic (ROC) curves with nonparametric confidence bounds Statistics-ROC-0.01 - 22 Jul 1998 - Hans A. Kestler 14. Statistics::Regression weighted linear regression package (line+plane fitting) StatisticsRegression - 26 May 2001 - ivo welch 15. Statistics::SparseVector Perl5 extension for manipulating sparse bitvectors Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest 16. Statistics::Descriptive::Discrete Compute descriptive statistics for discrete data sets. Statistics-Descriptive-Discrete-0.07 - 13 Jun 2002 - Rhet Turnbull 17. Bio::Tree::Statistics Calculate certain statistics for a Tree bioperl-1.0.2 - 16 Jul 2002 - Ewan Birney 18. Device::ISDN::OCLM::Statistics OCLM statistics superclass Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 19. Device::ISDN::OCLM::CurrentStatistics OCLM current call statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 20. Device::ISDN::OCLM::ISDNStatistics OCLM ISDN statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes  21.  Device::ISDN::OCLM::Last10Statistics OCLM Last10 call statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 22. Device::ISDN::OCLM::LastStatistics OCLM last call statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 23. Device::ISDN::OCLM::ManualStatistics OCLM manual call statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 24. Device::ISDN::OCLM::SPStatistics OCLM service provider statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 25. Device::ISDN::OCLM::SystemStatistics OCLM system statistics Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 

Let's check out the Statistics::ChiSquare module.

First, click on the link to Statistics::ChiSquare ; you'll see a summary of the module, complete with a description, overview, discussion of the method, examples of use, and information about the author.

One of the modules looks interesting; let's download and install it. How big is the source code? If you click on the source link, you'll find that the module is really just one short subroutine with the documentation defined right in the module. Here's the subroutine definition part of the module:

 package Statistics::ChiSquare; # ChiSquare.pm # # Jon Orwant, orwant@media.mit.edu # # 31 Oct 95, revised Mon Oct 18 12:16:47 1999, and again November 2001 # to fix an off-by-one error # # Copyright 1995, 1999, 2001 Jon Orwant.  All rights reserved. # This program is free software; you can redistribute it and/or # modify it under the same terms as Perl itself. #  # Version 0.3.  Module list status is "Rdpf" use strict; use vars qw($VERSION @ISA @EXPORT); require Exporter; require AutoLoader; @ISA = qw(Exporter AutoLoader); # Items to export into callers namespace by default. Note: do not export # names by default without a very good reason. Use EXPORT_OK instead. # Do not simply export all your public functions/methods/constants. @EXPORT = qw(chisquare); $VERSION = '0.3'; my @chilevels = (100, 99, 95, 90, 70, 50, 30, 10, 5, 1); my %chitable = (  ); # assume the expected probability distribution is uniform sub chisquare {     my @data = @_;     @data = @{$data[0]} if @data =  = 1 and ref($data[0]);     my $degrees_of_freedom = scalar(@data) - 1;     my ($chisquare, $num_samples, $expected, $i) = (0, 0, 0, 0);     if (! exists($chitable{$degrees_of_freedom})) {         return "I can't handle ", scalar(@data),          " choices without a better table.";     }     foreach (@data) { $num_samples += $_ }     $expected = $num_samples / scalar(@data);     return "There's no data!" unless $expected;     foreach (@data) {         $chisquare += (($_ - $expected) ** 2) / $expected;     }     foreach (@{$chitable{$degrees_of_freedom}}) {         if ($chisquare < $_) {             return              "There's a <$chilevels[$i+1]% and <$chilevels[$i]% chance that this data                      is random.";         }         $i++;     }     return "There's a <$chilevels[$#chilevels]% chance that this data is random."; } $chitable{1} = [0.00016, 0.0039, 0.016, 0.15, 0.46, 1.07, 2.71, 3.84, 6.64]; $chitable{2} = [0.020,   0.10,   0.21,  0.71, 1.39, 2.41, 4.60, 5.99, 9.21]; $chitable{3} = [0.12,    0.35,   0.58,  1.42, 2.37, 3.67, 6.25, 7.82, 11.34]; $chitable{4} = [0.30,    0.71,   1.06,  2.20, 3.36, 4.88, 7.78, 9.49, 13.28]; $chitable{5} = [0.55,    1.14,   1.61,  3.00, 4.35, 6.06, 9.24, 11.07, 15.09]; $chitable{6} = [0.87,    1.64,   2.20,  3.83, 5.35, 7.23, 10.65, 12.59, 16.81]; $chitable{7} = [1.24,    2.17,   2.83,  4.67, 6.35, 8.38, 12.02, 14.07, 18.48]; $chitable{8} = [1.65,    2.73,   3.49,  5.53, 7.34, 9.52, 13.36, 15.51, 20.09]; $chitable{9} = [2.09,    3.33,   4.17, 6.39, 8.34, 10.66, 14.68, 16.92, 21.67]; $chitable{10} = [2.56,   3.94,   4.86, 7.27, 9.34, 11.78, 15.99, 18.31, 23.21]; $chitable{11} = [3.05,   4.58,  5.58, 8.15, 10.34, 12.90, 17.28, 19.68, 24.73]; $chitable{12} = [3.57,   5.23, 6.30, 9.03, 11.34, 14.01, 18.55, 21.03, 26.22]; $chitable{13} = [4.11,   5.89, 7.04, 9.93, 12.34, 15.12, 19.81, 22.36, 27.69]; $chitable{14} = [4.66,   6.57, 7.79, 10.82, 13.34, 16.22, 21.06, 23.69, 29.14]; $chitable{15} = [5.23,   7.26, 8.55, 11.72, 14.34, 17.32, 22.31, 25.00, 30.58]; $chitable{16} = [5.81,   7.96, 9.31, 12.62, 15.34, 18.42, 23.54, 26.30, 32.00]; $chitable{17} = [6.41,  8.67, 10.09, 13.53, 16.34, 19.51, 24.77, 27.59, 33.41]; $chitable{18} = [7.00,  9.39, 10.87, 14.44, 17.34, 20.60, 25.99, 28.87, 34.81]; $chitable{19} = [7.63, 10.12, 11.65, 15.35, 18.34, 21.69, 27.20, 30.14, 36.19]; $chitable{20} = [8.26, 10.85, 12.44, 16.27, 19.34, 22.78, 28.41, 31.41, 37.57]; 1; 

Some of this code will look familiar; some may not. Check out the use of package , use strict , and require Exporter ; they're parts of Perl you've just seen.

You'll also see references to version , Autoloader , use vars , and an initialization of a multidimensional array chitable , which will be covered later. For now, you may want to take a quick read-through of the code and get some personal satisfaction at how much of it makes sense.

Indeed, one of the really nice things about most modules is that you don't really have to read the code very often. Usually you can just install the module, read enough of the documentation to see how to call it from your program, and you're off and running. Let's take that approach now.

1.9.3 Installing Modules Using CPAN.pm

Our next task is to install the module using CPAN.pm . This section contains a log from when I installed Statistics::ChiSquare on my Linux computer using CPAN.pm .

In fact, to make things easy, here's the section of the CPAN FAQ that addresses installing modules:

 How do I install Perl modules? Installing a new module can be as simple as typing perl -MCPAN -e 'install Chocolate::Belgian'. The CPAN.pm documentation has more complete instructions on how to use this convenient tool.  If you are uncomfortable with having something take that much control over your software installation, or it otherwise doesn't work for you, the perlmodinstall documentation covers module installation for UNIX, Windows and Macintosh in more familiar terms. Finally, if you're using ActivePerl on Windows, the PPM (Perl Package Manager) has much of the same functionality as CPAN.pm. 

The following is my install log. Notice that all I have to do is type a couple of lines, and everything else that follows is automatic!

 [tisdall@coltrane tisdall]$ perl -MCPAN -e 'install Statistics::ChiSquare' CPAN: Storable loaded ok mkdir /root/.cpan: Permission denied at /usr/local/lib/perl5/5.6.1/CPAN.pm line 2218 [tisdall@coltrane tisdall]$ su Password:  [root@coltrane tisdall]# perl -MCPAN -e 'install Statistics::ChiSquare' CPAN: Storable loaded ok Going to read /root/.cpan/Metadata   Database was generated on Wed, 20 Mar 2002 00:39:29 GMT CPAN: LWP::UserAgent loaded ok Fetching with LWP:   ftp://cpan.cse.msu.edu/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/authors/01mailrc.txt.gz CPAN: Compress::Zlib loaded ok Fetching with LWP:   ftp://cpan.cse.msu.edu/modules/02packages.details.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz   Database was generated on Mon, 26 Aug 2002 00:22:07 GMT   There's a new CPAN.pm version (v1.62) available!   [Current version is v1.59_54]   You might want to try     install Bundle::CPAN     reload cpan   without quitting the current session. It should be a seamless upgrade   while we are running... Fetching with LWP:   ftp://cpan.cse.msu.edu/modules/03modlist.data.gz Going to read /root/.cpan/sources/modules/03modlist.data.gz Going to write /root/.cpan/Metadata Running install for module Statistics::ChiSquare Running make for J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz Fetching with LWP:   ftp://cpan.cse.msu.edu/authors/id/J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz CPAN: MD5 loaded ok Fetching with LWP:   ftp://cpan.cse.msu.edu/authors/id/J/JO/JONO/CHECKSUMS Checksum for /root/.cpan/sources/authors/id/J/JO/JONO/Statistics-ChiSquare-0.3.      tar.gz ok Scanning cache /root/.cpan/build for sizes Deleting from cache: /root/.cpan/build/IO-stringy-2.108 (21.4>20.0 MB) Deleting from cache: /root/.cpan/build/XML-Node-0.11 (20.8>20.0 MB) Deleting from cache: /root/.cpan/build/bioperl-0.7.2 (20.7>20.0 MB) Statistics/ChiSquare-0.3/ Statistics/ChiSquare-0.3/ChiSquare.pm Statistics/ChiSquare-0.3/Makefile.PL Statistics/ChiSquare-0.3/test.pl Statistics/ChiSquare-0.3/Changes Statistics/ChiSquare-0.3/MANIFEST Package seems to come without Makefile.PL.   (The test -f "/root/.cpan/build/Statistics/Makefile.PL" returned false.)   Writing one on our own (setting NAME to StatisticsChiSquare)   CPAN.pm: Going to build J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz Checking if your kit is complete... Looks good Writing Makefile for Statistics::ChiSquare Writing Makefile for StatisticsChiSquare make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3' cp ChiSquare.pm ../blib/lib/Statistics/ChiSquare.pm AutoSplitting ../blib/lib/Statistics/ChiSquare.pm (../blib/lib/auto/      Statistics/ChiSquare) Manifying ../blib/man3/Statistics::ChiSquare.3 make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'   /usr/bin/make  -- OK Running make test make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3' make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3' make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3' PERL_DL_NONLAZY=1 /usr/bin/perl -I../blib/arch -I../blib/lib -I/usr/local/lib/      perl5/5.6.1/i686-linux -I/usr/local/lib/perl5/5.6.1 test.pl 1..2 ok 1 ok 2 make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'   /usr/bin/make test -- OK Running make install make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3' make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3' Installing /usr/local/lib/perl5/site_perl/5.6.1/Statistics/ChiSquare.pm Installing /usr/local/lib/perl5/site_perl/5.6.1/auto/Statistics/ChiSquare/      autosplit.ix Installing /usr/local/man/man3/Statistics::ChiSquare.3 Writing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/auto/     StatisticsChiSquare/.packlist Appending installation info to /usr/local/lib/perl5/5.6.1/i686-linux/perllocal.pod   /usr/bin/make install UNINST=1 -- OK [root@coltrane tisdall]# 

This may seem like a confusing amount of output, but, again, all you have to do is type a couple of lines, and the installation follows automatically.

You may get something like the following message when you try to install a CPAN module:

 [tisdall@coltrane tisdall]$ perl -MCPAN -e 'install Statistics::ChiSquare' CPAN: Storable loaded ok mkdir /root/.cpan: Permission denied at /usr/local/lib/perl5/5.6.1/CPAN.pm line 2218 

As you can see, it didn't work, and it produced an error message. On Unix machines, it's often necessary to become root to install things. [2] In that case, use the Unix su command and try the CPAN command again:

[2] You may need to contact your system administrator about getting root permission. The CPAN documentation discusses how to do a non-root installation. If you're not on a Unix or Linux machine and are using ActiveState's Perl on a Windows machine, for instance, you need to consult that documentation.

 [tisdall@coltrane tisdall]$ su Password:  [root@coltrane tisdall]# perl -MCPAN -e 'install Statistics::ChiSquare' 

Great, it worked. If you look over the rather verbose output, you'll see that it finds the module, installs it, tests it, and logs the installation.

Pretty easy, huh?

It's usually this easy, but not always. Occasionally, errors result, and the module may not be installed. In that case, the error messages may be enough to explain the problem; for instance, the module may depend on another module you have to install first. Another problem is that some modules haven't been tested on, or even designed to work on, all operating systems; if you try to install a Windows-specific module on Linux, it is likely to complain. In extreme cases, the module documentation usually provides the author's email address.

1.9.4 Using the Newly Installed CPAN Module

Now comes the payoff. Let's look again at the documentation for the module and see if we can use it from our own Perl code.

Now that the module is installed, you can see the documentation by typing:

 perldoc Statistics::ChiSquare 

You can also simply go back to the web documentation found at http://search.cpan.org. Either way, you'll find the following example using this ChiSquare module:

 NAME        "Statistics::ChiSquare" - How random is your data? SYNOPSIS         use Statistics::Chisquare;         print chisquare(@array_of_numbers);         Statistics::ChiSquare is available at a CPAN site near         you. DESCRIPTION         Suppose you flip a coin 100 times, and it turns up heads         70 times.  Is the coin fair?         Suppose you roll a die 100 times, and it shows 30 sixes.         Is the die loaded?         In statistics, the chi-square test calculates "how random"         a series of numbers is.  But it doesn't simply say "yes"         or "no".  Instead, it gives you a confidence interval,         which sets upper and lower bounds on the likelihood that         the variation in your data is due to chance.  See the         examples below. ... 

The documentation continues with more discussion and some concrete examples that use the module and interpret the results.

Very often, the SYNOPSIS part of the documentation is all you need to look at. It shows you specific examples of how to call the code in the module. In this case, because it's a very simple module, there is just one subroutine that can be used. As you see from the documentation excerpt, you just need to pass the chisquare subroutine an array of numbers and print out the return value to use the code. Let's try it. We'll take as our input an array of numbers that corresponds to the stops of the Broadway-7th Avenue local subway train on the west side of Manhattan, from 14th Street up to 137th Street in Harlem. (We'll assume you didn't run fast enough and missed the A train.) Let's see how random these stops really are:

 use strict; use warnings; use Statistics::ChiSquare; my(@subwaystops) = (14, 18, 23, 28, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110,  116, 125, 137); print chisquare(@subwaystops); 

This produces the output:

 There's a <1% chance that this data is random. 

(Knowing firsthand the feelings of long-suffering New York City Subway riders, I predict that this result might provoke some spirited discussion. Nevertheless, we seem to have working code.)

1.9.5 Problems with CPAN Modules

Actually, the sharp-eyed reader may have noticed a problem in our mad dash uptown. In the first line of the SYNOPSIS section, there's the following:

 use Statistics::Chisquare; 

The name of the module is spelled Chisquare, whereas in all other places in the documentation the module is spelled ChiSquare with a capital S. In Perl, the case of a letter, uppercase or lowercase, is important, and this looks suspiciously like a typographical error in the documentation. If you try use Statistics::Chisquare , you'll discover that the module can't be found, whereas if you try use Statistics::ChiSquare , the module is there. This is a minor bug, but some modules have poor documentation, and it can be a time-consuming problem, especially if you are forced to wade into the module code or try various tests, to figure out how the module works.

Apart from bugs , I've also mentioned the problem that some modules are not tested, or designed, for all operating systems. In addition, many modules require other modules to be present. It's possible to configure CPAN to automatically install all the required modules a requested module uses, as described in the CPAN documentation, but you may need to intervene personally . It's useful to remember that if you have a program that uses a certain module running on one computer, and you move the program to another computer, you may have to install the required modules on the new computer as well.

Saving the worst for last, it's also important to remember that contributing to CPAN is open to one and all, and not all the code there is well-written or well-tested. The heavily used modules are, but counterexamples can be found. So, don't bet the farm on your code just because it uses a CPAN module; you should still carefully read the documentation for the module and test your program.

The CPAN FAQ explains in detail the way to be a good citizen when it comes to testing and reporting bugs that you discover in CPAN code.



Mastering Perl for Bioinformatics
Mastering Perl for Bioinformatics
ISBN: 0596003072
EAN: 2147483647
Year: 2003
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net