3.7 Gene2.pm: A Second Example of a Perl Class


Gene1 demonstrated the fundamentals of a Perl class. Now, I'll build a more realistic example, which also includes a few additional standard Perl techniques.

My goal is to present an example that you can imitate in order to begin to develop your own OO software. I'm going to build the example in three more stages, expanding upon the Gene1.pm module. First, I'll add mutators, which are methods that alter the data in an object. I'll also add a method that gives information about the class as a whole, returning the count of how many objects in the class exist in the running program. This depends on the use of closures , methods that use variables declared outside the methods. This is the new material in the Gene2.pm module.

After that step, I introduce the AUTOLOAD mechanism, which gives a single class method called AUTOLOAD that can define large numbers of other methods and significantly reduce the amount of coding you need to write to develop a more complex object (among other benefits to be described later). That will be the Gene3.pm module.

We'll end up with a Gene.pm module you can use as a basis for your own Perl module development. It will add a mechanism to specify what properties each attribute has (which can prevent improper data manipulation, for instance). It will show how to initialize an object with class defaults and how to clone an existing object. Finally, Gene.pm will show you how to incorporate the documentation for a class right in the Perl code for the class.

Here is the code for the intermediate Gene2.pm module. Following the Gene2.pm module is an example of the code and output of a small test program that drives the module. Take a minute to look at these two code examples, especially at the comments. The module Gene2.pm contains several new details that will be discussed following the code. The test program should be fairly easy to read and understand.

 package Gene2; # # A second version of the Gene.pm module # use strict; use warnings; use Carp; # Class data and methods, that refer to the collection of all objects # in the class, not just one specific object {     my $_count = 0;     sub get_count {         $_count;     }     sub _incr_count {         ++$_count;     }     sub _decr_count {         --$_count;     } } # The constructor for the class sub new {     my ($class, %arg) = @_;     my $self = bless {         _name        => $arg{name}       croak("Error: no name"),         _organism    => $arg{organism}   croak("Error: no organism"),         _chromosome  => $arg{chromosome} "????",         _pdbref      => $arg{pdbref}     "????",     }, $class;     $class->_incr_count(  );     return $self; } # Accessors, for reading the values of data in an object sub get_name        { $_[0] -> {_name}       } sub get_organism    { $_[0] -> {_organism}   } sub get_chromosome  { $_[0] -> {_chromosome} } sub get_pdbref      { $_[0] -> {_pdbref}     } # Mutators, for writing the values of object data sub set_name {     my ($self, $name) = @_;     $self -> {_name} = $name if $name; } sub set_organism {     my ($self, $organism) = @_;     $self -> {_organism} = $organism if $organism; } sub set_chromosome {     my ($self, $chromosome) = @_;     $self -> {_chromosome} = $chromosome if $chromosome; } sub set_pdbref {     my ($self, $pdbref) = @_;     $self -> {_pdbref} = $pdbref if $pdbref; } 1; 

Here is the small test program testGene2 that demonstrates how to use the objects and methods in this version Gene2 of our OO class:

 #!/usr/bin/perl # # Test the second version of the Gene module # use strict; use warnings; # Change this line to show the folder where you store Gene2.pm use lib "/home/tisdall/MasteringPerlBio/development/lib"; use Gene2; # # Create object, print values # print "Object 1:\n\n"; my $obj1 = Gene2->new(         name          => "Aging",         organism      => "Homo sapiens",         chromosome    => "23",         pdbref        => "pdb9999.ent" );  print $obj1->get_name, "\n"; print $obj1->get_organism, "\n"; print $obj1->get_chromosome, "\n"; print $obj1->get_pdbref, "\n"; # # Create another object, print values ... some will be unset # print "\n\nObject 2:\n\n"; my $obj2 = Gene2->new(         organism    => "Homo sapiens",         name        => "Aging", );  print $obj2->get_name, "\n"; print $obj2->get_organism, "\n"; print $obj2->get_chromosome, "\n"; print $obj2->get_pdbref, "\n"; # # Reset some of the values, print them # $obj2->set_name("RapidAging"); $obj2->set_chromosome("22q"); $obj2->set_pdbref("pdf9876.ref"); print "\n\n"; print $obj2->get_name, "\n"; print $obj2->get_organism, "\n"; print $obj2->get_chromosome, "\n"; print $obj2->get_pdbref, "\n"; print "\nCount is ", Gene2->get_count, "\n\n"; # # Create another object, print values: but this fails # because the "name" value is required (see the "new" # constructor in Gene2.pm) # print "\n\nObject 3:\n\n"; my $obj3 = Gene2->new(         organism      => "Homo sapiens",         chromosome    => "23",         pdbref        => "pdb9999.ent" );  print "\nCount is ", Gene2->get_count, "\n\n"; 

Finally, here's the output from the test program testGene2 :

 Object 1: Aging Homo sapiens 23 pdb9999.ent Object 2: Aging Homo sapiens ???? ???? RapidAging Homo sapiens 22q pdf9876.ref Count is 2 Object 3: Error: no name at testGene2 line 68 

It's a good idea to take a moment to read through this Gene2.pm module, the test program testGene2 , and the output. Compare this new Gene2 module with the earlier Gene1 module. In particular, notice where the methods are defined in the module, and then how they are actually used in the test program. Don't get hung up on the details in this first reading; just look at the overall picture. Notice that the definitions are all in the module Gene2.pm , which is then loaded at the beginning of the test program testGene2 ; it is testGene2 that actually creates the module's objects and uses the module's methods on those objects. In other words, testGene2 is a program; Gene2.pm is a definition of a class that is used in testGene2 .

Let's begin examining the module code.

3.7.1 Closures

A closure keeps track of class data. Class data refers not to a particular object, but to several, possibly all, objects of a class that have been created during the running of your program. This is frequently important to do. For instance, say you have a DNA sequencing pipeline that can handle only 20 sequences at any one time. You'd want your controlling program to block any attempt to create more than 20 sequence objects until the pipeline is ready to receive more. To do this, you would keep a count of how many sequence objects your controlling program has created. Closures are a way to program such class data.

A closure is a subroutine that uses a variable defined outside the subroutine. By surrounding such a variable and some closures that use that variable within a block, you can use the closures to access the variable from anywhere in the program, and the variable will never go out of scope and lose its value. This section will explain how this works and how to use it in your code.

The following code is new in Gene2.pm :

 # Class data and methods, that refer to the collection of all objects # in the class, not just one specific object {     my $_count = 0;     sub get_count {         $_count;     }     sub _incr_count {         ++$_count;     }     sub _decr_count {         --$_count;     } } 

This code creates a variable $_count . $_count is a lexical my variable in a block of curly braces, and therefore is hidden from all parts of the code except within the block. The three methods that are also defined in the same block use the variable $_count .This variable persists throughout the life of the program because the subroutines defined with it are closures. For example, in the code for the class module Gene2.pm , I use $_count to keep a count of how many objects are in existence at any given time. Notice that the method names _incr_count and _decr_count begin with a leading underscore , as does the variable name $_count . They aren't meant to be called by the user of the class but are internal to the module. On the other hand, the remaining method get_count doesn't begin with a leading underscore and is meant to be called whenever the user of the class wants to know what the count is.

The previous section of code implements a closure. It is surrounded by curly braces creating a Perl block . You've seen many blocks associated with loops and conditionals as you learned the fundamentals of Perl. The block here stands on its own without being a part of another programming construct.

Any block, this one included, creates a new scope for the variables that occur within it. my variables (also called lexical variables ) within a block exist only while the program is executing the statements within that block. When a program leaves a block by passing beyond its closing curly brace , the my variables within it go out of scope. In other words, they cease to exist, and disappear from the program until the program reenters the block, and they are created anew.

The preceding paragraph is correct; however, there is one important "but."

Subroutine definitions don't go out of scope in the way that lexically scoped ( my ) variables do. It is also possible for a subroutine definition to affect the behavior of a lexically scoped variable. Aha. Read on.

To repeat: subroutine definitions aren't subject to the same constraints as variables in regards to my and blocks. In fact, a subroutine definition is global to the entire package in which it's declared. Perl looks for subroutine definitions at compile-time, before actually running the program, and makes a subroutine definition available to an entire package no matter where the subroutine is declared ”even if it's declared in a conditional block that's never reached during runtime ”when the program code is actually executed.

As an example, here is a small program with a subroutine definition:

 # # A program to demonstrate the global nature of subroutine definitions # my $dna = 'ACGT'; if ($dna eq 'ACGT') {         print "This statement gets executed\n";         print "Here's the subroutine call:\n";         isdna($dna); } else {         print "This statement does not get executed\n";         #         # The following subroutine definition is in a block which is         # never executed at runtime.         #         sub isdna {                 # Print the argument if it is DNA                 if($_[0] =~ /^[ACGT]+$/i) {                         print $_[0], "\n";                 else {                         return 0;                 }         } } 

This produces the following output:

 This statement gets executed Here's the subroutine call: ACGT 

As you see, even though the subroutine definition is buried in a block that's never entered, not even once, it is still available to the program. Perl scans the program at compile-time, reads in any subroutine definition no matter where it is, and the subroutine definition is then available to be called from anywhere in the program at runtime.

Continuing on, in the code from Gene2.pm under consideration, there's the variable definition:

 my $_count = 0; 

which occurs outside the following subroutine definitions such as:

 sub _incr_count {     ++$_count; } 

The variable $_count is declared outside the subroutine _ incr_count , but the subroutine uses the variable. Therefore, by definition, the subroutine _ incr_count is a closure.

There's just one more piece to the puzzle. Consider again the code fragment from Gene2.pm , which I repeat here:

 # Class data and methods, that refer to the collection of all objects # in the class, not just one specific object {     my $_count = 0;     sub get_count {         $_count;     }     sub _incr_count {         ++$_count;     }     sub _decr_count {         --$_count;     } } 

It seems that when the program leaves the block that encloses this code, the variable $_count should go out of scope and no longer be available to the program. However, in Gene2.pm the $_count variable doesn't cease to exist.

Because the subroutine definitions in this block are global, and because they also reference the variable $_count , Perl knows that at any point in the program you can put in a call to, say, get_count , which in turn needs the variable $_count to execute. Perl doesn't cause the variable $_count to cease to exist because it sees the closures and avoids destroying the variable they reference at runtime. At any point in the program, the value of $_count can be obtained by calling the subroutine. However, the value of $_count can't be accessed in any other way than by get_count or other closure defined within the same block.

To summarize, by defining a variable and a closure that uses that variable within a block, a program can limit access to that variable to calls by the closures. This is exactly what I want to do in setting up class methods that refer to the collection of all objects that are in use.

In Gene2.pm , I want to initialize the count of objects to 0 when the program starts and then increment it by one each time a new object is created. By defining _incr_count as a closure, I can call it from within the new object constructor, ensuring that the variable $_count will keep an accurate count of the number of objects that are created.

3.7.2 Tracking Class Data from the Constructor Method

In this second version of the class, I just have to make a small change to the constructor method, the subroutine new .

Here is the modified new method constructor:

 # The constructor for the class sub new {     my ($class, %arg) = @_;     my $self = bless {         _name        => $arg{name}       croak("Error: no name"),         _organism    => $arg{organism}   croak("Error: no organism"),         _chromosome  => $arg{chromosome} "????",         _pdbref      => $arg{pdbref}     "????",     }, $class;     $class->_incr_count(  );     return $self; } 

First, I create the object by bless ing (and initializing) an anonymous hash, as before. This time, however, I'll save the object as the local variable $self . This allows me to add a call to the class method _incr_count in order to keep track of the total number of objects created. I'll then return the object $self from the subroutine.

3.7.3 Accessor and Mutator Methods

In the first version of Gene1.pm , I printed the values stored in an object by accessing simple methods such as get_name .

In this new version of Gene2.pm , I have the same specific methods for each attribute for which I may want to see the value. I also include mutators , which are subroutines that enable the user of the class to alter the values of attributes of an object.

Here are the accessor and mutator methods for Gene2.pm :

 # Accessors, for reading the values of data in an object sub get_name        { $_[0] -> {_name}       } sub get_organism    { $_[0] -> {_organism}   } sub get_chromosome  { $_[0] -> {_chromosome} } sub get_pdbref      { $_[0] -> {_pdbref}     } # Mutators, for writing the values of object data sub set_name {     my ($self, $name) = @_;     $self -> {_name} = $name if $name; } sub set_organism {     my ($self, $organism) = @_;     $self -> {_organism} = $organism if $organism; } sub set_chromosome {     my ($self, $chromosome) = @_;     $self -> {_chromosome} = $chromosome if $chromosome; } sub set_pdbref {     my ($self, $pdbref) = @_;     $self -> {_pdbref} = $pdbref if $pdbref; } 

The mutators collect two arguments. The first is the reference to the object, which as before, is passed automatically to the method when it is invoked (using the method set_name as an example):

 $obj->set_name('hairy'); 

The second argument collected is then the first argument given to the call, in this case, setting the gene name to hairy .

The work of the subroutine is accomplished by the line:

 $self -> {_name} = $name if $name; 

It simply sets the internal _name attribute to the supplied name ( hairy in this example) if the argument $name is supplied. If it's not supplied, the subroutine does nothing.

Again, you see that the internal representation of the attributes of the object are hidden from the class's user. Altering an object's attributes is done with methods; the class author is then free to alter the way in which the attributes are stored, without changing the Application Programming Interface (API), the interface of the class to the outside world. If you use this class, you don't have to change your code when a new version of the class is written.

The test program testGene2 is similar to testGene1 , with the addition of examples of the class mutators.



Mastering Perl for Bioinformatics
Mastering Perl for Bioinformatics
ISBN: 0596003072
EAN: 2147483647
Year: 2003
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net