The time has now come to write the class that creates an object out of a sequence, enzyme name (s), and the map of the location(s) of the enzyme recognition sites in the sequence. This module depends a great deal on the module Rebase developed in the previous section, but it is fairly short because it just tries to do a small job. This new Restriction class takes a Rebase object (which has the Rebase database translated to regular expressions), some sequence, and a list of enzymes, and uses the regular expressions to find the recognition sites in the sequence. (Note that it doesn't use inheritance; it simply creates a Rebase object to use.) In this module, the restriction map (the list of locations where the enzymes have recognition sites in the sequence) is obtained through an auxiliary method map_enzyme that simply lists the locations. Clearly, a more graphical display would be easier and more useful. I'll consider that as the book progresses. 5.3.1 The Restriction.pm ModuleHere is the Restriction.pm module: package Restriction; # # A class to find locations of restriction enzyme recognition sites in # DNA sequence data. # use strict; use warnings; use Carp; # Class data and methods { # A list of all attributes with default values. # "enzyme" is given as an argument possibly multiple time, set as key to _map hash my %_attributes = ( _rebase => { }, # A Rebase.pm hash-based object # key = restriction enzyme name # value = space-separated string of recognition sites => regular expressions _sequence => '', # DNA sequence data in raw format (only bases) _map => { },# a hash: keys are enzyme names, # values are arrays of locations _enzyme => '', # space- or comma-separated enzyme names, # set as key to _map hash ); # Global variable to keep count of existing objects my $_count = 0; # Return a list of all attributes sub _all_attributes { keys %_attributes; } # Manage the count of existing objects sub get_count { $_count; } sub _incr_count { ++$_count; } sub _decr_count { --$_count; } } This opening block shows that Restriction is also a class with a small and simple set of attributes. One is just the DNA sequence data, _sequence . The attribute _map is a hash that stores the computed restriction map with a key for each restriction enzyme and a value consisting of an array of the positions in the sequence where recognition sites for that enzyme occur. The attribute _enzyme is one or more enzyme names separated by spaces or commas. The remaining attribute _rebase is an object of the class I developed in the last section, the class Rebase . Recall that a Rebase object gives you the ability to get regular expressions for recognition sites of restriction enzymes. When you call Restriction->new you must include as one of its arguments a Rebase object, as is shown in the example given in the POD documentation later in this section. Since a program can have many restriction maps, I also maintain the count of Restriction objects. 5.3.1.1 Initializing Restriction objectsNext , the new constructor method creates and initializes Restriction objects: # The "new" constructor method, called from class, e.g. sub new { my ($class, %args) = @_; # Create a new object my $self = bless { }, $class; # Set the attributes for the provided arguments foreach my $attribute ($self->_all_attributes( )) { # E.g. attribute = "_name", argument = "name" my($argument) = ($attribute =~ /^_(.*)/); if (exists $args{$argument}) { if($argument eq 'enzyme') { # permit space or comma separated enzyme names $args{$argument} =~ s/,/ /g; } $self->{$attribute} = $args{$argument}; } } # Check that the correct arguments are given if( not defined $self->{_rebase} ) { croak "A Rebase object must be given as an argument"; }elsif( ref($self->{_rebase}) ne 'Rebase' ) { croak "The argument to rebase is not a Rebase object"; }elsif( not defined $self->{_sequence} ) { croak "A sequence must be given as an argument"; } # Calculate the locations for each enzyme, store in _map hash attribute foreach my $enzyme (split(" ", $self->{_enzyme})) { $self->map_enzyme($enzyme); } $self->_incr_count; return $self; } # For this simple class I have no AUTOLOAD or DESTROY # No get_rebase method, I don't want to pass around a huge hash # No set mutators: all initialization done by way of "new" constructor # No clone method. Each sequence and set of enzymes can be easily calculated # by means of a "new" command. 5.3.1.2 The methods explainedThe new constructor method checks that the proper and required arguments are provided: a sequence, enzyme(s), and a Rebase object. Then the new constructor performs the required computation by calling the method map_enzyme for each enzyme to determine the (possibly empty) array of locations in which the enzyme can cut the sequence. These enzyme-array key/value pairs are stored in the _map attribute. As you can see in the next segment of code from Restriction.pm , the method map_enzyme works by getting the regular expressions that have been calculated for the enzyme and then finding the positions where the regular expressions match the sequence. The get_regular_expressions method accesses the Rebase object that was passed to the Restriction->new constructor method and appears as the $self->{_rebase} attribute. In that object, the attribute _rebase is the hash of the database, which looks for the value of the key $enzyme . The match_positions method puts a match of a regular expression in a while loop, where it loops once for each location it finds. The offset of the matching sequence is available in the special variable $-[0] after a successful regular-expression match (see the perlvar section of the Perl documentation for more details). sub map_enzyme { my($self, $enzyme) = @_; my(@positions) = ( ); my(@res) = $self->get_regular_expressions($enzyme); foreach my $re (@res) { push @positions, $self->match_positions($re); } @{$self->{_map}{$enzyme}} = @positions; return @positions; } sub get_regular_expressions { my($self, $enzyme) = @_; my(%sites) = split(' ', $self->{_rebase}{_rebase}{$enzyme}); # May have duplicate values return values %sites; } # Find positions of a regular expression in the sequence sub match_positions { my($self, $regexp) = @_; my @positions = ( ); # Determine positions of regular expression matches while ( $self->{_sequence} =~ /$regexp/ig ) { push @positions, ($-[0] + 1 ); } return(@positions); } Finally, here are some helper methods that report on the attributes of sequence and of the calculated map for a given enzyme (or the entire hash that contains the map of all the enzymes). I didn't use AUTOLOAD here because there are only a handful of attributes that return a few different data types: array, scalar string, and hash. sub get_enzyme_map { my($self, $enzyme) = @_; @{$self->{_map}{$enzyme}}; } sub get_enzyme_names { my($self) = @_; keys %{$self->{_map}}; } sub get_sequence { my($self) = @_; $self->{_sequence}; } sub get_map { my($self) = @_; %{$self->{_map}}; } 5.3.1.3 DocumentationHere's the (bare bones) documentation for the class embedded right in the module using the POD documentation language: =head1 Restriction Restriction: Given a Rebase object, sequence, and list of restriction enzyme names, return the locations of the recognition sites in the sequence =head1 Synopsis use Restriction; use Rebase; use strict; use warnings; my $rebase = Rebase->new( dbmfile => 'BIONET', bionetfile => 'bionet.212' ); my $restrict = Restriction->new( rebase => $rebase, enzyme => 'EcoRI, HindIII', sequence => 'ACGAATTCCGGAATTCG', ); print "Locations for EcoRI are ", join(' ', $restrict->get_enzyme_map('EcoRI')), "\n"; =head1 AUTHOR James Tisdall =head1 COPYRIGHT Copyright (c) 2003, James Tisdall =cut 1; Saving the Synopsis example in a file and running it (making sure the required bionet.212 file is in place) gives the output: EcoRI data in Rebase is GAATTC GAATTC Sequence is ACGAATTCCGGAATTCG Locations for EcoRI are 3 11 |