5.6 Exercises | Mastering Perl for Bioinformatics

Exercise 5.1

Why use the object-oriented approach for the interface to the Rebase database at all? What are the benefits and detriments of going to the object-oriented style?

Exercise 5.2

The Restriction.pm module uses another module in a new way. Instead of inheriting the Rebase.pm class, it requires that a Rebase object be passed to the constructor Restriction->new to become one of the attributes of the Restriction object.

Consider alternative ways to write this code. Can Restriction inherit Rebase and achieve the same functionality? If so, write the code. Or, can the same functionality be achieved by some method that avoids having a Rebase object passed as an argument to a Restriction object? If so, write the code.

Exercise 5.3

Go to CPAN and read the documentation about the MLDBM module. It allows you to use a DBM file to store and retrieve complex data. Rewrite the Rebase.pm module to use MLDBM and replace my use of space-separated strings of recognition sites and regular expressions.

Exercise 5.4

As discussed in the text, there are some interesting considerations involved in parsing the data that relates to how the restriction enzymes actually work, such as handling reverse complements of recognition sites and cut sites. The logic used here to handle reverse complements might not be ideal for all situations. Review carefully the logic of the parse_rebase subroutine. Can you find any problems its logic might cause when you try to use the software to support a particular experiment?

Exercise 5.5

It would be nice to be able to ask some method in Restriction.pm if a particular restriction enzyme produces sticky ends at its cut site. It would also be useful to know what other enzymes create sticky ends that will anneal with the sticky ends of this enzyme. Check to see if this information appears in any of the datafiles of the Rebase database. Can you design a method that returns this information, given the name of a restriction enzyme ? What changes do you have to make to your database; do you need any more datafiles from the Rebase distribution?

Exercise 5.6

Describe in detail how the logic for map_enzyme works. Can you devise a different way to accomplish the same thing?

Exercise 5.7

The code in this chapter uses the class Restriction as a base class for the class Restrictionmap which lets you make a graphic display of the restriction map. Would it be a better idea just to add the graphics capabilities to the Restriction class instead of inheriting it into a new class? Rewrite Restriction to add the graphics capability to it. What are the pros and cons of these two different ways of writing and organizing the code?

Exercise 5.8

In the method _formatrestrictionmap , some lines of code are commented out that shorten the output by not printing extra blank lines. Try it out both ways. (And may God have mercy on your souls.) Do you think it makes the output less lengthy at the expense of making it more difficult to read? What is the tradeoff here? Do you prefer the longer or shorter version? Defend your preference.

Exercise 5.9

Add position numbers to the output of Restrictionmap . Add the position of the first base in each line or the position of each restriction enzyme.

Exercise 5.10

The _drawmap_text method of the Restrictionmap class is a bit lengthy and involved. See if you can improve the method. Either alter the code in the book or start from scratch. Improve it by making it faster, simpler, or easier to read. Try making its output better or add options to make the output more flexible. Try any combination of the above.

Exercise 5.11

String copying is a great way to slow down a program. Consider the code I gave for the following subroutine:

 sub complementIUB {     my($seq) = @_;     (my $com = $seq) =~ tr [ACGTRYMKSWBDHVNacgtrymkswbdhvn]                               [TGCAYRKMWSVHDBNtgcayrkmwsvhdbn];     return $com; }

Explain why the subroutine is written in this somewhat slow way. Now, rewrite this subroutine to eliminate a string copy. (Extra challenge: there are actually two string copies here. Rewrite the subroutine another way to eliminate a string copy. Can you eliminate both string copies? Why or why not?) Also, what's with those square brackets around the arguments to the tr function?

Exercise 5.12

Consider the following lines from the subroutine IUB_to_regexp :

 # Remove the ^ signs from the recognition sites $iub =~ s/\^//g;

This operation is redundant because the caret ^ was removed from the recognition site in the subroutine parse_rebase . Why is it included here?

Exercise 5.13

Consider the following last two lines from the subroutine map_enzyme in Restriction.pm :

 @{$self->{_map}{$enzyme}} = @positions; return @positions;

How does the subroutine behave differently if the first line is changed to:

 $self->{_map}{$enzyme} = \@positions;

Why does the subroutine return the array @positions since the return value isn't used in any of the code and the positions are saved in the object anyway?

Exercise 5.14

There is a difference in behavior and readability between the looping constructs for(;;) and for( ) or its synonym foreach( ) . Try writing some small test programs that use these different loops and time them using the Perl modules Benchmark or Devel::DProf . Clearly, for and foreach are most useful when iterating through arrays, and for(;;) is most useful when iterating through numbers. However, there are places in the code presented in this chapter in which for(;;) iterates through an array using a scalar variable as a subscript counter (as $i in $array[$i] .) Try finding and rewriting such loops using foreach ; benchmark the two versions.