Flylib.com

Books Software

 
 
 

1.10 Exercises


1.10 Exercises

Exercise 1.1

What are the problems that might arise when dividing program code into separate module files?

Exercise 1.2

What are the differences between libraries, modules, packages, and namespaces?

Exercise 1.3

Write a module that finds modules on your computer.

Exercise 1.4

Where do the standard Perl distribution modules live on your computer?

Exercise 1.5

Research how Perl manages its namespaces.

Exercise 1.6

When might it be necessary to export names from a module? When might it be useful? When might it be convenient ? When might it be a very bad idea?

Exercise 1.7

The program testGeneticcode contains the following loop:

# Translate each three-base codon to an amino acid, and append to a protein 
for(my $i=0; $i < (length($dna) - 2) ; $i += 3) {
        $protein .= Geneticcode::codon2aa( substr($dna,$i,3) );
}

Here's another way to accomplish that loop:

# Translate each three-base codon to an amino acid, and append to a protein 
my $i=0;
while (my $codon = substr($dna, $i += 3, 3) ) {
        $protein .= Geneticcode::codon2aa( $codon );
}

Compare the two methods . Which is easier to understand? Which is easier to maintain? Which is faster? Why?

Exercise 1.8

The subroutine codon2aa causes the entire program to halt when it encounters a "bad" codon in the data. Often (usually) it is best for a subroutine to return some indication that it encountered a problem and let the calling program decide how to handle it. It makes the subroutine more generally useful if it isn't always halting the program (although that is what you want to do sometimes).

Rewrite codon2aa and the calling program testGeneticcode so that the subroutine returns some error ”perhaps the value undef ”and the calling program checks for that error and performs some action.

Exercise 1.9

Write a separate module for each of the following: reading a file, extracting FASTA sequence data, and printing sequence data to the screen.

Exercise 1.10

Download, install, and use a module from CPAN.


Chapter 2. Data Structures and String Algorithms

So far in this book, I've used the standard Perl data structures of scalars, arrays, and hashes. However, it is often necessary to handle data with a more complex structure than what those basics allow. For instance, it is frequently useful to have a two-dimensional array.

In this chapter, you'll learn how to define and use references and complex data structures . After you learn the fundamentals, you'll apply the new techniques to implement a biologically important algorithm. These techniques are also fundamental to the implementation of object-oriented programming, as you'll see in Chapter 3.

The algorithm we'll study is called approximate string matching . It lets you find the closest match for a peptide fragment in a protein, for instance. It uses an algorithmic technique called dynamic programming , an essential tool for many similar biological tasks , such as aligning biological sequences. In this chapter, you'll see how Perl references can be used to write programs for data problems with more complex relationships. References are also used for the objects of object-oriented programming.


2.1 Basic Perl Data Types

Before tackling references, let's review the basic Perl data types:

Scalar

A scalar value is a string or any one of several kinds of numbers such as integers, floating-point (decimal) numbers, or numbers in scientific notation such as 2.3E23. A scalar variable begins with the dollar sign $ , as in $dna .

Array

An array is an ordered collection of scalar values. An array variable begins with an at sign @ , as in @peptides . An array can be initialized by a list such as @peptides = ('zeroth' , ' first' , ' second') . Individual scalar elements of an array are referred to by first preceding the array name with a dollar sign (an individual element of an array is a scalar value) and then following the array name with the position of the desired element in square brackets. Thus the first element of the @peptides array is referenced by $peptides[0] and has the value ' zeroth '. (Note that array elements are given the positions 0, 1, 2, ..., n -1, where n is the number of elements in the array.)

Recall that printing an array within double quotes causes the elements to be separated by spaces; without the double quotes, the elements are printed one after the other without separations. This snippet:

@pentamers = ('cggca', 'tgatc', 'ttggc');

print "@pentamers", "\n";
print @pentamers, "\n";

produces the output:

cggca tgatc ttggc
cggcatgatcttggc
Hash

A hash is an unordered collection of key value pairs of scalar values. Each scalar key is associated with a scalar value. A hash variable begins with the percent sign % , as in %geneticmarkers . A hash can be initialized like an array, except that each pair of scalars are taken as a key with its value, as in:

The => symbol is just a synonym for a comma that makes it easier to see the key/value pairs in such lists. [1] An individual scalar value is retrieved by preceding the hash name with a dollar sign (an individual value is a scalar value) and following the hash name with the key in curly braces, as in $geneticmarkers{'hairless'} , which, because of how it's initialized, has the value ' no '.

[1] It also forces the left side to be interpreted as a string.