8.2 References

8.2 References

  • Kent, W. James. 2002. BLAT—The BLAST-Like Alignment Tool. Genome Research 12 (4):656-664.

    Main page

    http://genome.ucsc.edu/cgi-bin/hgBlat?command=start

    User guide

    http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html

    Download

    http://www.soe.ucsc.edu/~kent/exe/

Chapter 9. ClustalW

ClustalW is a general-purpose multiple sequence alignment program for nucleotide sequences or proteins. The alignments can be either global (whole sequences) or local (restricted to subsequences). ClustalW calculates the best match for the selected sequences, and lines them up so that the identities, similarities, and differences can be seen. For details see Section 9.2 at the end of this chapter. We're using Version 1.82 of ClustalW.

An example of a ClustalW command-line entry:

clustalw -infile=file.txt -align

where file.txt contains the FASTA-formatted sequences.

9.1 Command-Line Options

The ClustalW options are summarized in Tables Table 9-1 through Table 9-10.

Table 9-1. ClustalW verb options

Option

Definition

-align

Do full multiple alignment.

-bootstrap(=n)

Bootstrap a NJ tree (n= number of bootstraps; def. = 1000).

-convert

Output the input sequences in a different file format.

-help or -check

Outline the command-line parameters

-options

List the command-line parameters.

-tree

Calculate NJ tree.

Table 9-2. ClustalW data options

Option

Definition

-infile=file.ext

Input sequences.

-profile1=file.ext

Profiles.

-profile2=file.ext

Profiles (old alignment).

Table 9-3. ClustalW parameters—general settings

Option

Definition

-case

LOWER or UPPER (for GDE output only).

-interactive

Read command line, then enter normal interactive menus.

-negative

Protein alignment with negative values in matrix.

-outfile=

Sequence alignment file name.

-output=

GCG, GDE, PHYLIP, or PIR.

-outorder=

INPUT or ALIGNED.

-quicktree

Use FAST algorithm for the alignment guide tree.

-seqnos=

OFF or ON (for ClustalW output only).

Table 9-4. ClustalW parameters—fast pairwise alignments

Option

Definition

-ktuple=n

Word size.

-pairgap=n

Gap penalty.

-score

PERCENT or ABSOLUTE.

-topdiags=n

Number of best diags.

-window=n

Window around best diags.

Table 9-5. ClustalW parameters—slow pairwise alignments

Option

Definition

-pwdnamatrix=

DNA weight matrix=IUB, ClustalW, or filename.

-pwgapopen=f

Gap opening penalty.

-pwgapext=f

Gap extension penalty.

-pwmatrix=

Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename.

Table 9-6. ClustalW parameters—multiple alignments

Option

Definition

-dnamatrix=

DNA weight matrix=IUB, ClustalW, or filename.

-endgaps

No end gap separation penalty.

-gapdist=n

Gap separation penalty range.

-gapext=f

Gap extension penalty.

-gapopen=f

Gap opening penalty.

-hgapresidues=

List hydrophilic residue.

-matrix=

Protein weight matrix=BLOSUM, PAM, GONNET, ID, or filename.

-maxdiv=n

Percentage identity for delay.

-newtree=

File for new guide tree.

-nohgap

Hydrophilic gaps off.

-nopgap

Residue-specific gaps off.

-transweight

Transitions weighted.

-type=

PROTEIN or DNA.

-usetree=

File for old guide tree.

Table 9-7. ClustalW parameters—profile alignments

Option

Definition

-newtree1=

File for new guide tree for profile1.

-newtree2=

File for new guide tree for profile2.

-profile

Merge two alignments by profile alignment.

-usetree1=

File for old guide tree for profile1.

-usetree2=

File for old guide tree for profile2.

Table 9-8. ClustalW parameters—sequence to profile alignments

Option

Definition

-newtree=

File for new guide tree.

-sequences

Sequentially add profile2 sequences to profile1 alignment.

-usetree=

File for old guide tree.

Table 9-9. ClustalW parameters—structure alignments

Option

Definition

-helixendin=n

Number of residues inside helix to be treated as terminal.

-helixgap=n

Gap penalty for helix core residues.

-helixendout=n

Number of residues outside helix to be treated as terminal.

-loopgap=n

Gap penalty for loop regions.

-nosecstr1

Do not use secondary structure-gap penalty mask for profile 1.

-nosecstr2

Do not use secondary structure-gap penalty mask for profile 2.

-secstrout=

STRUCTURE or MASK or BOTH or NONE output in alignment file.

-strandgap=n

Gap penalty for strand core residues.

-strandendin=n

Number of residues inside strand to be treated as terminal.

-strandendout=n

Number of residues outside strand to be treated as terminal.

-terminalgap=n

Gap penalty for structure termini.

Table 9-10. ClustalW parameters—trees

Option

Definition

-kimura

Use Kimura's correction.

-outputtree=

nj OR phylip OR dist.

-seed=n

Seed number for bootstraps.

-tossgaps

Ignore positions with gaps