6.3 References

6.3 References

  • Gilbert, D. G. 1999. Readseq Version 2, an improved biosequence conversion tool, written in the Java language. Bionet.Software (August).

    Main page

    http://iubio.bio.indiana.edu/soft/molbio/readseq/

    README

    http://iubio.bio.indiana.edu/soft/molbio/readseq/Readme

    Download

    http://iubio.bio.indiana.edu/soft/molbio/readseq/classic/

    http://iubio.bio.indiana.edu/soft/molbio/readseq/java/

Chapter 7. BLAST

BLAST (Basic Local Alignment Search Tool) is probably the best-known program in sequence analysis. It compares two sequences by trying to align them, and is also used to lookup sequences in a database. The algorithm starts by looking for exact matches, then expands the aligned regions by allowing for mismatches. For details, see Section 7.1 at the end of this chapter.

This chapter contains a guide to the command-line options used in BLAST programs. The programs are listed in the order you might expect to use them. Each entry includes a brief program description, a command-line entry example, and a table summarizing any available options. We're using Version 2.2.5 of BLAST.

formatdb

formatdb is used to format protein or nucleotide source databases before these databases can be searched by blastall, blastpgp or MegaBLAST.

An example of a formatdb command-line entry:

formatdb -i fastafile -p F -oflat file T

The following table summarizes the formatdb options.

Option

Definition

Type

Default

-a

Input file is database in ASN.1 format (otherwise FASTA is expected).

[T/F]

F

-b

ASN.1 database in binary mode:

T = Binary.
F = Text mode.

[T/F]

F

-e

Input is a Seq-entry.

[T/F]

F

-i

Input file for formatting (this parameter must be set).

[File In]

 

-l

Logfile name.

[File Out]

formatdb.log

-n

Base name for BLAST files.

[String]

 

-o

Parse options:

T = True: Parse SeqId and create indexes.
F = False: Do not parse SeqId. Do not create indexes.

[T/F]

F

-p

Type of file:

T = Protein.
F = Nucleotide.

[T/F]

T

-s

Create indexes limited only to accessions - sparse.

[T/F]

F

-t

Title for database file.

[String]

 

-v

Number of sequence bases to be created in the volume.

[Integer]

0

-A

Create ASN.1 structured deflines.

[T/F]

F

-B

Binary GIfile produced from the GIfile.

This option should be used with the -F option.

[File Out]

 

-F

GIfile (file containing list of GIs).

[File In]

 

-L

Create an alias file with this name.

[File Out]

 

blastall

blastall allows use of all BLAST programs (blastn, blastp, blastx, tblastx, and tblastn). The following table summarizes the query, database sequence, and alignment types for the various BLAST commands.

Program

Query sequence type

Database sequence type

Alignment sequence type

blastn

nucleotide

nucleotide

nucleotide

blastp

protein

protein

protein

blastx

nucleotide

protein

protein

tblastn

protein

nucleotide

protein

tblastx

nucleotide

nucleotide

protein

An example of a blastall command-line entry:

blastall -p programname -d databasefilename -i queryfilename -o outputfilename

The following table summarizes the blastall options.

Option

Definition

Type

Default

-a

Number of processors to use.

[Integer]

1

-b

Number of database sequence to show alignments for (B).

[Integer]

250

-d

Database: multiple database names are bracketed by quotations, for example :

-d "db1 db2 db3".

[String]

nr

-e

Expectation value (E).

[Real]

10.0

-f

Threshold for extending hits, default if 0.

[Integer]

0

-g

Perfom gapped alignment (not available with tblastx).

[T/F]

T

-i

Query File.

[File In]

stdin

-l

Restrict search of database to list of GIs.

[String]

Optional

-m

Alignment view options:

0 = Tairwise.
1 = Query-anchored, showing identities.
2 = Query-anchored, no identities.
3 = Flat query-anchored, show identities.
4 = Flat query-anchored, no identities.
5 = Query-anchored, no identities and blunt ends.
6 = Flat query-anchored, no identities and blunt ends.
7 = XML Blast output.
8 = Tabular.

[Integer]

0

-n

MegaBLAST search.

[T/F]

F

-o

BLAST report Output File.

[File Out]

stdout

-p

Program Name.

[String]

 

-q

Penalty for a nucleotide mismatch (blastn only).

[Integer]

-3

-r

Reward for a nucleotide match (blastn only).

[Integer]

1

-v

Number of database sequences to show one-line descriptions for (V).

[Integer]

500

-y

Dropoff (X) for blast extensions in bits (0.0 invokes default behavior).

[Real]

0.0

-z

Effective length of the database (use 0 for the real size).

[Real]

0

-A

Multiple Hits window size (0 for single-hit algorithm).

[Integer]

40

-D

DB Genetic code (for tblast[nx] only).

[Integer]

1

-E

Cost to extend a gap (0 invokes default behavior).

[Integer]

0

-F

Filter query sequence (DUST with blastn, SEG with others).

[String]

T

-G

Cost to open a gap (0 invokes default behavior).

[Integer]

0

-I

Show GIs in deflines.

[T/F]

F

-J

Believe the query defline.

[T/F]

F

-K

Number of best hits from a region to keep (off by default; if used, a value of 100 is recommended).

[Integer]

0

-L

Location on query sequence.

[String]

 

-M

Matrix.

[String]

BLOSUM62

-O

SeqAlign file.

[File Out]

 

-P

0 = Multiple hits, 1-pass.
1 = Single hit, 1-pass.
2 = 2-pass.

[Integer]

0

-Q

Query Genetic code to use.

[Integer]

1

-R

PSI-TBLASTN checkpoint file.

[File In]

 

-S

Query strands to search against database (for blast[nx], and tblastx). 3 is both, 1 is top, 2 is bottom.

[Integer]

3

-T

Produce HTML output.

[T/F]

F

-U

Use lower case filtering of FASTA sequence.

[T/F]

F

-W

Word size, default if 0.

[Integer]

0

-X

X dropoff value for gapped alignment (in bits) (0 invokes default behavior).

[Integer]

0

-Y

Effective length of the search space (use 0 for the real size).

[Real]

0

-Z

X dropoff value for final gapped alignment (in bits).

[Integer]

0