megablast

megablast

megablast uses an algorithm for nucleotide sequence alignment searches and concatenates many queries to decrease the amount of time spent scanning the database.

An example of a megablast command-line entry:

megablast -d databasefilename -i queryfilename -D 2 -o outputfilename

The following table summarizes the MegaBLAST options.

Option

Definition

-b

Maximal number of reported alignments for a given database sequence. This option is meaningful only in conjunction with -D 2.

-e

The cutoff expectation value.

-f

Show full IDs in the output.

-p

Cutoff by percentage of identity.

-s

Minimal hit score to report. By default this value is set to W.

-v

Maximal number of database sequences to report alignments from. This option is meaningful only in conjunction with -D 2.

-D

Type of the MegaBLAST output:

0

Produce one-line output for each alignment, in the form subject-ID= =<[+-]query-ID> (s_off q_off s_ end q_end) score.

1

Show the same output as level 0, plus the endpoints and percentage of identical nucleotides for each ungapped segment in the alignment.

2

Show the traditional BLAST (blastn) output.

3

Show one-line output for each alignment, with the following fields tab-separated: Query ID, Subject ID, percent of identity, alignment length, number of mismatches (not including gaps), number of gap openings, start of alignment in query, end of alignment in query, start of alignment in subject, end of alignment in subject, expected value, bit score.

-F

Filtering. The available filters for nucleotide BLAST or MegaBLAST searches are:

D = Dust.
R = Human repeats.
V = Vector screen.
L = Low complexity (equivalent to D).

Finally, if letter "m" is included in the filter string, all types of filters are used to mask the query sequence regions only on the word finding stage and do not affect the extension stage.

-G, -E

Affine gapping penalties. The affine version of MegaBLAST requires significantly more memory, so it should be avoided if possible, especially when some of the query or database sequences are very long.

-J

Believe the query defline. The default is T (TRUE) for all types of output except -D 2. Note: If the sequence IDs in the FASTA file are not unique, this option must be set to F (FALSE).

-M

Maximal total length of queries to be concatenated for a single MegaBLAST search.

-O

ASN.1 seqalign file. It is only meaningful in conjunction with -D 2.

-P

Maximal number of positions for a hash value. This can be useful when running very long unmasked sequences.

-Q

Masked query output. The output is written to a file specified by the -Q option. It can be used only in onjunction with -D 2.

-U

Use lower case filtering of FASTA sequences. The default for this option is set to FALSE.

-W

Word size.

-X

X-dropoff value.

blastpgp

blastpgp performs gapped blastp searches and can be used to perform iterative searches in psi-blast and phi-blast mode.

An example of a blastpgp command-line entry:

blastpgp -i queryfilename -B alignmentfilename -j 2 -d databasefilename

The following table summarizes the blastpgp options.

Option

Definition

Type

Default

-a

Number of processors to use.

[Integer]

1

-b

Number of database sequence to show alignments for (B).

[Integer]

250

-c

Constant in pseudocounts for multipass version.

[Integer]

9

-d

Database.

[String]

nr

-e

Expectation value (E).

[Real]

10.0

-f

Threshold for extending hits.

[Integer]

0

-g

Gapped.

[T/F]

T

-h

e-value threshold for inclusion in multipass model.

[Real]

0.005

-i

Query File.

[File In]

stdin

-j

Maximum number of passes to use in multipass version.

[Integer]

1

-k

Hit File for PHI-BLAST.

[File In]

hit_file

-l

Restrict search of database to list of GIs.

[String]

 

-m

Alignment view options:

0 = Pairwise
1 = Query-anchored, showing identities.
2 = Query-anchored, no identities.
3 = Flat query-anchored, show identities.
4 = Flat query-anchored, no identities.
5 = qQuery-anchored no identities and blunt ends.
6 = Flat query-anchored, no identities and blunt ends.
7 = XML Blast output.
8 = Tabular output.

[Integer]

0

-o

Output file for alignment.

[File Out]

stdout

-p

Program option for PHI-BLAST.

[String]

blastpgp

-s

Compute locally optimal Smith-Waterman alignments.

[T/F]

F

-t

Tweak Lambda, K, and score matrix for each match.

[T/F]

T

-v

Number of database sequences to show one-line descriptions for (V).

[Integer]

500

-y

Dropoff (X) for blast extensions in bits (default if 0).

[Real]

7.0

-z

Effective length of the database (use 0 for the real size).

[Integer]

0

-A

Multiple hits window size (0 for single-hit algorithm).

[Integer]

40

-B

Input alignment file for PSI-BLAST restart.

[File In]

 

-C

Output file for PSI-BLAST checkpointing.

[File Out]

 

-E

Cost to extend a gap.

[Integer]

1

-F

Filter query sequence with SEG.

[String]

F

-G

Cost to open a gap.

[Integer]

11

-H

End of required region in query (-1 indicates end of query).

[Integer]

-1

-I

Show GIs in deflines.

[T/F]

F

-J

Believe the query defline.

[T/F]

F

-K

Number of best hits from a region to keep.

[Integer]

0

-L

Cost to decline alignment (disabled when 0).

[Integer]

0

-M

Matrix.

[String]

BLOSUM62

-N

Number of bits to trigger gapping.

[Real]

22.0

-O

SeqAlign file ("Believe the query defline" must be TRUE).

[File Out]

Optional

-P

0 = Multiple hits, 1-pass.
1 = Single hit, 1-pass.
2 = 2-pass.

[Integer]

0

-Q

Output file for PSI-BLAST matrix in ASCII.

[File Out]

 

-R

Input file for PSI-BLAST restart.

[File In]

 

-S

Start of required region in query.

[Integer]

1

-T

Produce HTML output.

[T/F]

F

-U

Use lowercase filtering of FASTA sequence.

[T/F]

F

-W

Word size, default if 0.

[Integer]

0

-X

X dropoff value for gapped alignment (in bits).

[Integer]

15

-Y

Effective length of the search space (use 0 for the real size).

[Real]

0

-Z

X dropoff value for final gapped alignment (in bits).

[Integer]

25