megablast | Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases

megablast

megablast uses an algorithm for nucleotide sequence alignment searches and concatenates many queries to decrease the amount of time spent scanning the database.

An example of a megablast command-line entry:

megablast -d databasefilename -i queryfilename -D 2 -o outputfilename

The following table summarizes the MegaBLAST options.

Option	Definition
-b	Maximal number of reported alignments for a given database sequence. This option is meaningful only in conjunction with -D 2.
-e	The cutoff expectation value.
-f	Show full IDs in the output.
-p	Cutoff by percentage of identity.
-s	Minimal hit score to report. By default this value is set to W.
-v	Maximal number of database sequences to report alignments from. This option is meaningful only in conjunction with -D 2.
-D	Type of the MegaBLAST output: 0 Produce one-line output for each alignment, in the form subject-ID= =<[+-]query-ID> (s_off q_off s_ end q_end) score. 1 Show the same output as level 0, plus the endpoints and percentage of identical nucleotides for each ungapped segment in the alignment. 2 Show the traditional BLAST (blastn) output. 3 Show one-line output for each alignment, with the following fields tab-separated: Query ID, Subject ID, percent of identity, alignment length, number of mismatches (not including gaps), number of gap openings, start of alignment in query, end of alignment in query, start of alignment in subject, end of alignment in subject, expected value, bit score.
-F	Filtering. The available filters for nucleotide BLAST or MegaBLAST searches are: D = Dust. R = Human repeats. V = Vector screen. L = Low complexity (equivalent to D). Finally, if letter "m" is included in the filter string, all types of filters are used to mask the query sequence regions only on the word finding stage and do not affect the extension stage.
-G, -E	Affine gapping penalties. The affine version of MegaBLAST requires significantly more memory, so it should be avoided if possible, especially when some of the query or database sequences are very long.
-J	Believe the query defline. The default is T (TRUE) for all types of output except -D 2. Note: If the sequence IDs in the FASTA file are not unique, this option must be set to F (FALSE).
-M	Maximal total length of queries to be concatenated for a single MegaBLAST search.
-O	ASN.1 seqalign file. It is only meaningful in conjunction with -D 2.
-P	Maximal number of positions for a hash value. This can be useful when running very long unmasked sequences.
-Q	Masked query output. The output is written to a file specified by the -Q option. It can be used only in onjunction with -D 2.
-U	Use lower case filtering of FASTA sequences. The default for this option is set to FALSE.
-W	Word size.
-X	X-dropoff value.