11.2 MAST

11.2 MAST

MAST (Motif Alignment and Search Tool) is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.

A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. Motifs are represented as position-dependent scoring matrices that describe the score of each possible letter at each position in the pattern. Individual motifs may not contain gaps. Patterns with variable-length gaps must be split into two or more separate motifs before being submitted as input to MAST.

MAST takes as input a file containing the descriptions of one or more motifs and searches a sequence database that you select for sequences that match the motifs. The motif file can be the output of the MEME motif discovery tool or any file in the appropriate format. For details, see Section 11.3 at the end of this chapter.

11.2.1 Examples

The following examples assume that file meme.results is the output of a MEME run containing at least 3 motifs and file SwissProt is a copy of the SWISS-PROT database on your local disk. DNA_DB is a copy of a DNA database on your local disk.

Annotate the training set:

mast meme.results

Find sequences matching the motif and annotate them in the SWISS-PROT database:

mast meme.results -d SwissProt

Show sequences with weaker combined matches to motifs:

mast meme.results -d SwissProt -ev 200

Indicate weaker matches to single motifs in the annotation so that sequences with weak matches to the motifs (but perhaps with the "correct" order and spacing) can be seen:

mast meme.results -d SwissProt -w

Include a nominal order and spacing of the first three motifs in the calculation of the sequence p-values to increase the sensitivity of the search for matching sequences:

mast meme.results -d SwissProt -diag "9-[2]-61-[1]-62-[3]-91"

Use only the first and third motifs in the search:

mast meme.results -d SwissProt -m 1 -m 3

Use only the first two motifs in the search:

mast meme.results -d SwissProt -c 2

Search DNA sequences using protein motifs, adjusting p-values and E-values for each sequence by that sequence's composition:

mast meme.results -d DNA_DB -dna -comp

11.2.2 Command-Line Options

Usage for MAST is the following.

mast mfile optionalarguments ...

where mfile is a file containing motifs to use. This may be a MEME output file, or a file with the format described in the MAST manpage at http://meme.sdsc.edu/meme/website/meme-download.html.

Table 11-2 summarizes the command-line options for MAST.

Table 11-2. MAST options

Option

Definition

mfile

File containing motifs to use; may be a MEME output file or a file with a supported format.

[database]

Database containing motifs to use.

[-d database]

Database to search with motifs.

[-stdin]

Read database from standard input; default reads database specified inside mfile.

[-c count]

Only use the first count motifs.

[-a alphabet]

mfile is assumed to contain motifs in the format output by bin/make_logodds and alphabet is their alphabet; -d database or -stdin must be specified when this option is used.

[-stdout]

Print output to standard output instead of a file.

[-text]

Output in text (ASCII) format; default is hypertext (HTML) format.

[-sep]

Score reverse complement DNA strand as a separate sequence.

[-norc]

Do not score reverse complement DNA strand.

[-dna]

Translate DNA sequences to protein.

[-comp]

Adjust p-values and E-values for sequence composition.

[-rank rank]

Print results starting with rank best; default is1.

[-smax smax]

Print results for no more than smax sequences; default is all.

[-ev ev]

Print results for sequences with E-value ev; default is 10.

[-mt mt]

Show motif matches with p-value mt ; default is 0.0001.

[-w]

Show weak matches (mt<p-value<mt*10) in angle brackets.

[-bfile bfile]

Read background frequencies from bfile.

[-seqp]

Use SEQUENCE p-values for motif thresholds (default: use POSITION p-values).

[-mf mf]

Print mf as motif file name.

[-df df]

Print df as database name.

[-minseqs minseqs]

Lower bound on number of sequences in db.

[-mev mev]+

Use only motifs with E-values less than mev.

[-m m]+

Use only motif(s) number m (overrides -mev).

[-diag diag]

Nominal order and spacing of motifs.

[-best]

Include only the best motif in diagrams.

[-remcorr]

Remove highly correlated motifs from query.

[-brief]

Brief output—do not print documentation.

[-b]

Print only sections I and II.

[-nostatus]

Do not print progress report.