MAST (Motif Alignment and Search Tool) is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.
A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. Motifs are represented as position-dependent scoring matrices that describe the score of each possible letter at each position in the pattern. Individual motifs may not contain gaps. Patterns with variable-length gaps must be split into two or more separate motifs before being submitted as input to MAST.
MAST takes as input a file containing the descriptions of one or more motifs and searches a sequence database that you select for sequences that match the motifs. The motif file can be the output of the MEME motif discovery tool or any file in the appropriate format. For details, see Section 11.3 at the end of this chapter.
The following examples assume that file meme.results is the output of a MEME run containing at least 3 motifs and file SwissProt is a copy of the SWISS-PROT database on your local disk. DNA_DB is a copy of a DNA database on your local disk.
Annotate the training set:
Find sequences matching the motif and annotate them in the SWISS-PROT database:
mast meme.results -d SwissProt
Show sequences with weaker combined matches to motifs:
mast meme.results -d SwissProt -ev 200
Indicate weaker matches to single motifs in the annotation so that sequences with weak matches to the motifs (but perhaps with the "correct" order and spacing) can be seen:
mast meme.results -d SwissProt -w
Include a nominal order and spacing of the first three motifs in the calculation of the sequence p-values to increase the sensitivity of the search for matching sequences:
mast meme.results -d SwissProt -diag "9--61--62--91"
Use only the first and third motifs in the search:
mast meme.results -d SwissProt -m 1 -m 3
Use only the first two motifs in the search:
mast meme.results -d SwissProt -c 2
Search DNA sequences using protein motifs, adjusting p-values and E-values for each sequence by that sequence's composition:
mast meme.results -d DNA_DB -dna -comp
11.2.2 Command-Line Options
Usage for MAST is the following.
mast mfile optionalarguments ...
where mfile is a file containing motifs to use. This may be a MEME output file, or a file with the format described in the MAST manpage at http://meme.sdsc.edu/meme/website/meme-download.html.
Table 11-2 summarizes the command-line options for MAST.