showseq | Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases

showseq

showseq displays a protein or a nucleic acid sequence in a style suitable for publication.

Here is a sample session with showseq. By default, the output appears on standard output (the terminal) but can be saved to a file. We only look at a small section of the sequence to save space:

% showseq tembl:eclac -sbeg 1 -send 100 Display a sequence with features, translation etc.. Things to display          0 : Enter your own list of things to display          1 : Sequence only          2 : Default sequence with features          3 : Pretty sequence          4 : One frame translation          5 : Three frame translations          6 : Six frame translations          7 : Restriction enzyme map          8 : Baroque Display format [2]:  Output file [eclac.showseq]: stdout ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes.              10        20        30        40        50         60 ----:----|----:----|----:----|----:----|----:----|----:----| gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagagagt                | variation note="c in wild-type; t in 'up' promoter mutant I-Q [11]"                                                  |=  =  =  =  =  =  =   mRNA note="lacI (repressor) mRNA; preferred in vivo 3' end [12],[29]"              70        80        90        100       110       120        ----:----|----:----|----:----|----:----|----:----|----:----| caattcagggtggtgaatgtgaaaccagtaacgttatacgatgtcgcagagtatgccggt =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =   mRNA note="lacI (repressor) mRNA; preferred in vivo 3' end [12],[29]"                    |======================================== CDS codon_start="1" db_xref="SWISS-PROT:P03023" note="lac repressor p

Note that although we asked for the sequence display to end at position "100", it has displayed the sequence up to the end of the line - position "120". This is a feature of this program to make the display of things like restriction enzyme cutting sites easier.

The standard list of output formats are only a small selection of the possible ways in which a sequence might be displayed. Precise control over the output format is achieved by selecting the qualifier -format 0 (Option 0 in the list of things to display). For example:

% showseq tembl:eclac -sbeg 1 -send 120 Display a sequence with features, translation etc.. Output file [stdout]: Things to display          0 : Enter your own list of things to display          1 : Sequence only          2 : Default sequence with features          3 : Pretty sequence          4 : One frame translation          5 : Three frame translations          6 : Six frame translations          7 : Restriction enzyme map          8 : Baroque Display format [2]: 0 Specify your own things to display          S : Sequence          B : Blank line          1 : Frame1 translation          2 : Frame2 translation          3 : Frame3 translation         -1 : CompFrame1 translation         -2 : CompFrame2 translation         -3 : CompFrame3 translation          T : Ticks line          N : Number ticks line          C : Complement sequence          F : Features          R : Restriction enzyme cut sites in forward sense         -R : Restriction enzyme cut sites in reverse sense          A : Annotation Enter a list of things to display [B N T S A F]: b,s,t,c Output file [eclac.showseq]: stdout ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes.                      gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagagagt           ----:----|----:----|----:----|----:----|----:----|----:----|           ctgtggtagcttaccgcgttttggaaagcgccataccgtactatcgcgggccttctctca               caattcagggtggtgaatgtgaaaccagtaacgttatacgatgtcgcagagtatgccggt           ----:----|----:----|----:----|----:----|----:----|----:----|           gttaagtcccaccacttacactttggtcattgcaatatgctacagcgtctcatacggcca

By choosing format "0" and specifying that we want to display the things: "b,s,t,c", we will output the sequence in the following way.

For every new line that the sequence starts to write, the output display will contain first a blank line ("b"), then the sequence itself ("s"), a line of with ticks every 10 characters ("t"), and the reverse complement of the sequence ("c'"). Subsequent lines of the sequence output will repeat this format.

The "thing" codes used in the list of standard formats are:

Sequence only:	S A
Default sequence:	B N T S A F
Pretty sequence:	B N T S A
One frame translation:	B N T S B 1 A F
Three frame translations:	B N T S B 1 2 3 A F
Six frame translations:	B N T S B 1 2 3 T -3 -2 -1 A F
Restriction enzyme map:	B R S N T C -R B 1 2 3 T -3 -2 -1 A
Baroque:	B 1 2 3 N T R S T C -R T -3 -2 -1 A F

The following are some examples of different formats.

Just sequence:

% showseq embl:eclac stdout -sbeg 1 -send 120 -noname -nodesc -format 0 -thing S Display a sequence with features, translation etc..           gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagagagt           caattcagggtggtgaatgtgaaaccagtaacgttatacgatgtcgcagagtatgccggt

Protein sequence displayed in three-letter codes. The codes are displayed downwards, so the first code is "Met":

% showseq tsw:rs24_fugru stdout -three -format 2 RS24_FUGRU 40S RIBOSOMAL PROTEIN S24.                                 10        20        30        40        50        60           ----:----|----:----|----:----|----:----|----:----|----:----|           MAATVTVATALPMTAALLGALGMVVAVLHPGLATVPLTGIAGLLALMTLTTPAVVPVPGP           esshaharhryhehsreelryleaasaeirlylharyhllrlyelyeyyhhrsaahahlh           tnprlrlgrgsetrnguungsntllplusoysarlosruegusuastrsrroplleleye                        70        80        90        100       110       120           ----:----|----:----|----:----|----:----|----:----|----:----|           ATGPGGGLTTGPAMVTASLATALLAGPLHALAAHGLPGLLLTSALGALGALAAMLLVAGT           rhlhlllyhhlhleayseesylyyslryirelrilehlyyyherylrylrysreyyarlh           grneyyysrryeatlrpruprassnuossguagsyueusssrrgsngsugsngtsslgyr                             130       140       150       160       170       180           ----:----|----:----|----:----|----:----|----:----|----:----|           LLASVGASLLLA            yylealleyyys           ssarlyarsssp

Number the sequence lines in the margin:

% showseq tembl:mmam stdout -format 1 -number Display a sequence with features, translation etc.. Output file [stdout]: MMAM       Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA.         1 gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg 60        61 tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag 120       121 cctgggcagggccttgagtggattggatatattaatccttacaatgatggtactaactac 180       181 aatgagaagttcaaaggcaaggccacactgacttcagacaaatcctccagcacagcctac 240       241 atggagttcagcagcctgacctctgaggactctgcggtctattactgtgcaagaaaaact 300       301 tcctactatagtaacctatattactttgactactggggccaaggcaccactctcacagtc 360       361 tcctca                                                       366

Start the numbering at a specified value ("123" in this case):

% showseq tembl:mmam stdout -format 1 -number -offset 123 Display a sequence with features, translation etc.. MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody  heavy chain mRNA.       123 gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg 182       183 tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag 242       243 cctgggcagggccttgagtggattggatatattaatccttacaatgatggtactaactac 302       303 aatgagaagttcaaaggcaaggccacactgacttcagacaaatcctccagcacagcctac 362       363 atggagttcagcagcctgacctctgaggactctgcggtctattactgtgcaagaaaaact 422       423 tcctactatagtaacctatattactttgactactggggccaaggcaccactctcacagtc 482       483 tcctca                                                       488

Make selected regions uppercase. Use -slower to force the rest of the sequence to be lowercase:

% showseq tembl:mmam stdout -format 1 -slower -upper '25-45,101-203,333-362' Display a sequence with features, translation etc..   MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA.           gagnnccagctgcagcagtctggaCCTGAGCTGGTAAAGCCTGGGgcttcagtgaagatg           tcctgcaaggcttctggatacacattcactagctatgttaTGCACTGGGTGAATCAGAAG           CCTGGGCAGGGCCTTGAGTGGATTGGATATATTAATCCTTACAATGATGGTACTAACTAC           AATGAGAAGTTCAAAGGCAAGGCcacactgacttcagacaaatcctccagcacagcctac           atggagttcagcagcctgacctctgaggactctgcggtctattactgtgcaagaaaaact           tcctactatagtaacctatattactttgactaCTGGGGCCAAGGCACCACTCTCACAGTC           TCctca

Translate selected regions:

% showseq embl:mmam tstdout -format 4 -send 120 -trans 25-49,66-76 Display a sequence with features, translation etc.. MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA.                               10        20        30        40        50        60           ----:----|----:----|----:----|----:----|----:----|----:----|           gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg                                       P  E  L  V  K  P  G  A  S                               70        80        90        100       110       120           ----:----|----:----|----:----|----:----|----:----|----:----|           tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag                           R  L  L

Add your own annotation to the display:

% showseq tembl:mmam stdout -format 2 -send 120 -annotation '13-26 binding  site 15-15 SNP' Display a sequence with features, translation etc.. MMAM Mus musculus (cell line C3H/F2-11) chromosome 12 anti-DNA antibody heavy chain mRNA.                        10        20        30        40        50        60                   ----:----|----:----|----:----|----:----|----:----|----:----|           gagnnccagctgcagcagtctggacctgagctggtaaagcctggggcttcagtgaagatg                       |------------|                       binding site                         |                         SNP                        70        80        90        100       110       120                  ----:----|----:----|----:----|----:----|----:----|----:----|           tcctgcaaggcttctggatacacattcactagctatgttatgcactgggtgaatcagaag

Mandatory qualifiers (bold if not always prompted):

[-sequence] (seqall): Sequence database USA.
-format (menu): Display format.
-things (menu): Specify a list of one or more code characters in the order in which you want things to be displayed. If you want to see things displayed in the order: sequence, complement sequence, ticks line, frame 1 translation, and blank line, enter: S,C,T,1,B.
[-outfile] (outfile): If you enter the name of a file here, this program will write the sequence details into that file.

Optional qualifiers:

-translate (range)

Regions to translate (if translating). If this is left blank the complete sequence is translated. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are:

24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99

-uppercase (range)

Regions to put in uppercase. If this is left blank, the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are:

24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99

-highlight (range)

Regions to color if formatting for HTML. If this is left blank, the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font color. Examples of region specifications are:

24-45 blue 56-78 orange 1-100 green 120-156 red

A file of ranges to color (one range per line) can be specified as @filename.

-annotation (range)

Regions to annotate by marking. If this is left blank, no annotation is added. A set of regions is specified by a set of pairs of positions followed by optional text. The positions are integers. They are followed by any text (but not digits when on the command-line). Examples of region specifications are:

24-45 new domain 56-78 match to Mouse 1-100 First part 120-156 oligo

A file of ranges to annotate (one range per line) can be specified as @filename.

-enzymes (string)

The argument all reads all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between them, such as: HincII,hinfI,ppiI,hindiii. This command is not case-sensitive. You may also use the data from file containing enzyme names by prepending the name of the file you want to use with an @ character; for example, @enz.list. Blank lines and lines starting with a comment tag (# or !) within the file are ignored; all other lines are concatenated together with a comma and treated as the list of enzymes to search for. A file containing enzyme names might look like this:

! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI

-table (menu)

Code to use. See the fuzztran description for codes.

-matchsource (string)

By default, any feature source in the feature table is shown. You can set this to match any feature source you want to show. The source name is usually the name of the program that detected the feature, or the feature table (e.g., EMBL) that the feature came from. The source may be wildcarded by using *. If you want to show more than one source, separate their names with the | character. For example:

gene* | embl

-matchtype (string)

By default, any feature type in the feature table is shown. You can set this to match any feature type you want to show. See Chapter 2 for a list of the EMBL feature types, and Chapter 3 for a list of the SWISS-PROT feature types. The type may be wildcarded by using the * character. If you want to show more than one type, separate their names with the | character. For example:

*UTR | intron

-matchsense (integer)

By default, any feature type in the feature table is shown. You can set this to match any feature sense you want to show. 0 = any sense, 1 = forward sense, and -1 = reverse sense.

-minscore (float)

If this is greater than or equal to the maximum score, any score is permitted.

-maxscore (float)

If this is less than or equal to the maximum score, any score is permitted.

-matchtag (string)

Tags are the types of extra values that a feature may have. For example, in the EMBL feature table, a CDS type of feature may have the tags /codon, /codon_start, /db_xref, /EC_number, /evidence, /exception, /function, /gene, /label, /map, /note, /number, /partial, /product, /protein_id, /pseudo, /standard_name, /translation, /transl_except, /transl_table, or /usedin. Some of these tags also have values (e.g., /gene can have the value of the gene name). By default, any feature tag in the feature table is extracted. You can set this to match any feature tag you want to show. The tag may be wildcarded by using *. If you want to extract more than one tag, separate their names with the | character. For example:

gene | label

-matchvalue (string)

Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. For example, in the EMBL feature table, a CDS type of feature may have the tags /codon, /codon_start, /db_xref, /EC_number, /evidence, /exception, /function, /gene, /label, /map, /note, /number, /partial, /product, /protein_id, /pseudo, /standard_name, /translation, /transl_except, /transl_table, or /usedin. Some of these tags also have values (e.g., /gene can have the value of the gene name). By default, any feature tag in the feature table is extracted. You can set this to match any feature tag value you want to show. The tag may be wildcarded by using *. If you want to extract more than one tag, separate their names with the | character. For example:

pax* | 10

Advanced qualifiers:

-orfminsize (integer): Minimum size of Open Reading Frames (ORFs) to display in the translations.
-flatreformat (boolean): Display restriction enzyme sites in flat format.
-mincuts (integer): Minimum cuts per restriction enzyme.
-maxcuts (integer): Maximum cuts per restriction enzyme.
-sitelen (integer): Minimum recognition site length.
-single (boolean): Force single restriction enzyme site only cuts.
-[no]blunt (boolean): Allow blunt end restriction enzyme cutters.
-[no]sticky (boolean): Allow sticky end restriction enzyme cutters.
-[no]ambiguity (boolean): Allow ambiguous restriction enzyme matches.
-plasmid (boolean): Allow circular DNA.
-[no]commercial (boolean): Only use restriction enzymes with suppliers.
-[no]limit (boolean): Limits restriction enzyme hits to one isoschizomer.
-preferred (boolean): Report preferred isoschizomers.
-threeletter (boolean): Display protein sequences in three-letter code.
-number (boolean): Number the sequences.
-width (integer): Width of sequence to display.
-length (integer): Line length of page (0 for indefinite).
-margin (integer): Margin around sequence for numbering.
-[no]name (boolean): Set this to false if you do not want to display the ID name of the sequence.
-[no]description (boolean): Set this to false if you do not want to display the description of the sequence.
-offset (integer): Offset to start numbering the sequence from.
-html (boolean): Use HTML formatting .