1.2 NCBI s Non-Redundant Database Syntax

1.2 NCBI's Non-Redundant Database Syntax

You should be aware of one additional syntax that's used by the NCBI for their non-redundant database. Since the whole point of the database is to have sequence entries listed only once, the description line syntax allows for more than one set of identifier and description. The sets are delimited by Ctrl-A characters. Here's what NCBI has to say about this.

These files are all non-redundant; identical sequences are merged into one entry. To be merged two sequences must have identical lengths and every residue (or basepair) at every position must be the same. The FASTA deflines for the different entries that belong to one sequence are separated by control-A's (^A). In the following example, both entries gi|1469284 and gi|1477453 have the same sequence, in every respect.

>gi|1469284 (U05042) afuC gene product [Actinobacillus  pleuropneumoniae]^Agi|1477453 (U04954) afuC gene product [Actinobacillus  pleuropneumoniae] MNNDFLVLKNITKSFGKATVIDNLDLVIKRGTMVTLLGPSGCGKTTVLRLVAGLENPTSGQIFIDGEDVT KSSIQNRDICIVFQSYALFPHMSIGDNVGYGLRMQGVSNEERKQRVKEALELVDLAGFADRFVDQISGGQ QQRVALARALVLKPKVLILDEPLSNLDANLRRSMREKIRELQQRLGITSLYVTHDQTEAFAVSDEVIVMN KGTIMQKARQKIFIYDRILYSLRNFMGESTICDGNLNQGTVSIGDYRFPLHNAADFSVADGACLVGVRPE AIRLTATGETSQRCQIKSAVYMGNHWEIVANWNGKDVLINANPDQFDPDATKAFIHFTEQGIFLLNKE


Sequence Analysis in a Nutshell
Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases
ISBN: 059600494X
EAN: 2147483647
Year: 2005
Pages: 312

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net