2.4 GenBankDDBJ Field Definitions

2.4 GenBank/DDBJ Field Definitions

The field terms found in GenBank/DDBJ sequence flat files are used to help organize the information for human readabilty and machine parsing. There are several GenBank/DDBJ field terms found in a sequence flat file, but the repositories themselves share the same field definitions. Table 2-1 summarizes each of the field definitions.

Table 2-1. GenBank/DDBJ field definitions




A short mnemonic name for the entry, chosen to suggest the sequence's definition. Mandatory keyword/exactly one record.


A concise description of the sequence. Mandatory keyword/one or more records.


The primary accession number is a unique, unchanging code assigned to each entry. Mandatory keyword/one or more records.


A compound identifier consisting of the primary accession number and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the sequence by NCBI. Mandatory keyword/exactly one record.


An alternative method of presenting the NCBI GI identifier (described above). The NID is obsolete and was removed from the GenBank flat file format in December 1999.


Short phrases describing gene products and other information about an entry. Mandatory keyword in all annotated entries/one or more records.


Information on the order in which this entry appears in a series of discontinuous sequences from the same molecule. Optional keyword (only in segmented entries)/exactly one record.


Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword.


Formal scientific name of the organism (first line) and taxonomic classification levels (second and subsequent lines). Mandatory subkeyword in all annotated entries/two or more records.


Citations for all articles containing data reported in this entry. Includes four subkeywords and may repeat. Mandatory keyword/one or more records.


Lists the authors of the citation. Mandatory subkeyword/one or more records.


Full title of citation. Optional subkeyword (present in all but unpublished citations)/one or more records.


Lists the journal name, volume, year, and page numbers of the citation. Mandatory subkeyword/one or more records.


Provides the Medline unique identifier for a citation. Optional subkeyword/one record.


Provides the PubMed unique identifier for a citation. Optional subkeyword/one record.


Specifies the relevance of a citation to an entry. Optional subkeyword/one or more records.


Cross-references to other sequence entries, comparisons to other collections, notes of changes in LOCUS names, and other remarks. Optional keyword/one or more records/may include blank records.


Table containing information on portions of the sequence that code for proteins and RNA molecules and information on experimentally determined sites of biological significance. Optional keyword/one or more records.


Summary of the number of occurrences of each base code in the sequence. Mandatory keyword/exactly one record.


Specification of how the first base of the reported sequence is operationally located within the genome. Where possible, this includes its location within a larger genetic map. Mandatory keyword/exactly one record.


Entry termination symbol. Mandatory at the end of an entry/exactly one record.

Sequence Analysis in a Nutshell
Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases
ISBN: 059600494X
EAN: 2147483647
Year: 2005
Pages: 312

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net