Part I: Data Formats | Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases

Bioinformatics, as we know it today, exists because of the vast number of sequence databases created in the last fifteen years. Many of these databases were constructed by scientists who needed a way to organize and annotate the data being generated by their efficient large-sequencing machines. Because these informative sequence files needed to be read by both computers and humans, most sequence databases were designed to use a flat file format. In this section, we explain the more popular flat file formats (GenBank, EMBL, etc.) and focus on describing, in detail, their sometimes cryptic content. While many sequence formats are available, the flat file format is usually used in sequence analysis. Please note that for easy comparison we have provided the same sequence (cyclin-dependent kinase 2) for each of the flat file examples. To give a complete picture of the chosen databases, we have also summarized information related to the feature terms used in the selected sequence flat files.

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5