3.2 SWISS-PROT Field Definitions

3.2 SWISS-PROT Field Definitions

The field codes found in a SWISS-PROT (or TrEMBL) sequence flat file are used to help arrange the information for human readabilty and machine-based parsing. There are several SWISS-PROT field codes found in a sequence flat file; they are represented by two-letter abbreviations. Table 3-1 summarizes the contents of each field code.

Table 3-1. SWISS-PROT field definititions

Line code

Content

ID

Identification

AC

Accession number(s)

DT

Date

DE

Description

GN

Gene name(s)

OS

Organism species

OG

Organelle

OC

Organism classification

OX

Taxonomy cross-reference(s)

RN

Reference number

RP

Reference position

RC

Reference comment(s)

RX

Reference cross-reference(s)

RA

Reference authors

RT

Reference title

RL

Reference location

CC

Comments or notes

DR

Database cross-references

KW

Keywords

FT

Feature table data

SQ

Sequence header

 

(blanks) sequence data

//

Termination line

3.3 SWISS-PROT Feature Table

A feature is a single word or abbreviation indicating a functional role or region associated with a sequence. A list of SWISS-PROT features (organized by feature type) is presented below. An example for each feature is also included to illustrate its use for describing a sequence location or region.

3.3.1 Change Indicators

CONFLICT

Different papers report differing sequences:

FT   CONFLICT    304    304       MISSING (IN REF. 3).
MUTAGEN

Indicates an experimentally altered site:

FT   MUTAGEN      65     65       H->F: 100% ACTIVITY LOSS.
VARIANT

Authors report that sequence variants exist:

FT   VARIANT     136    136       M -> I.
VARSPLIC

Describes sequence variants produced by alternative splicing:

FT   VARSPLIC     33     49       MISSING (IN SHORT ISOFORM).

3.3.2 Amino Acid Modifications

BINDING

Binding site for chemical group (co-enzyme, prosthetic group, etc.):

FT   BINDING      14     14       HEME (COVALENT).
CARBOHYD

Glycosylation site:

FT   CARBOHYD     53     53       N-LINKED (GLCNAC...) (POTENTIAL).
DISULFID

Disulfide bond:

FT   DISULFID     23     84       PROBABLE.
LIPID

Covalent binding of a lipid moiety:

FT   LIPID         2      2       MYRISTATE.

Table 3-2 lists the attached groups that are currently defined.

Table 3-2. SWISS-PROT lipid moiety attached groups

Attached group

Description

MYRISTATE

Myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue.

PALMITATE

Palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue.

FARNESYL

Farnesyl group attached through a thioether bond to a cysteine residue.

GERANYL-GERANYL

Geranyl-geranyl group attached through a thioether bond to a cysteine residue.

GPI-ANCHOR

Glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein.

N-ACYL DIGLYCERIDE

N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages.

METAL

Binding site for a metal ion:

FT   METAL        28     28       COPPER (POTENTIAL).
MOD_RES

Posttranslational modification of a residue:

FT   MOD_RES     686    686       PHOSPHORYLATION (BY PKC).

Table 3-3 lists the most frequent modifications.

Table 3-3. Frequently used SWISS-PROT amino acid modifications

Modification

Description

ACETYLATION

N-terminal or other.

AMIDATION

Generally at the C-terminal of a mature active peptide.

BLOCKED

Undetermined N- or C-terminal blocking group.

FORMYLATION

Of the N-terminal methionine.

GAMMA-CARBOXYGLUTAMIC ACID

Of glutamate.

HYDROXYLATION

Of asparagine, aspartic acid, proline or lysine.

METHYLATION

Generally of lysine or arginine.

PHOSPHORYLATION

Of serine, threonine, tyrosine, aspartic acid or histidine.

PYRROLIDONE CARBOXYLIC ACID

N-terminal glutamate which has formed an internal cyclic lactam. This is also called "pyro-Glu".

SULFATION

Generally of tyrosine.

SE_CYS

Selenocysteine:

FT   SE_CYS       52     52
THIOETH

Thioether bond.

THIOLEST

Thiolester bond.

3.3.3 Regions

CA_BIND

Extent of a calcium-binding region:

FT   CA_BIND     759    770       EF-HAND 1 (POTENTIAL).
CHAIN

Extent of a polypeptide chain in the mature protein:

FT   CHAIN        21    119       BETA-2 MICROGLOBULIN.
DNA_BIND

Extent of a DNA-binding region:

FT   DNA_BIND     69    128       HOMEOBOX.
DOMAIN

Extent of a domain of interest on the sequence:

FT   DOMAIN       22    788       EXTRACELLULAR (POTENTIAL).
NP_BIND

Extent of a nucleotide phosphate-binding region:

FT   NP_BIND      13     25       ATP.
PEPTIDE

Extent of a released active peptide:

FT   PEPTIDE      13    107       NEUROPHYSIN 2.
PROPEP

Extent of a propeptide:

FT   PROPEP      550    574       REMOVED IN MATURE FORM.
REPEAT

Extent of an internal sequence repetition:

FT   REPEAT      225    307       1.
SIGNAL

Extent of a signal sequence (prepeptide).

SIMILAR

Extent of a similarity with another protein sequence:

FT   SIMILAR     139    153       STRONG WITH CA-BINDING EF-HAND SEQUENCE.
TRANSIT

Extent of a transit peptide (mitochondrial, chloroplastic, thylakoid, cyanelle or for a microbody):

FT   TRANSIT       1     25       MITOCHONDRION.
TRANSMEM

Extent of a transmembrane region.

ZN_FING

Extent of a zinc finger region:

FT   ZN_FING     319    343       GATA-TYPE.

3.3.4 Secondary Structure

Secondary structures are formed as a result of the physical characteristics of the amino acid sidechains of a protein (see Table 3-4).

Table 3-4. SWISS-PROT secondary structure codes

Abbreviation

Description

Type

B

Residue in an isolated beta-bridge

STRAND

E

Hydrogen-bonded beta-strand (extended strand)

STRAND

G

3(10) helix

HELIX

H

Alpha-helix

HELIX

I

Pi-helix

HELIX

S

Bend (five-residue bend centered at residue i)

Not specified

T

H-bonded turn (3-turn, 4-turn or 5-turn)

TURN

For example:

FT   HELIX         4     14

3.3.5 Others

ACT_SITE

Amino acid(s) involved in the activity of an enzyme:

FT   ACT_SITE    193    193       ACCEPTS A PROTON DURING CATALYSIS.
INIT_MET

Initiator methionine:

FT   INIT_MET      0      0
NON_CONS

Non-consecutive residues:

FT   NON_CONS   1683   1684
NON_TER

The residue at an extremity of the sequence is not the terminal residue:

FT   NON_TER     129    129
SITE

Any other interesting site on the sequence:

FT   SITE        759    760       CLEAVAGE (BY THROMBIN).
UNSURE

Uncertainties in the sequence.