4.2 Pfam Field Definitions

4.2 Pfam Field Definitions

The field codes found in a Pfam flat file help display information for human readabilty and machine-based parsing. A typical entry contains several two-letter Pfam field codes. Table 4-1 provides definitions and descriptions of these codes.

Table 4-1. Pfam field definitions

Field

Definition

Description

AC

Accession number

PFxxxxx or PBxxxxxx.

ID

Identification

15 characters or less.

DE

Definition

80 characters or less.

AU

Author

Author of the entry.

AL

Alignment method of seed

Method used to align the seed members. Approved AL lines are:

Clustalv
Clustalw
Clustalw_mask_xxxx
Domainer
HMM_built_from_alignment
HMM_simulated_annealing
Manual
Prosite_pattern
Prodom
Structure_superposition
pftools
Unknown

BM

HMM building command lines

 

SE

Source of seed

The source suggesting seed members belong to a family.

GA

Gathering threshold

Search threshold to build the full alignment.

NC

Noise cutoff

This field refers to the bit scores of the highest scoring match not in the full alignment.

TC

Trusted cutoff

This field refers to the bit scores of the lowest scoring match in the full alignment.

TP

Type field

The type field is a compulsory field describing the type of family. At present it can be one of:

Family
Domain
Repeat
Motif

PI

Previous IDs

 

DC

Database Comment

Comment for database reference.

DR

Database Reference

Reference to external database.

RC

Reference Comment

Comment for literature reference.

RN

Reference Number

Digit in square brackets.

RM

Reference Medline

Eight digit number.

RT

Reference Title

Title of paper.

RA

Reference Author

Author of paper.

RL

Reference Location

Location of paper.

CC

Comment

Comment lines provide annotation and other information.

NE

Pfam accession

Indicated those cases where there is a nested domain.

SQ

Sequence

Nr of sequences, start of alignment.

//

End of alignment

4.3 References

  • Bateman, A., E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L. Howe, M. Marshall, and E. L. L. Sonnhammer. 2002. The Pfam Protein Families Database. Nucleic Acids Research 30 (1):275-280.

    Main page

    http://pfam.wustl.edu/

    Release notes

    ftp://ftp.genetics.wustl.edu/pub/Pfam/relnotes.txt

    Help pages

    http://pfam.wustl.edu/help.shtml

    Download

    ftp://ftp.genetics.wustl.edu/pub/Pfam/

Chapter 5. PROSITE

PROSITE is one the many popular databases for sequence profiles, patterns, and motifs. It is one of our favorite databases for sequence analysis, and we hope you find it as useful. The database was created so that computer-based tools could quickly identify sequences containing any known protein motif. PROSITE patterns represent another key database for basic sequence analysis and protein function determination. We're using PROSITE Release 17.

5.1 PROSITE Example Flat File

Example 5-1 contains a sample pattern entry from a PROSITE flat file. This entry contains examples of the PROSITE Field Definitions, discussed later in this chapter.

Example 5-1. Sample PROSITE pattern entry
ID   PPASE; PATTERN.
AC   PS00387;
DT   NOV-1990 (CREATED); NOV-1995 (DATA UPDATE); NOV-1995 (INFO UPDATE).
DE   Inorganic pyrophosphatase signature.
PA   D-[SGN]-D-[PE]-[LIVM]-D-[LIVMGC].
NR   /RELEASE=32,49340;
NR   /TOTAL=16(16); /POSITIVE=11(11); /UNKNOWN=0(0); /FALSE_POS=5(5);
NR   /FALSE_NEG=0; /PARTIAL=2;
CC   /TAXO-RANGE=A?EP?; /MAX-REPEAT=1;
CC   /SITE=1,magnesium; /SITE=3,magnesium; /SITE=6,magnesium;
DR   P21216, IPYR_ARATH, T; P37980, IPYR_BOVIN, T; P17288, IPYR_ECOLI, T;
DR   P44529, IPYR_HAEIN, T; P13998, IPYR_KLULA, T; P19117, IPYR_SCHPO, T;
DR   P37981, IPYR_THEAC, T; P19514, IPYR_THEP3, T; P38576, IPYR_THETH, T;
DR   P00817, IPYR_YEAST, T; P28239, IPY2_YEAST, T;
DR   P19371, IPYR_DESVH, P; P21616, IPYR_PHAAU, P;
DR   P09167, AERA_AERHY, F; P12351, CYP1_YEAST, F; P24653, Y101_NPVOP, F;
DR   P37904, YCEI_ECOLI, F; P39303, YJFU_ECOLI, F;
3D   1PYP;
DO   PDOC00325;
//

Example 5-2 contains a sample profile (matrix) entry from a PROSITE flat file. This entry contains further examples of the PROSITE Field Definitions described later in this chapter.

Example 5-2. Sample PROSITE profile (matrix)
ID   GLOBIN; MATRIX.
AC   PS01033;
DT   JUN-1994 (CREATED); DEC-2001 (DATA UPDATE); DEC-2001 (INFO UPDATE).
DE   Globins profile.
MA   /GENERAL_SPEC: ALPHABET='ABCDEFGHIKLMNPQRSTVWYZ'; LENGTH=154;
MA   /DISJOINT: DEFINITION=PROTECT; N1=1; N2=154;
MA   /NORMALIZATION: MODE=1; FUNCTION=LINEAR; R1=-0.8705306; R2=0.0209303; TEXT='-LogE';
MA   /CUT_OFF: LEVEL=0; SCORE=424; N_SCORE=8.0; MODE=1; TEXT='!';
MA   /CUT_OFF: LEVEL=-1; SCORE=353; N_SCORE=6.5; MODE=1; TEXT='?';
MA   /DEFAULT: D=-20; I=-20; MI=-210; MD=-210; IM=0; DM=0;
MA   /I: I=-6;
MA   /M: SY='A'; M=7,-7,-8,-10,-10,-8,3,-12,-4,-8,-6,-4,-6,-10,-10,-10,3,4,3,
-14,-10,-10; D=-6;
MA   /I: I=-6; MI=-59; MD=-59;
MA   /M: SY='H'; M=1,-3,-21,0,-6,-20,0,2,-16,-10,-16,-10,-4,0,-8,-12,-2,-9,-11,
-23,-13,-8; D=-6;
   
<deleted for brevity>
   
MA   /M: SY='H'; M=-1,4,-18,5,3,-19,-10,9,-20,8,-16,-9,3,-10,2,8,-2,-7,-14,-19,-6,1; D=-5;
MA   /I: I=0; MI=*;
NR   /RELEASE=40.7,103373;
NR   /TOTAL=797(796); /POSITIVE=796(795); /UNKNOWN=0(0); /FALSE_POS=1(1);
NR   /FALSE_NEG=0; /PARTIAL=3;
CC   /MATRIX_TYPE=protein_domain;
CC   /SCALING_DB=reversed;
CC   /AUTHOR=P_Bucher;
CC   /TAXO-RANGE=??EP?; /MAX-REPEAT=9;
CC   /FT_KEY=DOMAIN; /FT_DESC=GLOBIN;
DR   P04252, BAHG_VITST, T; Q03331, FHP_CANNO , T; P39676, FHP_YEAST , T;
DR   P02212, GLB1_ANABR, T; P19363, GLB1_ARTSX, T; P14805, GLB1_CALSO, T;
DR   P02221, GLB1_CHITH, T; P02216, GLB1_GLYDI, T; P20412, GLB1_LAMSP, T;
DR   P41260, GLB1_LUCPE, T; P08924, GLB1_LUMTE, T; P21197, GLB1_MORMR, T;
   
<deleted for brevity>
   
DR   P42430, YKYB_BACSU, F;
3D   1VHB; 2VHB; 3VHB; 1HBG; 2HBG; 1B0B; 1EBT; 1FLP; 1MOH; 1HBI; 2HBI; 3HBI;
3D   3SDH; 4HBI; 4SDH; 5HBI; 6HBI; 7HBI; 1ECA; 1ECD; 1ECN; 1ECO; 1VRE; 1VRF;
3D   2LHB; 3LHB; 1DM1; 1MBA; 2FAL; 2FAM; 3MBA; 4MBA; 5MBA; 1SCT; 1HLB; 1HLM;
3D   1OUT; 1OUU; 1A4F; 1FSX; 1HDA; 1CG5; 1CG8; 1IBE; 2DHB; 2MHB; 1A00; 1A01;
3D   1A0U; 1A0V; 1A0W; 1A0X; 1A0Y; 1A0Z; 1A3N; 1A3O; 1A9W; 1ABW; 1ABY; 1AJ9;
3D   1AXF; 1B86; 1BAB; 1BBB; 1BIJ; 1BUW; 1BZ0; 1BZ1; 1BZZ; 1CLS; 1CMY; 1COH;
3D   1DSH; 1DXT; 1DXU; 1DXV; 1FDH; 1GBU; 1GBV; 1GLI; 1HAB; 1HAC; 1HBA; 1HBB;
3D   1HBS; 1HCO; 1HDB; 1HGA; 1HGB; 1HGC; 1HHO; 1NIH; 1QI8; 1QSH; 1QSI; 1RVW;
3D   1SDK; 1SDL; 1THB; 1VWT; 2HBC; 2HBD; 2HBE; 2HBF; 2HBS; 2HCO; 2HHB; 2HHD;
3D   2HHE; 3HHB; 4HHB; 6HBW; 1SPG; 1HDS; 1HBH; 1PBX; 1QPW; 2PGH; 1HBR; 1CBL;
3D   1CBM; 1ITH; 1D8U; 1CQX; 1GDI; 1GDJ; 1GDK; 1GDL; 1LH1; 1LH2; 1LH3; 1LH5;
3D   1LH6; 1LH7; 2GDM; 2LH1; 2LH2; 2LH3; 2LH5; 2LH6; 2LH7; 1BIN; 1FSL; 1LHS;
3D   1LHT; 1EMY; 1MBS; 1AZI; 1BJE; 1DWR; 1DWS; 1DWT; 1HRM; 1HSY; 1RSE; 1WLA;
3D   1XCH; 1YMA; 1YMB; 1YMC; 2MM1; 101M; 102M; 103M; 104M; 105M; 106M; 107M;
3D   108M; 109M; 110M; 111M; 112M; 1A6G; 1A6K; 1A6M; 1A6N; 1ABS; 1AJG; 1AJH;
3D   1BVC; 1BVD; 1BZ6; 1BZP; 1BZR; 1CH1; 1CH2; 1CH3; 1CH5; 1CH7; 1CH9; 1CIK;
3D   1CIO; 1CO8; 1CO9; 1CP0; 1CP5; 1CPW; 1CQ2; 1DO1; 1DO3; 1DO4; 1DO7; 1DTI;
3D   1DTM; 1DUK; 1DUO; 1DXC; 1DXD; 1EBC; 1F63; 1F65; 1F6H; 1FCS; 1HJT; 1IOP;
3D   1IRC; 1JDO; 1LTW; 1MBC; 1MBD; 1MBI; 1MBN; 1MBO; 1MCY; 1MGN; 1MLF; 1MLG;
3D   1MLH; 1MLJ; 1MLK; 1MLL; 1MLM; 1MLN; 1MLO; 1MLQ; 1MLR; 1MLS; 1MLU; 1MOA;
3D   1MOB; 1MOC; 1MOD; 1MTI; 1MTJ; 1MTK; 1MYF; 1MYM; 1OBM; 1OFJ; 1OFK; 1SPE;
3D   1SWM; 1TES; 1VXA; 1VXB; 1VXC; 1VXD; 1VXE; 1VXF; 1VXG; 1VXH; 1YOG; 1YOH;
3D   1YOI; 2CMM; 2MB5; 2MBW; 2MGA; 2MGB; 2MGC; 2MGD; 2MGE; 2MGF; 2MGG; 2MGH;
3D   2MGI; 2MGJ; 2MGK; 2MGL; 2MGM; 2MYA; 2MYB; 2MYC; 2MYD; 2MYE; 2SPL; 2SPM;
3D   2SPN; 2SPO; 4MBN; 5MBN; 1M6C; 1M6M; 1MDN; 1MNH; 1MNI; 1MNJ; 1MNK; 1MNO;
3D   1MWC; 1MWD; 1MYG; 1MYH; 1MYI; 1MYJ; 1PMB; 1YCA; 1YCB; 1MYT; 1ASH;
DO   PDOC00793;
//