3.4 References

3.4 References

  • Bairoch, A., and R. Apweiler. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28:45-48.

    Main page

    http://us.expasy.org/sprot/

    Release notes

    http://us.expasy.org/sprot/relnotes/

    User manual

    http://us.expasy.org/sprot/userman.html

    Download

    ftp://us.expasy.org/databases/swiss-prot

Chapter 4. Pfam

While many databases are dedicated to organizing protein families and protein domains, Pfam is our preferred database for predicting the function of newly-discovered proteins. Pfam is unique in that it is a manually curated database of protein families derived from protein multiple sequence alignments and profile hidden Markow models. Pfam is a key database for understanding protein function and structure. It is used in many methods, including phylogenetic analysis, secondary structure prediction, and sequence annotation. We're using Pfam Release 7.8.

4.1 Pfam Example Flat File

Example 4-1 shows a Pfam flat file. This entry contains terms from the Pfam Field Definitions, discussed later in this chapter.

Example 4-1. Sample Pfam example
# STOCKHOLM 1.0
#=GF ID   14-3-3
#=GF AC   PF00244
#=GF DE   14-3-3 proteins
#=GF AU   Finn RD
#=GF AL   Clustalw
#=GF SE   Prosite
#=GF GA   25 25
#=GF TC   35.40 35.40
#=GF NC   19.10 19.10
#=GF BM   hmmbuild -f HMM SEED
#=GF BM   hmmcalibrate --seed 0 HMM
#=GF RN   [1]
#=GF RM   95327195
#=GF RT   Structure of a 14-3-3 protein and implications for
#=GF RT   coordination of multiple signalling pathways. 
#=GF RA   Xiao B, Smerdon SJ, Jones DH, Dodson GG, Soneji Y, Aitken
#=GF RA   A, Gamblin SJ; 
#=GF RL   Nature 1995;376:188-191.
#=GF RN   [2]
#=GF RM   95327196
#=GF RT   Crystal structure of the zeta isoform of the 14-3-3
#=GF RT   protein. 
#=GF RA   Liu D, Bienkowska J, Petosa C, Collier RJ, Fu H, Liddington
#=GF RA   R; 
#=GF RL   Nature 1995;376:191-194.
#=GF RN   [3]
#=GF RM   96182649
#=GF RT   Interaction of 14-3-3 with signaling proteins is mediated
#=GF RT   by the recognition of phosphoserine. 
#=GF RA   Muslin AJ, Tanner JW, Allen PM, Shaw AS; 
#=GF RL   Cell 1996;84:889-897.
#=GF RN   [4]
#=GF RM   97424374
#=GF RT   The 14-3-3 protein binds its target proteins with a common
#=GF RT   site located towards the C-terminus. 
#=GF RA   Ichimura T, Ito M, Itagaki C, Takahashi M, Horigome T,
#=GF RA   Omata S, Ohno S, Isobe T 
#=GF RL   FEBS Lett 1997;413:273-276.
#=GF RN   [5]
#=GF RM   96394689
#=GF RT   Molecular evolution of the 14-3-3 protein family. 
#=GF RA   Wang W, Shakes DC 
#=GF RL   J Mol Evol 1996;43:384-398.
#=GF RN   [6]
#=GF RM   96300316
#=GF RT   Function of 14-3-3 proteins. 
#=GF RA   Jin DY, Lyu MS, Kozak CA, Jeang KT 
#=GF RL   Nature 1996;382:308-308.
#=GF DR   PROSITE; PDOC00633;
#=GF DR   SMART; 14_3_3;
#=GF DR   PRINTS; PR00305;
#=GF DR   SCOP; 1a4o; fa;
#=GF DR   PDB; 1a37 A; 3; 228;
#=GF DR   PDB; 1a37 B; 3; 228;
#=GF DR   PDB; 1a38 A; 3; 228;
#=GF DR   PDB; 1a38 B; 3; 228;
#=GF DR   PDB; 1a4o A; 3; 228;
#=GF DR   PDB; 1a4o B; 3; 228;
#=GF DR   PDB; 1a4o C; 3; 228;
#=GF DR   PDB; 1a4o D; 3; 228;
#=GF DR   PDB; 1qja B; 3; 229;
#=GF DR   PDB; 1qja A; 3; 230;
#=GF DR   PDB; 1qjb A; 3; 232;
#=GF DR   PDB; 1qjb B; 3; 232;
#=GF DR   INTERPRO; IPR000308;
#=GF SQ   148
#=GS O61131/11-251      AC O61131
<deleted for brevity>
#=GS 143Z_HUMAN/3-236 DR PDB; 1qjb B; 3; 232;
O61131/11-251                RSDCTYRSKLAEQAERYDEMADAMRTLVEQCVnn.......
dkdELTVEERNLLSVAYKNAVGARRASWRIISSVEQKEMSKA.NVHNKNIAATYRKKVEEELNNIC.QDILN.
LLTKKLIPNT..SESESKVFYYKMKGDYYRYISEFS.CDE.
GKKEASNFAQEAYQKATDIAENELPSTHPIRLGLALNYSVFFY..EILNQPHQACEMAKRAF...DDAITEFDNV..
SEDS..YKDSTLI.MQLLRDNLTLWTSDLQGDQ
   
<deleted for brevity>
   
Q9XZV0/2-235                 KEELLNRCKLNDLIENYGEMFEYLKELSHIKI............
DLQPDELDLITRCTKCYIGHKRGQYRKILTLIDKDKIVD.NQKNSALLEILRKKLSEEILLLC.NSTIE.LSQNFLNNNV.
.FPKKTQLFFTKIIADHYRYIYEIN.GKE.DIKLKAKEYYE--KGLQTIKTCKYNSTETAYLTFYLNYSVFLH..
DTMRNTEESIKVSKACL...YEALKDTEDI..VDNS..QKDIVLL.CQMLKDNISLWKTETNEDN
#=GC SS_cons                 HHHHHHHHHHHHHTTCHHHHHHHHHHHHTTSC............
CCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCTTT--.CCHHHHHHHHHHHHHHHHHHHHH.HHHHH.HHHHTTTTCC.
.CSCHHHHHHHHHHHHHHHHHHHHC.CSC.HHHHHHHHHHHHHHHHHHHHHCHCCTTCHCHHHHHHHHHHHHC..
HTSCCHHHCHHHHHHHH...HHHHTTCGGC..CTTT..HHHHHHH.HHHHHHHHHHCTCCCXXXX
#=GC SA_cons                 26310320300350512510050022003352............
4045500400120033002310402420152179179--.38752510440144014203510.43002.0035201642.
.754403000010100011100201.867.7465125302500340252067635113122100001001127..
31372485135106412...5415867932..3994..6651462.142043126627759XXXX
//