Sample Evolutionary Traces

The following sample Evolutionary Traces were utilized in:

Yao, H., D. M. Kristensen, I. Mihalek, M. E. Sowa, C. Shaw, M. Kimmel, L. Kavraki, and O. Lichtarge. 2003. "An Accurate, Sensitive, and Scalable Method to Identify Functional Sites in Protein Structures." J Mol Biol. 326:255-261.

For each protein the table shows its Protein Data Bank (PDB) code, Enzyme Classification (EC) number, size, number of sequences in the trace alignment, minimum percent sequence identity of the alignment evolutionary breadth (E=Eukaryota, B=Bacteria, A=Archaea, V=Viruses), and, for each of the 4 statistics, the %fraction of ranks with significant trace clusters that also have a significant overlap with the functional site. TCR=Total Connected Residues, LCO=Largest Cluster Overlap; AO=Average Overlap; HG=Hypergeometric Distribution. Functional sites are then shown in green, and finally Trace clusters are colored individually: red=largest, blue=second largest, etc. Single-residue clusters are black. Ligands are colored yellow/orange or blue if they are peptides. There are no ligands shown in the EC set.

 

Protein-ligand set:

Name
PDB
code
EC #
(where
applicable)
Protein
Size
No.
of
Seq.
Minimum
sequence
identity
Evolutionary
Breadth
TCR
LCO
AO
HG
Phosphoglycerate kinase
16pk
2.7.2.3
415
93
37%
E+B
100%
100%
100%
100%
c-Src tyrosine kinase; SH2
1a09
2.7.1.112
106
96
43%
E+V
100%
100%
100%
93%
Growth hormone
1a22-A
-
180
84
29%
E
33%
0%
0%
0%
Growth hormone receptor
1a22-B
-
192
70
27%
E
92%
0%
0%
0%
Galectin-3 CRD
1a3k
-
137
94
27%
E+V
100%
100%
100%
100%
Indole-3-glyceophosphate synthase
1a53
4.1.1.48
247
94
25%
E+B+A
100%
100%
100%
100%
citrate synthase
1a59
4.1.3.7
377
97
26%
E+B+A
97%
100%
100%
100%
Myoglobin
1a6m
-
151
88
23%
E
100%
100%
100%
100%
Serine/Threonine phosphatase
1a6q
3.1.3.16
363
76
25%
E+B+V
100%
100%
100%
100%
2,5-diketo-D-gluconic acid reductase A
1a80
1.1.1.-
277
87
30%
E+B
100%
100%
100%
100%
Acyl CoA binding protein
1aca
-
86
65
23%
E+B
57%
0%
14%
0%
Dihydropteroate synthase
1aj2
2.5.1.15
282
94
31%
E+B+A
100%
100%
100%
100%
Adenylate kinase
1aky
2.7.4.3
218
82
33%
E+B+A
100%
82%
76%
100%
HSP-90 *
1am1
-
213
98
66%
E+B
NA
NA
NA
NA
Triosephosphate isomerase
1amk
5.3.1.1
250
92
36%
E+B+A
96%
100%
100%
100%
Peroxidase
1aru
1.11.1.7
336
80
13%
E
100%
55%
85%
65%
Astacin
1ast
-
200
90
25%
E+B
100%
100%
100%
100%
Annexin III
1axn
-
323
83
39%
E
40%
0%
0%
20%
Alpha amylase
1bag
3.2.1.1
425
98
19%
E+B+A
100%
100%
100%
100%
Biotinyl domain **
1bdo
6.4.1.2
80
50
33%
E+B+A
NA
NA
NA
NA
Pseudoazurin ***
1bqk
-
124
71
15%
E+B+A
NA
NA
NA
NA
HIV Reverse transcriptase *
1c1b
2.7.7.49
536
78
93%
V
NA
NA
NA
NA
Poly-A binding protein
1cvj
-
169
97
15%
E+B
17%
100%
100%
100%
Tpr2a domain of Hop
1elr
-
128
98
14%
E+B+A
100%
100%
100%
50%
Trp1 domain of Hop
1elw
-
117
97
14%
E+B+A
17%
83%
100%
33%
Thioredoxin reductase
1f6m
1.6.4.5
320
95
26%
E+B+A
100%
100%
100%
100%
Rhodopsin *
1f88
-
338
95
69%
E
NA
NA
NA
NA
Cyclins
1fin-A
-
298
93
52%
E
67%
100%
100%
100%
Cyclins
1fin-B
-
260
97
31%
E
70%
70%
100%
20%
Protein phospatase-1
1fjm
3.1.3.16
294
94
37%
E
90%
100%
76%
100%
Regulator of G-protein signaling
1fqj
3.1.4.17
133
99
27%
E
100%
100%
100%
100%
Signal sequence recognition protein
1ng1
-
294
95
24%
E+B+A
100%
100%
88%
100%
c-Src tyrosine kinase; SH3
1nlo
2.7.1.112
56
99
25%
E+V
100%
83%
100%
33%
Pyruvate decarboxylase
1pvd
4.1.1.1
537
80
14%
E+B+A
100%
100%
100%
100%
Endonuclease IV
1qum
3.1.21.2
279
39
22%
E+B+A+V
100%
100%
100%
100%
Deacetoxycephalosporin C
1rxg
-
275
35
14%
E+B
100%
100%
100%
100%
Mannose binding protein
2msb
-
113
91
21%
E
92%
42%
42%
17%

 

SGI set:

Name
PDB
code
EC #
(where
applicable)
Protein
Size
No.
of
Seq.
Minimum
sequence
identity
Evolutionary
Breadth
TCR
LCO
AO
HG
Phosphoribosylaminoimidazole-Succinocarboxamide Synthase
1a48
6.3.2.6
298
70
13%
E+B+A
100%
100%
100%
100%
Yeast Hypothetical Protein
1b54
-
230
54
22%
E+B
93%
100%
53%
100%
5,10-Methylenetetrahydrofolate Dehydrogenase
1ee9
1.5.1.15
317
83
18%
E+B+A
56%
33%
33%
44%
Modification Methylase Rsri
1eg2
2.1.1.72
262
78
10%
B+A+V
94%
100%
100%
100%
Nicotinamide Mononucleotide Adenylyltransferase
1ej2
2.7.7.1
167
22
17%
B+A
67%
100%
100%
100%
Fmn-Binding Protein
1eje
-
192
21
19%
B+A
100%
100%
100%
100%
Ribonuclease Hii
1ekeA
3.1.26.4
219
40
19%
E+B+A+V
100%
63%
63%
19%
Protein Maf
1ex2A
-
185
56
19%
E+B+A
90%
90%
100%
100%
Protein Maf
1excA
-
185
56
19%
E+B+A
100%
44%
31%
94%
Phosphoserine Phosphatase (Psp)
1f5sA
3.1.3.3
209
57
10%
E+B+A
100%
100%
100%
100%
Cag-alpha
1g6oA
-
317
64
17%
B+A
100%
100%
100%
100%
Isopentenyl-Diphosphate -Isomerase
1i9aA
5.3.3.2
175
49
21%
E+B+A
100%
100%
100%
100%
Selenocysteine Lyase
1jf9
4.4.1.16
405
96
16%
E+B+A
95%
100%
100%
100%
Transcription Regulator Nc2 alpha Chain
1jfiA
-
63
45
18%
E
NA
NA
NA
NA
Transcription Regulator Nc2 beta Chain
1jfiB
-
135
64
15%
E
88%
13%
63%
0%
Tata-Box-Binding Protein (TBP)
1jfiC
-
183
97
17%
E+A
100%
100%
100%
100%
L-Allo-Threonine Aldolase
1jg8A
4.1.2.5
342
26
28%
E+B+A
100%
91%
100%
100%
ATP-Binding Domain Of Protein Mj0577
1mjhA
-
143
18
18%
E+B+A
NA
NA
NA
NA
Superoxide Dismutase 1 Copper Chaperone
1qupA
-
219
13
22%
E
100%
100%
0%
0%
Pyrophosphatase
2mjpA
3.6.1.15
184
60
22%
E+B+A
100%
100%
100%
100%

 

EC set:

Name
PDB
code
EC #
(where
applicable)
Protein
Size
No.
of
Seq.
Minimum
sequence
identity
Evolutionary
Breadth
TCR
LCO
AO
HG
Adenosine deaminase
1a4mA
3.5.4.4
349
32
19%
E+B
100%
100%
100%
100%
Aspartate aminotransferase
1ars
2.6.1.1
396
87
29%
E+B
100%
96%
96%
88%
Taq DNA Polymerase
1bgxT
2.7.7.7
828
21
33%
E+B+V
100%
20%
100%
100%
Cutinase
1cex
3.1.1.-
197
23
21%
E+B
100%
67%
67%
67%
Creatine Amidinohydrolase
1chmA
3.5.3.3
401
57
18%
E+B+A
100%
100%
68%
68%
Cytochrome P450-terp
1cpt
1.14.-.-
412
96
17%
E+B+A
100%
79%
46%
46%
Human Beta-1 Alcohol Dehydrogenase
1dehA
1.1.1.1
374
90
44%
E+B
100%
29%
0%
0%
Pyruvate Phosphate Dikinase
1dik
2.7.9.1
869
64
19%
E+B+A
100%
100%
100%
100%
Fumarase C
1fuqA
4.2.1.2
456
93
33%
E+B+A
100%
67%
100%
93%
Glycolate Oxidase
1gox
1.1.3.1
350
73
25%
E+B+A
100%
100%
100%
100%
Thioltransferase
1kte
-
105
88
17%
E+B+A+V
100%
78%
56%
56%
N-Acetylneuraminate Lyase
1nal1
4.1.3.3
291
92
15%
E+B+A
100%
100%
100%
100%
4-Chlorobenzoyl Coenzyme A Dehalogenase
1nzyA
3.8.1.6
269
91
18%
E+B+A
82%
65%
65%
53%
Enolase
1oneA
4.2.1.11
436
80
56%
E+B+A
100%
100%
100%
100%
Cyclodextrin Glucanotransferase
1pamA
2.4.1.19
686
82
14%
E+B+A
100%
100%
100%
100%
N-(5'Phosphoribosyl)Anthranilate Isomerase
1pii
4.1.1.48
447
100
22%
E+B+A
87%
35%
26%
0%
Pyruvate Oxidase (Mutant)
1poxA
1.2.3.3
585
81
22%
E+B+A
100%
48%
17%
97%
Phosphoglycerate Mutase
1qhfA
5.4.2.1
240
58
16%
E+B
100%
94%
94%
94%
Rusticyanin
1rcy
-
151
13
14%
B+A
100%
100%
100%
100%
Rieske Iron-Sulfur Protein
1rie
1.10.2.2
127
57
27%
E+B+A
100%
100%
100%
100%
Xenobiotic Acetyltransferase
1xat
2.3.1.28
208
63
17%
E+B+A
100%
43%
0%
0%
Protein R2 Of Ribonucleotide Reductase
1xikA
1.17.4.1
340
88
14%
E+B+A+V
100%
100%
100%
100%
Yersinia Protein Tyrosine Phosphatase
1ytn
3.1.3.48
283
100
10%
E+B
100%
100%
100%
100%
Endonuclease III
2abk
4.2.99.18
211
95
15%
E+B+A
100%
100%
48%
81%
Nadh-Dependent Nitrate Reductase (Cytochrome B Reductase Fragment)
2cnd
1.6.6.1
260
92
28%
E+B
75%
75%
75%
0%
Carboxypeptidase A
2ctc
3.4.17.1
307
78
21%
E+B
96%
96%
96%
96%
Phosphotransferase System, Enzyme I
2ezb
2.7.3.9
249
80
17%
E+B+A
100%
60%
0%
0%
Glutathione Reductase
3grs
1.6.4.2
461
91
30%
E+B+A
100%
34%
66%
93%
D-Alanyl-D-Alanine Carboxypeptidase/Transpeptidase
3pte
3.4.16.4
347
92
11%
E+B+A+V
100%
0%
0%
0%

"NA" indicates that no significant ranks exist either for lack of sequence diversity (*), or because its size is too small (**) - see Madabushi et al., JMB 2002. In the case of 1bqk (***), the size of the largest cluster was significant, but the number of clusters was not (at the rank shown, there are 5 single-residue clusters in addition to the large cluster shown).


The following sample Evolutionary Traces are displayed following the format presented in:

Madabushi, S., H. Yao, M. Marsh, D.M. Kristensen, A. Philippi, M.E. Sowa, and O. Lichtarge. 2002. "Structural Clusters of Evolutionary Trace Residues are Statistically Significant" J. Mol. Biol. 316:1 (Feb 8), p139-154.

In addition to the full name of the protein, also listed for each are the PDB identifier, class, size (amino acids), known function, number of sequences in the multiple sequence alignment, minimum percent identity between sequences in the alignment and the selected protein structure, the best significance level the protein achieves using the "Number of Clusters" statistical method, the best significance level the protein achieves using the "Size of Largest Cluster" method, and the evolutionary breadth of the protein family tree (E = eukaryotic, P = prokaryotic, V = viral).

Name
PDB
code
Class
Size
Function
No.
of
Seq.
Min %
identity
Sig. of
No. of
Clusters
Sig. of
Cluster
Size
Evolutionary
Breadth
Regulator of G-protein signaling 1fqi all alpha proteins 133 regulator of G-protein signaling 43 43 0.3 0.3 E
Rhodopsin 1f88 membrane and cell surface protein 338 signaling protein 59 33 0.3 0.3 E
Annexin III 1axn all alpha proteins 323 calcium/phospholipid binding protein 70 40 0.3 5 E
Alpha amylase 1bag all beta proteins 425 alpha-amylase 55 24 0.3 0.3 E+P
HIV Reverse Transcriptase 1c1b alpha and beta proteins 536 reverse transcriptase 278 61 1 5 V
Phosphoglycerate kinase 16pk alpha and beta proteins 415 kinase 95 41 0.3 0.3 E+P
citrate synthase 1a59 all alpha proteins 377 synthase 63 32 0.3 0.3 E+P
Dd-Peptidase Penicillin-Target Enzyme 3pte
Serine/Threonine phosphatase 1a6q alpha and beta proteins 363 hydrolase 58 38 0.3 0.3 E
Peroxidase 1aru all alpha proteins 336 peroxidase 29 55 0.3 0.3 E
Protein phospatase-1 1fjm alpha and beta proteins 294 hydrolase 68 65 0.3 0.3 E
Cyclins 1fin-A all alpha proteins 298 transferase 37 64 0.3 0.3 E
Signal sequence recognition protein 1ng1 all alpha proteins 294 73 45 0.3 0.3 E+P
Endonuclease IV 1qum alpha and beta proteins 279 endonuclease 27 39 0.3 0.3 E+P
2,5-diketo-D-gluconic acid reductase A 1a80 alpha and beta proteins 277 Oxidoreductase 83 46 0.3 0.3 E+P
Deacetoxycephalosporin C 1rxg all beta proteins 275 Oxidoreductase 24 25 0.3 1 E+P
Cyclins 1fin-B all alpha proteins 260 transferase 23 34 0.3 0.3 E
Triosephosphate isomerase 1amk alpha and beta proteins 250 gluconeogenesis 73 47 0.3 0.3 E+P
Indole-3-glyceophosphate synthase 1a53 alpha and beta proteins 247 Synthase 19 30 0.3 0.3 E+P
Estrogen receptor 3ert all alpha proteins 247 Nuclear receptor 93 44 0.3 0.3 E
Adenylate kinase 1aky alpha and beta proteins 218 phosphotransferase 42 45 0.3 0.3 E+P
HSP-90 1am1 alpha and beta proteins 213 chaperone 78 55 0.3 0.3 E+P
Dihydropteroate Synthase 1aj2 alpha and beta proteins 282 Synthase 42 37 0.3 0.3 P
Astacin 1ast alpha and beta proteins 200 Metalloproteinase (hydrolase) 38 44 0.3 0.3 E
Growth hormone receptor 1a22-B all alpha proteins 192 Growth hormone receptor 21 30 1 0.3 E
Growth hormone 1a22-A all alpha proteins 180 Growth hormone 67 36 0.3 0.3 E
Tpr1 domain of Hop 1elr all alpha proteins 128 Chaperone 41 30 0.3 1 E+P
Pseudoazurin 1bqk all beta proteins 124 electron transport 29 37 1 10 E+P
Acyl CoA binding protein 1aca all alpha proteins 86 Binding protein 38 46 5 0.3 E
Biotinyl domain 1bdo beta proteins 80 Carboxylase 37 45 >30 20 E+P
Poly-A binding protein 1cvj alpha and beta proteins 169 Gene regulation 73 26 0.3 0.3 E
Myoglobin 1a6m all alpha proteins 151 Oxygen transport 171 35 5 0.3 E
Galectin-3 CRD 1a3k all beta proteins 137 Galectin carbohydrate recognition domain 70 32 1 1 E
Tpr2a-domain of Hop 1elw all alpha proteins 117 Chaperone 42 41 1 1 E
Pyruvate decarboxylase 1pvd alpha and beta proteins 537 Carbon-Carbon lyase 43 37 0.3 0.3 E+P
Mannose binding protein 2msb alpha and beta proteins 113 Binds Mannose 71 34 5 0.3 E
c-Src tyrosine kinase; SH3 1nlo all beta proteins 56 Tyrosine Kinase 71 37 5 0.3 E
c-Src tyrosine kinase; SH2 1a09 alpha and beta proteins 106 Tyrosine Kinase 137 34 1 0.3 E