Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.
Venner E, Lisewski AM, Erdin S, Ward RW, Amin S and Lichtarge O.
PLoS ONE - In Press - November, 2010
As a result of high-throughput Structural Genomics, many protein structures have no known molecular function. This study aims to recover these molecular functions in three steps, which compare small but evolutionarily important structural motifs among all protein structures. First, Evolutionary Trace Annotation, or ETA identifies which proteins have local similarities in both their evolutionary and structural features. Next, all the proteins with ETA similarities are linked together to create a structural proteomic network. Finally, competing functional labels diffuse link-by-link over the whole network, starting from proteins with known functions. As a result all the nodes are assigned significance z-scores for every function they can be linked to; the most significant function at each node, if any, wins and defines its annotation. This diffusion process is maximally smooth locally and does not lose initial annotations. In high-throughput controls it is 99% and 97% accurate at predicting the third and the fourth Enzyme Commission (EC) numbers, respectively, at half coverage. False positive rates are 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. And a direct experimental validation of a predicted carboxylesterase (EC 18.104.22.168) in a protein from the increasingly drug-resistant bacterium Staphylococcus aureus illustrates the effectiveness of this approach. This study is a further step linking molecular function to a few, specific and recognizable evolutionary important residues, identified by Evolutionary Trace. And it specifically points to the power of competitive global network diffusion to raise specificity and sensitivity. A web server is under construction at http://mammoth.bcm.tmc.edu/networks.