The Evolutionary Trace Annotation (ETA) server is designed to annotate protein structures—particularly those from structural genomics projects—with likely enzymatic functions. Beginning with a protein structure of interest, ETA uses the Evolutionary Trace (ET) to identify a likely functional site in a query protein. Based on this funcitonal site, a template picker uses heuristics to choose a 3D template to capture the determinants of function. This template is searched against a non-redundant subset of the PDB (no two proteins have more than 90% sequence identity) using a geometric matching algorithm to identify structural matches (a one-to-many search). Simultaneously, templates from those proteins are also matched against the query protein (a many-to-one search). Matches that occur in both directions (reciprocal matches) are the most reliable. A support vector machine (SVM) filters matches identify those with the most geometric and evolutionary similarity to the template. The EC numbers of the remaining matches suggest possible functions, and plurality voting suggests the function seen most often as the predicted function.
The server can perform these steps automatically, but many steps can be customized: the Evolutionary Trace analysis, the choice of residues, and the choice of allowed amino acid labels. The process is explained with an example in the sections below.
As an example, we will use Methanococcus jannaschii shikimate dehydrogenase.
To begin the ETA analysis, enter the PDB code of the protein structure where prompted, including a 1-digit chain identifier (use an "_" for a null chain): in this case, "1nvtA". Click "submit" to continue.
An ET analysis provides information on the evolutionary importance of each residue. In this case the analysis is cached, so the server continues to step 2. If no analysis is available, the server runs a trace automatically with default parameters.
To get more control over the ET analysis, such as a custom selection of sequences, multiple sequence alignment, tree, or parameters create a custom analysis with ET Wizard (part of the ET Viewer package). The zip file output by ET Wizard can then be uploaded to the server by clicking "Browse..." to locate the file and "Upload" to submit it, rather than submitting a PDB code.
2. Template Residues
With the protein structure selected and ET data available, the server predicts a functional site by identifying a cluster of evolutionarily important residues on the surface of the protein (10 in this case), derive a template based on the six most important (71, 73, 75, 96, 111, 259), and render an image of the template. This template may be explored in depth by clicking on the image to download a PyMOL session file.Click "Submit Template" to continue with the analysis.
Alternatively, templates may be customized. To substitute residue 70 for residue 71, uncheck the box by "71", enter "70" in one of the boxes in the "Custom Residues" table, and click the check box beside it. The template picture and PyMOL update dynamically to match the selection. The final template must be composed of exactly six residues.
3. Amino Acid Labels
The server next identifies possible amino acid types for each template residue based on the multiple sequence alignment used by ET. Each unique combination is listed, along with the number of times it occurs in the alignment. In this case, there are six unique combinations, four of which are pre-selected because they occur more than once. These include the native amino acids (T71, P73, K75, N96, D111, Q259), which occur 397 times and is in boldface, a combination that occurs 69 times (T71, P73, K75, N96, D111, Q259), and two combinations that each occur twice. Combinations may be turned on or off using their check boxes. Click "Find Matches" to begin the template search.
Custom amino acid labels can also be added. Click "Add Custom Labels" button and enter the new amino acid types. For example, it might be desirable to have the default amino acid types, except replacing the aspartate with glutamate. To do so, enter "T P K N E Q" in the apropriate fields. More conbinations can be added in the same manner.
While the search is running, status messages are printed to the screen.
At any point, to return to an earlier step and make changes, click on the header for that step.
The results page begins with reciprocal predictions, that is, those based on reciprocal matches. The most likely function is the one which appears the most times in the matched proteins. It is listed first, as a 3-digit EC number: 1.1.1. Clicking on this takes you to its definition on the EC web page, "oxidoreductase acting on the CH-OH group of donors with NAD+ or NADP+ as acceptors"—the correct function. Also shown is the number of matches with that function; 4 in this case. Next, the number of additional predictions is shown; in this case, none.
Below that is the initially-hidden "Details" section. Click on it to view more information about the predictions and matches. Each predicted function is listed. For each function, the matches with that function are also listed. In this case, among other matches there is a match to 1nytD, which is revealed to be an E. coli shikimate dehydrogenase by clicking on its PDB code to see the corresponding entry in PDBSum. For each such reciprocal match , data is listed for the OTM and MTO components. This includes the match RMSD (a low 0.372 and 0.527 angstroms respectively), which describes the geometric similarity of the matches; the average difference in ET scores between source and target residues (also low at 0.017 and 0.026), which describes the similarity of their evolutionary importance; and the template and match residues. These residues are colored red for residues which appear only in the OTM search, blue for residues which appear only in the MTO search, and purple for residues which appear reciprocally. In this case there are a total of eight residues: two red residues from the 1nvtA template that match in 1nytD; two blue residues from the 1nytD template that match in 1nvtA; and four purple residues that are found in both templates.
After this is a "Create Images" link, which will render and load images of the template and match residues in the source and target proteins. Residues are colored using the same scheme as the residue numbers above. Links to the PyMOL sessions for these images are also provided.
Matches which do not have a known enzymatic function are also grouped together and listed. In this case there is one such match, to 1npyA, a Haemophilus influenzae protein of unknown function.
Next is the secton "All Predictions". This contains functions predicted not just by the reciprocal matches, but also those that were identified using only an OTM or MTO search. These predictions may be less reliable than the reciprocal predictions, but may include relveant functions not suggested by reciprocal matching. Data is presented in this section as in the "Reciprocal Predictions" section. In the case of 1nvtA, there is no new data here.