Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
BMC Bioinformatics
2006 Mar 06;7:152. doi: 10.1186/1471-2105-7-152.
Show Gene links
Show Anatomy links
Domain-based small molecule binding site annotation.
Snyder KA
,
Feldman HJ
,
Dumontier M
,
Salama JJ
,
Hogue CW
.
Abstract
Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites. Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at http://smid.blueprint.org. The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives. By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.
Figure 1. SMID record as viewed from the SMID web interface. This record was derived from PDB entry 1HG1, which shows an interaction between an Asparaginase domain (residues 15–322 of chain A, identified by RPS-BLAST with an E-value of 1.34e-103) and D-Aspartate. The GI for 1HG1 chain A is 15825850. For the SMID record shown, seven of the eight residues of the binding site are located within the Asparaginase domain.
Figure 2. SMID small molecule information page, as viewed from the SMID web interface. The small molecule page shown here indicates that 8 SMID records involve the molecule D-Aspartate.
Figure 3. A CDD domain family multiple alignment. All sequences from a CDD domain family are listed including the consensus. In addition, the sequence for the PDB protein from which the SMID interaction was derived is included, with its PDB code highlighted in red. Lowercase residues do not align with the consensus and represent insertions or deletions relative to the consensus. Small molecule binding site residues are mapped to the domain family sequences from the parent PDB sequence using the following colour-coding scheme: red for conserved residues, blue for similar residues and yellow for non-conserved residues. In cases where a binding site aligns to a gap in the consensus, conservation cannot be measured and thus no coloured residue is displayed. Note that some binding site residues may be highlighted in addition to those associated with the parent PDB sequence if there are redundant interactions from other PDB files with a similar binding site. This alignment has been truncated for clarity.
Figure 4. A 3-D SMID interaction. The x-ray crystallographic structure of Erwinia chrysanthemi L-Asparaginase associating with D-Aspartate (PDB ID: 1HG1), as viewed by Cn3D. The structure was annotated by SMID to highlight the domain residues (purple), domain residues contacting the D-Aspartate molecule (green) and the non-domain residues (grey). The D-Aspartate small molecule ligand is shown in space-fill format. The sequence/alignment viewer provides sequences for all chains found in the PDB record. For the sequence involved in the small molecule interaction, residues are colour-coded using the same scheme seen in the structural model.
Figure 5. SMID-BLAST validation final ligand score distributions. a) Distribution of predictions in the validation set as a function of final ligand score. The solid line represents percent correct predictions, while dotted line represents predictions that were not observed in the PDB validation set; these latter interactions are comprised of both false positives, and true positives that simply have not been observed yet. For example, 12% of correct predictions had a final ligand score below 100, while 21% of unvalidated predictions had a final ligand score below 100. The dashed line represents an estimate of the distribution of final ligand scores for false positives as outlined in the text. b) Coverage as a function of final ligand score, for the predictions which were observed in the PDB validation set. Coverage is defined as the percent of true binding site residues which were included in the predicted binding site.
Figure 6. Selected chemical structures. Chemical structures of selected SMID-BLAST small molecule hits from query proteins MiaB, Phosphoglycerate Mutase, TrpRS and TyrRS.
Figure 7. Binding sites predicted by SMID-BLAST. a) Shown is a comparative model of the predicted Elp3 domain of MiaB. The iron-sulfur cluster (orange) and SAM (CPK stick model) have had their co-ordinates transferred from the modelling template, PDB 1OLT chain A to illustrate how they might bind. The predicted Fe-S binding site residues are indicated in red, the predicted SAM binding residues are shown in purple, and the three cysteine residues which interact with the Fe-S cluster are indicated in yellow. A mixture of red and purple was used for residues common to both binding sites. b) Structural alignment of PDB 1RII chain A (phosphoglycerate mutase from M. tuberculosis, blue) and 2BIF chain A (6-phosphopructo-2-kinase/fructose-2,6-bisphosphatase from Rattus norvegicus, yellow). The small molecules from 2BIF are also shown along with their PDB short labels. Purple molecules associate with the N-terminal domain of 2BIF chain A, while blue molecules associate with the C-temrinal domain. Note that BOG was part of the crystallization buffer in this example. Structures were aligned with Swiss PDBViewer.
Figure 8. An overview of how protein-small molecule interactions are identified from the PDB and utilized to generate SMID records. The process of 'Interaction Tagging' involves the identification of protein-small molecule interactions that involve i) single atom contacts ii) an unknown protein sequence iii) a biologically irrelevant small molecule iv) false contacts with biologically relevant ions using a Support Vector Machine. See text for details.
Ahmed,
Structure of oncomodulin refined at 1.85 A resolution. An example of extensive molecular aggregation via Ca2+.
1990, Pubmed
Ahmed,
Structure of oncomodulin refined at 1.85 A resolution. An example of extensive molecular aggregation via Ca2+.
1990,
Pubmed
Alfarano,
The Biomolecular Interaction Network Database and related tools 2005 update.
2005,
Pubmed
,
Xenbase
Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997,
Pubmed
Apweiler,
UniProt: the Universal Protein knowledgebase.
2004,
Pubmed
Bajorath,
Integration of virtual and high-throughput screening.
2002,
Pubmed
Bajorath,
Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening.
2001,
Pubmed
Bateman,
The Pfam protein families database.
2004,
Pubmed
Berman,
The Protein Data Bank.
2002,
Pubmed
Bond,
Mechanistic implications for Escherichia coli cofactor-dependent phosphoglycerate mutase based on the high-resolution crystal structure of a vanadate complex.
2002,
Pubmed
Brown,
Evidence for the early divergence of tryptophanyl- and tyrosyl-tRNA synthetases.
1997,
Pubmed
Burke,
Generating diverse skeletons of small molecules combinatorially.
2003,
Pubmed
Chen,
MMDB: Entrez's 3D-structure database.
2003,
Pubmed
Chen,
Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule.
2001,
Pubmed
Cosper,
Direct FeS cluster involvement in generation of a radical in lysine 2,3-aminomutase.
2000,
Pubmed
Esberg,
Identification of the miaB gene, involved in methylthiolation of isopentenylated A37 derivatives in the tRNA of Salmonella typhimurium and Escherichia coli.
1999,
Pubmed
Fothergill-Gilmore,
The phosphoglycerate mutases.
1989,
Pubmed
Frey,
S-Adenosylmethionine: a wolf in sheep's clothing, or a rich man's adenosylcobalamin?
2003,
Pubmed
Gilliland,
The Biological Macromolecule Crystallization Database: crystallization procedures and strategies.
2002,
Pubmed
Guex,
SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling.
1997,
Pubmed
Hall,
Solution- and solid-phase strategies for the design, synthesis, and screening of libraries based on natural product templates: a comprehensive survey.
2001,
Pubmed
Hendlich,
Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions.
2003,
Pubmed
Hogue,
Cn3D: a new generation of three-dimensional molecular structure viewer.
1997,
Pubmed
Jedrzejas,
Structure, function, and evolution of phosphoglycerate mutases: comparison with fructose-2,6-bisphosphatase, acid phosphatase, and alkaline phosphatase.
2000,
Pubmed
Karchin,
Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry.
2003,
Pubmed
Karplus,
Combining local-structure, fold-recognition, and new fold methods for protein structure prediction.
2003,
Pubmed
Kinoshita,
eF-site and PDBjViewer: database and viewer for protein functional sites.
2004,
Pubmed
Labute,
On the perception of molecules from 3D atomic coordinates.
2005,
Pubmed
Laskowski,
PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids.
2005,
Pubmed
Layer,
Structure and function of radical SAM enzymes.
2004,
Pubmed
Letunic,
SMART 4.0: towards genomic data integration.
2004,
Pubmed
Marchler-Bauer,
CD-Search: protein domain annotations on the fly.
2004,
Pubmed
Marchler-Bauer,
CDD: a curated Entrez database of conserved domain alignments.
2003,
Pubmed
Milburn,
Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis.
1998,
Pubmed
Mulder,
InterPro, progress and status in 2005.
2005,
Pubmed
Murzin,
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
1995,
Pubmed
Müller,
The 1.70 angstroms X-ray crystal structure of Mycobacterium tuberculosis phosphoglycerate mutase.
2005,
Pubmed
Panchenko,
Prediction of functional sites by analysis of sequence and structure conservation.
2004,
Pubmed
Pearson,
Improved tools for biological sequence comparison.
1988,
Pubmed
Pierrel,
MiaB protein is a bifunctional radical-S-adenosylmethionine enzyme involved in thiolation and methylation of tRNA.
2004,
Pubmed
Pilkis,
Active site sequence of hepatic fructose-2,6-bisphosphatase. Homology in primary structure with phosphoglycerate mutase.
1987,
Pubmed
Porter,
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data.
2004,
Pubmed
Qiu,
Crystal structure of Staphylococcus aureus tyrosyl-tRNA synthetase in complex with a class of potent and specific inhibitors.
2001,
Pubmed
Rigden,
Polyanionic inhibitors of phosphoglycerate mutase: combined structural and biochemical analysis.
1999,
Pubmed
Rigden,
A cofactor-dependent phosphoglycerate mutase homolog from Bacillus stearothermophilus is actually a broad specificity phosphatase.
2001,
Pubmed
Salama,
Automatic annotation of BIND molecular interactions from three-dimensional structures.
,
Pubmed
Schreiber,
Target-oriented and diversity-oriented organic synthesis in drug discovery.
2000,
Pubmed
Shearer,
The role of in vitro ADME assays in antimalarial drug discovery and development.
2005,
Pubmed
Sheu,
PRECISE: a Database of Predicted and Consensus Interaction Sites in Enzymes.
2005,
Pubmed
Tatusov,
The COG database: an updated version includes eukaryotes.
2003,
Pubmed
Wang,
Cn3D: sequence and structure views for Entrez.
2000,
Pubmed
Wesche,
High throughput screening for protein kinase inhibitors.
2005,
Pubmed
Wong,
Applying combinatorial chemistry and biology to food research.
2004,
Pubmed
Yamaguchi,
Het-PDB Navi.: a database for protein-small molecule interactions.
2004,
Pubmed
Yang,
Crystal structures that suggest late development of genetic code components for differentiating aromatic side chains.
2003,
Pubmed
Zavodszky,
Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening.
2002,
Pubmed