Background: Metalloproteins are highly involved in many biological processes, including
catalysis, recognition, transport, transcription, and signal transduction. The metal ions they bind usually
play enzymatic or structural roles in mediating these diverse functional roles. Thus, the systematic
analysis and prediction of metal-binding sites using sequence and/or structural information are crucial
for understanding their sequence-structure-function relationships.
Objective: The objective of this work is to develop a new computational algorithm for improved
prediction of major types of metal-binding sites.
Method: We propose MetalExplorer (http://metalexplorer.erc.monash.edu.au/), a new machine
learning-based method for predicting eight different types of metal-binding sites (Ca, Co, Cu, Fe, Ni,
Mg, Mn, and Zn) in proteins. Our approach combines heterogeneous sequence-, structure-, and residue
contact network-based features in a random forest machine-learning framework.
Results: The predictive performance of MetalExplorer was tested by cross-validation and independent
tests using non-redundant datasets of known structures. This method applies a two-step feature selection
approach based on the maximum relevance minimum redundancy and forward feature selection to
identify the most informative features that contribute to the prediction performance. With a precision of
60%, MetalExplorer achieved high recall values, which ranged from 59% to 88% for the eight metal ion
types in fivefold cross-validation tests. Moreover, the common and type-specific features in the optimal
subsets of all metal ions were characterized in terms of their contributions to the overall performance.
Conclusion: In terms of both benchmark and independent datasets at the 60% precision control level,
MetalExplorer compared favorably with an existing metalloprotein prediction tool, SitePredict.
MetalExplorer is expected to be a powerful tool for the accurate prediction of potential metal-binding
sites and it should facilitate the functional analysis and rational design of novel metalloproteins.