Boosting Granular Support Vector Machines for the Accurate Prediction of Protein-Nucleotide Binding Sites

Author(s): Yi-Heng Zhu, Jun Hu, Yong Qi, Xiao-Ning Song, Dong-Jun Yu*.

Journal Name: Combinatorial Chemistry & High Throughput Screening
Accelerated Technologies for Biotechnology, Bioassays, Medicinal Chemistry and Natural Products Research

Volume 22 , Issue 7 , 2019

Become EABM
Become Reviewer


Aim and Objective: The accurate identification of protein-ligand binding sites helps elucidate protein function and facilitate the design of new drugs. Machine-learning-based methods have been widely used for the prediction of protein-ligand binding sites. Nevertheless, the severe class imbalance phenomenon, where the number of nonbinding (majority) residues is far greater than that of binding (minority) residues, has a negative impact on the performance of such machine-learning-based predictors.

Materials and Methods: In this study, we aim to relieve the negative impact of class imbalance by Boosting Multiple Granular Support Vector Machines (BGSVM). In BGSVM, each base SVM is trained on a granular training subset consisting of all minority samples and some reasonably selected majority samples. The efficacy of BGSVM for dealing with class imbalance was validated by benchmarking it with several typical imbalance learning algorithms. We further implemented a protein-nucleotide binding site predictor, called BGSVM-NUC, with the BGSVM algorithm.

Results: Rigorous cross-validation and independent validation tests for five types of proteinnucleotide interactions demonstrated that the proposed BGSVM-NUC achieves promising prediction performance and outperforms several popular sequence-based protein-nucleotide binding site predictors. The BGSVM-NUC web server is freely available at for academic use.

Keywords: Imbalance learning, granular computing, support vector machine, classifier ensemble, protein-nucleotide binding sites.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [455 - 469]
Pages: 15
DOI: 10.2174/1386207322666190925125524
Price: $58

Article Metrics

PDF: 4