Generic placeholder image

Current Proteomics


ISSN (Print): 1570-1646
ISSN (Online): 1875-6247

Research Article

Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information

Author(s): Yijie Ding, Feng Chen, Xiaoyi Guo*, Jijun Tang and Hongjie Wu*

Volume 17, Issue 4, 2020

Page: [302 - 310] Pages: 9

DOI: 10.2174/1570164616666190417100509

Price: $65


Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive.

Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information.

Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins.

Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets.

Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.

Keywords: DNA-binding proteins, feature extraction, support vector machine, multiple kernel learning, kernel alignment, binding sites.

Graphical Abstract
Shen, C.; Ding, Y.; Tang, J.; Xu, X.; Guo, F. An ameliorated prediction of drug-target interactions based on multi-scale discrete wavelet transform and network features. Int. J. Mol. Sci., 2017, 18(8), 1781.
[] [PMID: 28813000]
Ding, Y.; Tang, J.; Guo, F. Identification of drug-target interactions via multiple information integration. Inf. Sci., 2017, 418, 546-560.
Zhang, W.; Chen, Y.; Li, D. Drug-Target interaction prediction through label propagation with linear neighborhood information. Molecules, 2017, 22(12), 2056.
[] [PMID: 29186828]
Ezzat, A.; Zhao, P.; Wu, M.; Li, X.; Kwoh, C.K. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2016, 646-656.
Ding, Y.; Tang, J.; Guo, F. identification of protein-protein interactions via a novel matrix-based sequence representation model with amino acid contact information. Int. J. Mol. Sci., 2016, 17(10), 1623.
[] [PMID: 27669239]
Ding, Y.; Tang, J.; Guo, F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics, 2016, 17(1), 398.
[] [PMID: 27677692]
Li, Z.; Zhao, Y.; Pan, G.; Tang, J.; Guo, F. a novel peptide binding prediction approach for hla-dr molecule based on sequence and structural information. BioMed Res. Int., 2016, 2016 3832176
[] [PMID: 27340658]
Huang, Y.A.; You, Z.H.; Chen, X.; Chan, K.; Luo, X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics, 2016, 17(1), 184.
[] [PMID: 27112932]
Bock, J.R.; Gough, D.A. Whole-proteome interaction mining. Bioinformatics, 2003, 19(1), 125-134.
[] [PMID: 12499302]
Ding, Y.; Tang, J.; Guo, F. Identification of residue-residue contacts using a novel coevolution- based method. Curr. Proteomics, 2016, 13, 122-129.
Guo, F.; Li, S.C.; Wei, Z.; Zhu, D.; Shen, C.; Wang, L. Structural neighboring property for identifying protein-protein binding sites. BMC Syst. Biol., 2015, 9(Suppl. 5), S3.
[] [PMID: 26356630]
Ding, Y.; Tang, J.; Guo, F. Identification of Protein-ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model., 2017, 57(12), 3149-3161.
[] [PMID: 29125297]
Yu, D.J.; Hu, J.; Yang, J.; Shen, H.B.; Tang, J.; Yang, J.Y. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2013, 10(4), 994-1008.
[] [PMID: 24334392]
Ofran, Y.; Mysore, V.; Rost, B. Prediction of DNA-binding residues from sequence. Bioinformatics, 2007, 23(13), i347-i353.
[] [PMID: 17646316]
Roy, A.; Yang, J.; Zhang, Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res., 2012, 40(Web Server issue), W471-7.
[] [PMID: 22570420]
Zhang, W.; Qu, Q.; Zhang, Y.; W, W. The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions. Neurocomputing, 2018, 273, 526-534.
Ding, Y.; Tang, J.; Guo, F. Identification of Drug-side Effect Association via Semi-supervised Model and Multiple Kernel Learning. IEEE J. Biomed. Health Inform., 2018.
[] [PMID: 30507518]
Zhang, W.; Zou, H.; Luo, L.; Liu, Q.; Wu, W.; Xiao, W. Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing, 2016, 173, 979-987.
Zhang, W.; Yue, X.; Huang, F.; Liu, R.; Chen, Y.; Ruan, C. Predicting drug-disease associations and their therapeutic function based on the drug-disease association bipartite network. Methods, 2018, 145, 51-59.
[] [PMID: 29879508]
Ding, Y.; Tang, J.; Guo, F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325, 211-224.
Wang, Y.; Ding, Y.; Guo, F.; Wei, L.; Tang, J. Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS One, 2017, 12(9) e0185587
[] [PMID: 28961273]
Wei, L.; Tang, J.; Quan, Z. Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci., 2016, 384, 135-144.
Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K-C. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One, 2014, 9(9) e106691
[] [PMID: 25184541]
Shen, C.; Ding, Y.; Tang, J.; Guo, F. Multivariate information fusion with fast kernel learning to kernel ridge regression in predicting LncRNA-protein interactions. Front. Genet., 2019, 9, 716.
[] [PMID: 30697228]
Shen, C.; Ding, Y.; Tang, J.; Jiang, L.; Guo, F. LPI-KTASLP: prediction of lncRNA-protein interaction by semi-supervised link learning with multivariate information. IEEE Access, 2019, 1-1.
Zhao, Q.; Zhang, Y.; Hu, H.; Ren, G.; Zhang, W.; Liu, H. IRWNRLPI: Integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction Prediction. Front. Genet., 2018, 9, 239.
[] [PMID: 30023002]
Jiang, L.; Xiao, Y.; Ding, Y.; Tang, J.; Guo, F. FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association. BMC Genomics, 2018, 19(Suppl. 10), 911.
[] [PMID: 30598109]
Chen, X.; Qu, J.; Yin, J. TLHNMDA: Triple layer heterogeneous network based inference for MiRNA-disease association prediction. Front. Genet., 2018, 9, 234.
[] [PMID: 30018632]
Chen, X.; Yin, J.; Qu, J.; Huang, L. MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction. PLOS Comput. Biol., 2018, 14(8) e1006418
[] [PMID: 30142158]
Jiang, L.; Ding, Y.; Tang, J.; Guo, F. MDA-SKF: similarity kernel fusion for accurately discovering miRNA-disease association. Front. Genet., 2018, 9, 618.
[] [PMID: 30619454]
Shen, Y.; Tang, J.; Guo, F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. J. Theor. Biol., 2019, 462, 230-239.
[] [PMID: 30452958]
Jiang, L.; Xiao, Y.; Ding, Y.; Tang, J.; Guo, F. Discovering cancer subtypes via an accurate fusion strategy on multiple profile data. Front. Genet., 2019, 10, 20.
[] [PMID: 30804977]
Nimrod, G.; Schushan, M.; Szilágyi, A.; Leslie, C.; Ben-Tal, N. iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics, 2010, 26(5), 692-693.
[] [PMID: 20089514]
Bhardwaj, N.; Langlois, R.E.; Zhao, G.; Lu, H. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res., 2005, 33(20), 6486-6493.
[] [PMID: 16284202]
Ahmad, S.; Sarai, A. Moment-based prediction of DNA-binding proteins. J. Mol. Biol., 2004, 341(1), 65-71.
[] [PMID: 15312763]
Cai, Y.D.; Lin, S.L. Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta, 2003, 1648(1-2), 127-133.
[] [PMID: 12758155]
Yu, X.; Cao, J.; Cai, Y.; Shi, T.; Li, Y. Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol., 2006, 240(2), 175-184.
[] [PMID: 16274699]
Liu, B.; Xu, J.; Fan, S.; Xu, R.; Zhou, J.; Wang, X. PseDNA-Pro: DNA-Binding protein identification by combining Chou’s PseAAC and physicochemical distance transformation. Mol. Inform., 2015, 34(1), 8-17.
[] [PMID: 27490858]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
[] [PMID: 9254694]
Kumar, M.; Gromiha, M.M.; Raghava, G.P. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics, 2007, 8, 463.
[] [PMID: 18042272]
Li, X.; Liao, B.; Shu, Y.; Zeng, Q.; Luo, J. Protein functional class prediction using global encoding of amino acid sequence. J. Theor. Biol., 2009, 261(2), 290-293.
[] [PMID: 19631664]
You, Z.H.; Zhu, L.; Zheng, C.H.; Yu, H.J.; Deng, S.P.; Ji, Z. Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics, 2014, 15(Suppl. 15), S9.
[] [PMID: 25474679]
Feng, Z.P.; Zhang, C.T. Prediction of membrane protein types based on the hydrophobic index of amino acids. J. Protein Chem., 2000, 19(4), 269-275.
[] [PMID: 11043931]
Jeong, J.C.; Lin, X.; Chen, X.W. On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2011, 8(2), 308-315.
[] [PMID: 20855926]
Huang, Y.A.; You, Z.H.; Gao, X.; Wong, L.; Wang, L. using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence. BioMed Res. Int., 2015, 2015 902198
[] [PMID: 26634213]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn., 1995, 20, 273-297.
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines; ACM, 2011.
Lin, W.Z.; Fang, J.A.; Xiao, X.; Chou, K.C. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One, 2011, 6(9) e24756
[] [PMID: 21935457]
Kumar, K.K.; Pugalenthi, G.; Suganthan, P.N. DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn., 2009, 26(6), 679-686.
[] [PMID: 19385697]
Liu, B.; Wang, S.; Wang, X. DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep., 2015, 5, 15479.
[] [PMID: 26482832]
Xu, R.; Zhou, J.; Wang, H.; He, Y.; Wang, X.; Liu, B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol., 2015, 9(Suppl. 1), S10.
[] [PMID: 25708928]
Lou, W.; Wang, X.; Chen, F.; Chen, Y.; Jiang, B.; Zhang, H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One, 2014, 9(1) e86703
[] [PMID: 24475169]

Rights & Permissions Print Export Cite as
© 2023 Bentham Science Publishers | Privacy Policy