Computational Methods for Predicting DNA Binding Proteins

Gaofeng      Pan; Jiandong	      Wang; Liang	      Zhao; William	      Hoskins; Jijun	      Tang
Abstract

Background: DNA-binding proteins are very important to many biomolecular functions. The traditional experimental methods are expensive and time-consuming, so, computational methods that can predict whether a protein is a DNA-binding protein or not are very helpful to researchers. Machine learning has been widely used in many research areas. Many researchers have proposed machine learning methods for DNA-binding protein prediction, and this paper highlights their advantages and disadvantages.
Objective: There are many computational methods that can predict DNA-binding proteins. Every method uses different features and different classifier algorithms. In this paper, a review of these methods is provided to find out some common procedures that can help researchers to develop more accurate methods.
Methods: Firstly, the information stored in the protein sequence and gene sequence is presented. That information is the basis to find out the patterns leading to binding. Then, feature extraction methods and classifier algorithms are discussed. At last, some commonly used benchmark datasets are analysed and evaluated by methods.
Conclusion: In this review, we analyzed some popular computational methods to predict DNAbinding protein. From those methods, we highlighted many features necessary to build up an accurate DNA-binding protein classifier. This can also help researchers to build up more useful computational tools. Currently, there are some machine learning methods with good performance in predicting DNAbinding proteins. The performance can be improved by using different kinds of features and classifiers.
Keywords: DNA-binding protein, machine learning, feature extraction, PseAAC, DWT, benchmark dataset.
« Previous Next »
Graphical Abstract

[1] 
Lee, T.I.; Johnstone, S.E.; Young, R.A. Chromatin immunoprecipitation and microarray-based analysis of protein location. Nat. Protoc.,  2006, 1(2), 729-748.
[http://dx.doi.org/10.1038/nprot.2006.98] [PMID: 17406303] 
[2] 
Tsai, C.C.; Jain, S.C.; Sobell, H.M. X-ray crystallographic visualization of drug-nucleic acid intercalative binding: structure of an ethidium-dinucleoside monophosphate crystalline complex, Ethidium: 5-iodouridylyl (3′-5′) adenosine. Proc. Natl. Acad. Sci. USA,  1975, 72(2), 628-632.
[http://dx.doi.org/10.1073/pnas.72.2.628] [PMID: 1054844] 
[3] 
Pan, G.; Tang, J.; Guo, F. Analysis of co-associated transcription factors via ordered adjacency differences on motif distribution. Sci. Rep.,  2017, 7, 43597.
[http://dx.doi.org/10.1038/srep43597] [PMID: 28240320] 
[4] 
Alhamdoosh, M.; Wang, D. Modelling the transcription factor DNA-binding affinity using genome-wide ChIP-based data. bioRxiv,  2016, 061978
[http://dx.doi.org/10.1101/061978] 
[5] 
Wang, D.; Alhamdoosh, M.; Pedrycz, W. ANFIS-based fuzzy systems for searching DNA-protein binding sites. bioRxiv,  2016, 058800
[http://dx.doi.org/10.1101/058800] 
[6] 
Stawiski, E.W.; Gregoret, L.M.; Mandel-Gutfreund, Y. Annotating nucleic acid-binding function based on protein structure. J. Mol. Biol.,  2003, 326(4), 1065-1079.
[http://dx.doi.org/10.1016/S0022-2836(03)00031-7] [PMID: 12589754] 
[7] 
Gao, M.; Skolnick, J. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res.,  2008, 36(12), 3978-3992.
[http://dx.doi.org/10.1093/nar/gkn332] [PMID: 18515839] 
[8] 
Ahmad, S.; Sarai, A. Moment-based prediction of DNA-binding proteins. J. Mol. Biol.,  2004, 341(1), 65-71.
[http://dx.doi.org/10.1016/j.jmb.2004.05.058] [PMID: 15312763] 
[9] 
Zhao, H.; Yang, Y.; Zhou, Y. Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics,  2010, 26(15), 1857-1863.
[http://dx.doi.org/10.1093/bioinformatics/btq295] [PMID: 20525822] 
[10] 
Zhou, W.; Yan, H. Prediction of DNA-binding protein based on statistical and geometric features and support vector machines. Proteome Sci.,  2011, 9(1)(Suppl. 1), S1.
[http://dx.doi.org/10.1186/1477-5956-9-S1-S1] [PMID: 22166014] 
[11] 
Bhardwaj, N.; Langlois, R.E.; Zhao, G.; Lu, H. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res.,  2005, 33(20), 6486-6493.
[http://dx.doi.org/10.1093/nar/gki949] [PMID: 16284202] 
[12] 
Bhardwaj, N.; Lu, H. Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions. FEBS Lett.,  2007, 581(5), 1058-1066.
[http://dx.doi.org/10.1016/j.febslet.2007.01.086] [PMID: 17316627] 
[13] 
Gao, M.; Skolnick, J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLOS Comput. Biol.,  2009, 5(11) e1000567
[http://dx.doi.org/10.1371/journal.pcbi.1000567] [PMID: 19911048] 
[14] 
Szilágyi, A.; Skolnick, J. Efficient prediction of nucleic acid binding function from low-resolution protein structures. J. Mol. Biol.,  2006, 358(3), 922-933.
[http://dx.doi.org/10.1016/j.jmb.2006.02.053] [PMID: 16551468] 
[15] 
Nimrod, G.; Schushan, M.; Szilágyi, A.; Leslie, C.; Ben-Tal, N. iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics,  2010, 26(5), 692-693.
[http://dx.doi.org/10.1093/bioinformatics/btq019] [PMID: 20089514] 
[16] 
Kumar, K.K.; Pugalenthi, G.; Suganthan, P.N. DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn.,  2009, 26(6), 679-686.
[http://dx.doi.org/10.1080/07391102.2009.10507281] [PMID: 19385697] 
[17] 
Cai, Y.D.; Lin, S.L. Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta,  2003, 1648(1-2), 127-133.
[http://dx.doi.org/10.1016/S1570-9639(03)00112-2] [PMID: 12758155] 
[18] 
Yu, X.; Cao, J.; Cai, Y.; Shi, T.; Li, Y. Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol.,  2006, 240(2), 175-184.
[http://dx.doi.org/10.1016/j.jtbi.2005.09.018] [PMID: 16274699] 
[19] 
Ding, Y.; Tang, J.; Guo, F. Identification of drug-target interactions via multiple information integration. Inf. Sci.,  2017, 418(C), 546-560.
[http://dx.doi.org/10.1016/j.ins.2017.08.045] 
[20] 
Liu, B.; Xu, J.; Fan, S.; Xu, R.; Zhou, J.; Wang, X. PseDNA‐Pro: DNA‐binding protein identification by combining Chou’s PseAAC and physicochemical distance transformation. Mol. Inform.,  2015, 34(1), 8-17.
[http://dx.doi.org/10.1002/minf.201400025] [PMID: 27490858] 
[21] 
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.,  1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694] 
[22] 
Kumar, M.; Gromiha, M.M.; Raghava, G.P. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics,  2007, 8(1), 463.
[http://dx.doi.org/10.1186/1471-2105-8-463] [PMID: 18042272] 
[23] 
Liu, B.; Wang, S.; Wang, X. DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep.,  2015, 5, 15479.
[http://dx.doi.org/10.1038/srep15479] [PMID: 26482832] 
[24] 
Ahmad, S.; Gromiha, M.M.; Sarai, A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics,  2004, 20(4), 477-486.
[http://dx.doi.org/10.1093/bioinformatics/btg432] [PMID: 14990443] 
[25] 
Ounit, R.; Wanamaker, S.; Close, T.J.; Lonardi, S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics,  2015, 16(1), 236.
[http://dx.doi.org/10.1186/s12864-015-1419-2] [PMID: 25879410] 
[26] 
Dubinkina, V.B.; Ischenko, D.S.; Ulyantsev, V.I.; Tyakht, A.V.; Alexeev, D.G. Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinformatics,  2016, 17(1), 38.
[http://dx.doi.org/10.1186/s12859-015-0875-7] [PMID: 26774270] 
[27] 
Zhu, J.; Zheng, W-M. Self-organizing approach for meta-genomes. Comput. Biol. Chem.,  2014, 53(Pt A), 118-124.
[http://dx.doi.org/10.1016/j.compbiolchem.2014.08.016] [PMID: 25213854] 
[28] 
Chor, B.; Horn, D.; Goldman, N.; Levy, Y.; Massingham, T. Genomic DNA k-mer spectra: models and modalities. Genome Biol.,  2009, 10(10), R108.
[http://dx.doi.org/10.1186/gb-2009-10-10-r108] [PMID: 19814784] 
[29] 
Meher, P.K.; Sahu, T.K.; Rao, A.R. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier. Gene,  2016, 592(2), 316-324.
[http://dx.doi.org/10.1016/j.gene.2016.07.010] [PMID: 27393648] 
[30] 
Li, R.; Zhu, H.; Ruan, J.; Qian, W.; Fang, X.; Shi, Z.; Li, Y.; Li, S.; Shan, G.; Kristiansen, K.; Li, S.; Yang, H.; Wang, J.; Wang, J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res.,  2010, 20(2), 265-272.
[http://dx.doi.org/10.1101/gr.097261.109] [PMID: 20019144] 
[31] 
Navarro-Gomez, D.; Leipzig, J.; Shen, L.; Lott, M.; Stassen, A.P.; Wallace, D.C.; Wiggs, J.L.; Falk, M.J.; van Oven, M.; Gai, X. Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier. Bioinformatics,  2015, 31(8), 1310-1312.
[http://dx.doi.org/10.1093/bioinformatics/btu825] [PMID: 25505086] 
[32] 
Phillippy, A.M.; Schatz, M.C.; Pop, M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol.,  2008, 9(3), R55.
[http://dx.doi.org/10.1186/gb-2008-9-3-r55] [PMID: 18341692] 
[33] 
Price, A.L.; Jones, N.C.; Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics,  2005, 21(suppl_1), i351-i358.
[http://dx.doi.org/10.1093/bioinformatics/bti1018] 
[34] 
Newburger, D. E.; Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein–DNA interactions. Nucleic Acids Res.,  2008, 37(suppl_1), D77-D82.
[35] 
Burkhardt, S.; Kärkkäinen, J. Better filtering with gapped q-grams. Fundam. Inform.,  2003, 56(1-2), 51-70.
[36] 
Keich, U.; Li, M.; Ma, B.; Tromp, J. On spaced seeds for similarity search. Discrete Appl. Math.,  2004, 138(3), 253-263.
[http://dx.doi.org/10.1016/S0166-218X(03)00382-2] 
[37] 
Ghandi, M.; Lee, D.; Mohammad-Noori, M.; Beer, M.A. Enhanced regulatory sequence prediction using gapped k-mer features. PLOS Comput. Biol.,  2014, 10(7) e1003711
[http://dx.doi.org/10.1371/journal.pcbi.1003711] [PMID: 25033408] 
[38] 
Nordström, K.J.; Albani, M.C.; James, G.V.; Gutjahr, C.; Hartwig, B.; Turck, F.; Paszkowski, U.; Coupland, G.; Schneeberger, K. Mutation identification by direct comparison of whole-genome sequencing data from mutant and wild-type individuals using k-mers. Nat. Biotechnol.,  2013, 31(4), 325-330.
[http://dx.doi.org/10.1038/nbt.2515] [PMID: 23475072] 
[39] 
Chae, H.; Park, J.; Lee, S-W.; Nephew, K.P.; Kim, S. Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes. Nucleic Acids Res.,  2013, 41(9), 4783-4791.
[http://dx.doi.org/10.1093/nar/gkt144] [PMID: 23519616] 
[40] 
Hashim, E.K.; Abdullah, R. Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter. J. Theor. Biol.,  2015, 387, 88-100.
[http://dx.doi.org/10.1016/j.jtbi.2015.09.014] [PMID: 26427337] 
[41] 
Jaron, K.S.; Moravec, J.C.; Martínková, N. SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes. Bioinformatics,  2014, 30(8), 1081-1086.
[http://dx.doi.org/10.1093/bioinformatics/btt727] [PMID: 24371153] 
[42] 
Delmont, T.O.; Eren, A.M. Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies. PeerJ,  2016, 4 e1839
[http://dx.doi.org/10.7717/peerj.1839] [PMID: 27069789] 
[43] 
Bemm, F.; Weiß, C.L.; Schultz, J.; Förster, F. Genome of a tardigrade: Horizontal gene transfer or bacterial contamination? Proc. Natl. Acad. Sci. USA,  2016, 113(22), E3054-E3056.
[http://dx.doi.org/10.1073/pnas.1525116113] [PMID: 27173902] 
[44] 
Wang, R.; Xu, Y.; Liu, B. Recombination spot identification Based on gapped k-mers. Sci. Rep.,  2016, 6 23934
[http://dx.doi.org/10.1038/srep23934] [PMID: 27030570] 
[45] 
Hozza, M.; Vinař, T.; Brejová, B. In how big is that genome? Int. Symposium on String Process. Info. Retrieval,  2015, 199-209.
[http://dx.doi.org/10.1007/978-3-319-23826-5_20] 
[46] 
Lamichhaney, S.; Fan, G.; Widemo, F.; Gunnarsson, U.; Thalmann, D.S.; Hoeppner, M.P.; Kerje, S.; Gustafson, U.; Shi, C.; Zhang, H.; Chen, W.; Liang, X.; Huang, L.; Wang, J.; Liang, E.; Wu, Q.; Lee, S.M.; Xu, X.; Höglund, J.; Liu, X.; Andersson, L. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nat. Genet.,  2016, 48(1), 84-88.
[http://dx.doi.org/10.1038/ng.3430] [PMID: 26569123] 
[47] 
Patro, R.; Mount, S.M.; Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol.,  2014, 32(5), 462-464.
[http://dx.doi.org/10.1038/nbt.2862] [PMID: 24752080] 
[48] 
Guo, F.; Wang, D.; Wang, L. Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data. Bioinformatics,  2018, 34(12), 2012-2018.
[http://dx.doi.org/10.1093/bioinformatics/bty059] [PMID: 29474523] 
[49] 
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins,  2001, 43(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174] 
[50] 
Georgiou, D.; Karakasidis, T.; Megaritis, A. A short survey on genetic sequences, Chou’s pseudo amino acid composition and its combination with fuzzy set theory. Open Bioinform. J.,  2013, 7, 41-48.
[http://dx.doi.org/10.2174/1875036201307010041] 
[51] 
Petell, C.J.; Loiseau, G.; Gandy, R.; Pradhan, S.; Gowher, H. A refined DNA methylation detection method using MspJI coupled quantitative PCR. Anal. Biochem.,  2017, 533, 1-9.
[http://dx.doi.org/10.1016/j.ab.2017.06.006] [PMID: 28624296] 
[52] 
Chou, K-C. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics,  2005, 21(1), 10-19.
[http://dx.doi.org/10.1093/bioinformatics/bth466] [PMID: 15308540] 
[53] 
Zhong, W.Z.; Zhou, S.F. Molecular science for drug development and biomedicine., 2014.
[54] 
Sahu, S.S.; Panda, G. A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput. Biol. Chem.,  2010, 34(5-6), 320-327.
[http://dx.doi.org/10.1016/j.compbiolchem.2010.09.002] [PMID: 21106461] 
[55] 
Nanni, L.; Brahnam, S.; Lumini, A. Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J. Theor. Biol.,  2014, 360, 109-116.
[http://dx.doi.org/10.1016/j.jtbi.2014.07.003] [PMID: 25026218] 
[56] 
Fang, Y.; Guo, Y.; Feng, Y.; Li, M. Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids,  2008, 34(1), 103-109.
[http://dx.doi.org/10.1007/s00726-007-0568-2] [PMID: 17624492] 
[57] 
Zhou, X-B.; Chen, C.; Li, Z-C.; Zou, X-Y. Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol.,  2007, 248(3), 546-551.
[http://dx.doi.org/10.1016/j.jtbi.2007.06.001] [PMID: 17628605] 
[58] 
Khan, Z.U.; Hayat, M.; Khan, M.A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol.,  2015, 365, 197-203.
[http://dx.doi.org/10.1016/j.jtbi.2014.10.014] [PMID: 25452135] 
[59] 
Guo, J.; Rao, N.; Liu, G.; Yang, Y.; Wang, G. Predicting protein folding rates using the concept of Chou’s pseudo amino acid composition. J. Comput. Chem.,  2011, 32(8), 1612-1617.
[http://dx.doi.org/10.1002/jcc.21740] [PMID: 21328402] 
[60] 
Mohabatkar, H. Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept. Lett.,  2010, 17(10), 1207-1214.
[http://dx.doi.org/10.2174/092986610792231564] [PMID: 20450487] 
[61] 
Zou, D.; He, Z.; He, J.; Xia, Y. Supersecondary structure prediction using Chou’s pseudo amino acid composition. J. Comput. Chem.,  2011, 32(2), 271-278.
[http://dx.doi.org/10.1002/jcc.21616] [PMID: 20652881] 
[62] 
Mei, S. Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. J. Theor. Biol.,  2012, 310, 80-87.
[http://dx.doi.org/10.1016/j.jtbi.2012.06.028] [PMID: 22750634] 
[63] 
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol.,  1990, 215(3), 403-410.
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712] 
[64] 
Zhou, J.; Lu, Q.; Xu, R.; He, Y.; Wang, H. EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation. BMC Bioinformatics,  2017, 18(1), 379.
[http://dx.doi.org/10.1186/s12859-017-1792-8] [PMID: 28851273] 
[65] 
Zhang, J.; Liu, B. PSFM-DBT: Identifying DNA-binding proteins by combing position specific frequency matrix and distance-bigram transformation. Int. J. Mol. Sci.,  2017, 18(9), 1856.
[http://dx.doi.org/10.3390/ijms18091856] [PMID: 28841194] 
[66] 
Wei, L.; Tang, J.; Zou, Q. Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci.,  2017, 384, 135-144.
[http://dx.doi.org/10.1016/j.ins.2016.06.026] 
[67] 
Stormo, G.D.; Schneider, T.D.; Gold, L.; Ehrenfeucht, A. Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res.,  1982, 10(9), 2997-3011.
[http://dx.doi.org/10.1093/nar/10.9.2997] [PMID: 7048259] 
[68] 
Guigo, R. An introduction to position specific scoring matrices. Retrieved August,  2016, 30
[69] 
Nishida, K.; Frith, M.C.; Nakai, K. Pseudocounts for transcription factor binding sites. Nucleic Acids Res.,  2009, 37(3), 939-944.
[http://dx.doi.org/10.1093/nar/gkn1019] [PMID: 19106141] 
[70] 
Stormo, G.D. DNA binding sites: representation and discovery. Bioinformatics,  2000, 16(1), 16-23.
[http://dx.doi.org/10.1093/bioinformatics/16.1.16] [PMID: 10812473] 
[71] 
Sinha, S. On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics,  2006, 22(14), e454-e463.
[http://dx.doi.org/10.1093/bioinformatics/btl227] [PMID: 16873507] 
[72] 
Ding, Y.; Tang, J.; Guo, F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics,  2016, 17(1), 398.
[http://dx.doi.org/10.1186/s12859-016-1253-9] [PMID: 27677692] 
[73] 
Wang, Y.; Ding, Y.; Guo, F.; Wei, L.; Tang, J. Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS One,  2017, 12(9) e0185587
[http://dx.doi.org/10.1371/journal.pone.0185587] [PMID: 28961273] 
[74] 
Zhang, Y.N.; Yu, D.J.; Li, S.S.; Fan, Y.X.; Huang, Y.; Shen, H.B. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics,  2012, 13(1), 118.
[http://dx.doi.org/10.1186/1471-2105-13-118] [PMID: 22651691] 
[75] 
Amitai, G.; Shemesh, A.; Sitbon, E.; Shklar, M.; Netanely, D.; Venger, I.; Pietrokovski, S. Network analysis of protein structures identifies functional residues. J. Mol. Biol.,  2004, 344(4), 1135-1146.
[http://dx.doi.org/10.1016/j.jmb.2004.10.055] [PMID: 15544817] 
[76] 
Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol.,  1999, 292(2), 195-202.
[http://dx.doi.org/10.1006/jmbi.1999.3091] [PMID: 10493868] 
[77] 
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An introduction to statistical learning. 2013, Vol. 112
[http://dx.doi.org/10.1007/978-1-4614-7138-7] 
[78] 
Mohri, M.; Rostamizadeh, A.; Talwalkar, A. Foundations of
machine learning. Adapt. Computat. Mach. Learn. Series,  2018, 504.
[79] 
Alpaydin, E. Introduction to machine learning. Adapt. Computat.
Mach. Learn. Series,  2009, 400.
[80] 
Ding, Y.; Tang, J.; Guo, F. Identification of protein-ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model.,  2017, 57(12), 3149-3161.
[http://dx.doi.org/10.1021/acs.jcim.7b00307] [PMID: 29125297] 
[81] 
Shen, C.; Ding, Y.; Tang, J.; Song, J.; Guo, F. Identification of DNA-protein binding sites through multi-scale local average blocks on sequence information. Molecules,  2017, 22(12), 2079.
[http://dx.doi.org/10.3390/molecules22122079] [PMID: 29182548] 
[82] 
Mason, L.; Baxter, J.; Bartlett, P.; Frean, M. Boosting algorithms as gradient descent. Proc. 12th Int. Conf. Neural Info. Process. Sys.,  1999, 512-518.
[83] 
Pan, G.; Jiang, L.; Tang, J.; Guo, F. A novel computational method for detecting DNA methylation sites with DNA sequence information and physicochemical properties. Int. J. Mol. Sci.,  2018, 19(2), 511.
[http://dx.doi.org/10.3390/ijms19020511] [PMID: 29419752] 
[84] 
Fang, F.; Fan, S.; Zhang, X.; Zhang, M.Q. Predicting methylation status of CpG islands in the human brain. Bioinformatics,  2006, 22(18), 2204-2209.
[http://dx.doi.org/10.1093/bioinformatics/btl377] [PMID: 16837523] 
[85] 
Bhasin, M.; Zhang, H.; Reinherz, E.L.; Reche, P.A. Prediction of methylated CpGs in DNA sequences using a support vector machine. FEBS Lett.,  2005, 579(20), 4302-4308.
[http://dx.doi.org/10.1016/j.febslet.2005.07.002] [PMID: 16051225] 
[86] 
Liu, Z.; Xiao, X.; Qiu, W-R.; Chou, K-C. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal. Biochem.,  2015, 474, 69-77.
[http://dx.doi.org/10.1016/j.ab.2014.12.009] [PMID: 25596338] 
[87] 
Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat.,  2000, 28(2), 337-407.
[http://dx.doi.org/10.1214/aos/1016218223] 
[88] 
Chen, T.; Guestrin, C. XGBoost: a scalable tree boosting system. Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis. Data Mining,  2016, 785-794.
[http://dx.doi.org/10.1145/2939672.2939785] 
[89] 
Shensa, M.J. The discrete wavelet transform: wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process.,  1992, 40(10), 2464-2482.
[http://dx.doi.org/10.1109/78.157290] 
[90] 
Ergen, B. Signal and image denoising using wavelet transform. Adv. Wavelet Theory Their Appl. Engr., 2012.
[http://dx.doi.org/10.5772/36434] 
[91] 
Nanni, L.; Lumini, A. An ensemble of K-local hyperplanes for predicting protein-protein interactions. Bioinformatics,  2006, 22(10), 1207-1210.
[http://dx.doi.org/10.1093/bioinformatics/btl055] [PMID: 16481334] 
[92] 
Cao, J.; Xiong, L. Protein sequence classification with improved extreme learning machine algorithms. BioMed Res. Int.,  2014, 2014
[http://dx.doi.org/10.1155/2014/103054] 
[93] 
Cerf, N.J.; Adami, C. Information theory of quantum entanglement and measurement. Physica D,  1998, 120(1-2), 62-81.
[http://dx.doi.org/10.1016/S0167-2789(98)00045-1] 
[94] 
Feng, Z.P.; Zhang, C.T. Prediction of membrane protein types based on the hydrophobic index of amino acids. J. Protein Chem.,  2000, 19(4), 269-275.
[http://dx.doi.org/10.1023/A:1007091128394] [PMID: 11043931] 
[95] 
Liu, H.; Wang, M.; Chou, K.C. Low-frequency Fourier spectrum for predicting membrane protein types. Biochem. Biophys. Res. Commun.,  2005, 336(3), 737-739.
[http://dx.doi.org/10.1016/j.bbrc.2005.08.160] [PMID: 16140260] 
[96] 
Wan, S.; Mak, M.W.; Kung, S.Y. GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. J. Theor. Biol.,  2013, 323, 40-48.
[http://dx.doi.org/10.1016/j.jtbi.2013.01.012] [PMID: 23376577] 
[97] 
Shi, K.; Gao, L.; Wang, B. Systematic tracking of coordinated differential network motifs identifies novel disease-related genes by integrating multiple data. Neurocomputing,  2016, 206, 3-12.
[http://dx.doi.org/10.1016/j.neucom.2015.12.120] 
[98] 
Bach, F.R.; Lanckriet, G.R.; Jordan, M.I. In multiple kernel learning, conic duality, and the SMO algorithm. Proc. 21st Int. Conf. Machine Learn.,  2004, 6.
[http://dx.doi.org/10.1145/1015330.1015424] 
[99] 
Gönen, M.; Alpaydın, E. Multiple kernel learning algorithms. J. Mach. Learn. Res.,  2011, 12, 2211-2268.
[100] 
Yu, S.; Falck, T.; Daemen, A.; Tranchevent, L.C.; Suykens, J.A.; De Moor, B.; Moreau, Y. L2-norm multiple kernel learning and its application to biomedical data fusion. BMC Bioinformatics,  2010, 11(1), 309.
[http://dx.doi.org/10.1186/1471-2105-11-309] [PMID: 20529363] 
[101] 
Lou, W.; Wang, X.; Chen, F.; Chen, Y.; Jiang, B.; Zhang, H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One,  2014, 9(1) e86703
[http://dx.doi.org/10.1371/journal.pone.0086703] [PMID: 24475169] 
[102] 
Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res.,  2001, 1, 211-244.
[103] 
Zhang, J.; Gao, B.; Chai, H.; Ma, Z.; Yang, G. Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm. BMC Bioinformatics,  2016, 17(1), 323.
[http://dx.doi.org/10.1186/s12859-016-1201-8] [PMID: 27565741] 
[104] 
Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K.C. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One,  2014, 9(9) e106691
[http://dx.doi.org/10.1371/journal.pone.0106691] [PMID: 25184541] 
[105] 
Zou, C.; Gong, J.; Li, H. An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinformatics,  2013, 14(1), 90.
[http://dx.doi.org/10.1186/1471-2105-14-90] [PMID: 23497329] 
[106] 
Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta,  1975, 405(2), 442-451.
[http://dx.doi.org/10.1016/0005-2795(75)90109-9] [PMID: 1180967] 
[107] 
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.,  2006, 27(8), 861-874.
[http://dx.doi.org/10.1016/j.patrec.2005.10.010] 
[108] 
Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology,  1982, 143(1), 29-36.
[http://dx.doi.org/10.1148/radiology.143.1.7063747] [PMID: 7063747] 
[109] 
Lin, W.Z.; Fang, J.A.; Xiao, X.; Chou, K.C. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One,  2011, 6(9) e24756
[http://dx.doi.org/10.1371/journal.pone.0024756] [PMID: 21935457] 
[110] 
Dong, Q.; Wang, S.; Wang, K.; Liu, X.; Liu, B. In Identification of DNA-binding proteins by auto-cross covariance transformation. IEEE Int. Conf. Bioinfo. Biomed.,  2015, 470-475.
[http://dx.doi.org/10.1109/BIBM.2015.7359730] 
Rights & Permissions Print Cite
Article Metrics
20
1
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/1570164616666190722141129	Print ISSN 1570-1646
Publisher Name Bentham Science Publisher	Online ISSN 1875-6247
Current Proteomics

Computational Methods for Predicting DNA Binding Proteins

Abstract

Graphical Abstract

Mass spectrometry data acquisition and analysis for proteomics

Peptides: State-of-Art and Commercialisation Hurdles

Current Proteomics

Computational Methods for Predicting DNA Binding Proteins

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Mass spectrometry data acquisition and analysis for proteomics

Peptides: State-of-Art and Commercialisation Hurdles

Related Journals

Related Books