The Applications of Clustering Methods in Predicting Protein Functions

Author(s): Weiyang Chen* , Weiwei Li , Guohua Huang* , Matthew Flavel .

Journal Name: Current Proteomics

Volume 16 , Issue 5 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: The understanding of protein function is essential to the study of biological processes. However, the prediction of protein function has been a difficult task for bioinformatics to overcome. This has resulted in many scholars focusing on the development of computational methods to address this problem.

Objective: In this review, we introduce the recently developed computational methods of protein function prediction and assess the validity of these methods. We then introduce the applications of clustering methods in predicting protein functions.

Keywords: Clustering, protein function prediction, protein-protein interaction, protein complexes, computational methods, topology.

Zhao, B.; Wang, J.; Wu, F.X. Computational methods to predict protein functions from protein-protein interaction networks. Curr. Protein Pept. Sci., 2017, 18(11), 1120-1131.
Jensen, L.J.; Gupta, R.; Staerfeldt, H.H.; Brunak, S. Prediction of human protein function according to gene ontology categories. Bioinformatics, 2003, 19(5), 635-642.
Huang, G.; Chu, C.; Huang, T.; Kong, X.; Zhang, Y.; Zhang, N.; Cai, Y.D. Exploring mouse protein function via multiple approaches. PLoS One, 2016, 11(11)e0166580
Karimpour-Fard, A.; Leach, S.M.; Hunter, L.E.; Gill, R.T. The topology of the bacterial co-conserved protein network and its implications for predicting protein function. BMC Genomics, 2008, 9, 313.
Karimpour-Fard, A.; Detweiler, C.S.; Erickson, K.D.; Hunter, L.; Gill, R.T. Cross-species cluster co-conservation: a new method for generating protein interaction networks. Genome Biol., 2007, 8(9), R185.
Bork, P.; Jensen, L.J.; von Mering, C.; Ramani, A.K.; Lee, I.; Marcotte, E.M. Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol., 2004, 14(3), 292-299.
Shoemaker, B.A.; Panchenko, A.R. Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput. Biol., 2007, 3(3)e42
De Bodt, S.; Proost, S.; Vandepoele, K.; Rouze, P.; Van de Peer, Y. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics, 2009, 10, 288.
Mirabello, C.; Wallner, B. InterPred: a pipeline to identify and model protein-protein interactions. Proteins, 2017, 85(6), 1159-1170.
Sun, J.; Xu, J.; Liu, Z.; Liu, Q.; Zhao, A.; Shi, T.; Li, Y. Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics, 2005, 21(16), 3409-3415.
Craig, R.A.; Liao, L. Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices. BMC Bioinformatics, 2007, 8, 6.
Dimitrakopoulos, C.; Theofilatos, K.; Pegkas, A.; Likothanassis, S.; Mavroudi, S. Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods. Artif. Intell. Med., 2016, 71, 62-69.
Nguyen, C.; Mannino, M.; Gardiner, K.; Cios, K.J. ClusFCM: an algorithm for predicting protein functions using homologies and protein interactions. J. Bioinform. Comput. Biol., 2008, 6(1), 203-222.
Huang, Q.; You, Z.; Zhang, X.; Zhou, Y. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int. J. Mol. Sci., 2015, 16(5), 10855-10869.
Frasca, M.; Cesa-Bianchi, N. Multitask protein function prediction through task dissimilarity. IEEE/ACM Trans; Comput. Biol. Bioinform, 2017, p. 1.
Ur Rehman, H.; Azam, N.; Yao, J.; Benso, A. A three-way approach for protein function classification. PLoS One, 2017, 12(2)e0171702
Jiang, B.; Kloster, K.; Gleich, D.F.; Gribskov, M. AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs. Bioinformatics, 2017, 33(12), 1829-1836.
Xu, Y.; Min, H.; Wu, Q.; Song, H.; Ye, B. Multi-instance metric transfer learning for genome-wide protein function prediction. Sci. Rep., 2017, 7, 41831.
Rentzsch, R.; Orengo, C.A. Protein function prediction using domain families. BMC Bioinform., 2013, 14(Suppl 3), S5.
Wong, A.; Shatkay, H. Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge. BMC Bioinform., 2013, 14(Suppl. 3), S14.
Zhu, W.; Hou, J.; Chen, Y.P. Semantic and layered protein function prediction from PPI networks. J. Theor. Biol., 2010, 267(2), 129-136.
Sun, T.; Zhou, B.; Lai, L.; Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinform., 2017, 18(1), 277.
Jaeger, D.; Barth, J.; Niehues, A.; Fufezan, C. pyGCluster, a novel hierarchical clustering approach. Bioinformatics, 2014, 30(6), 896-898.
Tasdemir, K.; Milenov, P.; Tapsall, B. Topology-based hierarchical clustering of self-organizing maps. IEEE Trans. Neural Netw., 2011, 22(3), 474-485.
Wei, D.; Jiang, Q.; Wei, Y.; Wang, S. A novel hierarchical clustering algorithm for gene sequences. BMC Bioinformatics, 2012, 13, 174.
Langfelder, P.; Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw., 2012, 46(pii: i11), i11.
Timmerman, M.E.; Ceulemans, E.; De Roover, K.; Van Leeuwen, K. Subspace K-means clustering. Behav. Res. Methods, 2013, 45(4), 1011-1023.
Yu, S.; Tranchevent, L.C.; Liu, X.; Glanzel, W.; Suykens, J.A.; De Moor, B.; Moreau, Y. Optimized data fusion for kernel k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34(5), 1031-1039.
Steinley, D. K-means clustering: a half-century synthesis. Br. J. Math. Stat. Psychol., 2006, 59(Pt 1), 1-34.
Wilkin, G.A.; Huang, X. A practical comparison of two K-Means clustering algorithms. BMC Bioinformatics, 2008, 9(Suppl. 6), S19.
Sarkar, M.; Leong, T.Y. Fuzzy K-means clustering with missing values. Proc. AMIA Symp., 2001, •••, 588-592.
Steinley, D. Stability analysis in K-means clustering. Br. J. Math. Stat. Psychol., 2008, 61(Pt 2), 255-273.
Dudik, J.M.; Kurosu, A.; Coyle, J.L.; Sejdic, E. A comparative analysis of DBSCAN, K-means, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals. Comput. Biol. Med., 2015, 59, 10-18.
Chen, Y.; Reilly, K.D.; Sprague, A.P.; Guan, Z. SEQOPTICS: a protein sequence clustering system. BMC Bioinformatics, 2006, 7(Suppl. 4), S10.
Guo, J.; Tian, D.; McKinney, B.A.; Hartman, J.L. Recursive expectation-maximization clustering: a method for identifying buffering mechanisms composed of phenomic modules. Chaos, 2010, 20(2)026103
Van Mechelen, I.; Bock, H.H.; De Boeck, P. Two-mode clustering methods: a structured overview. Stat. Methods Med. Res., 2004, 13(5), 363-394.
Hartuv, E.; Shamir, R. A clustering algorithm based on graph connectivity. Inf. Process. Lett., 2000, 76(4-6), 175-181.
Huang, G.; Yan, F.; Tan, D. A review of computational methods for predicting drug targets. Curr. Protein Pept. Sci., 2018, 19(6), 562-572.
Du, P.; Wang, L. Predicting human protein subcellular locations by the ensemble of multiple predictors via protein-protein interaction network with edge clustering coefficients. PLoS One, 2014, 9(1)e86879
Gonzalez, A.J.; Liao, L.; Wu, C.H. Predicting ligand binding residues and functional sites using multipositional correlations with graph theoretic clustering and kernel CCA. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2012, 9(4), 992-1001.
Leung, H.C.; Siu, M.H.; Yiu, S.M.; Chin, F.Y.; Sung, K.W. Clustering-based approach for predicting motif pairs from protein interaction data. J. Bioinform. Comput. Biol., 2009, 7(4), 701-716.
Enright, A.J.; Van Dongen, S.; Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 2002, 30(7), 1575-1584.
Wang, T.N.; Li, T.J.; Shao, G.F.; Wu, S.X. An improved K-means clustering method for cDNA microarray image segmentation. Genet. Mol. Res., 2015, 14(3), 7771-7781.
Sarkar, A.; Maulik, U. Gene microarray data analysis using parallel point-symmetry-based clustering. Int. J. Data Min. Bioinform., 2015, 11(3), 277-300.
Lu, J.; Chen, L.; Yin, J.; Huang, T.; Bi, Y.; Kong, X.; Zheng, M.; Cai, Y.D. Identification of new candidate drugs for lung cancer using chemical-chemical interactions, chemical-protein interactions and a K-means clustering algorithm. J. Biomol. Struct. Dyn., 2016, 34(4), 906-917.
Greve, B.; Pigeot, I.; Huybrechts, I.; Pala, V.; Bornhorst, C. A comparison of heuristic and model-based clustering methods for dietary pattern analysis. Public Health Nutr., 2016, 19(2), 255-264.
Banjari, I.; Kenjeric, D.; Solic, K.; Mandic, M.L. Cluster analysis as a prediction tool for pregnancy outcomes. Coll. Antropol., 2015, 39(1), 247-252.
Hu, G.M.; Mai, T.L.; Chen, C.M. Clustering and visualizing similarity networks of membrane proteins. Proteins, 2015, 83(8), 1450-1461.
Hu, J.; Zhang, X.; Liu, X.; Tang, J. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification. Comput. Biol. Med., 2015, 61, 127-137.
Theofilatos, K.; Pavlopoulou, N.; Papasavvas, C.; Likothanassis, S.; Dimitrakopoulos, C.; Georgopoulos, E.; Moschopoulos, C.; Mavroudi, S. Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: evolutionary enhanced Markov clustering. Artif. Intell. Med., 2015, 63(3), 181-189.
Tang, X.; Wang, J.; Zhong, J.; Pan, Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans. Comput. Biol. Bioinform., 2014, 11(2), 407-418.
Alvarez, M.A.; Yan, C. A new protein graph model for function prediction. Comput. Biol. Chem., 2012, 37, 6-10.
Saini, A.; Hou, J. Progressive clustering based method for protein function prediction. Bull. Math. Biol., 2013, 75(2), 331-350.
Chua, H.N.; Sung, W.K.; Wong, L. Exploiting indirect neighbours and topological weight to predict protein function from proteinprotein interactions. Bioinformatics, 2006, 22(13), 1623-1630.
Trivodaliev, K.; Bogojeska, A.; Kocarev, L. Exploring function prediction in protein interaction networks via clustering methods. PLoS One, 2014, 9(6)e99755
Ansari, E.S.; Eslahchi, C.; Pezeshk, H.; Sadeghi, M. ProDomAs, protein domain assignment algorithm using center-based clustering and independent dominating set. Proteins, 2014, 82(9), 1937-1946.
Tang, X.; Feng, Q.; Wang, J.; He, Y.; Pan, Y. Clustering based on multiple biological information: approach for predicting protein complexes. IET Syst. Biol., 2013, 7(5), 223-230.
Wu, M.; Xie, Z.; Li, X.; Kwoh, C.K.; Zheng, J. Identifying protein complexes from heterogeneous biological data. Proteins, 2013, 81(11), 2023-2033.
King, A.D.; Przulj, N.; Jurisica, I. Protein complex prediction via cost-based clustering. Bioinformatics, 2004, 20(17), 3013-3020.
Ramadan, E.; Naef, A.; Ahmed, M. Protein complexes predictions within protein interaction networks using genetic algorithms. BMC Bioinformatics, 2016, 17(Suppl. 7), 269.
Madani, S.; Faez, K.; Aminghafari, M. Identifying similar functional modules by a new hybrid spectral clustering method. IET Syst. Biol., 2012, 6(5), 175-186.
Wang, J.; Li, M.; Chen, J.; Pan, Y. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2011, 8(3), 607-620.
Monji, H.; Koizumi, S.; Ozaki, T.; Ohkawa, T. Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks. BMC Bioinformatics, 2011, 12(Suppl. 1), S39.
Chen, P.Y.; Deane, C.M.; Reinert, G. Predicting and validating protein interactions using network structure. PLOS Comput. Biol., 2008, 4(7)e1000118
Zhang, X.; Xu, J.; Xiao, W.X. A new method for the discovery of essential proteins. PLoS One, 2013, 8(3)e58763
Iqbal, M.J.; Faye, I.; Samir, B.B.; Said, A.M. Efficient feature selection and classification of protein sequence data in bioinformatics. ScientificWorldJournal, 2014, 2014173869
Mai, T.L.; Hu, G.M.; Chen, C.M. Visualizing and clustering protein similarity networks: sequences, structures, and functions. J. Proteome Res., 2016, 15(7), 2123-2131.
Han, L.; Cui, J.; Lin, H.; Ji, Z.; Cao, Z.; Li, Y.; Chen, Y. Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity. Proteomics, 2006, 6(14), 4023-4037.
Mamitsuka, H. Essential latent knowledge for protein-protein interactions: analysis by an unsupervised learning approach. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2005, 2(2), 119-130.
Brun, C.; Chevenet, F.; Martin, D.; Wojcik, J.; Guenoche, A.; Jacq, B. Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol., 2003, 5(1), R6.
Samanta, M.P.; Liang, S. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc. Natl. Acad. Sci. USA, 2003, 100(22), 12579-12583.
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature, 2015, 521(7553), 436-444.
Hazlett, H.C.; Gu, H.; Munsell, B.C.; Kim, S.H.; Styner, M.; Wolff, J.J.; Elison, J.T.; Swanson, M.R.; Zhu, H.; Botteron, K.N.; Collins, D.L.; Constantino, J.N.; Dager, S.R.; Estes, A.M.; Evans, A.C.; Fonov, V.S.; Gerig, G.; Kostopoulos, P.; McKinstry, R.C.; Pandey, J.; Paterson, S.; Pruett, J.R.; Schultz, R.T.; Shaw, D.W.; Zwaigenbaum, L.; Piven, J. Early brain development in infants at high risk for autism spectrum disorder. Nature, 2017, 542(7641), 348-351.
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; Hassabis, D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540), 529-533.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [354 - 358]
Pages: 5
DOI: 10.2174/1570164616666181212114612
Price: $58

Article Metrics

PDF: 37