Identification of Essential Proteins in Yeast Using Mean Weighted Average and Recursive Feature Elimination

Author(s): Sivagnanam Rajamanickam Mani Sekhar*, Siddesh Gaddadevara Matt, Sunilkumar S. Manvi, Srinivasa Krishnarajanagar Gopalalyengar

Journal Name: Recent Patents on Computer Science
Continued as Recent Advances in Computer Science and Communications

Volume 12 , Issue 1 , 2019

Graphical Abstract:


Background: Essential proteins are significant for drug design, cell development, and for living organism survival. A different method has been developed to predict essential proteins by using topological feature, and biological features.

Objective: Still it is a challenging task to predict essential proteins effectively and timely, as the availability of protein protein interaction data depends on network correctness.

Methods: In the proposed solution, two approaches Mean Weighted Average and Recursive Feature Elimination is been used to predict essential proteins and compared to select the best one. In Mean Weighted Average consecutive slot data to be taken into aggregated count, to get the nearest value which considered as prescription for the best proteins for the slot, where as in Recursive Feature Elimination method whole data is spilt into different slots and essential protein for each slot is determined.

Results: The result shows that the accuracy using Recursive Feature Elimination is at-least nine percentages superior when compared to Mean Weighted Average and Betweenness centrality.

Conclusion: Essential proteins are made of genes which are essential for living being survival and drug design. Different approaches have been proposed to anticipate essential proteins using either experimental or computation methods. The experimental result show that the proposed work performs better than other approaches.

Keywords: Proteins, essential proteins, weighted average, recursive feature elimination, machine learning, yeast, PPI, proteinprotein interaction.

R.S. Kamath, A.G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin, N. Le Bot, S. Moreno, and M. Sohrmann, "Systematic functional analysis of the Caenorhabditis elegans genome using RNAi", Nature, vol. 421, p. 231, 2003.
J. Wang, X. Peng, W. Peng, and F.X. Wu, "Dynamic protein interaction network construction and applications", Proteomics, vol. 14, pp. 338-352, 2014.
J. Wang, M. Li, H. Wang, and Y. Pan, "Identification of essential proteins based on edge clustering coefficient", IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 9, pp. 1070-1080, 2012.
L.C. Freeman, "A set of measures of centrality based on betweenness", Sociometry, pp. 35-41, 1977.
M.P. Joy, A. Brock, D.E. Ingber, and S. Huang, "High-betweenness proteins in the yeast protein interaction network", BioMed Res. Int., vol. 2005, pp. 96-103, 2005.
R.R. Vallabhajosyula, D. Chakravarti, S. Lutfeali, A. Ray, and A. Raval, "Identifying hubs in protein interaction networks", PLoS One, vol. 4, p. 5344, 2009.
S. Wuchty, and P.F. Stadler, "Centers of complex networks", J. Theor. Biol., vol. 223, pp. 45-53, 2003.
P. Bonacich, "Power and centrality: A family of measures", Am. J. Sociol., vol. 92, pp. 1170-1182, 1987.
K. Stephenson, and M. Zelen, "Rethinking centrality: Methods and examples", Soc. Netw., vol. 11, pp. 1-37, 1989.
E. Estrada, and J.A. Rodriguez-Velazquez, "Subgraph centrality in complex networks", Phys. Rev. E, vol. 71, p. 056103, 2005.
K.S. Shafna, K.C. Binsu, and M.U. Sreeja, "Visualization of symptom based disease prognosis using K-means algorithm", Intl. J. Adv. Studies Comput. Sci. Eng., vol. 7, pp. 30-34, 2018.
Y. Lu, M. Li, Q. Li, Y. Pan, and J. Wang, "“A new method for predicting essential proteins based on topology potential”,", IEEE International Conference on Bioinformatics and Biomedicine,, p. pp. 109- 114, 2013.
Z. Jiancheng, W. Jianxin, P. Wei, Z. Zhen, and L. Min, "A feature selection method for prediction essential protein", Tsinghua Sci. Technol., vol. 20, pp. 491-499, 2015.
Y. Qi, and J. Luo, "“Prediction of essential proteins based on local interaction density”, IEEE/ACM", Trans. Comput. Biol. Bioinform., vol. 13, pp. 1170-1182, 2016.
F. Yetian, T. Xiwei, H. Xiaohua, W. Wei, and P. Qing, "Prediction of essential proteins based on subcellular localization and gene expression correlation", BMC Bioinformatics, vol. 18, p. 470, 2017.
L. Xiujuan, F. Ming, W. Fang-Xiang, and C. Luonan, "Improved flower pollination algorithm for identifying essential proteins", BMC Syst. Biol., vol. 12, p. 46, 2018.
Z. Wei, X. Jia, L. Yuanyuan, and Z. Xiufen, "“Detecting essential proteins based on network topology, gene expression data and gene ontology information”, IEEE/ACM", Trans. Comput. Biol. Bioinform., vol. 15, pp. 109-116, 2016.
L. Christophe, K. Olivier, B. Philippe, and L. Laurent, "Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median", J. Exp. Soc. Psychol., vol. 49, pp. 764-766, 2013.
"D. Reshef and Y. Reshef, “Gene Expression Data Set”, Available From:", [Accessed: September 4, 2018].
P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, "Comprehensive identification of cell cycle–regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization", Am. Soc. Cell Biol., vol. 9, pp. 3273-3297, 1998.
A.G. Holman, P.J. Davis, J.M. Foster, C.K.S. Carlow, and S. Kumar, "Computational prediction of essential genes in anunculturable Endosymbiotic bacterium, Wolbachia of Brugia malayi", BMC Microbiol., vol. 9, p. 243, 2009.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Published on: 10 January, 2019
Page: [5 - 10]
Pages: 6
DOI: 10.2174/2213275911666180918155521
Price: $58

Article Metrics

PDF: 21