Using a Novel AdaBoost Algorithm and Chous Pseudo Amino Acid Composition for Predicting Protein Subcellular Localization

Jie      Lin; Yan      Wang

Abstract

For a protein, an important characteristic is its location or compartment in a cell. This is because a protein has to be located in its proper position in a cell to perform its biological functions. Therefore, predicting protein subcellular location is an important and challenging task in current molecular and cellular biology. In this paper, based on AdaBoost.ME algorithm and Chous PseAAC (pseudo amino acid composition), a new computational method was developed to identify protein subcellular location. AdaBoost.ME is an improved version of AdaBoost algorithm that can directly extend the original AdaBoost algorithm to deal with multi-class cases without the need to reduce it to multiple twoclass problems. In some previous studies the conventional amino acid composition was applied to represent protein samples. In order to take into account the sequence order effects, in this study we use Chous PseAAC to represent protein samples. To demonstrate that AdaBoost.ME is a robust and efficient model in predicting protein subcellular locations, the same protein dataset used by Cedano et al. (Journal of Molecular Biology, 1997, 266: 594-600) is adopted in this paper. It can be seen from the computed results that the accuracy achieved by our method is better than those by the methods developed by the previous investigators.

Keywords: AdaBoost, AdaBoost.ME, Multi-class, Subcellular Localization, PseAAC, SWISS-PROT, AAC, GO (gene ontology), algorithm processes, MCC, jackknife cross-validation, ProtLoc, iLoc-Euk, prokaryotic and eukaryotic cells, hydrophobicityAdaBoost, AdaBoost.ME, Multi-class, Subcellular Localization, PseAAC, SWISS-PROT, AAC, GO (gene ontology), algorithm processes, MCC, jackknife cross-validation, ProtLoc, iLoc-Euk, prokaryotic and eukaryotic cells, hydrophobicity