Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Suitability of Sequence-Based Feature Vector for Classification Algorithm Improves Accuracy of Human Protein-Protein Interaction Prediction: A Red Blood Cell Case Study

Author(s): Afsaneh Maali, Mahmood A. Mahdavi and Reza Gheshlaghi

Volume 11, Issue 2, 2016

Page: [291 - 300] Pages: 10

DOI: 10.2174/1574893610666151026215233

Price: $65

Abstract

To classify human protein-protein interaction information and consolidate existing data, supervised learning algorithms are implemented. These algorithms require a feature vector to generate a prediction model and feature vectors could be constructed based on various input data. The suitability of feature vector for classification algorithm results in a more predictive model and predictions with higher accuracies based on low-dimension vectors. To investigate the proper combination of feature sets and the algorithms, three feature vectors including AA Frequency, AA Graphical Parameter, and AA Triplex based on the sole knowledge of primary structure of human red blood cell proteins were constructed and then applied to five different classification methods. The results indicated that support vector machine (SVM) algorithm produced the highest accuracy of 84.65% with AA Graphical Parameter feature set while it reached accuracy of 80.65% with AA Triplex feature set. Random forest (RF) achieved high accuracy of 83.69% with all three feature sets on average. Bayesian classifier of TAN performed better than NB using all three features. Artificial neural network (ANN) classifier demonstrated the lowest average accuracy of 76%; however, the performance was comparable with TAN where AA triplex learning feature was used with the accuracy of 77.90%. These figures demonstrated that selecting an appropriate feature set for a classification task results in a higher accuracy with the advantage of utilizing low-dimension feature vectors constructed from more simple data.

Keywords: classification algorithms; Protein-protein interaction prediction; sequence-based feature vectors; machine learning; human protein-protein interaction; accuracy of interaction prediction.

« Previous
Graphical Abstract

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy