Classification of Small GTPases with Hybrid Protein Features and Advanced Machine Learning Techniques

Author(s): Zhijun Liao, Shixiang Wan, Yan He, Quan Zou*

Journal Name: Current Bioinformatics

Volume 13 , Issue 5 , 2018

Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Objective: Small GTPase is an important molecular switch that plays an important role in numerous signaling transduction pathways, the aim is to explore its binary classification features with machine learning algorithms.

Methods: The sequences including small GTPases and non small GTPases were clustered to remove similar entries, respectively. Then, they were divided into 10 datasets, each containing equal entries of small GTPases and non small GTPases. These datasets extracted three feature vectors that included188- dimensional(188D), 400D, and motif-based features (608D). The next step was classification based on software in scikit-learn, which integrated 12 classifiers and finally discovered the conserved motifs by MEME suite.

Results: The three best performed classifiers were logistic regression (LR), gradient boosting decision tree (GBDT), and bagging for 400D features, LibSVM, GBDT, and bagging for 188D features, and GBDT, bagging, and AdaBoost for 608D features, respectively. The top four classifiers were GBDT, bagging, LR, and AdaBoost according to commonly evaluated indices as a whole. GBDT obtained the highest area under the curve (AUC) value at 88.61%. The 400D features performed better than the 188D and 608D ones. Five conserved G-box motifs were discovered in the sequences of human small GTPases.

Conclusion: This study provides the first description of GBDT algorithm performed best for small GTPases classification.

Keywords: Small GTPase, binary-class classification, feature vector, gradient boosting decision tree (GBDT), scikit-learn method, motif.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2018
Published on: 21 November, 2017
Page: [492 - 500]
Pages: 9
DOI: 10.2174/1574893612666171121162552
Price: $65

Article Metrics

PDF: 21
PRC: 2