Citrullination Site Prediction by Incorporating Sequence Coupled Effects into PseAAC and Resolving Data Imbalance Issue

Author(s): Md. Al Mehedi Hasan, Md Khaled Ben Islam, Julia Rahman*, Shamim Ahmad.

Journal Name: Current Bioinformatics

Volume 15 , Issue 3 , 2020

Become EABM
Become Reviewer

Abstract:

Background: Post-translational modification is one of the bio-molecular mechanisms in living organisms, which incorporate functional diversity in proteins as well as regulate cellular processes. Transformation of arginine residue to citrulline in protein is such a modification.

Objective: Our objective is to identify citrullinated arginine residue sites quickly and accurately.

Method: In this study, a novel computational tool, abbreviated as predCitru-Site, has been developed to predict citrullination sites. This technique effectively has incorporated the sequencecoupling effect of surrounding amino acids of arginine residues as well as optimizes skewed training citrullination dataset for prediction quality improvement. The performance of predCitru- Site has been measured from the average of 5 complete runs of the 10-fold cross-validation test to comply with existing tools.

Results and Conclusion: predCitru-Site has achieved 97.6% sensitivity, 98.9% specificity, and overall accuracy of 98.5%. With Matthew’s correlation coefficient of 0.967, it has also shown an area under the receiver operator characteristics curve of 0.997. Compared with existing tools, predCitru-Site significantly outperforms on the same benchmark dataset. It also shows significant improvement in the case of independent tests in all performance metrics (around 50% higher in AUC). These results suggest that our method is promising and can be used as a complementary technique for fast exploration of citrullination in arginine residue. A user-friendly web server has also been deployed at http://research.ru.ac.bd/predCitru-Site/ for the convenience of experimental scientists.

Keywords: Citrullination Sites Prediction, Sequence-coupling Model, General PseAAC, Data Imbalance Issue, Support Vector Machine.

Rights & PermissionsPrintExport Cite as


Article Details

VOLUME: 15
ISSUE: 3
Year: 2020
Page: [235 - 245]
Pages: 11
DOI: 10.2174/1574893614666191202152328
Price: $95

Article Metrics

PDF: 2