Title:DBP-PSSM: Combination of Evolutionary Profiles with the XGBoost Algorithm to Improve the Identification of DNA-binding Proteins
VOLUME: 23
Author(s):Yanping Zhang*, Pengcheng Chen, Ya Gao, Jianwei Ni and Xiaosheng Wang
Affiliation:School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038,, School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038,, School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038,, School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038,, School of Mathematics and Physics Science and Engineering, Hebei University of Engineering, Handan 056038
Keywords:DNA-binding proteins, Local_DPP, PSSM400, Sliding window and Smoothing window, mRMR, XGBoost.
Abstract:Aim and Objective: Given the rapidly increasing number of molecular biology data available, computational
methods of low complexity are necessary to infer protein structure, function, and evolution.
Method: In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position
representation from the curve and normalized moments of inertia, respectively, to extract features information of protein
sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine
ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation.
Results: In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory.
Moreover, this method can effectively predict the DNA-binding proteins in realistic situations.
Conclusion: The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein
sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS.