Background: Detecting DNA-binding proteins (DBPs) based on biological and chemical
methods is time-consuming and expensive.
Objective: In recent years, the rise of computational biology methods based on Machine Learning
(ML) has greatly improved the detection efficiency of DBPs.
Methods: In this study, the Multiple Kernel-based Fuzzy SVM Model with Support Vector Data
Description (MK-FSVM-SVDD) is proposed to predict DBPs. Firstly, sex features are extracted
from the protein sequence. Secondly, multiple kernels are constructed via these sequence features.
Then, multiple kernels are integrated by Centered Kernel Alignment-based Multiple Kernel
Learning (CKA-MKL). Next, fuzzy membership scores of training samples are calculated with
Support Vector Data Description (SVDD). FSVM is trained and employed to detect new DBPs.
Results: Our model is evaluated on several benchmark datasets. Compared with other methods, MKFSVM-
SVDD achieves best Matthew's Correlation Coefficient (MCC) on PDB186 (0.7250) and
Conclusion: We can conclude that MK-FSVM-SVDD is more suitable than common SVM, as the
classifier for DNA-binding proteins identification.