Title:MK-FSVM-SVDD: A Multiple Kernel-based Fuzzy SVM Model for Predicting DNA-binding Proteins via Support Vector Data Description
VOLUME: 15
Author(s):Yi Zou, Hongjie Wu, Xiaoyi Guo, Li Peng, Yijie Ding*, Jijun Tang and Fei Guo
Affiliation:School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, School of Electronic and Information Engineering, Suzhou University of Science and Technology, No. 1 Kerui Road, 215009, Suzhou, Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, 214000, Wuxi, School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, School of Electronic and Information Engineering, Suzhou University of Science and Technology, No. 1 Kerui Road, 215009, Suzhou, School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, 300350, Tianjin, School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, 300350, Tianjin
Keywords:DNA-binding proteins, Fuzzy support vector machine, Multiple kernel learning, Support vector data
description, Membership function.
Abstract:Background: Detecting DNA-binding proetins (DBPs) based on biological and chemical methods is time consuming and
expensive.
Objective: In recent years, the rise of computational biology methods based on Machine Learning (ML) has greatly improved the detection
efficiency of DBPs.
Method: In this study, Multiple Kernel-based Fuzzy SVM Model with Support Vector Data Description (MK-FSVM-SVDD) is proposed to
predict DBPs. Firstly, sex features are extracted from protein sequence. Secondly, multiple kernels are constructed via these sequence feature.
Than, multiple kernels are integrated by Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL). Next, fuzzy membership
scores of training samples are calculated with Support Vector Data Description (SVDD). FSVM is trained and employed to detect new DBPs.
Results: Our model is test on several benchmark datasets. Compared with other methods, MK-FSVM-SVDD achieves best Matthew's
Correlation Coefficient (MCC) on PDB186 (0.7250) and PDB2272 (0.5476).
Conclusion: We can conclude that MK-FSVM-SVDD is more suitable than common SVM, as the classifier for DNA-binding proteins
identification.