Prediction and Identification of Krüppel-like Transcription Factors by Machine Learning Method (E-pub Ahead of Print)
The Krüppel-like factors (KLFs)are a family of containing zinc finger(ZF) motif transcription factors with 18 members in human genome.KLFs possess various physiological functionrelating withnumerous cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs and conserved motifs analysis of human KLFs. We search and cluster the protein sequences andseparate them into training datasetand test dataset(containing only negative samples), after extracting the 188-dimensional(188D) feature vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN), and use 10-fold cross-validation.On the human KLFs, we further explore the evolutionary relationship and motif distribution, and finally we analyze the conserved amino acid residues of three ZFs. The results show that classifier models of the training dataset were well constructed, and the highest specificity reached 99.83% from alibrary for support vector machine(libSVM)and the correctly classified rates were over 70% on test dataset. The 18 human KLFs can be further divided into 7 groups and the ZF domains were located at the C-terminus. Many conserved sequences,including Cys,His, the spanand interval, were consistent in the three ZF domains. In conclusion, we have built two-class classification models for KLFs prediction by novel machine learning methods.
Keywords: Krüppel-like factor; binary-class classification; phylogenetic analysis; motif; a library for support vector machine; machine learning method
Rights & PermissionsPrintExport