Features Identification for Phenotypic Classification Based on Genes and Gene Pairs

Author(s): Yansen Su, Yanxin Li*, Zheng Zhang, Linqiang Pan*.

Journal Name: Current Bioinformatics

Volume 13 , Issue 5 , 2018

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: The classification of phenotypes on microarray data has drawn much attention in last few years. The known methods mainly focused on the selection or construction of features based on either genes or gene pairs on continuous-value gene expression data. However, few researches have been implemented to identify useful features based on both genes and gene pairs on binary-value gene expression data.

Objective: In this work, we proposed a new algorithm, called FSGGP, to select both feature genes and feature gene pairs on the binary-value gene expression data to improve two-phenotype classification.

Method: We calculated the uncertainty coefficient which represented how well a phenotype was described by a gene or gene pair under some possible relationship, and the exact relationship between the gene or gene pair and the phenotype was identified by the value of uncertainty coefficient. Furthermore, the closeness between genes or gene pairs and phenotypes was calculated, and the genes or gene pairs closely related with phenotypes were selected. The redundancy of genes and gene pairs as features was calculated by cross entropy on the binary data, and the redundant feature genes or gene pairs were eliminated. The optimal feature sets were obtained by the wrapper based forward feature selection for three classical classifiers.

Results: The algorithm was experimentally assessed on four public datasets. The results showed that algorithm FSGGP had better performance over four known feature selection algorithms based on either genes or gene pairs in terms of the average classification error rates.

Conclusion: We developed an algorithm to select both feature genes and feature gene pairs on the binaryvalue gene expression data, where the selection of feature gene pairs was implemented by identifying the higher logical relationship between gene pairs and phenotypes. The comparison with four known feature selection algorithms suggests that feature selection algorithms based on both genes and gene pairs can achieve better performance than feature selection algorithms based on either genes or gene pairs, and the identification of higher logical relationship is an effective approach for the selection of feature gene pairs.

Keywords: Classification, phenotype, gene, gene pair, microarray, binary-value gene expression data.

Rights & PermissionsPrintExport Cite as


Article Details

VOLUME: 13
ISSUE: 5
Year: 2018
Page: [468 - 478]
Pages: 11
DOI: 10.2174/1574893612666171122151625
Price: $58

Article Metrics

PDF: 14