In recent years, research on bioinformatics has increasingly focused on the problem of class imbalance. A
classification task is called class imbalance when the number of instances belonging to one class or several classes
exceeds that of the other classes. Class imbalance often underestimates the performance of minority classes. This article
provides a review of the most widely used class imbalance learning methods and their applications in various
bioinformatic problems, including disease diagnosis based on gene expression data and protein mass spectrometry data,
translation initiation site recognition based on DNA sequences, protein function classification using amino acid
sequences, activities prediction of drug molecules, recognition of precursor microRNA (pre-miRNAs), etc. This article
also summarizes the current challenges and future possible trends of class imbalance learning methods in Bioinformatics.
Keywords: Activities prediction of drug molecules, bioinformatics, class imbalance, gene expression, protein function
classification, protein mass spectrometry, recognition of precursor microRNA, translation initiation site recognition.
Rights & PermissionsPrintExport