Generic placeholder image

Current Metabolomics

Editor-in-Chief

ISSN (Print): 2213-235X
ISSN (Online): 2213-2368

Research Article

Significant Metabolites and Outlier-Robust Classifier Identification for Breast Cancer Prediction

Author(s): Nishith Kumar*, Md. Aminul Hoque, Md. Shahjaman, S.M. Shahinul Islam and Md. Nurul Haque Mollah

Volume 6, Issue 2, 2018

Page: [147 - 154] Pages: 8

DOI: 10.2174/2213235X06666180131155010

Price: $65

Abstract

Background: Metabolomics is a relatively new and dominant branch of bioinformatics. Metabolite expression level controls the phenotypic characteristics of any organism. Recently, breast cancer is the leading type of cancer in women across the world, accounting for 25% of all cases. In 2012, it was seen that due to breast cancer, there were 1.68 million cases and 522,000 deaths. Therefore, for drug discovery as well as for early disease status prediction, significant metabolites identification for breast cancer and correct classification of the breast cancer status through classification technique are very important for metabolomics data analysis.

Objective: The main objective of this paper is to identify significant metabolites (p-value<0.05) and state of the art classification technique for breast cancer prediction using metabolomics dataset.

Methods: Although there are several techniques to identify significant metabolites, here, we took Student's t-test and Kruskal-Wallis test for significant metabolites identification. To classify the breast cancer prediction, we considered five modern classification techniques- (i) Naive Bayes (NB) (ii) Support Vector Machine (SVM) (iii) Linear Discriminant Analysis (LDA) (iv) k-nearest neighbors algorithm (kNN) and (v) Random Forest (RF). We also measured the performances of the classification techniques through accuracy, sensitivity, specificity, Receiver Operating Characteristic (ROC) curve and area under the ROC curve etc.

Results: The performance measures of different classification techniques showed that random forest classifier produced higher accuracy, sensitivity, specificity and area under the ROC curve compared to the other classification techniques for breast cancer prediction using metabolomics dataset. The analytical results also showed that there are 24 significant (adjusted p-value < 0.05) metabolites influencing breast cancer.

Conclusion: On the basis of the experimental results, we could say that there are 24 breast cancer influencing metabolites and for breast cancer prediction as well as metabolomics data analysis, random forest is the state of the art and outlier-robust classifier among the five classification techniques.

Keywords: Naive Bayes, support vector machine, linear discriminant analysis, k-nearest neighbors algorithm, random forest, ROC curve.

Graphical Abstract

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy