Significant Metabolites and Outlier-Robust Classifier Identification for Breast Cancer Prediction

Nishith         Kumar; Md.        Aminul Hoque; Md.         Shahjaman; S.M.     Shahinul    Islam; Md.     Nurul    Haque Mollah

Abstract

Background: Metabolomics is a relatively new and dominant branch of bioinformatics. Metabolite expression level controls the phenotypic characteristics of any organism. Recently, breast cancer is the leading type of cancer in women across the world, accounting for 25% of all cases. In 2012, it was seen that due to breast cancer, there were 1.68 million cases and 522,000 deaths. Therefore, for drug discovery as well as for early disease status prediction, significant metabolites identification for breast cancer and correct classification of the breast cancer status through classification technique are very important for metabolomics data analysis.

Objective: The main objective of this paper is to identify significant metabolites (p-value<0.05) and state of the art classification technique for breast cancer prediction using metabolomics dataset.

Methods: Although there are several techniques to identify significant metabolites, here, we took Student's t-test and Kruskal-Wallis test for significant metabolites identification. To classify the breast cancer prediction, we considered five modern classification techniques- (i) Naive Bayes (NB) (ii) Support Vector Machine (SVM) (iii) Linear Discriminant Analysis (LDA) (iv) k-nearest neighbors algorithm (kNN) and (v) Random Forest (RF). We also measured the performances of the classification techniques through accuracy, sensitivity, specificity, Receiver Operating Characteristic (ROC) curve and area under the ROC curve etc.

Results: The performance measures of different classification techniques showed that random forest classifier produced higher accuracy, sensitivity, specificity and area under the ROC curve compared to the other classification techniques for breast cancer prediction using metabolomics dataset. The analytical results also showed that there are 24 significant (adjusted p-value < 0.05) metabolites influencing breast cancer.

Conclusion: On the basis of the experimental results, we could say that there are 24 breast cancer influencing metabolites and for breast cancer prediction as well as metabolomics data analysis, random forest is the state of the art and outlier-robust classifier among the five classification techniques.

Keywords: Naive Bayes, support vector machine, linear discriminant analysis, k-nearest neighbors algorithm, random forest, ROC curve.

« Previous Next »

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

25

2

DOI https://dx.doi.org/10.2174/2213235X06666180131155010	Print ISSN 2213-235X
Publisher Name Bentham Science Publisher	Online ISSN 2213-2368

Current Metabolomics

Significant Metabolites and Outlier-Robust Classifier Identification for Breast Cancer Prediction

Abstract

Graphical Abstract

Related Journals

Related Books

Related Articles