Current State of the Art for Survival Prediction in Cancer Using Data Mining Techniques

Author(s): M.N. Doja, Ishleen Kaur*, Tanvir Ahmad.

Journal Name: Current Bioinformatics

Volume 15 , Issue 3 , 2020

Become EABM
Become Reviewer

Graphical Abstract:


Background: Cancer treatment is expensive and results in a lot of side effects, and thus survival prediction is necessary for the patients as well as the clinician. Data mining technology has been used in the medical domain to extract interesting information. Cancer prognosis is such an application in medicine.

Objective: This study focuses on identifying the technologies used in the recent past for predicting the survival of cancer patients. Supervised, semi-supervised and unsupervised techniques have been used over the years successfully for the survival prediction of different types of cancer.

Methods: A systematic literature review process has been followed in this study to discover the future directions of the research. This study focuses on uncovering the gaps in recent studies.

Results and Conclusion: It has been found that the present system lacks structured information of the patients. Also, there are a lot of different cancer types that are still unexplored in terms of survival prediction, mainly due to the unavailability of sufficient data. Hence a lot can be improved if researchers may get their hands on required data for the research.

Keywords: Cancer, cancer prognosis, data mining, medical, machine learning, preprocessing, survival prediction.

World Health Organization. Available from:. news-room/fact-sheets/detail/cancer Accessed on March 18, 2019.
Indian Cancer Society. Available from:. Accessed on March 18, 2019.
Al-Bahrani R, Agrawal A, Choudhary A. Colon cancer survival prediction using ensemble data mining on SEER data IEEE International Conference on Big Data. 2013 Oct. 6-9; Silicon Valley, CA, USA.
Shukla N, Hagenbuchner M, Win KT, Yang J. Breast cancer data analysis for survivability studies and prediction. Comput Methods Programs Biomed 2018; 155: 199-208.
Malhotra K, Navathe SB, Chau DH, Hadjipanayis C, Sun J. Constraint based temporal event sequence mining for Glioblastoma survival prediction. J Biomed Inform 2016; 61: 267-75.
[] [PMID: 27064059]
Vanneschi L, Farinaccio A, Mauri G, Antoniotti M, Provero P, Giacobini M. A comparison of machine learning techniques for survival prediction in breast cancer. BioData Min 2011; 4: 12.
[] [PMID: 21569330]
Lynch CM, Abdollahi B, Fuqua JD, et al. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 2017; 1-8.
Santos MS, Abreu PH, García-Laencina PJ, Simão A, Carvalho A. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inform 2015; 58: 49-59.
Kate RJ, Nadig R. Stage-specific predictive models for breast cancer survivability. Int J Med Inform 2017; 97: 304-11.
Wang KJ, Makond B, Wang KM. An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data. BMC Med Inform Decis Mak 2013; 13: 124.
[] [PMID: 24207108]
Tseng WT, Chiang WF, Liu SY, Roan J, Lin CN. The application of data mining techniques to oral cancer prognosis. J Med Syst 2015; 59
[] [PMID: 25796587]
Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. A lung cancer outcome calculator using ensemble data mining on SEER data, Proceedings of the Tenth International Workshop on Data Mining In Bioinformatics-BIOKDD ’11, August 2011, article no. 5: 1-9.
Bojana R, Cirkovic Andjelkovic, Cvetkovic Aleksandar M, et al. Prediction Models for Estimation of Survival Rate and Relapse for Breast Cancer Patients IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE) 2015.
Jajroudi M, Baniasadi T, Kamkar L, Arbabi F, Sanei M. Prediction of survival in thyroid cancer using data mining technique. Technol Cancer Res & Treat 2014; 13(4): 353-9.
Park K, Ali A, Kim D, An Y, Kim M, Shin H. Robust predictive model for evaluating breast cancer survivability. Eng Appl Artif Intell 2013; 26(9): 2194-205.
Chao CM, Yu YW, Cheng BW, Kuo YL. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree. J Med Syst 2014; 38(10): 106.
Walczak S, Velanovich V. Improving prognosis and reducing decision regret for pancreatic cancer treatment using artificial neural networks. Decis Support Syst 2018; 106: 110-8.
García-Laencina PJ, Abreu PH, Abreu MH, Afonoso N. Missing data imputation on 5-year survival prediction of breast cancer patients with unknown discrete values. Comput Biol Med 2015; 59: 125-33.
Chen CM, Hsu CY, Chiu HW, Rau HH, Chen CM, Hsu CY. Prediction of survival in patients with liver cancer using ANN and CART IEEE 7th International Conference on Natural Computation (ICNC) 2011.
Abreu PH, Amaro H, Silva DC, Machado P, Henriques M, Noemia Afonso. Overall Survival Prediction for Women Breast Cancer using Ensemble Methods and Incomplete Clinical Data XIII Mediterranean Conference on Medical and Biological Engineering and Computing Springer 2013.
Varlamis I, Apostolakis I, Sifaki-Pistolla D, Dey N, Georgoulias V, Lionis C. Application of data mining techniques and data analysis methods to measure cancer morbidity and mortality data in a regional cancer registry: The case of the island of Crete, Greece. Comput Methods Programs Biomed 2017; 145: 73-83.
Yusof MM, Mohamed R, Wahid N. Benchmark of Feature Selection Techniques with Machine Learning Algorithms for Cancer Datasets International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering 2016.
Wang KJ, Makond B, Chen KH, Wang KM. A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients. Appl Soft Comput 2014; 20: 15-24.
Dubey AK, Gupta U, Jain S. Epidemiology of lung cancer and approaches for its prediction: a systematic review and analysis. Chinese J Cancer 2016; 35(1): 71.
Barakat MS, Field M, Ghose A, et al. The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance. Health Inf Sci Syst 2017; 5(1): 16.
Zolbanin HM, Delen D, Zadeh AH. Predicting overall survivability in comorbidity of cancers: a data mining approach. Decis Support Syst 2015; 74: 150-61.
Pradeep KR, Naveen NC. Lung Cancer Survivability Prediction based on Performance Using Classification Techniques of Support Vector Machines, C45 and Naïve Bayes Algorithm for Health care Analytics International Conference on Computational Intelligence and Data Science (ICCIDS) 2018.
Park JV, Park SJ, Yoo JS. Finding characteristics of exceptional breast cancer subpopulations using subgroup mining and statistical test. Expert Syst Appl 2019; 118: 553-62.
Surveillance, Epidemiology, and End Results (SEER) Program. Available from:. Accessed on March 18, 2019.
UCI machine learning repository. Available from:. Accessed on March 18, 2019.
National Cancer Registry Programme. Available from:. Accessed on March 18, 2019.
Delhi Cancer Registry. Available from:. delhi-cancer-registry.html Accessed on March 18, 2019.
Vazifehdan M, Moattar MH, Jalali M. A hybrid Bayesian network and tensor factorization approach for missing value imputation to improve breast cancer recurrence prediction. J King Saud University-Comput Inform Sci 2019; 31(2): 175-84.
Han J, Micheline Kamber. Data Mining Concepts and Techniques. 3rd Edition. 2012.
Hall M. Correlation based feature selection for machine learning, Doctoral dissertation. Department of Computer Science, University of Waikato 1999.
Wang Y, Wang D, Ye X, Wang Y, Yin Y, Jin Y. A Tree Ensemble-Based Two-Stage Model for Advanced-Stage Colorectal Cancer Survival Prediction. Inf Sci 2018.
Khourdifi Y, Bahaj M. Selecting Best Machine Learning Techniques for Breast Cancer Prediction and Diagnosis Information Systems and Technologies to Support Learning EMENA-ISTL 2018. Smart Innovation. Systems and Technologies 2018; pp. 565-71.
Tapak L. Prediction of survival and metastasis in breast cancer patients using machine learning classifiers. Clin Epidemiol Glob Health 2019; 7: 293-9.
Mamoshina P, Vieira A, Putin E, Zhavoronkov A. Applications of deep learning in biomedicine. Mol Pharm 2016; 13(5): 1445-54.
[] [PMID: 27007977]
Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2018; 19(6): 1236-46.
[PMID: 28481991]
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018; 68(6): 394-424.
[] [PMID: 30207593]
Pond GR, Sonpavde G, de Wit R, Eisenberger MA, Tannock IF, Armstrong AJ. The prognostic importance of metastatic site in men with metastatic castration-resistant prostate cancer. Eur Urol 2014; 65(1): 3-6.
[] [PMID: 24120464]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Page: [174 - 186]
Pages: 13
DOI: 10.2174/1574893614666190902152142
Price: $65

Article Metrics

PDF: 13