Importance of Feature Selection and Data Visualization Towards Prediction of Breast Cancer

Author(s): Rajalakshmi Krishnamurthi*, Niyati Aggrawal, Lokendra Sharma, Diva Srivastava, Shivangi Sharma

Journal Name: Recent Patents on Computer Science
Continued as Recent Advances in Computer Science and Communications

Volume 12 , Issue 4 , 2019

Graphical Abstract:


Background: Breast cancer is one of the most common forms of cancers among women and the leading cause of death among them. Countries like United States, England and Canada have reported a high number of breast cancer patients every year and this number is continuously increasing due to detection at later stages. Hence, it is very important to create awareness among women and develop such algorithms which help to detect malignant cancer. Several research studies have been conducted to analyze the breast cancer data.

Objective: This paper presents an effective method in predicting breast cancer and its stage and will also analyze the performance of different supervised learning algorithms such as Random Classifier, Chi2 Square test used in order to predict. The paper focuses on the three important aspects such as the feature selection, the corresponding data visualisation and finally making a prediction call on different machine learning models.

Methods: The dataset used for this work is breast cancer Wisconsin data taken from UCI library. The dataset has been used to show the different 32 features which are all important and how it can be achieved using data visualisation. Secondly, after the feature selection, different machine learning models have been applied.

Conclusion: The machine learning models involved are namely Support Vector Machine (SVM), KNearest Neighbour (KNN), Random Forest, Principal Component Analysis (PCA), Neural Network using Perceptron (NNP). This has been done to check which type of model is better under what conditions. At different stages several charts have been plotted and eliminated based on relative comparison. Results have shown that Random Tree classifier along with Chi2 Square proves to be an efficient one.

Keywords: Breast cancer, machine learning, data mining, classification, prediction, data visualization.

M.A. Abien-Fred, "On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset", In: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, ACM: New York, NY, USA 2018, pp. 5-9. []
S. Kharya, D. Dubey, and S. Soni, "Predictive machine learning techniques for breast cancer detection", Int. J. Comp. Sci. Inform. Tech., vol. 4, no. 6, pp. 1023-1028, 2013.
A. Ali, A. Tufail, U. Khan, and M. Kim, "A survey of prediction models for breast cancer survivability", In: Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ACM: New York, NY, USA 2009, pp. 1259-1262. []
A. Al-Khasawneh, "Diagnosis of breast cancer using intelligent information systems techniques", Int. J. E-Health Med. Commun., vol. 7, no. 1, pp. 65-75, 2016. [].
N.C. Yadav, and P. Gajbhiye, "Diagnosis of breast cancer using neural network approach", BMR Bioinfo. Cheminform. J., vol. 11, pp. 1-7, 2014.
M. Karabatak, and M. Cevdet Ince, "An expert system for detection of breast cancer based on association rules and neural network", Expert Syst. Appl., vol. 36, no. 2, pp. 3465-3469, 2009. [].
P.C. Pendharkar, J.A. Rodger, G.J. Yaverbaum, N. Herman, and M. Benner, "Associations statistical mathematical and neural approaches for mining breast cancer patterns", Expert Syst. Appl., vol. 17, pp. 223-232, 1999. [].
S.S. Shajahaan, and S. Shanthi, "Application of data mining techniques to model breast cancer data", Int. J. Emerging. Tech. Adv. Eng., vol. 3, no. 11, pp. 1-10, 2013.
J. Thongkam, G. Xu, Y. Zhang, and F. Huang, "Breast cancer survivability via AdaBoost algorithms", In: Proceedings of the 2nd Australasian workshop on Health data and knowledge management, Darlinghurst, Australia Vol. 80, pp. 55-64. 2008
A. Mert, N. Kilic, and A. Akan, "Breast cancer classification by using support vector machines with reduced dimension", In: Proceedings ELMAR-2011, Zadar, Croatia, pp. 37-40. 2011
R.F. Arafi, and A. Bouroumi, "Breast cancer data analysis using support vector machines and particle swarm optimization", In: Second World Conference on Complex Systems (WCCS), Agadir, Morocco , pp. 1-6. 2014 []
B-Y. Sun, Z-H. Zhu, J. Li, and B. Linghu, "Combined feature selection and cancer prognosis using support vector machine regression", IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 8, no. 6, pp. 1671-1677, 2011. []. [PMID: 21116037].
K.U. Al-Salihy, and T. Ibrikci, "Classifying breast cancer by using decision tree algorithms", In: Proceedings of the 6th International Conference on Software and Computer Applications, New York, NY, USA, pp. 144-148. 2017. [ 3056662.3056716]
E. Murat, M.Z.B. Erkan, and Y.A. Ziya, "Early prostate cancer diagnosis by using artificial neural networks and support vector machines", Expert Syst. Appl., vol. 36, no. 3, pp. 6357-6361, 2009. [].
J. Ren, "ANN vs. SVM: which one performs better in classification of MCCs in mammogram imaging", Knowl. Base. Syst., vol. 26, pp. 144-153, 2012. [].
P. Král, and L. Lenc, "LBP features for breast cancer detection", In: 2016 IEEE International Conference on Image Processing, Phoenix, AZ, pp. 2643-2647. 2016. []
P-H. Tang, and M. Tseng, "Medical data mining using BGA and RGA for weighting of features in fuzzy k-NN classification", In: International Conference on Machine Learning and Cybernetics, Hebei, China, pp. 3070-3075. 2009. [ ICMLC.2009.5212633]
S. Reis, P. Gazinska, J.H. Hipwell, T. Mertzanidou, K. Naidoo, N. Williams, S. Pinder, and D.J. Hawkes, "Automated classification of breast cancer stroma maturity from histological images", IEEE Trans. Biomed. Eng., vol. 64, no. 10, pp. 2344-2352, 2017. []. [PMID: 28186876].
W. H. Wolberg, W. N. Street, and O. L. Mangasarian, "Diagnostic Wisconsin Breast Cancer Database", Wisconsin breast cancer data set, August 2016. [Available from:,
V. Baskaran, A. Guergachi, R.K. Bali, and R.N.G. Naguib, "Predicting breast screening attendance using machine learning techniques", IEEE Trans. Inf. Technol. Biomed., vol. 15, no. 2, pp. 251-259, 2011. []. [PMID: 21216721].
E. Zafiropoulos, I. Maglogiannis and I. Anagnostopoulos, “A support vector machine approach to breast cancer diagnosis and prognosis., Artif. Intell. Appli. Innovat, pp. 500-507. 2006 []
A.M. Khan, N. Rajpoot, D. Treanor, and D. Magee, "A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution", IEEE Trans. Biomed. Eng., vol. 61, no. 6, pp. 1729-1738, 2014. []. [PMID: 24845283]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Published on: 19 August, 2019
Page: [317 - 328]
Pages: 12
DOI: 10.2174/2213275912666190101121058
Price: $58

Article Metrics

PDF: 40