Background: Breast cancer is one of the most common forms of cancers among women
and the leading cause of death among them. Countries like United States, England, and Canada have
reported a high number of breast cancer patients every year and this number is continuously increasing
due to detection at later stages. Hence, it is very important to create awareness among women
and develop such algorithms which help to detect malignant cancer. Several research studies have
been conducted to analyze the breast cancer data.
Objective: This paper presents an effective method in predicting breast cancer and its stage and will
also analyze the performance of different supervised learning algorithms such as Random Classifier,
Chi2 Square test used in order to predict. The paper focuses on the three important aspects such as
the feature selection, the corresponding data visualisation and finally making a prediction call on different
machine learning models.
Method: The dataset used for this work is breast cancer Wisconsin data taken from UCI library. The
dataset has been used to show the different 32 features which are all important and how it can be
achieved using data visualisation. Secondly, after the feature selection, different machine learning
models have been applied.
Conclusion: The machine learning models involved are namely Support vector machine (SVM), KNearest
Neighbour (KNN), Random Forest, Principal component analysis (PCA), Neural Network
using Perceptron (NNP). This has been done to check which type of model is better under what conditions.
At different stages several charts have been plotted and eliminated based on relative comparison.
Results have shown that Random Tree classifier along with Chi2 Square proves to be an efficient