Background: Cervical cancer is a highly significant cause of mortality in developing
countries, and it is one of the most prominent forms of cancer worldwide. Machine learning
techniques have been proven more accurate for the identification of cervical cancer as compared to
the manual screening methods like Pap smear and Liquid Cytology Based (LCB) tests.
Objective: Primarily, these machine-learning techniques use the images of the cervix for cervical
cancer risk analysis; in this article, demographic data and medical records of patients are used to
identify major causes of cervical cancer. Furthermore, normal classification methods are used as a
usual way of classification when the dataset is balanced as this dataset has abundant examples of
negative cases as compared to positive cases On the other hand, traditional binary class classifiers
are not sufficient to classify the examples of cervical cancer correctly.
Methods: We identified the major causes of cervical cancer by employing multiple machine learning
feature selection algorithms. After this selection, we trained different machine learning methods
including Decision Trees (DTs), Support Vector Machines (SVMs) and Ensemble Learners using all
features as well as these important features.
Results and Conclusion: AdaBoost is able to classify instances into healthy and unhealthy classes of
this unbalanced dataset with 96% accuracy. Based on this model and significant causes of cervical
cancer, we aimed to develop a technique for self-risk assessment of cervical cancer, which women
can use to know their chances of being infected from cervical cancer after answering some questions
about their demographics and medical history.