Abstract
Background: Hepatocellular carcinoma (HCC) is a serious disease and is the third main cause of death in the world. Hepatitis B virus infection can lead to HCC. The virus introduces genetic material into the host, damages DNA, and interferes with the activity of the apoptotic and tumor suppressors to trigger the formation of an oncogene. However, most of these cases are discovered after cancer enters stage three or four.
Objective: Early detection of HCC through machine learning algorithm approach using data set: DNA sequence of HBx HepB virus.
Methods: The research method used is the development of a Support Vector Machine classifier algorithm for carcinoma detection. The large data volume and unbalance data distribution in class can decrease the accuracy rate and sensitivity. Therefore, this paper proposed a hybrid of Hierarchical k-Means clustering and SVM algorithms to detect HCC disease using HBx DNA sequences. In this method, the SVM algorithm was applied in each cluster using the Hierarchical k- Means method.
Results: The experimental result showed an accuracy rate of 97.18%, a sensitivity of 98.9%, and AUC of 0.918. This means the performance was increased to 9.52%, 95.3%, and 0.4 above the conventional SVM method.
Conclusion: Detection of HCC can be applied using the SVM algorithm based on clustering. The proposed method, by hybrid hierarchical k-Means and SVM, increased the performance of classification results for the detection.
Keywords: Hepatocellular carcinoma, HBx, DNA sequence, clustering, hierarchical k-means, SVM.