Background: Hepatocellular carcinoma (HCC) is a serious disease and is the third main
cause of death in the world. Hepatitis B virus infection can lead to HCC. The virus introduces
genetic material into the host, damages DNA, and interferes with the activity of the apoptotic and
tumor suppressors to trigger the formation of an oncogene. However, most of these cases are
discovered after cancer enters stage three or four.
Objective: Early detection of HCC through machine learning algorithm approach using data set:
DNA sequence of HBx HepB virus.
Methods: The research method used is the development of a Support Vector Machine classifier
algorithm for carcinoma detection. The large data volume and unbalance data distribution in class
can decrease the accuracy rate and sensitivity. Therefore, this paper proposed a hybrid of
Hierarchical k-Means clustering and SVM algorithms to detect HCC disease using HBx DNA
sequences. In this method, the SVM algorithm was applied in each cluster using the Hierarchical k-
Results: The experimental result showed an accuracy rate of 97.18%, a sensitivity of 98.9%, and
AUC of 0.918. This means the performance was increased to 9.52%, 95.3%, and 0.4 above the
conventional SVM method.
Conclusion: Detection of HCC can be applied using the SVM algorithm based on clustering. The
proposed method, by hybrid hierarchical k-Means and SVM, increased the performance of
classification results for the detection.