Background: Some studies have shown that human papillomavirus (HPV) is strongly associated with cervical cancer. As we all know, cervical cancer still remains the fourth most common cancer, affecting women worldwide. Thus, it is both challenging and essential to detect risk types of human papillomaviruses.
Methods: In order to discriminate whether HPV type is highly risky or not, many epidemiological and experimental methods have been proposed recently. For HPV risk type prediction, there also have been a few computational studies which are all based on machine learning (ML) techniques, but adopt different feature extraction methods. Therefore, we conclude and discuss several classical approaches which have got a better result for the risk type prediction of HPV.
Results: This review summarizes the common methods to detect human papillomavirus. The main methods are sequence-derived features, text-based classification, gap-kernel method, ensemble SVM, Word statistical model, position-specific statistical model and mismatch kernel method (SVM). Among these methods, position-specific statistical model get a relatively high accuracy rate (accuracy=97.18%). Word statistical model is also a novel approach, which extracted the information of HPV from the protein “sequence space” with word statistical model to predict high-risk types of HPVs (accuracy=95.59%). These methods could potentially be used to improve prediction of high-risk types of HPVs.
Conclusion: From the prediction accuracy, we get that the classification results are more accurate by establishing mathematical models. Thus, adopting mathematical methods to predict risk type of HPV will be the main goal of research in the future.