Background: Human Papillomavirus is considered as a necessary cause of cervical cancer, which is the second most common cancer in women around the world. At present, an individual genotyping of Human Papillomavirus can provide essential information for an improvement of diagnosis and medical treatment to infected patients.
Objective: For this purpose, our paper focuses on predicting the significant Human Papillomavirus genotypes mainly associated with cervical cancers.
Method: In this experiment, partial coding sequences of genotypes were transformed into coordinates in chaos game representations, and they were subsequently partitioned into 8×8 equal sub-regions. Probabilities of distribution in sub-regions were extracted in forms of tri-nucleotide frequencies. Then, two-fold cross validation technique was employed for separating training and testing sets. For each fold, a feature selection by RReliefF algorithm was conducted for selecting significant features, followed by predicting the corresponding genotypes by fuzzy k-nearest neighbor technique.
Results: The experimental results showed that our proposed method can achieve higher performance than two related methods, while RReliefF algorithm can successfully reduce all of 64 extracted features into 29 significant features. Additionally, it also found that our experimental results are significantly different from those of the method of Nair et al., in almost all genotypes.
Conclusion: Therefore, the algorithm based on chaos game representation and fuzzy k-nearest neighbor technique can efficiently predict Human Papillomavirus genotypes.