Background: Human Papillomavirus is considered as a necessary cause of cervical cancer,
which is the second most common cancer in women around the world. At present, an individual
genotyping of Human Papillomavirus can provide essential information for an improvement of
diagnosis and medical treatment to infected patients.
Objective: For this purpose, our paper focuses on predicting the significant Human Papillomavirus
genotypes mainly associated with cervical cancers.
Method: In this experiment, partial coding sequences of genotypes were transformed into coordinates in
chaos game representations, and they were subsequently partitioned into 8×8 equal sub-regions.
Probabilities of distribution in sub-regions were extracted in forms of tri-nucleotide frequencies. Then,
two-fold cross validation technique was employed for separating training and testing sets. For each fold,
a feature selection by RReliefF algorithm was conducted for selecting significant features, followed by
predicting the corresponding genotypes by fuzzy k-nearest neighbor technique.
Results: The experimental results showed that our proposed method can achieve higher performance
than two related methods, while RReliefF algorithm can successfully reduce all of 64 extracted features
into 29 significant features. Additionally, it also found that our experimental results are significantly
different from those of the method of Nair et al., in almost all genotypes.
Conclusion: Therefore, the algorithm based on chaos game representation and fuzzy k-nearest neighbor
technique can efficiently predict Human Papillomavirus genotypes.