Machine learning is a kind of reliable technology for automated subcellular localization of viral proteins within
a host cell or virus-infected cell. One challenge is that the viral protein samples are not only with multiple location sites,
but also class-imbalanced. The imbalanced dataset often decreases the prediction performance. In order to accomplish this
challenge, this paper proposes a novel approach named imbalance-weighted multi-label K-nearest neighbor to predict viral
protein subcellular location with multiple sites. The experimental results by jackknife test indicate that the presented
algorithm achieves a better performance than the existing methods and has great potentials in protein science.
Keywords: Class-imbalance, K-nearest neighbor, multi-label learning, pseudo amino acid composition, subcellular localization
Rights & PermissionsPrintExport