Classifying sequences is one of the central problems in computational biosciences. Several tools have been released
to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the
existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce
TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide
their own training data with any alphabet therein defined. TRAINER allows users to select among several feature
representation schemes and supervised machine learning methods with relevant parameters. Trained models can be saved
for future use without retraining by other users. Two case studies are reported for effective use of the system for DNA and
protein sequences; candidate effector prediction and nucleolar localization signal prediction. Biological relevance of the
results is discussed.
Keywords: Sequence classification, web server, k-nearest neighbors, naive Bayes classifier, support vector machine.
Rights & PermissionsPrintExport