Evidence is accumulating that small open reading frames (sORF, < 100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.
Keywords: Small Open Reading Frames, Homology search, Ab inito prediction, Method assessment, codons, energy metabolism, stress proteins, transcriptional regulators, nucleases, ribosomal proteins
Rights & PermissionsPrintExport