Pattern Matching in Indeterminate and Arc-Annotated Sequences
Md Tanvir Islam Aumi, Tanaeem M. Moosa and M. Sohel Rahman
Affiliation: A l EDA Group, Department of CSE, BUET, Dhaka-1000, Bangladesh.
In this paper, we present efficient algorithms for finding indeterminate Arc-Annotated patterns in indeterminate
Arc-Annotated references. Our algorithms run in O(m +
) time where n and m are respectively the length of our
reference and pattern strings and w is the target machine word size. Here we have assumed the alphabet size to be
constant, because, indeterminate Arc-Annotated sequences are used to model biological sequences. Clearly, for short
patterns, our algorithms run in linear time and efficient algorithms for matching short patterns to reference genomes have
huge applications in practical settings. We have also applied our algorithms to scan the ncRNAs without pseudoknots. We
scanned three whole human chromosomes and it took only 2.5 - 4 minutes to scan one whole chromosome for an ncRNA
family. Some relevant patents are discussed in [1, 2].
Keywords: Indeterminate Sequence, Arc-Annotated Sequence, Sequence Matching, Bioinformatics, Patent.
Rights & PermissionsPrintExport