Background: Genomes of organisms contains a variety of repeated structures of various
lengths and type, interspersed or tandem. Tandem repeats play important role in molecular biology as
they are related to genetic backgrounds of inherited diseases, and also they can serve as markers for
DNA mapping and DNA fingerprinting. Improving the efficiency of algorithms for searching the
tandem repeats in DNA sequences can lead to many useful applications in the area of genomics.
Objective: We introduce an efficient algorithm of O(n) for searching the maximum length exact tandem
repeats in genomes.
Method: Algorithm is based on the use of the Enhanced Suffix Array (ESA). ESA consists of Suffix
Array (SA) and Longest Common Prefix (LCP) array. SA is an array of all sorted suffixes of a string
and LCP array stores the lengths of the longest common prefixes between all pairs of consecutive
suffixes in a sorted suffix array.
Results: We compare the results of our computation with other existing application: Burrows Wheeler
Tandem Repeat Searcher (BWtrs) for searching the exact tandem repeats. We provided an open source
standalone application called TR-ESA (available at: www.algorithms-akgec-shivika.in/tandem), which
implements searching of exact maximum length tandem repeat.
Conclusion: Tool is remarkably efficient and powerful which allows the analysis of complete genomes
having exact tandem repeats.