Background: A robust guide tree is necessary as a first step for the multiple sequence
alignment of proteins. The guide tree is normally generated using an initial distance matrix based on a
particular distance metric.
Objective: A new tool for generating guide trees for multiple protein sequence alignment is presented.
Method: The algorithm involved in the initialization of the progressive algorithm for the alignment of
sequences is computed by a novel metric termed Radial Distance that estimates the variation around
symbols in two sequences; after the initial distance matrix is generated, a guide tree is created using the
neighbor joining algorithm. The guide trees generated with our tool were then fed independently into
MUSCLE and Clustal Omega-as these methods can accept external guide trees-to produce the final
Results: The results from our approach in the alignment of the sequences were compared with those
from MUSCLE and Clustal Omega (with their original guide trees) on the BAliBASE, SABRE, and
PREFAB protein sequence databases. For scoring the alignments, we obtained the sum of pairs score
and the column score against the reference alignments of the protein benchmark databases used. The
alignments produced using the guide trees generated by SARELI obtained statistically superior scores
on sum of pairs and column scores than those using the original guide trees from MUSCLE and Clustal
Omega on the SABRE and PREFAB databases.
Conclusion: Our proposed approach can generate guide trees that can be used by established multiple
sequence alignment methods for proteins.