Background: A robust guide tree is necessary as a first step for the multiple sequence alignment of proteins. The guide tree is normally generated using an initial distance matrix based on a particular distance metric.
Objective: A new tool for generating guide trees for multiple protein sequence alignment is presented.
Method: The algorithm involved in the initialization of the progressive algorithm for the alignment of sequences is computed by a novel metric termed Radial Distance that estimates the variation around symbols in two sequences; after the initial distance matrix is generated, a guide tree is created using the neighbor joining algorithm. The guide trees generated with our tool were then fed independently into MUSCLE and Clustal Omega-as these methods can accept external guide trees-to produce the final alignments.
Results: The results from our approach in the alignment of the sequences were compared with those from MUSCLE and Clustal Omega (with their original guide trees) on the BAliBASE, SABRE, and PREFAB protein sequence databases. For scoring the alignments, we obtained the sum of pairs score and the column score against the reference alignments of the protein benchmark databases used. The alignments produced using the guide trees generated by SARELI obtained statistically superior scores on sum of pairs and column scores than those using the original guide trees from MUSCLE and Clustal Omega on the SABRE and PREFAB databases.
Conclusion: Our proposed approach can generate guide trees that can be used by established multiple sequence alignment methods for proteins.