A large number of alignment–free techniques of graphical representation and
numerical characterization (GRANCH) of bio-molecular sequences have been proposed in the recent
past years, but the relative efficacy of these methods in determining the degree of similarities and dissimilarities
of such sequences have not been ascertained.
Objective: Our objective is to make an assessment of the relative efficacy of these methods in determining
the degree of similarities and dissimilarities of bio-molecular sequences.
Method: We have chosen 7 published/communicated methods that represent various classes of
GRANCH techniques and computed the descriptors that are expected to characterize similarities and
dissimilarities in several sets of gene sequences. We critically appraise the different methods and determine
which of these yield non-redundant structural information that could be used to compute different
properties of the sequences, and which are correlated enough to one another so that using the simplest
representative of the group would suffice. We also do a principal component analysis (PCA) to determine
how the variances in the calculated sequence descriptors are explained by the computed principal
Results: We found that some of the descriptors are strongly correlated implying a commonality of structural
information encoded by them while others are distinctly separate. The PCA results show that the
first three PC’s explain >97% of the variances.
Conclusion: We found that some mathematical DNA descriptors calculated by a few of these techniques
correlate strongly with one another implying a redundancy in the structural information quantified
by those descriptors; others are not strongly correlated with one another suggesting that they encode
non-redundant sequence information. From this and our PCA results, our recommendation would
be to use minimally correlated set of descriptors or orthogonal descriptors like PCs derived from the descriptor
set for the characterization of nucleic acid structure and function.