Background: The hybridization stability of single and double stranded DNA sequences has
been studied extensively and its impact on bio-computing, bio-sensing and bio-quantification
technologies such as microarrays, Real-time PCR and DNA sequencing is significant. In many
bioinformatics applications DNA duplex hybridization is traditionally estimated using GC-content and
melting temperature calculations based on the sequence base composition.
Objective: In this study we explore the equivalence of the two approaches when estimating DNA
sequence hybridization and we show that GC-content is a far from perfect predictor of DNA strand
hybridization strength compared to experimentally-determined melting temperatures.
Method: To test the assumption that DNA GC-content is a good indicator of its melting temperature, we
formulate a research hypothesis and we apply the Pearson product-moment correlation statistical model
to measure the strength of a linear association between the GC-content and melting temperatures.
Results: We built a manually curated set of 373 experimental data points collected from 21
publications, each point representing a DNA strand with length between 4 and 35 nucleotides and its
corresponding experimentally determined melting temperature measured under specific sequence and
salt concentrations. For each data point we calculated the corresponding GC-content and we separated
the set into 12 subsets to minimize the variability of experimental conditions.
Conclusion: Based on calculated Pearson product-moment correlation coefficients we conclude that
GC-content only seldom correlates well with experimentally determined melting temperatures and thus
it is not a strictly necessary constraint when used to control the uniformity of DNA strands.