Background: Residue-residue interactions play important roles in functional and spatial relationship
of proteins. These interactions are usually related to the sequence but display close proximity
within three-dimensional structure. In the past few years, identifying residue-residue contacts in proteins
is an important prediction problem.
Objective: Many methods extract contact information from multiple sequence alignments (MSAs). Existing
methods associated with MSAs are derived from homologous protein sequences. However, they need
a large number of homologous protein sequences, average of about several thousand, for residue-residue
Method: In this article, we use both phylogenetic information and amino acid frequency to predict residue-residue contacts,
based on small size of MSAs. In order to better reflect evolutionary information, we combine the evolutionary distance
matrix and the similarity matrix and produce a novel score to filter some noise, based on amino acid frequency. We use the
above information to estimate correlation coefficient between each pair of sites from one target protein family, and extract
binding sites with high values of final correlative score.
Results: First, we present statistical analysis of correlative relationship on residue-residue contact. Second, we evaluate our
method on 150 benchmark proteins to predict residue-residue contact. Third, we identify protein-protein interaction in bacterial
signal transduction. Experiments show that our method is very effective in real applications.
Conclusion: In the case of less protein sequences, experimental results confirm that the performance of our method is better
than some currently popular methods. We reduce the number of homologous proteins. Therefore, the computing time
to construct phylogenetic trees decreases significantly. On 150 benchmark proteins, our method achieves overall precisions
of 68%, 64%, 54% and 45% in the top L/10, L/5, L/2 and L ranked, respectively. The performance of our method is
better than the normalized Mutual Information scoring with sequence weighting and the Bayesian approach of Burger &
van Nimwegen (B&vN).