A Simple Protein Evolutionary Classification Method Based on the Mutual Relations Between Protein Sequences

(E-pub Ahead of Print)

Author(s): Xiaogeng Wan*, Xinying Tan

Journal Name: Current Bioinformatics

Become EABM
Become Reviewer

Abstract:

Aims:This paper presents a simple method that is efficient for protein evolutionary classification.

Background: Proteins are diverse with their sequences, structures and functions. It is important to understand the relations between the sequences, structures and functions of proteins. Many methods have been developed for protein evolutionaryclassifications, these methods include machine learning methods such as the LibSVM, feature methods such as the natural vector method and the protein map. Machine learning methods use pre-labeled training sets to classify protein sequences into disjoint classes. Feature methods such as the natural vector and the protein map convert protein sequences into feature vectors and use polygenetic-trees to classify on the distance between the feature vectors. In this paper, we propose a simple method that classify the evolutionary relations of protein sequences using the distance maps on the mutual relations between protein sequences. The new method is unsupervised and model-free, which is efficient in the evolutionary classifications of proteins.

Objective: In this paper, we propose a simple method that classify the evolutionary relations of protein sequences using the distance maps on the mutual relations between protein sequences. The new method is unsupervised and model-free, which is efficient in the evolutionary classifications of proteins.

Method: To quantify the mutual relations and the homology of protein sequences, we use the normalized mutual information rates on protein sequences, and we define two distance maps that convert the normalized mutual information rates into 'distances', and use UPGMA trees to present the evolutionary classifications of proteins.

Result: We use four classifical protein evolutionary classification examples to demonstrate the new method, where the results are compared with traditional methods such as the natural vector and the protein maps. We use the AUPRC curves to evaluate the classification qualities of the new method and the traditional methods. We found that the new method with the two distance maps is efficient in the evolutionary classification of the classical examples, and it outperforms the natural vector and the protein maps in the evolutionary classifications.

Conclusion: The normalized mutual information rates with the two distance maps are efficient in protein evolutionary classifications, which outperform some classifical methods in the evolutionary classifications.

Other: The results are compared with traditional protein evolutionary classification methods such as the natural vector and the protein map, and the method of AUPRC curves is applied to the new method and the traditional methods to inspect the classification accuracies.

Keywords: Protein evolutionary classification, mutual information rate, protein sequence.Protein evolutionary classification, protein sequence.

Rights & PermissionsPrintExport Cite as

Article Details

(E-pub Ahead of Print)
DOI: 10.2174/1574893615666200305090055
Price: $95