Protein Sequence Annotation by Means of Community Detection
Pier Luigi Martelli,
In the postgenomic era different electronic procedures are available for protein sequence
annotation, the process of enriching, with structural and functional features, any protein after electronic
translation from its correspondent gene or mRNA. The demand of reliable annotation systems is
particularly urgent given the volume of genomic data that are daily produced by next generation
sequencing machines. In this paper we present a procedure that enhances the annotation performance
of the previously described Bologna Annotation Resource (BAR+). BAR is based on clustering of the graphs representing
the similarity between a large number of protein sequences and here we apply community detection algorithms to detect
subclusters within any graph. When the cluster is endowed with specific Gene Ontology terms associated both to
Biological Process and Molecular Function, the application of our procedure allows a fine tuning of the annotation
process and generates subclusters where proteins sharing strictly related GO terms are grouped.
Keywords: Clustering, community detection, protein sequence annotation.
Rights & PermissionsPrintExport