Combinatorial Chemistry & High Throughput Screening

Rathnam Chaguturu 
iDDPartners, 3 Edith Court
Princeton Junction
NJ 08550
USA

Back

Building a Biological Space Based on Protein Sequence Similarities and Biological Ontologies

Author(s): Paul Kersey, David Lonsdale, Nicky J. Mulder, Robert Petryszak and Rolf Apweiler

Affiliation: EMBL Outstation The European Bioinfomatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Keywords: CluSTr, clustering, genomes, GO, InterPro, metagenomes, orthology, paralagy

Abstract:

Assignment of function to protein sequence is a task of growing importance in the life sciences, as new highthroughput sequencing DNA technologies generate ever increasing quantities of genomic and meta-genomic data. Patterns within the sequence space, caused by the evolutionary conservation and assembly of protein domains, make possible the inference of function from sequence similarity. Clustering similar sequences is a useful technique for finding conserved sequences; the CluSTr database is a publicly-available database arranging proteins in a hierarchy structured by similarity. The protein classification tool InterProScan builds on this approach by applying a range of methods to detect proteins that contain signatures indicative of the presence of particular conserved domains. The use of ontologies to describe protein function provides a flexible and abstract language to classify proteins. Together, these techniques can provide an understanding of the shape of the protein space, and can be used to explore the unchartered waters of the emerging metagenomic world.

Order Reprints Order Eprints Rights & PermissionsPrintExport

Article Details

VOLUME: 11
ISSUE: 8
Page: [653 - 660]
Pages: 8
DOI: 10.2174/138620708785739925