Building a Biological Space Based on Protein Sequence Similarities and Biological Ontologies

Author(s): Paul Kersey, David Lonsdale, Nicky J. Mulder, Robert Petryszak, Rolf Apweiler

Journal Name: Combinatorial Chemistry & High Throughput Screening
Accelerated Technologies for Biotechnology, Bioassays, Medicinal Chemistry and Natural Products Research

Volume 11 , Issue 8 , 2008

Become EABM
Become Reviewer


Assignment of function to protein sequence is a task of growing importance in the life sciences, as new highthroughput sequencing DNA technologies generate ever increasing quantities of genomic and meta-genomic data. Patterns within the sequence space, caused by the evolutionary conservation and assembly of protein domains, make possible the inference of function from sequence similarity. Clustering similar sequences is a useful technique for finding conserved sequences; the CluSTr database is a publicly-available database arranging proteins in a hierarchy structured by similarity. The protein classification tool InterProScan builds on this approach by applying a range of methods to detect proteins that contain signatures indicative of the presence of particular conserved domains. The use of ontologies to describe protein function provides a flexible and abstract language to classify proteins. Together, these techniques can provide an understanding of the shape of the protein space, and can be used to explore the unchartered waters of the emerging metagenomic world.

Keywords: CluSTr, clustering, genomes, GO, InterPro, metagenomes, orthology, paralagy

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2008
Page: [653 - 660]
Pages: 8
DOI: 10.2174/138620708785739925
Price: $65

Article Metrics

PDF: 6