Pesquisa | Portal Regional da BVS

Genome and proteome annotation: organization, interpretation and integration.

Reeves, Gabrielle A; Talavera, David; Thornton, Janet M.

J R Soc Interface ; 6(31): 129-47, 2009 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-19019817

RESUMO

Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.

Assuntos

Genoma , Genômica/métodos , Proteoma , Proteômica/métodos , Animais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular

The Protein Feature Ontology: a tool for the unification of protein feature annotations.

Reeves, Gabrielle A; Eilbeck, Karen; Magrane, Michele; O'Donovan, Claire; Montecchi-Palazzi, Luisa; Harris, Midori A; Orchard, Sandra; Jimenez, Rafael C; Prlic, Andreas; Hubbard, Tim J P; Hermjakob, Henning; Thornton, Janet M.

Bioinformatics ; 24(23): 2767-72, 2008 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-18936051

RESUMO

MOTIVATION: The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of uncharacterized protein structures and sequences. Consequently, many computational tools have been developed to help elucidate protein function. However, such services are spread throughout the world, often with standalone web pages. Integration of these methods is needed and so far this has not been possible as there was no common vocabulary available that could be used as a standard language. RESULTS: The Protein Feature Ontology has been developed to provide a structured controlled vocabulary for features on a protein sequence or structure and comprises approximately 100 positional terms, now integrated into the Sequence Ontology (SO) and 40 non-positional terms which describe features relating to the whole-protein sequence. In addition, post-translational modifications are described by using a pre-existing ontology, the Protein Modification Ontology (MOD). This ontology is being used to integrate over 150 distinct annotations provided by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in Europe. AVAILABILITY: The Protein Feature Ontology can be browsed by accessing the ontology lookup service at the European Bioinformatics Institute (http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS).

Assuntos

Biologia Computacional/métodos , Proteínas/química , Software , Vocabulário Controlado , Bases de Dados de Proteínas , Internet , Proteínas/metabolismo , Proteoma/genética

Integrating biological data--the Distributed Annotation System.

Jenkinson, Andrew M; Albrecht, Mario; Birney, Ewan; Blankenburg, Hagen; Down, Thomas; Finn, Robert D; Hermjakob, Henning; Hubbard, Tim J P; Jimenez, Rafael C; Jones, Philip; Kähäri, Andreas; Kulesha, Eugene; Macías, José R; Reeves, Gabrielle A; Prlic, Andreas.

BMC Bioinformatics ; 9 Suppl 8: S3, 2008 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-18673527

RESUMO

BACKGROUND: The Distributed Annotation System (DAS) is a widely adopted protocol for dynamically integrating a wide range of biological data from geographically diverse sources. DAS continues to expand its applicability and evolve in response to new challenges facing integrative bioinformatics. RESULTS: Here we describe the various infrastructure components of DAS and present a new extended version of the DAS specification. Version 1.53E incorporates several recent developments, including its extension to serve new data types and an ontology for protein features. CONCLUSION: Our extensions to the DAS protocol have facilitated the integration of new data types, and our improvements to the existing DAS infrastructure have addressed recent challenges. The steadily increasing numbers of available data sources demonstrates further adoption of the DAS protocol.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Biologia Computacional/métodos , Integração de Sistemas

The implications of alternative splicing in the ENCODE protein complement.

Tress, Michael L; Martelli, Pier Luigi; Frankish, Adam; Reeves, Gabrielle A; Wesselink, Jan Jaap; Yeats, Corin; Olason, Páll Isólfur; Albrecht, Mario; Hegyi, Hedi; Giorgetti, Alejandro; Raimondo, Domenico; Lagarde, Julien; Laskowski, Roman A; López, Gonzalo; Sadowski, Michael I; Watson, James D; Fariselli, Piero; Rossi, Ivan; Nagy, Alinda; Kai, Wang; Størling, Zenia; Orsini, Massimiliano; Assenov, Yassen; Blankenburg, Hagen; Huthmacher, Carola; Ramírez, Fidel; Schlicker, Andreas; Denoeud, France; Jones, Phil; Kerrien, Samuel; Orchard, Sandra; Antonarakis, Stylianos E; Reymond, Alexandre; Birney, Ewan; Brunak, Søren; Casadio, Rita; Guigo, Roderic; Harrow, Jennifer; Hermjakob, Henning; Jones, David T; Lengauer, Thomas; Orengo, Christine A; Patthy, László; Thornton, Janet M; Tramontano, Anna; Valencia, Alfonso.

Proc Natl Acad Sci U S A ; 104(13): 5495-500, 2007 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-17372197

RESUMO

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.

Assuntos

Processamento Alternativo , Precursores de RNA , Bases de Dados Genéticas , Regulação da Expressão Gênica , Genoma Humano , Humanos , Internet , Modelos Moleculares , Conformação Proteica , Isoformas de Proteínas , Sinais Direcionadores de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Splicing de RNA

Structural diversity of domain superfamilies in the CATH database.

Reeves, Gabrielle A; Dallman, Timothy J; Redfern, Oliver C; Akpor, Adrian; Orengo, Christine A.

J Mol Biol ; 360(3): 725-41, 2006 Jul 14.

Artigo em Inglês | MEDLINE | ID: mdl-16780872

RESUMO

The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Sequência de Aminoácidos , Azurina/química , Carboidratos/química , Sequência Conservada , Galectinas/química , Lacase/química , Mutação/genética , Estrutura Secundária de Proteína , Homologia Estrutural de Proteína

Integrating biological data through the genome.

Reeves, Gabrielle A; Thornton, Janet M.

Hum Mol Genet ; 15 Spec No 1: R81-7, 2006 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-16651373

RESUMO

Owing to the ongoing success of the genome sequencing and structural genomics projects, the increase in both sequence and structural data is rapid. The development of tools for the annotation of sequence and structural data has become more important in the hope of keeping up with this data explosion. Scientists in this field have addressed these issues over the last 10 years and there now exists a wealth of methods and approaches to help interpret these data. However, there is no current way in which these methods can be incorporated easily so that the resulting annotations can be viewed together. This review discusses the development of these annotation methods and introduces the BioSapiens Network of Excellence, which has been formed in order to integrate the methods which have been developed in Europe.

Assuntos

Biologia Computacional/métodos , Genoma , Genômica/métodos , Animais , Bases de Dados de Ácidos Nucleicos , Humanos , Modelos Biológicos , Relação Estrutura-Atividade

Exploiting protein structure data to explore the evolution of protein function and biological complexity.

Marsden, Russell L; Ranea, Juan A G; Sillero, Antonio; Redfern, Oliver; Yeats, Corin; Maibaum, Michael; Lee, David; Addou, Sarah; Reeves, Gabrielle A; Dallman, Timothy J; Orengo, Christine A.

Philos Trans R Soc Lond B Biol Sci ; 361(1467): 425-40, 2006 Mar 29.

Artigo em Inglês | MEDLINE | ID: mdl-16524831

RESUMO

New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.

Assuntos

Evolução Molecular , Proteínas/química , Proteínas/metabolismo , Algoritmos , Biologia Computacional , Bases de Dados Factuais , Conformação Proteica

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.

Pearl, Frances; Todd, Annabel; Sillitoe, Ian; Dibley, Mark; Redfern, Oliver; Lewis, Tony; Bennett, Christopher; Marsden, Russell; Grant, Alistair; Lee, David; Akpor, Adrian; Maibaum, Michael; Harrison, Andrew; Dallman, Timothy; Reeves, Gabrielle; Diboun, Ilhem; Addou, Sarah; Lise, Stefano; Johnston, Caroline; Sillero, Antonio; Thornton, Janet; Orengo, Christine.

Nucleic Acids Res ; 33(Database issue): D247-51, 2005 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-15608188

RESUMO

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.

Assuntos

Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genômica , Estrutura Terciária de Proteína , Proteínas/classificação , Análise de Sequência de Proteína , Bases de Dados de Proteínas/estatística & dados numéricos , Internet , Proteínas/genética , Homologia de Sequência de Aminoácidos , Integração de Sistemas , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA