Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Nucleic Acids Res ; 50(D1): D693-D700, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34755880

RESUMO

Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function.


Assuntos
Fenômenos Químicos , Bases de Dados Factuais , Software , Animais , Humanos , Internet , Bases de Conhecimento
2.
Bioinformatics ; 36(6): 1896-1901, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31688925

RESUMO

MOTIVATION: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. RESULTS: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. AVAILABILITY AND IMPLEMENTATION: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.


Assuntos
Reiformes , Animais , Bases de Dados de Proteínas , Bases de Conhecimento
3.
Nucleic Acids Res ; 47(D1): D596-D600, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30272209

RESUMO

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of over 11 000 expert-curated biochemical reactions that uses chemical entities from the ChEBI ontology to represent reaction participants. Originally designed as an annotation vocabulary for the UniProt Knowledgebase (UniProtKB), Rhea also provides reaction data for a range of other core knowledgebases and data repositories including ChEBI and MetaboLights. Here we describe recent developments in Rhea, focusing on a new resource description framework representation of Rhea reaction data and an SPARQL endpoint (https://sparql.rhea-db.org/sparql) that provides access to it. We demonstrate how federated queries that combine the Rhea SPARQL endpoint and other SPARQL endpoints such as that of UniProt can provide improved metabolite annotation and support integrative analyses that link the metabolome through the proteome to the transcriptome and genome. These developments will significantly boost the utility of Rhea as a means to link chemistry and biology for a more holistic understanding of biological systems and their function in health and disease.


Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados de Proteínas , Metabolômica/métodos , Software/normas , Humanos , Bases de Conhecimento , Biologia de Sistemas/métodos
4.
Nucleic Acids Res ; 45(D1): D415-D418, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27789701

RESUMO

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of expert-curated biochemical reactions designed for the functional annotation of enzymes and the description of metabolic networks. Rhea describes enzyme-catalyzed reactions covering the IUBMB Enzyme Nomenclature list as well as additional reactions, including spontaneously occurring reactions, using entities from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Here we describe developments in Rhea since our last report in the database issue of Nucleic Acids Research. These include the first implementation of a simple hierarchical classification of reactions, improved coverage of the IUBMB Enzyme Nomenclature list and additional reactions through continuing expert curation, and the development of a new website to serve this improved dataset.

5.
Nucleic Acids Res ; 44(D1): D523-6, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26527720

RESUMO

MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from a number of major resources imported into a common namespace of chemical compounds, reactions, cellular compartments--namely MNXref--and proteins. The MetaNetX.org website (http://www.metanetx.org/) provides access to these integrated data as well as a variety of tools that allow users to import their own GSMNs, map them to the MNXref reconciliation, and manipulate, compare, analyze, simulate (using flux balance analysis) and export the resulting GSMNs. MNXref and MetaNetX are regularly updated and freely available.


Assuntos
Bases de Dados de Compostos Químicos , Genoma , Redes e Vias Metabólicas/genética , Estrutura Molecular , Software
6.
Nucleic Acids Res ; 43(Database issue): D459-64, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25332395

RESUMO

Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models.


Assuntos
Bases de Dados de Compostos Químicos , Enzimas/metabolismo , Redes e Vias Metabólicas , Fenômenos Bioquímicos , Biopolímeros/metabolismo , Genômica , Internet , Redes e Vias Metabólicas/genética
7.
Environ Microbiol ; 18(10): 3403-3424, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-26913973

RESUMO

By the time the complete genome sequence of the soil bacterium Pseudomonas putida KT2440 was published in 2002 (Nelson et al., ) this bacterium was considered a potential agent for environmental bioremediation of industrial waste and a good colonizer of the rhizosphere. However, neither the annotation tools available at that time nor the scarcely available omics data-let alone metabolic modeling and other nowadays common systems biology approaches-allowed them to anticipate the astonishing capacities that are encoded in the genetic complement of this unique microorganism. In this work we have adopted a suite of state-of-the-art genomic analysis tools to revisit the functional and metabolic information encoded in the chromosomal sequence of strain KT2440. We identified 242 new protein-coding genes and re-annotated the functions of 1548 genes, which are linked to almost 4900 PubMed references. Catabolic pathways for 92 compounds (carbon, nitrogen and phosphorus sources) that could not be accommodated by the previously constructed metabolic models were also predicted. The resulting examination not only accounts for some of the known stress tolerance traits known in P. putida but also recognizes the capacity of this bacterium to perform difficult redox reactions, thereby multiplying its value as a platform microorganism for industrial biotechnology.


Assuntos
Genoma Bacteriano , Pseudomonas putida/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Carbono/metabolismo , Genômica , Nitrogênio/metabolismo , Pseudomonas putida/metabolismo
8.
Brief Bioinform ; 15(1): 123-35, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23172809

RESUMO

Genome-scale metabolic network reconstructions are now routinely used in the study of metabolic pathways, their evolution and design. The development of such reconstructions involves the integration of information on reactions and metabolites from the scientific literature as well as public databases and existing genome-scale metabolic models. The reconciliation of discrepancies between data from these sources generally requires significant manual curation, which constitutes a major obstacle in efforts to develop and apply genome-scale metabolic network reconstructions. In this work, we discuss some of the major difficulties encountered in the mapping and reconciliation of metabolic resources and review three recent initiatives that aim to accelerate this process, namely BKM-react, MetRxn and MNXref (presented in this article). Each of these resources provides a pre-compiled reconciliation of many of the most commonly used metabolic resources. By reducing the time required for manual curation of metabolite and reaction discrepancies, these resources aim to accelerate the development and application of high-quality genome-scale metabolic network reconstructions and models.


Assuntos
Redes e Vias Metabólicas , Biologia Computacional , Simulação por Computador , Bases de Dados Factuais/estatística & dados numéricos , Genômica/estatística & dados numéricos , Redes e Vias Metabólicas/genética , Modelos Biológicos , Estrutura Molecular , Software
9.
Nucleic Acids Res ; 40(Database issue): D761-9, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22102589

RESUMO

UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway.


Assuntos
Bases de Dados Factuais , Redes e Vias Metabólicas , Bases de Dados de Proteínas , Enzimas/metabolismo , Lisina/biossíntese , Anotação de Sequência Molecular
11.
Nucleic Acids Res ; 40(Database issue): D754-60, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22135291

RESUMO

Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reactions. Rhea provides a non-redundant set of chemical transformations for use in a broad spectrum of applications, including metabolic network reconstruction and pathway inference. Rhea includes enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list), transport reactions and spontaneously occurring reactions. Rhea reactions are described using chemical species from the Chemical Entities of Biological Interest ontology (ChEBI) and are stoichiometrically balanced for mass and charge. They are extensively manually curated with links to source literature and other public resources on metabolism including enzyme and pathway databases. This cross-referencing facilitates the mapping and reconciliation of common reactions and compounds between distinct resources, which is a common first step in the reconstruction of genome scale metabolic networks and models.


Assuntos
Fenômenos Bioquímicos , Bases de Dados Factuais , Enzimas/metabolismo , Internet , Redes e Vias Metabólicas , Software
12.
Sci Data ; 11(1): 982, 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39251610

RESUMO

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.


Assuntos
Enzimas , Processamento de Linguagem Natural , Enzimas/química , PubMed , Bases de Dados de Proteínas , Bases de Conhecimento
13.
ArXiv ; 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38903736

RESUMO

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

14.
Microbiology (Reading) ; 159(Pt 4): 757-770, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23429746

RESUMO

Continuous updating of the genome sequence of Bacillus subtilis, the model of the Firmicutes, is a basic requirement needed by the biology community. In this work new genomic objects have been included (toxin/antitoxin genes and small RNA genes) and the metabolic network has been entirely updated. The curated view of the validated metabolic pathways present in the organism as of 2012 shows several significant differences from pathways present in the other bacterial reference, Escherichia coli: variants in synthesis of cofactors (thiamine, biotin, bacillithiol), amino acids (lysine, methionine), branched-chain fatty acids, tRNA modification and RNA degradation. In this new version, gene products that are enzymes or transporters are explicitly linked to the biochemical reactions of the RHEA reaction resource (http://www.ebi.ac.uk/rhea/), while novel compound entries have been created in the database Chemical Entities of Biological Interest (http://www.ebi.ac.uk/chebi/). The newly annotated sequence is deposited at the International Nucleotide Sequence Data Collaboration with accession number AL009126.4.


Assuntos
Bacillus subtilis/metabolismo , Proteínas de Bactérias/metabolismo , Genoma Bacteriano , Redes e Vias Metabólicas/genética , Bacillus subtilis/genética , Proteínas de Bactérias/genética , Genômica , Anotação de Sequência Molecular , Dados de Sequência Molecular , Análise de Sequência de DNA
15.
Database (Oxford) ; 20222022 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-35961013

RESUMO

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.


Assuntos
Genômica , Proteínas , Sequência de Bases , Biologia Computacional , Genoma , Anotação de Sequência Molecular
16.
Metabolites ; 11(1)2021 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-33445429

RESUMO

The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.

17.
Infect Genet Evol ; 8(4): 459-66, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17644446

RESUMO

Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa introduced in the Caribbean and threatening to emerge and spread in the American mainland. Complete genome sequencing was done for two isolates of E. ruminantium of differing phenotype, isolates Gardel (Erga) from Guadeloupe Island and Welgevonden (Erwe) originating from South Africa and maintained in Guadeloupe. The type strain of E. ruminantium (Erwo), previously isolated and sequenced in South Africa; is identical to Erwe with respect to target genes. They make the Erwe/Erwo complex. Comparative analysis of the genomes shows the presence of 49 unique CDS and 28 truncated CDS differentiating Erga from Erwe/Erwo. Three regions of accumulated differences (RAD) acting as mutational hot spots were identified in E. ruminantium. Ten CDS, six unique CDS and four truncated CDS corresponding to major genomic changes (deletions or extensive mutations) were considered as targets for differential diagnosis on four isolates of E. ruminantium: Erga, Erwe/Erwo, Senegal and Umpala. Pairs of PCR primers were developed for each target gene. PCR analysis of the target genes generated strain-specific patterns on Erga and Erwe/Erwo as predicted by comparative genomics, but also for isolates Senegal and Umpala. The target genes identified by bacterial comparative genomics are shown to be highly efficient for strain-specific PCR diagnosis of E. ruminantium and further vaccine management tools.


Assuntos
Ehrlichia ruminantium/isolamento & purificação , Hidropericárdio/diagnóstico , Hidropericárdio/microbiologia , Animais , Bovinos , Doenças dos Bovinos/diagnóstico , Doenças dos Bovinos/microbiologia , Células Cultivadas , DNA Bacteriano/análise , DNA Bacteriano/isolamento & purificação , Ehrlichia ruminantium/genética , Feminino , Genoma Bacteriano , Geografia , Cabras , Camundongos , Ovinos , Especificidade da Espécie
18.
Ann N Y Acad Sci ; 1081: 417-33, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17135545

RESUMO

The tick-borne Rickettsiale Ehrlichia ruminantium (E. ruminantium) is the causative agent of heartwater in Africa and the Caribbean. Heartwater, responsible for major losses on livestock in Africa represents also a threat for the American mainland. Three complete genomes corresponding to two different groups of differing phenotypes, Gardel and Welgevonden, have been recently described. One genome (Erga) represents the Gardel group from Guadeloupe Island and two genomes (Erwo and Erwe) belong to the Welgevonden group. Erwo, isolated in South Africa, is the parental strain of Erwe, which was maintained for 18 years in Guadeloupe under different culture conditions than Erwo. The three strains display genomes of differing sizes with 1,499,920 bp, 1,512,977 bp, and 1,516,355 bp for Erga, Erwe, and Erwo, respectively. Gene sequences and order are highly conserved between the three strains, although several gene truncations could be pinpointed, most of them occurring within three regions of accumulated differences (RAD). E. ruminantium displays a strong leading/lagging compositional bias inducing a strand-specific codon usage. Finally, a striking feature of E. ruminantium is the presence of long intergenic regions containing tandem repeats. These repeats are at the origin of an active process, specific to E. ruminantium, of genome expansion/contraction based on the addition or removal of tandem units.


Assuntos
Ehrlichia ruminantium/genética , Evolução Molecular , Genoma Bacteriano , Sequências de Repetição em Tandem/genética , Animais , Sequência Conservada , Dados de Sequência Molecular , Peso Molecular , Especificidade da Espécie
19.
Curr Opin Drug Discov Devel ; 6(3): 346-52, 2003 May.
Artigo em Inglês | MEDLINE | ID: mdl-12833667

RESUMO

The development of genomic and post-genomic technologies has created an explosion in the quantity, diversity and availability of both biological data and methods of analysis. Biologists are currently facing the problem of using all these resources to convert raw data into new valuable knowledge. This review presents software platforms designed to handle data and/or methods in the context of genome analysis.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genoma , Análise de Sequência de DNA/métodos , Animais , Sistemas de Gerenciamento de Base de Dados/tendências , Bases de Dados Genéticas/tendências , Genoma Humano , Humanos , Análise de Sequência de DNA/tendências
20.
J Bacteriol ; 188(7): 2533-42, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16547041

RESUMO

Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa that has been introduced in the Caribbean and is threatening to emerge and spread on the American mainland. We sequenced the complete genomes of two strains of E. ruminantium of differing phenotypes, strains Gardel (Erga; 1,499,920 bp), from the island of Guadeloupe, and Welgevonden (Erwe; 1,512,977 bp), originating in South Africa and maintained in Guadeloupe in a different cell environment. Comparative genomic analysis of these two strains was performed with the recently published parent strain of Erwe (Erwo) and other Rickettsiales (Anaplasma, Wolbachia, and Rickettsia spp.). Gene order is highly conserved between the E. ruminantium strains and with A. marginale. In contrast, there is very little conservation of gene order with members of the Rickettsiaceae. However, gene order may be locally conserved, as illustrated by the tuf operons. Eighteen truncated protein-encoding sequences (CDSs) differentiate Erga from Erwe/Erwo, whereas four other truncated CDSs differentiate Erwe from Erwo. Moreover, E. ruminantium displays the lowest coding ratio observed among bacteria due to unusually long intergenic regions. This is related to an active process of genome expansion/contraction targeted at tandem repeats in noncoding regions and based on the addition or removal of ca. 150-bp tandem units. This process seems to be specific to E. ruminantium and is not observed in the other Rickettsiales.


Assuntos
Ehrlichia ruminantium/classificação , Ehrlichia ruminantium/genética , Evolução Molecular , Variação Genética/genética , Genoma Bacteriano , Mutagênese/genética , Sequência Conservada , Ordem dos Genes , Dados de Sequência Molecular , Fenótipo , Especificidade da Espécie , Sequências de Repetição em Tandem/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa