Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36484697

RESUMEN

MOTIVATION: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands. RESULTS: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides. AVAILABILITY AND IMPLEMENTATION: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases del Conocimiento , Bases de Datos de Proteínas , Ligandos , Secuencia de Aminoácidos , Sitios de Unión , Anotación de Secuencia Molecular
2.
Nucleic Acids Res ; 50(D1): D693-D700, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34755880

RESUMEN

Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function.


Asunto(s)
Fenómenos Químicos , Bases de Datos Factuales , Programas Informáticos , Animales , Humanos , Internet , Bases del Conocimiento
3.
Brief Bioinform ; 22(2): 642-663, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-33147627

RESUMEN

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.


Asunto(s)
COVID-19/prevención & control , Biología Computacional , SARS-CoV-2/aislamiento & purificación , Investigación Biomédica , COVID-19/epidemiología , COVID-19/virología , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética
4.
Bioinformatics ; 36(6): 1896-1901, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31688925

RESUMEN

MOTIVATION: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. RESULTS: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. AVAILABILITY AND IMPLEMENTATION: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.


Asunto(s)
Reiformes , Animales , Bases de Datos de Proteínas , Bases del Conocimiento
6.
Nucleic Acids Res ; 47(D1): D596-D600, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30272209

RESUMEN

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of over 11 000 expert-curated biochemical reactions that uses chemical entities from the ChEBI ontology to represent reaction participants. Originally designed as an annotation vocabulary for the UniProt Knowledgebase (UniProtKB), Rhea also provides reaction data for a range of other core knowledgebases and data repositories including ChEBI and MetaboLights. Here we describe recent developments in Rhea, focusing on a new resource description framework representation of Rhea reaction data and an SPARQL endpoint (https://sparql.rhea-db.org/sparql) that provides access to it. We demonstrate how federated queries that combine the Rhea SPARQL endpoint and other SPARQL endpoints such as that of UniProt can provide improved metabolite annotation and support integrative analyses that link the metabolome through the proteome to the transcriptome and genome. These developments will significantly boost the utility of Rhea as a means to link chemistry and biology for a more holistic understanding of biological systems and their function in health and disease.


Asunto(s)
Bases de Datos de Compuestos Químicos , Bases de Datos de Proteínas , Metabolómica/métodos , Programas Informáticos/normas , Humanos , Bases del Conocimiento , Biología de Sistemas/métodos
7.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
8.
Nucleic Acids Res ; 45(D1): D415-D418, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27789701

RESUMEN

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of expert-curated biochemical reactions designed for the functional annotation of enzymes and the description of metabolic networks. Rhea describes enzyme-catalyzed reactions covering the IUBMB Enzyme Nomenclature list as well as additional reactions, including spontaneously occurring reactions, using entities from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Here we describe developments in Rhea since our last report in the database issue of Nucleic Acids Research. These include the first implementation of a simple hierarchical classification of reactions, improved coverage of the IUBMB Enzyme Nomenclature list and additional reactions through continuing expert curation, and the development of a new website to serve this improved dataset.

9.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
10.
Nucleic Acids Res ; 43(Database issue): D459-64, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25332395

RESUMEN

Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models.


Asunto(s)
Bases de Datos de Compuestos Químicos , Enzimas/metabolismo , Redes y Vías Metabólicas , Fenómenos Bioquímicos , Biopolímeros/metabolismo , Genómica , Internet , Redes y Vías Metabólicas/genética
11.
Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25348399

RESUMEN

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Homología de Secuencia de Aminoácido , Humanos , Internet , Proteínas/clasificación
12.
Nucleic Acids Res ; 43(Database issue): D213-21, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25428371

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Bacterias/metabolismo , Ontología de Genes , Estructura Terciaria de Proteína , Proteínas/genética , Análisis de Secuencia de Proteína , Programas Informáticos
13.
Bioinformatics ; 31(11): 1875-7, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-25638809

RESUMEN

MOTIVATION: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. RESULTS: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. AVAILABILITY AND IMPLEMENTATION: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql.


Asunto(s)
Bases de Datos Factuales , Disciplinas de las Ciencias Biológicas , Internet , Semántica , Integración de Sistemas
14.
Bioinformatics ; 30(9): 1338-9, 2014 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24413672

RESUMEN

MOTIVATION: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Academias e Institutos , Investigación Biomédica , Internet
15.
Nucleic Acids Res ; 41(Database issue): D584-9, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23193261

RESUMEN

HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteínas/clasificación , Eucariontes/genética , Internet
16.
Hum Mutat ; 35(8): 927-35, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24848695

RESUMEN

During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.


Asunto(s)
Bases de Datos de Proteínas/estadística & datos numéricos , Estudios de Asociación Genética , Genética Médica , Bases del Conocimiento , Proteoma , Programas Informáticos , Secuencia de Aminoácidos , Variación Genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Terminología como Asunto
18.
Nucleic Acids Res ; 40(Web Server issue): W597-603, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22661580

RESUMEN

ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.


Asunto(s)
Biología Computacional , Proteómica , Programas Informáticos , Gráficos por Computador , Genómica , Internet , Integración de Sistemas , Interfaz Usuario-Computador
19.
ArXiv ; 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38903736

RESUMEN

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.

20.
Sci Data ; 11(1): 982, 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39251610

RESUMEN

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.


Asunto(s)
Enzimas , Procesamiento de Lenguaje Natural , Enzimas/química , PubMed , Bases de Datos de Proteínas , Bases del Conocimiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA