Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
1.
Nat Methods ; 20(2): 193-204, 2023 02.
Article in English | MEDLINE | ID: mdl-36543939

ABSTRACT

Progress in mass spectrometry lipidomics has led to a rapid proliferation of studies across biology and biomedicine. These generate extremely large raw datasets requiring sophisticated solutions to support automated data processing. To address this, numerous software tools have been developed and tailored for specific tasks. However, for researchers, deciding which approach best suits their application relies on ad hoc testing, which is inefficient and time consuming. Here we first review the data processing pipeline, summarizing the scope of available tools. Next, to support researchers, LIPID MAPS provides an interactive online portal listing open-access tools with a graphical user interface. This guides users towards appropriate solutions within major areas in data processing, including (1) lipid-oriented databases, (2) mass spectrometry data repositories, (3) analysis of targeted lipidomics datasets, (4) lipid identification and (5) quantification from untargeted lipidomics datasets, (6) statistical analysis and visualization, and (7) data integration solutions. Detailed descriptions of functions and requirements are provided to guide customized data analysis workflows.


Subject(s)
Computational Biology , Lipidomics , Computational Biology/methods , Software , Informatics , Lipids/chemistry
2.
Nucleic Acids Res ; 52(D1): D817-D821, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37897348

ABSTRACT

ViralZone (http://viralzone.expasy.org) is a knowledge repository for viruses that links biological knowledge and databases. It contains data on virion structure, genome, proteome, replication cycle and host-virus interactions. The new update provides better access to the data through contextual popups and higher resolution images in Scalable Vector Graphics (SVG) format. These images are designed to be dynamic and interactive with human viruses to give users better access to the data. In addition, a new coronavirus-specific resource provides regularly updated data on variants and molecular biology of SARS-CoV-2. Other virus-specific resources have been added to the database, particularly for HIV, herpesviruses and poxviruses.


Subject(s)
Knowledge Bases , Viruses , Humans , Virion/chemistry , Virion/genetics , Virion/growth & development , Viruses/chemistry , Viruses/genetics , Viruses/growth & development
3.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350672

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Subject(s)
Databases, Protein , Humans , Amino Acid Sequence , Artificial Intelligence , Internet , Proteins/chemistry , Software
4.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36484697

ABSTRACT

MOTIVATION: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands. RESULTS: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides. AVAILABILITY AND IMPLEMENTATION: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Knowledge Bases , Databases, Protein , Ligands , Amino Acid Sequence , Binding Sites , Molecular Sequence Annotation
5.
Metabolomics ; 20(1): 15, 2024 Jan 24.
Article in English | MEDLINE | ID: mdl-38267595

ABSTRACT

INTRODUCTION: Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied. OBJECTIVES: In addition, their structures are varied and complex, which complicates their analysis. Indeed, the structural resolution does not always allow a complete level of annotation so the actual compound analysed will vary from study to study and should be clearly stated. For all these reasons the identification and naming of lipids is complicated and very variable from one study to another, it needs to be harmonized. METHODS & RESULTS: In this position paper we will present and discuss the different way to name lipids (with chemoinformatic and semantic identifiers) and their importance to share lipidomic results. CONCLUSION: Homogenising this identification and adopting the same rules is essential to be able to share data within the community and to map data on functional networks.


Subject(s)
Lipidomics , Metabolomics , Lipids
6.
Nucleic Acids Res ; 50(D1): D693-D700, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34755880

ABSTRACT

Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function.


Subject(s)
Chemical Phenomena , Databases, Factual , Software , Animals , Humans , Internet , Knowledge Bases
7.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33156333

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Subject(s)
Databases, Protein , Proteins/chemistry , Amino Acid Sequence , COVID-19/metabolism , Internet , Molecular Sequence Annotation , Protein Domains , Protein Interaction Maps , SARS-CoV-2/metabolism , Sequence Alignment
8.
Bioinformatics ; 36(6): 1896-1901, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31688925

ABSTRACT

MOTIVATION: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. RESULTS: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide. AVAILABILITY AND IMPLEMENTATION: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org.


Subject(s)
Rheiformes , Animals , Databases, Protein , Knowledge Bases
9.
Metabolomics ; 17(6): 55, 2021 06 06.
Article in English | MEDLINE | ID: mdl-34091802

ABSTRACT

BACKGROUND: Improvements in mass spectrometry (MS) technologies coupled with bioinformatics developments have allowed considerable advancement in the measurement and interpretation of lipidomics data in recent years. Since research areas employing lipidomics are rapidly increasing, there is a great need for bioinformatic tools that capture and utilize the complexity of the data. Currently, the diversity and complexity within the lipidome is often concealed by summing over or averaging individual lipids up to (sub)class-based descriptors, losing valuable information about biological function and interactions with other distinct lipids molecules, proteins and/or metabolites. AIM OF REVIEW: To address this gap in knowledge, novel bioinformatics methods are needed to improve identification, quantification, integration and interpretation of lipidomics data. The purpose of this mini-review is to summarize exemplary methods to explore the complexity of the lipidome. KEY SCIENTIFIC CONCEPTS OF REVIEW: Here we describe six approaches that capture three core focus areas for lipidomics: (1) lipidome annotation including a resolvable database identifier, (2) interpretation via pathway- and enrichment-based methods, and (3) understanding complex interactions to emphasize specific steps in the analytical process and highlight challenges in analyses associated with the complexity of lipidome data.


Subject(s)
Computational Biology , Lipidomics , Databases, Factual , Lipids , Mass Spectrometry
10.
Nucleic Acids Res ; 47(D1): D596-D600, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30272209

ABSTRACT

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of over 11 000 expert-curated biochemical reactions that uses chemical entities from the ChEBI ontology to represent reaction participants. Originally designed as an annotation vocabulary for the UniProt Knowledgebase (UniProtKB), Rhea also provides reaction data for a range of other core knowledgebases and data repositories including ChEBI and MetaboLights. Here we describe recent developments in Rhea, focusing on a new resource description framework representation of Rhea reaction data and an SPARQL endpoint (https://sparql.rhea-db.org/sparql) that provides access to it. We demonstrate how federated queries that combine the Rhea SPARQL endpoint and other SPARQL endpoints such as that of UniProt can provide improved metabolite annotation and support integrative analyses that link the metabolome through the proteome to the transcriptome and genome. These developments will significantly boost the utility of Rhea as a means to link chemistry and biology for a more holistic understanding of biological systems and their function in health and disease.


Subject(s)
Databases, Chemical , Databases, Protein , Metabolomics/methods , Software/standards , Humans , Knowledge Bases , Systems Biology/methods
11.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30398656

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Animals , Databases, Genetic , Gene Ontology , Humans , Internet , Multigene Family , Protein Domains/genetics , Sequence Homology, Amino Acid , Software , User-Computer Interface
12.
PLoS Comput Biol ; 14(8): e1006390, 2018 08.
Article in English | MEDLINE | ID: mdl-30102703

ABSTRACT

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.


Subject(s)
Data Curation/methods , Information Storage and Retrieval/methods , Data Curation/statistics & numerical data , Databases, Genetic , Databases, Protein , Deep Learning , Genomics , Knowledge Bases , Machine Learning , Publications
13.
Nucleic Acids Res ; 45(D1): D415-D418, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27789701

ABSTRACT

Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of expert-curated biochemical reactions designed for the functional annotation of enzymes and the description of metabolic networks. Rhea describes enzyme-catalyzed reactions covering the IUBMB Enzyme Nomenclature list as well as additional reactions, including spontaneously occurring reactions, using entities from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Here we describe developments in Rhea since our last report in the database issue of Nucleic Acids Research. These include the first implementation of a simple hierarchical classification of reactions, improved coverage of the IUBMB Enzyme Nomenclature list and additional reactions through continuing expert curation, and the development of a new website to serve this improved dataset.

14.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899635

ABSTRACT

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Subject(s)
Computational Biology/methods , Databases, Protein , Protein Interaction Domains and Motifs , Software , Humans , Molecular Sequence Annotation , Phylogeny
15.
Nucleic Acids Res ; 44(D1): D523-6, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26527720

ABSTRACT

MetaNetX is a repository of genome-scale metabolic networks (GSMNs) and biochemical pathways from a number of major resources imported into a common namespace of chemical compounds, reactions, cellular compartments--namely MNXref--and proteins. The MetaNetX.org website (http://www.metanetx.org/) provides access to these integrated data as well as a variety of tools that allow users to import their own GSMNs, map them to the MNXref reconciliation, and manipulate, compare, analyze, simulate (using flux balance analysis) and export the resulting GSMNs. MNXref and MetaNetX are regularly updated and freely available.


Subject(s)
Databases, Chemical , Genome , Metabolic Networks and Pathways/genetics , Molecular Structure , Software
16.
Nucleic Acids Res ; 43(Database issue): D1064-70, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25348399

ABSTRACT

HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Sequence Homology, Amino Acid , Humans , Internet , Proteins/classification
17.
Nucleic Acids Res ; 43(Database issue): D459-64, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25332395

ABSTRACT

Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models.


Subject(s)
Databases, Chemical , Enzymes/metabolism , Metabolic Networks and Pathways , Biochemical Phenomena , Biopolymers/metabolism , Genomics , Internet , Metabolic Networks and Pathways/genetics
18.
Brief Bioinform ; 15(1): 123-35, 2014 Jan.
Article in English | MEDLINE | ID: mdl-23172809

ABSTRACT

Genome-scale metabolic network reconstructions are now routinely used in the study of metabolic pathways, their evolution and design. The development of such reconstructions involves the integration of information on reactions and metabolites from the scientific literature as well as public databases and existing genome-scale metabolic models. The reconciliation of discrepancies between data from these sources generally requires significant manual curation, which constitutes a major obstacle in efforts to develop and apply genome-scale metabolic network reconstructions. In this work, we discuss some of the major difficulties encountered in the mapping and reconciliation of metabolic resources and review three recent initiatives that aim to accelerate this process, namely BKM-react, MetRxn and MNXref (presented in this article). Each of these resources provides a pre-compiled reconciliation of many of the most commonly used metabolic resources. By reducing the time required for manual curation of metabolite and reaction discrepancies, these resources aim to accelerate the development and application of high-quality genome-scale metabolic network reconstructions and models.


Subject(s)
Metabolic Networks and Pathways , Computational Biology , Computer Simulation , Databases, Factual/statistics & numerical data , Genomics/statistics & numerical data , Metabolic Networks and Pathways/genetics , Models, Biological , Molecular Structure , Software
19.
Bioinformatics ; 31(17): 2860-6, 2015 Sep 01.
Article in English | MEDLINE | ID: mdl-25943471

ABSTRACT

MOTIVATION: Lipids are a large and diverse group of biological molecules with roles in membrane formation, energy storage and signaling. Cellular lipidomes may contain tens of thousands of structures, a staggering degree of complexity whose significance is not yet fully understood. High-throughput mass spectrometry-based platforms provide a means to study this complexity, but the interpretation of lipidomic data and its integration with prior knowledge of lipid biology suffers from a lack of appropriate tools to manage the data and extract knowledge from it. RESULTS: To facilitate the description and exploration of lipidomic data and its integration with prior biological knowledge, we have developed a knowledge resource for lipids and their biology-SwissLipids. SwissLipids provides curated knowledge of lipid structures and metabolism which is used to generate an in silico library of feasible lipid structures. These are arranged in a hierarchical classification that links mass spectrometry analytical outputs to all possible lipid structures, metabolic reactions and enzymes. SwissLipids provides a reference namespace for lipidomic data publication, data exploration and hypothesis generation. The current version of SwissLipids includes over 244 000 known and theoretically possible lipid structures, over 800 proteins, and curated links to published knowledge from over 620 peer-reviewed publications. We are continually updating the SwissLipids hierarchy with new lipid categories and new expert curated knowledge. AVAILABILITY: SwissLipids is freely available at http://www.swisslipids.org/. CONTACT: alan.bridge@isb-sib.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Databases, Factual , Knowledge Bases , Lipid Metabolism , Lipids/chemistry , Lipids/physiology , Mass Spectrometry/methods , Humans , Lipids/analysis
20.
Nat Methods ; 9(4): 345-50, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22453911

ABSTRACT

The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.


Subject(s)
Databases, Protein , Protein Interaction Mapping , Proteins/metabolism , Periodicals as Topic , Protein Binding , Proteins/chemistry , Quality Control
SELECTION OF CITATIONS
SEARCH DETAIL