Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
Add more filters










Publication year range
1.
Nat Immunol ; 2024 May 28.
Article in English | MEDLINE | ID: mdl-38806708

ABSTRACT

Inflammatory pain results from the heightened sensitivity and reduced threshold of nociceptor sensory neurons due to exposure to inflammatory mediators. However, the cellular and transcriptional diversity of immune cell and sensory neuron types makes it challenging to decipher the immune mechanisms underlying pain. Here we used single-cell transcriptomics to determine the immune gene signatures associated with pain development in three skin inflammatory pain models in mice: zymosan injection, skin incision and ultraviolet burn. We found that macrophage and neutrophil recruitment closely mirrored the kinetics of pain development and identified cell-type-specific transcriptional programs associated with pain and its resolution. Using a comprehensive list of potential interactions mediated by receptors, ligands, ion channels and metabolites to generate injury-specific neuroimmune interactomes, we also uncovered that thrombospondin-1 upregulated by immune cells upon injury inhibited nociceptor sensitization. This study lays the groundwork for identifying the neuroimmune axes that modulate pain in diverse disease contexts.

4.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36882166

ABSTRACT

MOTIVATION: The investigation of sets of genes using biological pathways is a common task for researchers and is supported by a wide variety of software tools. This type of analysis generates hypotheses about the biological processes that are active or modulated in a specific experimental context. RESULTS: The Network Data Exchange Integrated Query (NDEx IQuery) is a new tool for network and pathway-based gene set interpretation that complements or extends existing resources. It combines novel sources of pathways, integration with Cytoscape, and the ability to store and share analysis results. The NDEx IQuery web application performs multiple gene set analyses based on diverse pathways and networks stored in NDEx. These include curated pathways from WikiPathways and SIGNOR, published pathway figures from the last 27 years, machine-assembled networks using the INDRA system, and the new NCI-PID v2.0, an updated version of the popular NCI Pathway Interaction Database. NDEx IQuery's integration with MSigDB and cBioPortal now provides pathway analysis in the context of these two resources. AVAILABILITY AND IMPLEMENTATION: NDEx IQuery is available at https://www.ndexbio.org/iquery and is implemented in Javascript and Java.


Subject(s)
Computational Biology , Software , Computational Biology/methods , Protein Interaction Maps , Publications , Databases, Factual , Internet
5.
Mol Syst Biol ; 19(5): e11325, 2023 05 09.
Article in English | MEDLINE | ID: mdl-36938926

ABSTRACT

The analysis of omic data depends on machine-readable information about protein interactions, modifications, and activities as found in protein interaction networks, databases of post-translational modifications, and curated models of gene and protein function. These resources typically depend heavily on human curation. Natural language processing systems that read the primary literature have the potential to substantially extend knowledge resources while reducing the burden on human curators. However, machine-reading systems are limited by high error rates and commonly generate fragmentary and redundant information. Here, we describe an approach to precisely assemble molecular mechanisms at scale using multiple natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA identifies full and partial overlaps in information extracted from published papers and pathway databases, uses predictive models to improve the reliability of machine reading, and thereby assembles individual pieces of information into non-redundant and broadly usable mechanistic knowledge. Using INDRA to create high-quality corpora of causal knowledge we show it is possible to extend protein-protein interaction databases and explain co-dependencies in the Cancer Dependency Map.


Subject(s)
Data Mining , Natural Language Processing , Humans , Reproducibility of Results , Databases, Factual
6.
Bioinformatics ; 39(4)2023 04 03.
Article in English | MEDLINE | ID: mdl-36916735

ABSTRACT

MOTIVATION: Biomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings between these entries is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation. RESULTS: Biomappings implements a curation workflow for missing mappings which combines automated prediction with human-in-the-loop curation. It supports multiple prediction approaches and provides a web-based user interface for reviewing predicted mappings for correctness, combined with automated consistency checking. Predicted and curated mappings are made available in public, version-controlled resource files on GitHub. Biomappings currently makes available 9274 curated mappings and 40 691 predicted ones, providing previously missing mappings between widely used identifier resources covering small molecules, cell lines, diseases, and other concepts. We demonstrate the value of Biomappings on case studies involving predicting and curating missing mappings among cancer cell lines as well as small molecules tested in clinical trials. We also present how previously missing mappings curated using Biomappings were contributed back to multiple widely used community ontologies. AVAILABILITY AND IMPLEMENTATION: The data and code are available under the CC0 and MIT licenses at https://github.com/biopragmatics/biomappings.


Subject(s)
Data Curation , Vocabulary, Controlled , Humans , Data Curation/methods , Software , User-Computer Interface
7.
bioRxiv ; 2023 Feb 03.
Article in English | MEDLINE | ID: mdl-36778477

ABSTRACT

Inflammatory pain associated with tissue injury and infections, results from the heightened sensitivity of the peripheral terminals of nociceptor sensory neurons in response to exposure to inflammatory mediators. Targeting immune-derived inflammatory ligands, like prostaglandin E2, has been effective in alleviating inflammatory pain. However, the diversity of immune cells and the vast array of ligands they produce make it challenging to systematically map all neuroimmune pathways that contribute to inflammatory pain. Here, we constructed a comprehensive and updatable database of receptor-ligand pairs and complemented it with single-cell transcriptomics of immune cells and sensory neurons in three distinct inflammatory pain conditions, to generate injury-specific neuroimmune interactomes. We identified cell-type-specific neuroimmune axes that are common, as well as unique, to different injury types. This approach successfully predicts neuroimmune pathways with established roles in inflammatory pain as well as ones not previously described. We found that thrombospondin-1 produced by myeloid cells in all three conditions, is a negative regulator of nociceptor sensitization, revealing a non-canonical role of immune ligands as an endogenous reducer of peripheral sensitization. This computational platform lays the groundwork to identify novel mechanisms of immune-mediated peripheral sensitization and the specific disease contexts in which they act.

8.
Front Immunol ; 14: 1282859, 2023.
Article in English | MEDLINE | ID: mdl-38414974

ABSTRACT

Introduction: The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Methods: Extensive community work allowed an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework can link biomolecules from omics data analysis and computational modelling to dysregulated pathways in a cell-, tissue- or patient-specific manner. Drug repurposing using text mining and AI-assisted analysis identified potential drugs, chemicals and microRNAs that could target the identified key factors. Results: Results revealed drugs already tested for anti-COVID-19 efficacy, providing a mechanistic context for their mode of action, and drugs already in clinical trials for treating other diseases, never tested against COVID-19. Discussion: The key advance is that the proposed framework is versatile and expandable, offering a significant upgrade in the arsenal for virus-host interactions and other complex pathologies.


Subject(s)
COVID-19 , Humans , SARS-CoV-2 , Drug Repositioning , Systems Biology , Computer Simulation
9.
Sci Data ; 9(1): 714, 2022 11 19.
Article in English | MEDLINE | ID: mdl-36402838

ABSTRACT

The standardized identification of biomedical entities is a cornerstone of interoperability, reuse, and data integration in the life sciences. Several registries have been developed to catalog resources maintaining identifiers for biomedical entities such as small molecules, proteins, cell lines, and clinical trials. However, existing registries have struggled to provide sufficient coverage and metadata standards that meet the evolving needs of modern life sciences researchers. Here, we introduce the Bioregistry, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries. The Bioregistry addresses the need for a sustainable registry by leveraging public infrastructure and automation, and employing a progressive governance model centered around open code and open data to foster community contribution. The Bioregistry can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse. The Bioregistry can be accessed through https://bioregistry.io and its source code and data are available under the MIT and CC0 Licenses at https://github.com/biopragmatics/bioregistry .

11.
Database (Oxford) ; 20222022 08 12.
Article in English | MEDLINE | ID: mdl-35961013

ABSTRACT

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.


Subject(s)
Genomics , Proteins , Base Sequence , Computational Biology , Genome , Molecular Sequence Annotation
12.
Database (Oxford) ; 20222022 05 25.
Article in English | MEDLINE | ID: mdl-35616100

ABSTRACT

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.


Subject(s)
Metadata , Semantic Web , Data Management , Databases, Factual , Workflow
13.
Environ Health Perspect ; 130(3): 37002, 2022 03.
Article in English | MEDLINE | ID: mdl-35238605

ABSTRACT

BACKGROUND: Mechanistic data is increasingly used in hazard identification of chemicals. However, the volume of data is large, challenging the efficient identification and clustering of relevant data. OBJECTIVES: We investigated whether evidence identification for hazard assessment can become more efficient and informed through an automated approach that combines machine reading of publications with network visualization tools. METHODS: We chose 13 chemicals that were evaluated by the International Agency for Research on Cancer (IARC) Monographs program incorporating the key characteristics of carcinogens (KCCs) approach. Using established literature search terms for KCCs, we retrieved and analyzed literature using Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA combines large-scale literature processing with pathway databases and extracts relationships between biomolecules, bioprocesses, and chemicals into statements (e.g., "benzene activates DNA damage"). These statements were subsequently assembled into networks and compared with the KCC evaluation by the IARC, to evaluate the informativeness of our approach. RESULTS: We found, in general, larger networks for those chemicals which the IARC has evaluated the evidence to be strong for KCC induction. Larger networks were not directly linked to publication count, given that we retrieved small networks for several chemicals with little support for KCC activation according to the IARC, despite the significant volume of literature for these specific chemicals. In addition, interpreting networks for genotoxicity and DNA repair showed concordance with the IARC KCC evaluation. DISCUSSION: Our method is an automated approach to condense mechanistic literature into searchable and interpretable networks based on an a priori ontology. The approach is no replacement of expert evaluation but, instead, provides an informed structure for experts to quickly identify which statements are made in which papers and how these could connect. We focused on the KCCs because these are supported by well-described search terms. The method needs to be tested in other frameworks as well to demonstrate its generalizability. https://doi.org/10.1289/EHP9112.


Subject(s)
Carcinogens , Neoplasms , Benzene , Carcinogens/toxicity , Databases, Factual , Humans , Neoplasms/chemically induced , Neoplasms/epidemiology , Risk Assessment
14.
Bioinformatics ; 38(6): 1648-1656, 2022 03 04.
Article in English | MEDLINE | ID: mdl-34986221

ABSTRACT

MOTIVATION: The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. RESULTS: To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. AVAILABILITY AND IMPLEMENTATION: We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Pattern Recognition, Automated , Software , Machine Learning , Natural Language Processing , Publications
15.
Bioinform Adv ; 2(1): vbac034, 2022.
Article in English | MEDLINE | ID: mdl-36699362

ABSTRACT

Summary: Gilda is a software tool and web service that implements a scored string matching algorithm for names and synonyms across entries in biomedical ontologies covering genes, proteins (and their families and complexes), small molecules, biological processes and diseases. Gilda integrates machine-learned disambiguation models to choose between ambiguous strings given relevant surrounding text as context, and supports species-prioritization in case of ambiguity. Availability and implementation: The Gilda web service is available at http://grounding.indra.bio with source code, documentation and tutorials available via https://github.com/indralab/gilda. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

16.
Elife ; 102021 12 03.
Article in English | MEDLINE | ID: mdl-34860157

ABSTRACT

Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.


Subject(s)
Computational Biology/methods , Genomics/methods , Computational Biology/instrumentation , Databases, Factual , Genomics/instrumentation , Pilot Projects
17.
Genome Biol ; 22(1): 55, 2021 02 02.
Article in English | MEDLINE | ID: mdl-33526072

ABSTRACT

A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk ( github.com/churchmanlab/genewalk ) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.


Subject(s)
Databases, Genetic , Gene Regulatory Networks , Animals , Biflavonoids , Brain , Gene Ontology , High-Throughput Nucleotide Sequencing , Humans , Mice , RNA-Seq , Transcriptome
19.
PLoS Comput Biol ; 16(5): e1007573, 2020 05.
Article in English | MEDLINE | ID: mdl-32365103

ABSTRACT

Biological systems are acknowledged to be robust to perturbations but a rigorous understanding of this has been elusive. In a mathematical model, perturbations often exert their effect through parameters, so sizes and shapes of parametric regions offer an integrated global estimate of robustness. Here, we explore this "parameter geography" for bistability in post-translational modification (PTM) systems. We use the previously developed "linear framework" for timescale separation to describe the steady-states of a two-site PTM system as the solutions of two polynomial equations in two variables, with eight non-dimensional parameters. Importantly, this approach allows us to accommodate enzyme mechanisms of arbitrary complexity beyond the conventional Michaelis-Menten scheme, which unrealistically forbids product rebinding. We further use the numerical algebraic geometry tools Bertini, Paramotopy, and alphaCertified to statistically assess the solutions to these equations at ∼109 parameter points in total. Subject to sampling limitations, we find no bistability when substrate amount is below a threshold relative to enzyme amounts. As substrate increases, the bistable region acquires 8-dimensional volume which increases in an apparently monotonic and sigmoidal manner towards saturation. The region remains connected but not convex, albeit with a high visibility ratio. Surprisingly, the saturating bistable region occupies a much smaller proportion of the sampling domain under mechanistic assumptions more realistic than the Michaelis-Menten scheme. We find that bistability is compromised by product rebinding and that unrealistic assumptions on enzyme mechanisms have obscured its parametric rarity. The apparent monotonic increase in volume of the bistable region remains perplexing because the region itself does not grow monotonically: parameter points can move back and forth between monostability and bistability. We suggest mathematical conjectures and questions arising from these findings. Advances in theory and software now permit insights into parameter geography to be uncovered by high-dimensional, data-centric analysis.


Subject(s)
Computational Biology/methods , Protein Processing, Post-Translational/physiology , Algorithms , Gene Expression/genetics , Gene Expression/physiology , Gene Regulatory Networks/genetics , Gene Regulatory Networks/physiology , Models, Biological , Models, Theoretical , Protein Processing, Post-Translational/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...