Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35453140

RESUMEN

Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.


Asunto(s)
Bases de Datos Factuales , Análisis Factorial , Estudios Longitudinales
2.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36384050

RESUMEN

Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug-disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug-disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.


Asunto(s)
Descubrimiento de Drogas , Reconocimiento de Normas Patrones Automatizadas , Conocimiento , Aprendizaje Automático , Programas Informáticos
3.
J Nat Prod ; 2024 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-38970498

RESUMEN

Natural products (NPs) or their derivatives represent a large proportion of drugs that successfully progress through clinical trials to approval. This study explores the presence of NPs in both early- and late-stage drug discovery to determine their success rate, and the factors or features of natural products that contribute to such success. As a proxy for early drug development stages, we analyzed patent applications over several decades, finding a consistent proportion of NP, NP-derived, and synthetic-compound-based patent documents, with the latter group outnumbering NP and NP-derived ones (approximately 77% vs 23%). We next assessed clinical trial data, where we observed a steady increase in NP and NP-derived compounds from clinical trial phases I to III (from approximately 35% in phase I to 45% in phase III), with an inverse trend observed in synthetics (from approximately 65% in phase I to 55% in phase III). Finally, in vitro and in silico toxicity studies revealed that NPs and their derivatives were less toxic alternatives to their synthetic counterparts. These discoveries offer valuable insights for successful NP-based drug development, highlighting the potential benefits of prioritizing NPs and their derivatives as starting points.

4.
BMC Bioinformatics ; 24(1): 207, 2023 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-37208587

RESUMEN

Better understanding the transcriptomic response produced by a compound perturbing its targets can shed light on the underlying biological processes regulated by the compound. However, establishing the relationship between the induced transcriptomic response and the target of a compound is non-trivial, partly because targets are rarely differentially expressed. Therefore, connecting both modalities requires orthogonal information (e.g., pathway or functional information). Here, we present a comprehensive study aimed at exploring this relationship by leveraging thousands of transcriptomic experiments and target data for over 2000 compounds. Firstly, we confirm that compound-target information does not correlate as expected with the transcriptomic signatures induced by a compound. However, we reveal how the concordance between both modalities increases by connecting pathway and target information. Additionally, we investigate whether compounds that target the same proteins induce a similar transcriptomic response and conversely, whether compounds with similar transcriptomic responses share the same target proteins. While our findings suggest that this is generally not the case, we did observe that compounds with similar transcriptomic profiles are more likely to share at least one protein target and common therapeutic applications. Finally, we demonstrate how to exploit the relationship between both modalities for mechanism of action deconvolution by presenting a case scenario involving a few compound pairs with high similarity.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Proteínas
5.
Bioinformatics ; 38(6): 1648-1656, 2022 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-34986221

RESUMEN

MOTIVATION: The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. RESULTS: To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. AVAILABILITY AND IMPLEMENTATION: We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Publicaciones
6.
Bioinformatics ; 38(15): 3850-3852, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35652780

RESUMEN

MOTIVATION: The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease etiology. However, achieving this requires a global integration of data across studies, which proves to be challenging given the lack of interoperability of cohort datasets. RESULTS: Here, we present the Data Steward Tool (DST), an application that allows for semi-automatic semantic integration of clinical data into ontologies and global data models and data standards. We demonstrate the applicability of the tool in the field of dementia research by establishing a Clinical Data Model (CDM) in this domain. The CDM currently consists of 277 common variables covering demographics (e.g. age and gender), diagnostics, neuropsychological tests and biomarker measurements. The DST combined with this disease-specific data model shows how interoperability between multiple, heterogeneous dementia datasets can be achieved. AVAILABILITY AND IMPLEMENTATION: The DST source code and Docker images are respectively available at https://github.com/SCAI-BIO/data-steward and https://hub.docker.com/r/phwegner/data-steward. Furthermore, the DST is hosted at https://data-steward.bio.scai.fraunhofer.de/data-steward. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Demencia , Semántica , Humanos , Programas Informáticos , Demencia/diagnóstico
7.
PLoS Comput Biol ; 18(2): e1009909, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35213534

RESUMEN

Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Transcriptoma , Algoritmos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Transcriptoma/genética
8.
Nucleic Acids Res ; 49(14): 7939-7953, 2021 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-34197603

RESUMEN

We attempt to address a key question in the joint analysis of transcriptomic data: can we correlate the patterns we observe in transcriptomic datasets to known interactions and pathway knowledge to broaden our understanding of disease pathophysiology? We present a systematic approach that sheds light on the patterns observed in hundreds of transcriptomic datasets from over sixty indications by using pathways and molecular interactions as a template. Our analysis employs transcriptomic datasets to construct dozens of disease specific co-expression networks, alongside a human protein-protein interactome network. Leveraging the interoperability between these two network templates, we explore patterns both common and particular to these diseases on three different levels. Firstly, at the node-level, we identify most and least common proteins across diseases and evaluate their consistency against the interactome as a proxy for their prevalence in the scientific literature. Secondly, we overlay both network templates to analyze common correlations and interactions across diseases at the edge-level. Thirdly, we explore the similarity between patterns observed at the disease-level and pathway knowledge to identify signatures associated with specific diseases and indication areas. Finally, we present a case scenario in schizophrenia, where we show how our approach can be used to investigate disease pathophysiology.


Asunto(s)
Enfermedad/genética , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad/genética , Transducción de Señal/genética , Transcriptoma/genética , Algoritmos , Análisis por Conglomerados , Humanos , Esquizofrenia/genética
9.
BMC Bioinformatics ; 23(1): 231, 2022 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-35705903

RESUMEN

Distinct gene expression patterns within cells are foundational for the diversity of functions and unique characteristics observed in specific contexts, such as human tissues and cell types. Though some biological processes commonly occur across contexts, by harnessing the vast amounts of available gene expression data, we can decipher the processes that are unique to a specific context. Therefore, with the goal of developing a portrait of context-specific patterns to better elucidate how they govern distinct biological processes, this work presents a large-scale exploration of transcriptomic signatures across three different contexts (i.e., tissues, cell types, and cell lines) by leveraging over 600 gene expression datasets categorized into 98 subcontexts. The strongest pairwise correlations between genes from these subcontexts are used for the construction of co-expression networks. Using a network-based approach, we then pinpoint patterns that are unique and common across these subcontexts. First, we focused on patterns at the level of individual nodes and evaluated their functional roles using a human protein-protein interactome as a referential network. Next, within each context, we systematically overlaid the co-expression networks to identify specific and shared correlations as well as relations already described in scientific literature. Additionally, in a pathway-level analysis, we overlaid node and edge sets from co-expression networks against pathway knowledge to identify biological processes that are related to specific subcontexts or groups of them. Finally, we have released our data and scripts at https://zenodo.org/record/5831786 and https://github.com/ContNeXt/ , respectively and developed ContNeXt ( https://contnext.scai.fraunhofer.de/ ), a web application to explore the networks generated in this work.


Asunto(s)
Redes Reguladoras de Genes , Transcriptoma , Perfilación de la Expresión Génica , Humanos , Programas Informáticos
10.
Bioinformatics ; 37(1): 137-139, 2021 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-33367476

RESUMEN

SUMMARY: High-throughput screening yields vast amounts of biological data which can be highly challenging to interpret. In response, knowledge-driven approaches emerged as possible solutions to analyze large datasets by leveraging prior knowledge of biomolecular interactions represented in the form of biological networks. Nonetheless, given their size and complexity, their manual investigation quickly becomes impractical. Thus, computational approaches, such as diffusion algorithms, are often employed to interpret and contextualize the results of high-throughput experiments. Here, we present MultiPaths, a framework consisting of two independent Python packages for network analysis. While the first package, DiffuPy, comprises numerous commonly used diffusion algorithms applicable to any generic network, the second, DiffuPath, enables the application of these algorithms on multi-layer biological networks. To facilitate its usability, the framework includes a command line interface, reproducible examples and documentation. To demonstrate the framework, we conducted several diffusion experiments on three independent multi-omics datasets over disparate networks generated from pathway databases, thus, highlighting the ability of multi-layer networks to integrate multiple modalities. Finally, the results of these experiments demonstrate how the generation of harmonized networks from disparate databases can improve predictive performance with respect to individual resources. AVAILABILITY AND IMPLEMENTATION: DiffuPy and DiffuPath are publicly available under the Apache License 2.0 at https://github.com/multipaths. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 37(19): 3311-3318, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33964127

RESUMEN

SUMMARY: As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation. AVAILABILITY AND IMPLEMENTATION: CLEP is available to the bioinformatics community as an open source Python package at https://github.com/hybrid-kg/clep under the Apache 2.0 License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Bioinformatics ; 37(9): 1332-1334, 2021 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-32976572

RESUMEN

SUMMARY: The COVID-19 crisis has elicited a global response by the scientific community that has led to a burst of publications on the pathophysiology of the virus. However, without coordinated efforts to organize this knowledge, it can remain hidden away from individual research groups. By extracting and formalizing this knowledge in a structured and computable form, as in the form of a knowledge graph, researchers can readily reason and analyze this information on a much larger scale. Here, we present the COVID-19 Knowledge Graph, an expansive cause-and-effect network constructed from scientific literature on the new coronavirus that aims to provide a comprehensive view of its pathophysiology. To make this resource available to the research community and facilitate its exploration and analysis, we also implemented a web application and released the KG in multiple standard formats. AVAILABILITY AND IMPLEMENTATION: The COVID-19 Knowledge Graph is publicly available under CC-0 license at https://github.com/covid19kg and https://bikmi.covid19-knowledgespace.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , Programas Informáticos , Humanos , Reconocimiento de Normas Patrones Automatizadas , Publicaciones , SARS-CoV-2
13.
PLoS Comput Biol ; 16(12): e1008464, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33264280

RESUMEN

Elucidating the causal mechanisms responsible for disease can reveal potential therapeutic targets for pharmacological intervention and, accordingly, guide drug repositioning and discovery. In essence, the topology of a network can reveal the impact a drug candidate may have on a given biological state, leading the way for enhanced disease characterization and the design of advanced therapies. Network-based approaches, in particular, are highly suited for these purposes as they hold the capacity to identify the molecular mechanisms underlying disease. Here, we present drug2ways, a novel methodology that leverages multimodal causal networks for predicting drug candidates. Drug2ways implements an efficient algorithm which reasons over causal paths in large-scale biological networks to propose drug candidates for a given disease. We validate our approach using clinical trial information and demonstrate how drug2ways can be used for multiple applications to identify: i) single-target drug candidates, ii) candidates with polypharmacological properties that can optimize multiple targets, and iii) candidates for combination therapy. Finally, we make drug2ways available to the scientific community as a Python package that enables conducting these applications on multiple standard network formats.


Asunto(s)
Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Modelos Biológicos , Algoritmos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Simulación por Computador , Quimioterapia , Humanos , Neoplasias/tratamiento farmacológico , Fenotipo , Polifarmacología
14.
BMC Bioinformatics ; 21(1): 231, 2020 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-32503412

RESUMEN

BACKGROUND: During the last decade, there has been a surge towards computational drug repositioning owing to constantly increasing -omics data in the biomedical research field. While numerous existing methods focus on the integration of heterogeneous data to propose candidate drugs, it is still challenging to substantiate their results with mechanistic insights of these candidate drugs. Therefore, there is a need for more innovative and efficient methods which can enable better integration of data and knowledge for drug repositioning. RESULTS: Here, we present a customizable workflow (PS4DR) which not only integrates high-throughput data such as genome-wide association study (GWAS) data and gene expression signatures from disease and drug perturbations but also takes pathway knowledge into consideration to predict drug candidates for repositioning. We have collected and integrated publicly available GWAS data and gene expression signatures for several diseases and hundreds of FDA-approved drugs or those under clinical trial in this study. Additionally, different pathway databases were used for mechanistic knowledge integration in the workflow. Using this systematic consolidation of data and knowledge, the workflow computes pathway signatures that assist in the prediction of new indications for approved and investigational drugs. CONCLUSION: We showcase PS4DR with applications demonstrating how this tool can be used for repositioning and identifying new drugs as well as proposing drugs that can simulate disease dysregulations. We were able to validate our workflow by demonstrating its capability to predict FDA-approved drugs for their known indications for several diseases. Further, PS4DR returned many potential drug candidates for repositioning that were backed up by epidemiological evidence extracted from scientific literature. Source code is freely available at https://github.com/ps4dr/ps4dr.


Asunto(s)
Preparaciones Farmacéuticas/metabolismo , Interfaz Usuario-Computador , Ensayos Clínicos como Asunto , Biología Computacional/métodos , Reposicionamiento de Medicamentos , Estudio de Asociación del Genoma Completo , Humanos , Transcriptoma , Flujo de Trabajo
15.
Bioinformatics ; 35(18): 3538-3540, 2019 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-30768158

RESUMEN

SUMMARY: Knowledge graph embeddings (KGEs) have received significant attention in other domains due to their ability to predict links and create dense representations for graphs' nodes and edges. However, the software ecosystem for their application to bioinformatics remains limited and inaccessible for users without expertise in programing and machine learning. Therefore, we developed BioKEEN (Biological KnowlEdge EmbeddiNgs) and PyKEEN (Python KnowlEdge EmbeddiNgs) to facilitate their easy use through an interactive command line interface. Finally, we present a case study in which we used a novel biological pathway mapping resource to predict links that represent pathway crosstalks and hierarchies. AVAILABILITY AND IMPLEMENTATION: BioKEEN and PyKEEN are open source Python packages publicly available under the MIT License at https://github.com/SmartDataAnalytics/BioKEEN and https://github.com/SmartDataAnalytics/PyKEEN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos , Ecosistema , Biblioteca de Genes , Aprendizaje Automático
16.
BMC Bioinformatics ; 20(1): 243, 2019 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-31092193

RESUMEN

BACKGROUND: The complexity of representing biological systems is compounded by an ever-expanding body of knowledge emerging from multi-omics experiments. A number of pathway databases have facilitated pathway-centric approaches that assist in the interpretation of molecular signatures yielded by these experiments. However, the lack of interoperability between pathway databases has hindered the ability to harmonize these resources and to exploit their consolidated knowledge. Such a unification of pathway knowledge is imperative in enhancing the comprehension and modeling of biological abstractions. RESULTS: Here, we present PathMe, a Python package that transforms pathway knowledge from three major pathway databases into a unified abstraction using Biological Expression Language as the pivotal, integrative schema. PathMe is complemented by a novel web application (freely available at https://pathme.scai.fraunhofer.de/ ) which allows users to comprehensively explore pathway crosstalk and compare areas of consensus and discrepancies. CONCLUSIONS: This work has harmonized three major pathway databases and transformed them into a unified schema in order to gain a holistic picture of pathway knowledge. We demonstrate the utility of the PathMe framework in: i) integrating pathway landscapes at the database level, ii) comparing the degree of consensus at the pathway level, and iii) exploring pathway crosstalk and investigating consensus at the molecular level.


Asunto(s)
Transducción de Señal , Programas Informáticos , Biología Computacional , Bases de Datos como Asunto , Bases de Datos Factuales , Humanos , Serina-Treonina Quinasas TOR/metabolismo
17.
Bioinformatics ; 33(22): 3679-3681, 2017 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-28651363

RESUMEN

MOTIVATION: The concept of a 'mechanism-based taxonomy of human disease' is currently replacing the outdated paradigm of diseases classified by clinical appearance. We have tackled the paradigm of mechanism-based patient subgroup identification in the challenging area of research on neurodegenerative diseases. RESULTS: We have developed a knowledge base representing essential pathophysiology mechanisms of neurodegenerative diseases. Together with dedicated algorithms, this knowledge base forms the basis for a 'mechanism-enrichment server' that supports the mechanistic interpretation of multiscale, multimodal clinical data. AVAILABILITY AND IMPLEMENTATION: NeuroMMSig is available at http://neurommsig.scai.fraunhofer.de/. CONTACT: martin.hofmann-apitius@scai.fraunhofer.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Bases del Conocimiento , Enfermedades Neurodegenerativas/metabolismo , Enfermedades Neurodegenerativas/fisiopatología , Humanos , Internet , Modelos Biológicos , Enfermedades Neurodegenerativas/genética , Programas Informáticos
18.
J Am Soc Mass Spectrom ; 35(2): 266-274, 2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38271611

RESUMEN

Calculating spectral similarity is a fundamental step in MS/MS data analysis in untargeted metabolomics experiments, as it facilitates the identification of related spectra and the annotation of compounds. To improve matching accuracy when querying an experimental mass spectrum against a spectral library, previous approaches have proposed increasing peak intensities for high m/z ranges. These high m/z values tend to be smaller in magnitude, yet they offer more crucial information for identifying the chemical structure. Here, we evaluate the impact of using these weights for identifying structurally related compounds and mass spectral library searches. Additionally, we propose a weighting approach that (i) takes into account the frequency of the m/z values within a spectral library in order to assign higher importance to the most common peaks and (ii) increases the intensity of lower peaks, similar to previous approaches. To demonstrate our approach, we applied weighting preprocessing to modified cosine, entropy, and fidelity distance metrics and benchmarked it against previously reported weights. Our results demonstrate how weighting-based preprocessing can assist in annotating the structure of unknown spectra as well as identifying structurally similar compounds. Finally, we examined scenarios in which the utilization of weights resulted in diminished performance, pinpointing spectral features where the application of weights might be detrimental.


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Metabolómica/métodos , Iones
19.
Heliyon ; 9(11): e21502, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38027969

RESUMEN

Objectives: Knowledge graphs and ontologies in the biomedical domain provide rich contextual knowledge for a variety of challenges. Employing that for knowledge-driven NLP tasks such as gene-disease association prediction represents a promising way to increase the predictive power of a model. Methods: We investigated the power of infusing the embedding of two aligned ontologies as prior knowledge to the NLP models. We evaluated the performance of different models on some large-scale gene-disease association datasets and compared it with a model without incorporating contextualized knowledge (BERT). Results: The experiments demonstrated that the knowledge-infused model slightly outperforms BERT by creating a small number of bridges. Thus, indicating that incorporating cross-references across ontologies can enhance the performance of base models without the need for more complex and costly training. However, further research is needed to explore the generalizability of the model. We expected that adding more bridges would bring further improvement based on the trend we observed in the experiments. In addition, the use of state-of-the-art knowledge graph embedding methods on a joint graph from connecting OGG and DOID with bridges also yielded promising results. Conclusion: Our work shows that allowing language models to leverage structured knowledge from ontologies does come with clear advantages in the performance. Besides, the annotation stage brought out in this paper is constrained in reasonable complexity.

20.
J Cheminform ; 15(1): 107, 2023 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-37950325

RESUMEN

Plants are one of the primary sources of natural products for drug development. However, despite centuries of research, only a limited region of the phytochemical space has been studied. To understand the scope of what is explored versus unexplored in the phytochemical space, we begin by reconstructing the known chemical space of the plant kingdom, mapping the distribution of secondary metabolites, chemical classes, and plants traditionally used for medicinal purposes (i.e., medicinal plants) across various levels of the taxonomy. We identify hotspot taxonomic clades occupied by a large proportion of medicinal plants and characterized secondary metabolites, as well as clades requiring further characterization with regard to their chemical composition. In a complementary analysis, we build a chemotaxonomy which has a high level of concordance with the taxonomy at the genus level, highlighting the close relationship between chemical profiles and evolutionary relationships within the plant kingdom. Next, we delve into regions of the phytochemical space with known bioactivity that have been used in modern drug discovery. While we find that the vast majority of approved drugs from phytochemicals are derived from known medicinal plants, we also show that medicinal and non-medicinal plants do not occupy distinct regions of the known phytochemical landscape and their phytochemicals exhibit properties similar to bioactive compounds. Moreover, we also reveal that only a few thousand phytochemicals have been screened for bioactivity and that there are hundreds of known bioactive compounds present in both medicinal and non-medicinal plants, suggesting that non-medicinal plants also have potential therapeutic applications. Overall, these results support the hypothesis that there are many plants with medicinal properties awaiting discovery.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA