RESUMO
Natural products (NPs) or their derivatives represent a large proportion of drugs that successfully progress through clinical trials to approval. This study explores the presence of NPs in both early- and late-stage drug discovery to determine their success rate, and the factors or features of natural products that contribute to such success. As a proxy for early drug development stages, we analyzed patent applications over several decades, finding a consistent proportion of NP, NP-derived, and synthetic-compound-based patent documents, with the latter group outnumbering NP and NP-derived ones (approximately 77% vs 23%). We next assessed clinical trial data, where we observed a steady increase in NP and NP-derived compounds from clinical trial phases I to III (from approximately 35% in phase I to 45% in phase III), with an inverse trend observed in synthetics (from approximately 65% in phase I to 55% in phase III). Finally, in vitro and in silico toxicity studies revealed that NPs and their derivatives were less toxic alternatives to their synthetic counterparts. These discoveries offer valuable insights for successful NP-based drug development, highlighting the potential benefits of prioritizing NPs and their derivatives as starting points.
Assuntos
Produtos Biológicos , Desenvolvimento de Medicamentos , Produtos Biológicos/química , Produtos Biológicos/farmacologia , Humanos , Ensaios Clínicos como Assunto , Descoberta de Drogas , Estrutura MolecularRESUMO
Plants are one of the primary sources of natural products for drug development. However, despite centuries of research, only a limited region of the phytochemical space has been studied. To understand the scope of what is explored versus unexplored in the phytochemical space, we begin by reconstructing the known chemical space of the plant kingdom, mapping the distribution of secondary metabolites, chemical classes, and plants traditionally used for medicinal purposes (i.e., medicinal plants) across various levels of the taxonomy. We identify hotspot taxonomic clades occupied by a large proportion of medicinal plants and characterized secondary metabolites, as well as clades requiring further characterization with regard to their chemical composition. In a complementary analysis, we build a chemotaxonomy which has a high level of concordance with the taxonomy at the genus level, highlighting the close relationship between chemical profiles and evolutionary relationships within the plant kingdom. Next, we delve into regions of the phytochemical space with known bioactivity that have been used in modern drug discovery. While we find that the vast majority of approved drugs from phytochemicals are derived from known medicinal plants, we also show that medicinal and non-medicinal plants do not occupy distinct regions of the known phytochemical landscape and their phytochemicals exhibit properties similar to bioactive compounds. Moreover, we also reveal that only a few thousand phytochemicals have been screened for bioactivity and that there are hundreds of known bioactive compounds present in both medicinal and non-medicinal plants, suggesting that non-medicinal plants also have potential therapeutic applications. Overall, these results support the hypothesis that there are many plants with medicinal properties awaiting discovery.
RESUMO
For millennia, numerous cultures and civilizations have relied on traditional remedies derived from plants to treat a wide range of conditions and ailments. Here, we systematically analyzed ethnobotanical patterns across taxonomically related plants, demonstrating that congeneric medicinal plants are more likely to be used for treating similar indications. Next, we reconstructed the phytochemical space covered by medicinal plants to reveal that (i) taxonomically related medicinal plants cover a similar phytochemical space, and (ii) chemical similarity correlates with similar therapeutic usage. Lastly, we present several case scenarios illustrating how mining this information can be used for drug discovery applications, including: (i) investigating taxonomic hotspots around particular indications, (ii) exploring shared patterns of congeneric plants located in different geographic areas, but which have been used to treat the same indications, and (iii) showing the concordance between ethnobotanical patterns among non-taxonomically related plants and the presence of shared bioactive phytochemicals.
RESUMO
Schizophrenia and bipolar disorder are characterized by highly similar neuropsychological signatures, implying shared neurobiological mechanisms between these two disorders. These disorders also have comorbidities, such as type 2 diabetes mellitus (T2DM). To date, an understanding of the mechanisms that mediate the link between these two disorders remains incomplete. In this work, we identify and investigate shared patterns across multiple schizophrenia, bipolar disorder and T2DM gene expression datasets through multiple strategies. Firstly, we investigate dysregulation patterns at the gene-level and compare our findings against disease-specific knowledge graphs (KGs). Secondly, we analyze the concordance of co-expression patterns across datasets to identify disease-specific as well as common pathways. Thirdly, we examine enriched pathways across datasets and disorders to identify common biological mechanisms between them. Lastly, we investigate the correspondence of shared genetic variants between these two disorders and T2DM as well as the disease-specific KGs. In conclusion, our work reveals several shared candidate genes and pathways, particularly those related to the immune system, such as TNF signaling pathway, IL-17 signaling pathway and NF-kappa B signaling pathway and nervous system, such as dopaminergic synapse and GABAergic synapse, which we propose mediate the link between schizophrenia and bipolar disorder and its shared comorbidity, T2DM.
Assuntos
Transtorno Bipolar , Diabetes Mellitus Tipo 2 , Esquizofrenia , Humanos , Transtorno Bipolar/psicologia , Esquizofrenia/epidemiologia , Esquizofrenia/genética , Comorbidade , Transdução de SinaisRESUMO
Excess labile heme, occurring under hemolytic conditions, displays a versatile modulator in the blood coagulation system. As such, heme provokes prothrombotic states, either by binding to plasma proteins or through interaction with participating cell types. However, despite several independent reports on these effects, apparently contradictory observations and significant knowledge gaps characterize this relationship, which hampers a complete understanding of heme-driven coagulopathies and the development of suitable and specific treatment options. Thus, the computational exploration of the complex network of heme-triggered effects in the blood coagulation system is presented herein. Combining hemostasis- and heme-specific terminology, the knowledge available thus far was curated and modeled in a mechanistic interactome. Further, these data were incorporated in the earlier established heme knowledge graph, "HemeKG", to better comprehend the knowledge surrounding heme biology. Finally, a pathway enrichment analysis of these data provided deep insights into so far unknown links and novel experimental targets within the blood coagulation cascade and platelet activation pathways for further investigation of the prothrombotic nature of heme. In summary, this study allows, for the first time, a detailed network analysis of the effects of heme in the blood coagulation system.
RESUMO
Distinct gene expression patterns within cells are foundational for the diversity of functions and unique characteristics observed in specific contexts, such as human tissues and cell types. Though some biological processes commonly occur across contexts, by harnessing the vast amounts of available gene expression data, we can decipher the processes that are unique to a specific context. Therefore, with the goal of developing a portrait of context-specific patterns to better elucidate how they govern distinct biological processes, this work presents a large-scale exploration of transcriptomic signatures across three different contexts (i.e., tissues, cell types, and cell lines) by leveraging over 600 gene expression datasets categorized into 98 subcontexts. The strongest pairwise correlations between genes from these subcontexts are used for the construction of co-expression networks. Using a network-based approach, we then pinpoint patterns that are unique and common across these subcontexts. First, we focused on patterns at the level of individual nodes and evaluated their functional roles using a human protein-protein interactome as a referential network. Next, within each context, we systematically overlaid the co-expression networks to identify specific and shared correlations as well as relations already described in scientific literature. Additionally, in a pathway-level analysis, we overlaid node and edge sets from co-expression networks against pathway knowledge to identify biological processes that are related to specific subcontexts or groups of them. Finally, we have released our data and scripts at https://zenodo.org/record/5831786 and https://github.com/ContNeXt/ , respectively and developed ContNeXt ( https://contnext.scai.fraunhofer.de/ ), a web application to explore the networks generated in this work.
Assuntos
Redes Reguladoras de Genes , Transcriptoma , Perfilação da Expressão Gênica , Humanos , SoftwareRESUMO
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Assuntos
Bases de Dados Factuais , Análise Fatorial , Estudos LongitudinaisRESUMO
Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.
Assuntos
Reconhecimento Automatizado de Padrão , Transcriptoma , Algoritmos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Transcriptoma/genéticaRESUMO
The utility of pathway signatures lies in their capability to determine whether a specific pathway or biological process is dysregulated in a given patient. These signatures have been widely used in machine learning (ML) methods for a variety of applications including precision medicine, drug repurposing, and drug discovery. In this work, we leverage highly predictive ML models for drug response simulation in individual patients by calibrating the pathway activity scores of disease samples. Using these ML models and an intuitive scoring algorithm to modify the signatures of patients, we evaluate whether a given sample that was formerly classified as diseased, could be predicted as normal following drug treatment simulation. We then use this technique as a proxy for the identification of potential drug candidates. Furthermore, we demonstrate the ability of our methodology to successfully identify approved and clinically investigated drugs for four different cancers, outperforming six comparable state-of-the-art methods. We also show how this approach can deconvolute a drugs' mechanism of action and propose combination therapies. Taken together, our methodology could be promising to support clinical decision-making in personalized medicine by simulating a drugs' effect on a given patient.
Assuntos
Fenômenos Biológicos , Aprendizado de Máquina , Algoritmos , Simulação por Computador , Humanos , Medicina de PrecisãoRESUMO
The past decades have brought a steady growth of pathway databases and enrichment methods. However, the advent of pathway data has not been accompanied by an improvement in interoperability across databases, hampering the use of pathway knowledge from multiple databases for enrichment analysis. While integrative databases have attempted to address this issue, they often do not account for redundant information across resources. Furthermore, the majority of studies that employ pathway enrichment analysis still rely upon a single database or enrichment method, though the use of another could yield differing results. These shortcomings call for approaches that investigate the differences and agreements across databases and methods as their selection in the design of a pathway analysis can be a crucial step in ensuring the results of such an analysis are meaningful. Here we present DecoPath, a web application to assist in the interpretation of the results of pathway enrichment analysis. DecoPath provides an ecosystem to run enrichment analysis or directly upload results and facilitate the interpretation of results with custom visualizations that highlight the consensus and/or discrepancies at the pathway- and gene-levels. DecoPath is available at https://decopath.scai.fraunhofer.de, and its source code and documentation can be found on GitHub at https://github.com/DecoPath/DecoPath.
RESUMO
We attempt to address a key question in the joint analysis of transcriptomic data: can we correlate the patterns we observe in transcriptomic datasets to known interactions and pathway knowledge to broaden our understanding of disease pathophysiology? We present a systematic approach that sheds light on the patterns observed in hundreds of transcriptomic datasets from over sixty indications by using pathways and molecular interactions as a template. Our analysis employs transcriptomic datasets to construct dozens of disease specific co-expression networks, alongside a human protein-protein interactome network. Leveraging the interoperability between these two network templates, we explore patterns both common and particular to these diseases on three different levels. Firstly, at the node-level, we identify most and least common proteins across diseases and evaluate their consistency against the interactome as a proxy for their prevalence in the scientific literature. Secondly, we overlay both network templates to analyze common correlations and interactions across diseases at the edge-level. Thirdly, we explore the similarity between patterns observed at the disease-level and pathway knowledge to identify signatures associated with specific diseases and indication areas. Finally, we present a case scenario in schizophrenia, where we show how our approach can be used to investigate disease pathophysiology.
Assuntos
Doença/genética , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Predisposição Genética para Doença/genética , Transdução de Sinais/genética , Transcriptoma/genética , Algoritmos , Análise por Conglomerados , Humanos , Esquizofrenia/genéticaRESUMO
SUMMARY: As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation. AVAILABILITY AND IMPLEMENTATION: CLEP is available to the bioinformatics community as an open source Python package at https://github.com/hybrid-kg/clep under the Apache 2.0 License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
BACKGROUND: Neuroimaging markers provide quantitative insight into brain structure and function in neurodegenerative diseases, such as Alzheimer's disease, where we lack mechanistic insights to explain pathophysiology. These mechanisms are often mediated by genes and genetic variations and are often studied through the lens of genome-wide association studies. Linking these two disparate layers (i.e., imaging and genetic variation) through causal relationships between biological entities involved in the disease's etiology would pave the way to large-scale mechanistic reasoning and interpretation. OBJECTIVE: We explore how genetic variants may lead to functional alterations of intermediate molecular traits, which can further impact neuroimaging hallmarks over a series of biological processes across multiple scales. METHODS: We present an approach in which knowledge pertaining to single nucleotide polymorphisms and imaging readouts is extracted from the literature, encoded in Biological Expression Language, and used in a novel workflow to assist in the functional interpretation of SNPs in a clinical context. RESULTS: We demonstrate our approach in a case scenario which proposes KANSL1 as a candidate gene that accounts for the clinically reported correlation between the incidence of the genetic variants and hippocampal atrophy. We find that the workflow prioritizes multiple mechanisms reported in the literature through which KANSL1 may have an impact on hippocampal atrophy such as through the dysregulation of cell proliferation, synaptic plasticity, and metabolic processes. CONCLUSION: We have presented an approach that enables pinpointing relevant genetic variants as well as investigating their functional role in biological processes spanning across several, diverse biological scales.
Assuntos
Doença de Alzheimer/genética , Predisposição Genética para Doença/genética , Neuroimagem , Biologia de Sistemas , Doença de Alzheimer/diagnóstico por imagem , Biomarcadores/metabolismo , Encéfalo/metabolismo , Encéfalo/patologia , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Biologia de Sistemas/métodosRESUMO
SUMMARY: High-throughput screening yields vast amounts of biological data which can be highly challenging to interpret. In response, knowledge-driven approaches emerged as possible solutions to analyze large datasets by leveraging prior knowledge of biomolecular interactions represented in the form of biological networks. Nonetheless, given their size and complexity, their manual investigation quickly becomes impractical. Thus, computational approaches, such as diffusion algorithms, are often employed to interpret and contextualize the results of high-throughput experiments. Here, we present MultiPaths, a framework consisting of two independent Python packages for network analysis. While the first package, DiffuPy, comprises numerous commonly used diffusion algorithms applicable to any generic network, the second, DiffuPath, enables the application of these algorithms on multi-layer biological networks. To facilitate its usability, the framework includes a command line interface, reproducible examples and documentation. To demonstrate the framework, we conducted several diffusion experiments on three independent multi-omics datasets over disparate networks generated from pathway databases, thus, highlighting the ability of multi-layer networks to integrate multiple modalities. Finally, the results of these experiments demonstrate how the generation of harmonized networks from disparate databases can improve predictive performance with respect to individual resources. AVAILABILITY AND IMPLEMENTATION: DiffuPy and DiffuPath are publicly available under the Apache License 2.0 at https://github.com/multipaths. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Elucidating the causal mechanisms responsible for disease can reveal potential therapeutic targets for pharmacological intervention and, accordingly, guide drug repositioning and discovery. In essence, the topology of a network can reveal the impact a drug candidate may have on a given biological state, leading the way for enhanced disease characterization and the design of advanced therapies. Network-based approaches, in particular, are highly suited for these purposes as they hold the capacity to identify the molecular mechanisms underlying disease. Here, we present drug2ways, a novel methodology that leverages multimodal causal networks for predicting drug candidates. Drug2ways implements an efficient algorithm which reasons over causal paths in large-scale biological networks to propose drug candidates for a given disease. We validate our approach using clinical trial information and demonstrate how drug2ways can be used for multiple applications to identify: i) single-target drug candidates, ii) candidates with polypharmacological properties that can optimize multiple targets, and iii) candidates for combination therapy. Finally, we make drug2ways available to the scientific community as a Python package that enables conducting these applications on multiple standard network formats.
Assuntos
Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Modelos Biológicos , Algoritmos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Simulação por Computador , Tratamento Farmacológico , Humanos , Neoplasias/tratamento farmacológico , Fenótipo , PolifarmacologiaRESUMO
[This corrects the article DOI: 10.3389/fgene.2019.01203.].
RESUMO
PURPOSE OF REVIEW: With the advancement of computational approaches and abundance of biomedical data, a broad range of neurodegenerative disease models have been developed. In this review, we argue that computational models can be both relevant and useful in neurodegenerative disease research and although the current established models have limitations in clinical practice, artificial intelligence has the potential to overcome deficiencies encountered by these models, which in turn can improve our understanding of disease. RECENT FINDINGS: In recent years, diverse computational approaches have been used to shed light on different aspects of neurodegenerative disease models. For example, linear and nonlinear mixed models, self-modeling regression, differential equation models, and event-based models have been applied to provide a better understanding of disease progression patterns and biomarker trajectories. Additionally, the Cox-regression technique, Bayesian network models, and deep-learning-based approaches have been used to predict the probability of future incidence of disease, whereas nonnegative matrix factorization, nonhierarchical cluster analysis, hierarchical agglomerative clustering, and deep-learning-based approaches have been employed to stratify patients based on their disease subtypes. Furthermore, the interpretation of neurodegenerative disease data is possible through knowledge-based models which use prior knowledge to complement data-driven analyses. These knowledge-based models can include pathway-centric approaches to establish pathways perturbed in a given condition, as well as disease-specific knowledge maps, which elucidate the mechanisms involved in a given disease. Collectively, these established models have revealed high granular details and insights into neurodegenerative disease models. SUMMARY: In conjunction with increasingly advanced computational approaches, a wide spectrum of neurodegenerative disease models, which can be broadly categorized into data-driven and knowledge-driven, have been developed. We review the state of the art data and knowledge-driven models and discuss the necessary steps which are vital to bring them into clinical application.
Assuntos
Ciência de Dados , Doenças Neurodegenerativas/epidemiologia , Algoritmos , Humanos , Modelos EstatísticosRESUMO
Pathway-centric approaches are widely used to interpret and contextualize -omics data. However, databases contain different representations of the same biological pathway, which may lead to different results of statistical enrichment analysis and predictive models in the context of precision medicine. We have performed an in-depth benchmarking of the impact of pathway database choice on statistical enrichment analysis and predictive modeling. We analyzed five cancer datasets using three major pathway databases and developed an approach to merge several databases into a single integrative one: MPath. Our results show that equivalent pathways from different databases yield disparate results in statistical enrichment analysis. Moreover, we observed a significant dataset-dependent impact on the performance of machine learning models on different prediction tasks. In some cases, MPath significantly improved prediction performance and also reduced the variance of prediction performances. Furthermore, MPath yielded more consistent and biologically plausible results in statistical enrichment analyses. In summary, this benchmarking study demonstrates that pathway database choice can influence the results of statistical enrichment analysis and predictive modeling. Therefore, we recommend the use of multiple pathway databases or integrative ones.
RESUMO
BACKGROUND: The complexity of representing biological systems is compounded by an ever-expanding body of knowledge emerging from multi-omics experiments. A number of pathway databases have facilitated pathway-centric approaches that assist in the interpretation of molecular signatures yielded by these experiments. However, the lack of interoperability between pathway databases has hindered the ability to harmonize these resources and to exploit their consolidated knowledge. Such a unification of pathway knowledge is imperative in enhancing the comprehension and modeling of biological abstractions. RESULTS: Here, we present PathMe, a Python package that transforms pathway knowledge from three major pathway databases into a unified abstraction using Biological Expression Language as the pivotal, integrative schema. PathMe is complemented by a novel web application (freely available at https://pathme.scai.fraunhofer.de/ ) which allows users to comprehensively explore pathway crosstalk and compare areas of consensus and discrepancies. CONCLUSIONS: This work has harmonized three major pathway databases and transformed them into a unified schema in order to gain a holistic picture of pathway knowledge. We demonstrate the utility of the PathMe framework in: i) integrating pathway landscapes at the database level, ii) comparing the degree of consensus at the pathway level, and iii) exploring pathway crosstalk and investigating consensus at the molecular level.