Búsqueda | BVS Nicaragua

1.

Current and future directions in network biology.

Zitnik, Marinka; Li, Michelle M; Wells, Aydin; Glass, Kimberly; Morselli Gysi, Deisy; Krishnan, Arjun; Murali, T M; Radivojac, Predrag; Roy, Sushmita; Baudot, Anaïs; Bozdag, Serdar; Chen, Danny Z; Cowen, Lenore; Devkota, Kapil; Gitter, Anthony; Gosline, Sara J C; Gu, Pengfei; Guzzi, Pietro H; Huang, Heng; Jiang, Meng; Kesimoglu, Ziynet Nesibe; Koyuturk, Mehmet; Ma, Jian; Pico, Alexander R; Przulj, Natasa; Przytycka, Teresa M; Raphael, Benjamin J; Ritz, Anna; Sharan, Roded; Shen, Yang; Singh, Mona; Slonim, Donna K; Tong, Hanghang; Yang, Xinan Holly; Yoon, Byung-Jun; Yu, Haiyuan; Milenkovic, Tijana.

Bioinform Adv ; 4(1): vbae099, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39143982

RESUMEN

Summary: Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation: Not applicable.

2.

A foundation model for clinician-centered drug repurposing.

Huang, Kexin; Chandak, Payal; Wang, Qianwen; Havaldar, Shreyas; Vaid, Akhil; Leskovec, Jure; Nadkarni, Girish; Glicksberg, Benjamin S; Gehlenborg, Nils; Zitnik, Marinka.

medRxiv ; 2024 Aug 07.

Artículo en Inglés | MEDLINE | ID: mdl-39148855

RESUMEN

Drug repurposing - identifying new therapeutic uses for approved drugs - is often serendipitous and opportunistic, expanding the use of drugs for new diseases. The clinical utility of drug repurposing AI models remains limited because the models focus narrowly on diseases for which some drugs already exist. Here, we introduce TXGNN, a graph foundation model for zero-shot drug repurposing, identifying therapeutic candidates even for diseases with limited treatment options or no existing drugs. Trained on a medical knowledge graph, TXGNN utilizes a graph neural network and metric-learning module to rank drugs as potential indications and contraindications across 17,080 diseases. When benchmarked against eight methods, TXGNN improves prediction accuracy for indications by 49.2% and contraindications by 35.1% under stringent zero-shot evaluation. To facilitate model interpretation, TXGNN's Explainer module offers transparent insights into multi-hop medical knowledge paths that form TXGNN's predictive rationales. Human evaluation of TXGNN's Explainer showed that TXGNN's predictions and explanations perform encouragingly on multiple axes of performance beyond accuracy. Many of TxGNN's novel predictions align with off-label prescriptions clinicians make in a large healthcare system. TXGNN's drug repurposing predictions are accurate, consistent with off-label drug use, and can be investigated by human experts through multi-hop interpretable rationales.

3.

Actionable Predictions of Human Pharmacokinetics at the Drug Design Stage.

Komissarov, Leonid; Manevski, Nenad; Groebke Zbinden, Katrin; Schindler, Torsten; Zitnik, Marinka; Sach-Peltason, Lisa.

Mol Pharm ; 21(9): 4356-4371, 2024 Sep 02.

Artículo en Inglés | MEDLINE | ID: mdl-39132855

RESUMEN

We present a novel computational approach for predicting human pharmacokinetics (PK) that addresses the challenges of early stage drug design. Our study introduces and describes a large-scale data set of 11 clinical PK end points, encompassing over 2700 unique chemical structures to train machine learning models. To that end multiple advanced training strategies are compared, including the integration of in vitro data and a novel self-supervised pretraining task. In addition to the predictions, our final model provides meaningful epistemic uncertainties for every data point. This allows us to successfully identify regions of exceptional predictive performance, with an absolute average fold error (AAFE/geometric mean fold error) of less than 2.5 across multiple end points. Together, these advancements represent a significant leap toward actionable PK predictions, which can be utilized early on in the drug design process to expedite development and reduce reliance on nonclinical studies.

Asunto(s)

Diseño de Fármacos , Aprendizaje Automático , Humanos , Farmacocinética , Preparaciones Farmacéuticas/química

4.

Contextual AI models for single-cell protein biology.

Li, Michelle M; Huang, Yepeng; Sumathipala, Marissa; Liang, Man Qing; Valdeolivas, Alberto; Ananthakrishnan, Ashwin N; Liao, Katherine; Marbach, Daniel; Zitnik, Marinka.

Nat Methods ; 21(8): 1546-1557, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-39039335

RESUMEN

Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.

Asunto(s)

Aprendizaje Profundo , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Algoritmos , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Proteínas/química , Biología Computacional/métodos

5.

TDC-2: Multimodal Foundation for Therapeutic Science.

Velez-Arce, Alejandro; Huang, Kexin; Li, Michelle M; Lin, Xiang; Gao, Wenhao; Fu, Tianfan; Kellis, Manolis; Pentelute, Bradley L; Zitnik, Marinka.

bioRxiv ; 2024 Jun 21.

Artículo en Inglés | MEDLINE | ID: mdl-38948789

RESUMEN

Therapeutics Data Commons (tdcommons.ai) is an open science initiative with unified datasets, AI models, and benchmarks to support research across therapeutic modalities and drug discovery and development stages. The Commons 2.0 (TDC-2) is a comprehensive overhaul of Therapeutic Data Commons to catalyze research in multimodal models for drug discovery by unifying single-cell biology of diseases, biochemistry of molecules, and effects of drugs through multimodal datasets, AI-powered API endpoints, new multimodal tasks and model frameworks, and comprehensive benchmarks. TDC-2 introduces over 1,000 multimodal datasets spanning approximately 85 million cells, pre-calculated embeddings from 5 state-of-the-art single-cell models, and a biomedical knowledge graph. TDC-2 drastically expands the coverage of ML tasks across therapeutic pipelines and 10+ new modalities, spanning but not limited to single-cell gene expression data, clinical trial data, peptide sequence data, peptidomimetics protein-peptide interaction data regarding newly discovered ligands derived from AS-MS spectroscopy, novel 3D structural data for proteins, and cell-type-specific protein-protein interaction networks at single-cell resolution. TDC-2 introduces multimodal data access under an API-first design using the model-view-controller paradigm. TDC-2 introduces 7 novel ML tasks with fine-grained biological contexts: contextualized drug-target identification, single-cell chemical/genetic perturbation response prediction, protein-peptide binding affinity prediction task, and clinical trial outcome prediction task, which introduce antigen-processing-pathway-specific, cell-type-specific, peptide-specific, and patient-specific biological contexts. TDC-2 also releases benchmarks evaluating 15+ state-of-the-art models across 5+ new learning tasks evaluating models on diverse biological contexts and sampling approaches. Among these, TDC-2 provides the first benchmark for context-specific learning. TDC-2, to our knowledge, is also the first to introduce a protein-peptide binding interaction benchmark.

6.

On knowing a gene: A distributional hypothesis of gene function.

Kwon, Jason J; Pan, Joshua; Gonzalez, Guadalupe; Hahn, William C; Zitnik, Marinka.

Cell Syst ; 15(6): 488-496, 2024 Jun 19.

Artículo en Inglés | MEDLINE | ID: mdl-38810640

RESUMEN

As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.

Asunto(s)

Procesamiento de Lenguaje Natural , Semántica , Humanos , Genes/genética , Ontología de Genes , Biología Computacional/métodos , Animales

7.

Graph Artificial Intelligence in Medicine.

Johnson, Ruth; Li, Michelle M; Noori, Ayush; Queen, Owen; Zitnik, Marinka.

Annu Rev Biomed Data Sci ; 7(1): 345-368, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38749465

RESUMEN

In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.

Asunto(s)

Inteligencia Artificial , Redes Neurales de la Computación , Humanos , Gráficos por Computador

8.

Efficient Generation of Protein Pockets with PocketGen.

Zhang, Zaixi; Shen, Wanxiang; Liu, Qi; Zitnik, Marinka.

bioRxiv ; 2024 Jul 26.

Artículo en Inglés | MEDLINE | ID: mdl-38464121

RESUMEN

Designing protein-binding proteins plays an important role in drug discovery. However, AI-based design of such proteins is challenging due to complex ligand-protein interactions, flexibility of ligand molecules and amino acid side chains, and sequence-structure dependencies. We introduce PocketGen, a deep generative model that produces both the residue sequence and atom structure of the protein regions where interactions with ligand molecules occur. PocketGen ensures sequence-structure consistency by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The bilevel graph transformer captures interactions at multiple granularities across atom, residue, and ligand levels. To enhance sequence refinement, PocketGen integrates a structural adapter with the protein language model, ensuring consistency between structure-based and sequence-based predictions. Results show that PocketGen can generate high-fidelity protein pockets with superior binding affinity and structural validity. It is ten times faster than physics-based methods and achieves a 95% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets, along with achieving an amino acid recovery rate exceeding 64%.

9.

Evaluating generalizability of artificial intelligence models for molecular datasets.

Ektefaie, Yasha; Shen, Andrew; Bykova, Daria; Marin, Maximillian; Zitnik, Marinka; Farhat, Maha.

bioRxiv ; 2024 Feb 28.

Artículo en Inglés | MEDLINE | ID: mdl-38464295

RESUMEN

Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.

10.

Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks.

Gonzalez, Guadalupe; Herath, Isuru; Veselkov, Kirill; Bronstein, Michael; Zitnik, Marinka.

bioRxiv ; 2024 Jan 08.

Artículo en Inglés | MEDLINE | ID: mdl-38260532

RESUMEN

As an alternative to target-driven drug discovery, phenotype-driven approaches identify compounds that counteract the overall disease effects by analyzing phenotypic signatures. Our study introduces a novel approach to this field, aiming to expand the search space for new therapeutic agents. We introduce PDGrapher, a causally-inspired graph neural network model designed to predict arbitrary perturbagens - sets of therapeutic targets - capable of reversing disease effects. Unlike existing methods that learn responses to perturbations, PDGrapher solves the inverse problem, which is to infer the perturbagens necessary to achieve a specific response - i.e., directly predicting perturbagens by learning which perturbations elicit a desired response. Experiments across eight datasets of genetic and chemical perturbations show that PDGrapher successfully predicted effective perturbagens in up to 9% additional test samples and ranked therapeutic targets up to 35% higher than competing methods. A key innovation of PDGrapher is its direct prediction capability, which contrasts with the indirect, computationally intensive models traditionally used in phenotypedriven drug discovery that only predict changes in phenotypes due to perturbations. The direct approach enables PDGrapher to train up to 30 times faster, representing a significant leap in efficiency. Our results suggest that PDGrapher can advance phenotype-driven drug discovery, offering a fast and comprehensive approach to identifying therapeutically useful perturbations.

11.

Contextualizing protein representations using deep learning on protein networks and single-cell data.

Li, Michelle M; Huang, Yepeng; Sumathipala, Marissa; Liang, Man Qing; Valdeolivas, Alberto; Ananthakrishnan, Ashwin N; Liao, Katherine; Marbach, Daniel; Zitnik, Marinka.

bioRxiv ; 2024 Jan 18.

Artículo en Inglés | MEDLINE | ID: mdl-37503080

RESUMEN

Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across diverse biological contexts, such as tissues and cell types, remains a significant challenge for existing algorithms. We introduce Pinnacle, a flexible geometric deep learning approach that is trained on contextualized protein interaction networks to generate context-aware protein representations. Leveraging a human multi-organ single-cell transcriptomic atlas, Pinnacle provides 394,760 protein representations split across 156 cell type contexts from 24 tissues and organs. Pinnacle's contextualized representations of proteins reflect cellular and tissue organization and Pinnacle's tissue representations enable zero-shot retrieval of the tissue hierarchy. Pretrained Pinnacle's protein representations can be adapted for downstream tasks: to enhance 3D structure-based protein representations for important protein interactions in immuno-oncology (PD-1/PD-L1 and B7-1/CTLA-4) and to study the effects of drugs across cell type contexts. Pinnacle outperforms state-of-the-art, yet context-free, models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases, and can pinpoint cell type contexts that predict therapeutic targets better than context-free models (29 out of 156 cell types in rheumatoid arthritis; 13 out of 152 cell types in inflammatory bowel diseases). Pinnacle is a graph-based contextual AI model that dynamically adjusts its outputs based on biological contexts in which it operates.

12.

Multimodal learning with graphs.

Ektefaie, Yasha; Dasoulas, George; Noori, Ayush; Farhat, Maha; Zitnik, Marinka.

Nat Mach Intell ; 5(4): 340-350, 2023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38076673

RESUMEN

Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases-the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies using graphs. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models.

13.

Scientific discovery in the age of artificial intelligence.

Wang, Hanchen; Fu, Tianfan; Du, Yuanqi; Gao, Wenhao; Huang, Kexin; Liu, Ziming; Chandak, Payal; Liu, Shengchao; Van Katwyk, Peter; Deac, Andreea; Anandkumar, Anima; Bergen, Karianne; Gomes, Carla P; Ho, Shirley; Kohli, Pushmeet; Lasenby, Joan; Leskovec, Jure; Liu, Tie-Yan; Manrai, Arjun; Marks, Debora; Ramsundar, Bharath; Song, Le; Sun, Jimeng; Tang, Jian; Velickovic, Petar; Welling, Max; Zhang, Linfeng; Coley, Connor W; Bengio, Yoshua; Zitnik, Marinka.

Nature ; 620(7972): 47-60, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-37532811

RESUMEN

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.

Asunto(s)

Inteligencia Artificial , Proyectos de Investigación , Inteligencia Artificial/normas , Inteligencia Artificial/tendencias , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Proyectos de Investigación/normas , Proyectos de Investigación/tendencias , Aprendizaje Automático no Supervisado

14.

Publisher Correction: Scientific discovery in the age of artificial intelligence.

Wang, Hanchen; Fu, Tianfan; Du, Yuanqi; Gao, Wenhao; Huang, Kexin; Liu, Ziming; Chandak, Payal; Liu, Shengchao; Van Katwyk, Peter; Deac, Andreea; Anandkumar, Anima; Bergen, Karianne; Gomes, Carla P; Ho, Shirley; Kohli, Pushmeet; Lasenby, Joan; Leskovec, Jure; Liu, Tie-Yan; Manrai, Arjun; Marks, Debora; Ramsundar, Bharath; Song, Le; Sun, Jimeng; Tang, Jian; Velickovic, Petar; Welling, Max; Zhang, Linfeng; Coley, Connor W; Bengio, Yoshua; Zitnik, Marinka.

Nature ; 621(7978): E33, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37648871

15.

Multimodal learning on graphs for disease relation extraction.

Lin, Yucong; Lu, Keming; Yu, Sheng; Cai, Tianxi; Zitnik, Marinka.

J Biomed Inform ; 143: 104415, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37276949

RESUMEN

Disease knowledge graphs have emerged as a powerful tool for artificial intelligence to connect, organize, and access diverse information about diseases. Relations between disease concepts are often distributed across multiple datasets, including unstructured plain text datasets and incomplete disease knowledge graphs. Extracting disease relations from multimodal data sources is thus crucial for constructing accurate and comprehensive disease knowledge graphs. We introduce REMAP, a multimodal approach for disease relation extraction. The REMAP machine learning approach jointly embeds a partial, incomplete knowledge graph and a medical language dataset into a compact latent vector space, aligning the multimodal embeddings for optimal disease relation extraction. Additionally, REMAP utilizes a decoupled model structure to enable inference in single-modal data, which can be applied under missing modality scenarios. We apply the REMAP approach to a disease knowledge graph with 96,913 relations and a text dataset of 1.24 million sentences. On a dataset annotated by human experts, REMAP improves language-based disease relation extraction by 10.0% (accuracy) and 17.2% (F1-score) by fusing disease knowledge graphs with language information. Furthermore, REMAP leverages text information to recommend new relationships in the knowledge graph, outperforming graph-based methods by 8.4% (accuracy) and 10.4% (F1-score). REMAP is a flexible multimodal approach for extracting disease relations by fusing structured knowledge and language information. This approach provides a powerful model to easily find, access, and evaluate relations between disease concepts.

Asunto(s)

Inteligencia Artificial , Aprendizaje Automático , Humanos , Unified Medical Language System , Lenguaje , Procesamiento de Lenguaje Natural

16.

Metapaths: similarity search in heterogeneous knowledge graphs via meta-paths.

Noori, Ayush; Li, Michelle M; Tan, Amelia L M; Zitnik, Marinka.

Bioinformatics ; 39(5)2023 05 04.

Artículo en Inglés | MEDLINE | ID: mdl-37140542

RESUMEN

SUMMARY: Heterogeneous knowledge graphs (KGs) have enabled the modeling of complex systems, from genetic interaction graphs and protein-protein interaction networks to networks representing drugs, diseases, proteins, and side effects. Analytical methods for KGs rely on quantifying similarities between entities, such as nodes, in the graph. However, such methods must consider the diversity of node and edge types contained within the KG via, for example, defined sequences of entity types known as meta-paths. We present metapaths, the first R software package to implement meta-paths and perform meta-path-based similarity search in heterogeneous KGs. The metapaths package offers various built-in similarity metrics for node pair comparison by querying KGs represented as either edge or adjacency lists, as well as auxiliary aggregation methods to measure set-level relationships. Indeed, evaluation of these methods on an open-source biomedical KG recovered meaningful drug and disease-associated relationships, including those in Alzheimer's disease. The metapaths framework facilitates the scalable and flexible modeling of network similarities in KGs with applications across KG learning. AVAILABILITY AND IMPLEMENTATION: The metapaths R package is available via GitHub at https://github.com/ayushnoori/metapaths and is released under MPL 2.0 (Zenodo DOI: 10.5281/zenodo.7047209). Package documentation and usage examples are available at https://www.ayushnoori.com/metapaths.

Asunto(s)

Enfermedad de Alzheimer , Reconocimiento de Normas Patrones Automatizadas , Humanos , Programas Informáticos , Mapas de Interacción de Proteínas

17.

Evaluating explainability for graph neural networks.

Agarwal, Chirag; Queen, Owen; Lakkaraju, Himabindu; Zitnik, Marinka.

Sci Data ; 10(1): 144, 2023 03 18.

Artículo en Inglés | MEDLINE | ID: mdl-36934095

RESUMEN

As explanations are increasingly used to understand the behavior of graph neural networks (GNNs), evaluating the quality and reliability of GNN explanations is crucial. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations. Here, we introduce a synthetic graph data generator, SHAPEGGEN, which can generate a variety of benchmark datasets (e.g., varying graph sizes, degree distributions, homophilic vs. heterophilic graphs) accompanied by ground-truth explanations. The flexibility to generate diverse synthetic datasets and corresponding ground-truth explanations allows SHAPEGGEN to mimic the data in various real-world areas. We include SHAPEGGEN and several real-world graph datasets in a graph explainability library, GRAPHXAI. In addition to synthetic and real-world graph datasets with ground-truth explanations, GRAPHXAI provides data loaders, data processing functions, visualizers, GNN model implementations, and evaluation metrics to benchmark GNN explainability methods.

18.

Building a knowledge graph to enable precision medicine.

Chandak, Payal; Huang, Kexin; Zitnik, Marinka.

Sci Data ; 10(1): 67, 2023 02 02.

Artículo en Inglés | MEDLINE | ID: mdl-36732524

RESUMEN

Developing personalized diagnostic strategies and targeted treatments requires a deep understanding of disease biology and the ability to dissect the relationship between molecular and genetic factors and their phenotypic consequences. However, such knowledge is fragmented across publications, non-standardized repositories, and evolving ontologies describing various scales of biological organization between genotypes and clinical phenotypes. Here, we present PrimeKG, a multimodal knowledge graph for precision medicine analyses. PrimeKG integrates 20 high-quality resources to describe 17,080 diseases with 4,050,249 relationships representing ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous efforts in disease-rooted knowledge graphs. PrimeKG contains an abundance of 'indications', 'contradictions', and 'off-label use' drug-disease edges that lack in other knowledge graphs and can support AI analyses of how drugs affect disease-associated networks. We supplement PrimeKG's graph structure with language descriptions of clinical guidelines to enable multimodal analyses and provide instructions for continual updates of PrimeKG as new data become available.

Asunto(s)

Reconocimiento de Normas Patrones Automatizadas , Medicina de Precisión , Humanos

19.

Multimodal representation learning for predicting molecule-disease relations.

Wen, Jun; Zhang, Xiang; Rush, Everett; Panickan, Vidul A; Li, Xingyu; Cai, Tianrun; Zhou, Doudou; Ho, Yuk-Lam; Costa, Lauren; Begoli, Edmon; Hong, Chuan; Gaziano, J Michael; Cho, Kelly; Lu, Junwei; Liao, Katherine P; Zitnik, Marinka; Cai, Tianxi.

Bioinformatics ; 39(2)2023 02 03.

Artículo en Inglés | MEDLINE | ID: mdl-36805623

RESUMEN

MOTIVATION: Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance. METHODS: We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects. RESULTS: We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Desarrollo de Medicamentos , Registros Electrónicos de Salud , Redes Neurales de la Computación , Farmacovigilancia

20.

Extending the Nested Model for User-Centric XAI: A Design Study on GNN-based Drug Repurposing.

Wang, Qianwen; Huang, Kexin; Chandak, Payal; Zitnik, Marinka; Gehlenborg, Nils.

IEEE Trans Vis Comput Graph ; 29(1): 1266-1276, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36223348

RESUMEN

Whether AI explanations can help users achieve specific tasks efficiently (i.e., usable explanations) is significantly influenced by their visual presentation. While many techniques exist to generate explanations, it remains unclear how to select and visually present AI explanations based on the characteristics of domain users. This paper aims to understand this question through a multidisciplinary design study for a specific problem: explaining graph neural network (GNN) predictions to domain experts in drug repurposing, i.e., reuse of existing drugs for new diseases. Building on the nested design model of visualization, we incorporate XAI design considerations from a literature review and from our collaborators' feedback into the design process. Specifically, we discuss XAI-related design considerations for usable visual explanations at each design layer: target user, usage context, domain explanation, and XAI goal at the domain layer; format, granularity, and operation of explanations at the abstraction layer; encodings and interactions at the visualization layer; and XAI and rendering algorithm at the algorithm layer. We present how the extended nested model motivates and informs the design of DrugExplorer, an XAI tool for drug repurposing. Based on our domain characterization, DrugExplorer provides path-based explanations and presents them both as individual paths and meta-paths for two key XAI operations, why and what else. DrugExplorer offers a novel visualization design called MetaMatrix with a set of interactions to help domain users organize and compare explanation paths at different levels of granularity to generate domain-meaningful insights. We demonstrate the effectiveness of the selected visual presentation and DrugExplorer as a whole via a usage scenario, a user study, and expert interviews. From these evaluations, we derive insightful observations and reflections that can inform the design of XAI visualizations for other scientific applications.

Asunto(s)

Gráficos por Computador , Reposicionamiento de Medicamentos , Redes Neurales de la Computación , Algoritmos

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA