Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Bioinformatics ; 40(9)2024 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-39288310

RESUMEN

MOTIVATION: Large language models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains such as biomedicine. Solutions such as pretraining and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo, and GPT-4, to generate meaningful biomedical text rooted in established knowledge. RESULTS: Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. AVAILABILITY AND IMPLEMENTATION: SPOKE KG can be accessed at https://spoke.rbvi.ucsf.edu/neighborhood.html. It can also be accessed using REST-API (https://spoke.rbvi.ucsf.edu/swagger/). KG-RAG code is made available at https://github.com/BaranziniLab/KG_RAG. Biomedical benchmark datasets used in this study are made available to the research community in the same GitHub repository.


Asunto(s)
Procesamiento de Lenguaje Natural , Biología Computacional/métodos , Algoritmos , Humanos
2.
Bioinformatics ; 39(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36759942

RESUMEN

MOTIVATION: Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS: In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Medicina de Precisión , Bases de Datos Factuales
3.
AI Mag ; 43(1): 46-58, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36093122

RESUMEN

Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article we describe concrete uses of SPOKE, an open knowledge network that connects curated information from 37 specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis and management.

4.
Proc Natl Acad Sci U S A ; 114(40): 10713-10718, 2017 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-28893978

RESUMEN

The gut microbiota regulates T cell functions throughout the body. We hypothesized that intestinal bacteria impact the pathogenesis of multiple sclerosis (MS), an autoimmune disorder of the CNS and thus analyzed the microbiomes of 71 MS patients not undergoing treatment and 71 healthy controls. Although no major shifts in microbial community structure were found, we identified specific bacterial taxa that were significantly associated with MS. Akkermansia muciniphila and Acinetobacter calcoaceticus, both increased in MS patients, induced proinflammatory responses in human peripheral blood mononuclear cells and in monocolonized mice. In contrast, Parabacteroides distasonis, which was reduced in MS patients, stimulated antiinflammatory IL-10-expressing human CD4+CD25+ T cells and IL-10+FoxP3+ Tregs in mice. Finally, microbiota transplants from MS patients into germ-free mice resulted in more severe symptoms of experimental autoimmune encephalomyelitis and reduced proportions of IL-10+ Tregs compared with mice "humanized" with microbiota from healthy controls. This study identifies specific human gut bacteria that regulate adaptive autoimmune responses, suggesting therapeutic targeting of the microbiota as a treatment for MS.


Asunto(s)
Modelos Animales de Enfermedad , Encefalomielitis Autoinmune Experimental/inmunología , Microbioma Gastrointestinal , Leucocitos Mononucleares/inmunología , Esclerosis Múltiple/inmunología , Linfocitos T Reguladores/inmunología , Linfocitos T/inmunología , Animales , Células Cultivadas , Encefalomielitis Autoinmune Experimental/microbiología , Encefalomielitis Autoinmune Experimental/patología , Femenino , Humanos , Leucocitos Mononucleares/microbiología , Leucocitos Mononucleares/patología , Masculino , Ratones , Esclerosis Múltiple/microbiología , Esclerosis Múltiple/patología , Linfocitos T/microbiología , Linfocitos T/patología
5.
Pac Symp Biocomput ; 28: 97-108, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540968

RESUMEN

Meaningful representations of clinical data using embedding vectors is a pivotal step to invoke any machine learning (ML) algorithm for data inference. In this article, we propose a time-aware embedding approach of electronic health records onto a biomedical knowledge graph for creating machine readable patient representations. This approach not only captures the temporal dynamics of patient clinical trajectories, but also enriches it with additional biological information from the knowledge graph. To gauge the predictivity of this approach, we propose an ML pipeline called TANDEM (Temporal and Non-temporal Dynamics Embedded Model) and apply it on the early detection of Parkinson's disease. TANDEM results in a classification AUC score of 0.85 on unseen test dataset. These predictions are further explained by providing a biological insight using the knowledge graph. Taken together, we show that temporal embeddings of clinical data could be a meaningful predictive representation for downstream ML pipelines in clinical decision-making.


Asunto(s)
Biología Computacional , Reconocimiento de Normas Patrones Automatizadas , Humanos , Biología Computacional/métodos , Algoritmos , Aprendizaje Automático , Registros Electrónicos de Salud
6.
Front Med (Lausanne) ; 10: 1081087, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37250641

RESUMEN

Introduction: Early diagnosis of Parkinson's disease (PD) is important to identify treatments to slow neurodegeneration. People who develop PD often have symptoms before the disease manifests and may be coded as diagnoses in the electronic health record (EHR). Methods: To predict PD diagnosis, we embedded EHR data of patients onto a biomedical knowledge graph called Scalable Precision medicine Open Knowledge Engine (SPOKE) and created patient embedding vectors. We trained and validated a classifier using these vectors from 3,004 PD patients, restricting records to 1, 3, and 5 years before diagnosis, and 457,197 non-PD group. Results: The classifier predicted PD diagnosis with moderate accuracy (AUC = 0.77 ± 0.06, 0.74 ± 0.05, 0.72 ± 0.05 at 1, 3, and 5 years) and performed better than other benchmark methods. Nodes in the SPOKE graph, among cases, revealed novel associations, while SPOKE patient vectors revealed the basis for individual risk classification. Discussion: The proposed method was able to explain the clinical predictions using the knowledge graph, thereby making the predictions clinically interpretable. Through enriching EHR data with biomedical associations, SPOKE may be a cost-efficient and personalized way to predict PD diagnosis years before its occurrence.

7.
J Am Med Inform Assoc ; 29(3): 424-434, 2022 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-34915552

RESUMEN

OBJECTIVE: Early identification of chronic diseases is a pillar of precision medicine as it can lead to improved outcomes, reduction of disease burden, and lower healthcare costs. Predictions of a patient's health trajectory have been improved through the application of machine learning approaches to electronic health records (EHRs). However, these methods have traditionally relied on "black box" algorithms that can process large amounts of data but are unable to incorporate domain knowledge, thus limiting their predictive and explanatory power. Here, we present a method for incorporating domain knowledge into clinical classifications by embedding individual patient data into a biomedical knowledge graph. MATERIALS AND METHODS: A modified version of the Page rank algorithm was implemented to embed millions of deidentified EHRs into a biomedical knowledge graph (SPOKE). This resulted in high-dimensional, knowledge-guided patient health signatures (ie, SPOKEsigs) that were subsequently used as features in a random forest environment to classify patients at risk of developing a chronic disease. RESULTS: Our model predicted disease status of 5752 subjects 3 years before being diagnosed with multiple sclerosis (MS) (AUC = 0.83). SPOKEsigs outperformed predictions using EHRs alone, and the biological drivers of the classifiers provided insight into the underpinnings of prodromal MS. CONCLUSION: Using data from EHR as input, SPOKEsigs describe patients at both the clinical and biological levels. We provide a clinical use case for detecting MS up to 5 years prior to their documented diagnosis in the clinic and illustrate the biological features that distinguish the prodromal MS state.


Asunto(s)
Registros Electrónicos de Salud , Esclerosis Múltiple , Algoritmos , Humanos , Aprendizaje Automático , Esclerosis Múltiple/diagnóstico , Medicina de Precisión/métodos
8.
Life (Basel) ; 11(1)2021 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-33445483

RESUMEN

There has long been an interest in understanding how the hazards from spaceflight may trigger or exacerbate human diseases. With the goal of advancing our knowledge on physiological changes during space travel, NASA GeneLab provides an open-source repository of multi-omics data from real and simulated spaceflight studies. Alone, this data enables identification of biological changes during spaceflight, but cannot infer how that may impact an astronaut at the phenotypic level. To bridge this gap, Scalable Precision Medicine Oriented Knowledge Engine (SPOKE), a heterogeneous knowledge graph connecting biological and clinical data from over 30 databases, was used in combination with GeneLab transcriptomic data from six studies. This integration identified critical symptoms and physiological changes incurred during spaceflight.

9.
Nat Commun ; 10(1): 3045, 2019 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-31292438

RESUMEN

In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine.


Asunto(s)
Análisis de Datos , Recolección de Datos/métodos , Aprendizaje Automático no Supervisado , Investigación Biomédica/estadística & datos numéricos , Registros Electrónicos de Salud/estadística & datos numéricos , Medicina de Precisión/métodos
10.
Biol Open ; 7(7)2018 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-30037883

RESUMEN

Although the primary protein sequence of ubiquitin (Ub) is extremely stable over evolutionary time, it is highly tolerant to mutation during selection experiments performed in the laboratory. We have proposed that this discrepancy results from the difference between fitness under laboratory culture conditions and the selective pressures in changing environments over evolutionary timescales. Building on our previous work (Mavor et al., 2016), we used deep mutational scanning to determine how twelve new chemicals (3-Amino-1,2,4-triazole, 5-fluorocytosine, Amphotericin B, CaCl2, Cerulenin, Cobalt Acetate, Menadione, Nickel Chloride, p-Fluorophenylalanine, Rapamycin, Tamoxifen, and Tunicamycin) reveal novel mutational sensitivities of ubiquitin residues. Collectively, our experiments have identified eight new sensitizing conditions for Lys63 and uncovered a sensitizing condition for every position in Ub except Ser57 and Gln62. By determining the ubiquitin fitness landscape under different chemical constraints, our work helps to resolve the inconsistencies between deep mutational scanning experiments and sequence conservation over evolutionary timescales.

11.
Eur J Cancer ; 48(17): 3278-87, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22459762

RESUMEN

Histone deacetylase (HDAC) inhibitors have shown promising anticancer effects in clinical trials. However, a proportion of patients do not respond to HDAC inhibitor therapy. We have previously demonstrated that tissue transglutaminase (TG2) is one of the genes commonly up-regulated by HDAC inhibitors in vitro and in vivo, and that two structurally distinct TG2 protein isoforms, the full-length (TG2-L) and the short form (TG2-S), exert opposing effects on cell differentiation due to difference in transamidation activity. Here we show that the HDAC inhibitor suberoylanilide hydroxamic acid (SAHA) transcriptionally activates the expression of both TG2-L and TG2-S, and that up-regulation of TG2-L renders neuroblastoma cells less sensitive to SAHA-induced cytotoxicity. Combination therapy with SAHA and the transamidation activator Naringenin, a natural product found in citrus fruits, synergistically enhanced transamidation activity and SAHA-induced cytotoxicity in neuroblastoma cells, but not in normal non-malignant cells. In tumour-bearing N-Myc transgenic mice, SAHA and Naringenin synergistically suppressed tumour progression. Taken together, our data demonstrate that SAHA-induced TG2-L over-expression renders cancer cells less sensitive to SAHA therapy, and suggest the addition of Naringenin to SAHA and probably also other HDAC inhibitors in future clinical trials in cancer patients.


Asunto(s)
Antineoplásicos/farmacología , Inhibidores de Histona Desacetilasas/farmacología , Ácidos Hidroxámicos/farmacología , Transglutaminasas/fisiología , Animales , Línea Celular Tumoral , Flavanonas/farmacología , Proteínas de Unión al GTP , Genes myc , Humanos , Ratones , Ratones Transgénicos , Neuroblastoma/tratamiento farmacológico , Neuroblastoma/patología , Proteína Glutamina Gamma Glutamiltransferasa 2 , ARN Mensajero/análisis , Transglutaminasas/genética , Vorinostat
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA