RESUMEN
MOTIVATION: Answering and solving complex problems using a large language model (LLM) given a certain domain such as biomedicine is a challenging task that requires both factual consistency and logic, and LLMs often suffer from some major limitations, such as hallucinating false or irrelevant information, or being influenced by noisy data. These issues can compromise the trustworthiness, accuracy, and compliance of LLM-generated text and insights. RESULTS: Knowledge Retrieval Augmented Generation ENgine (KRAGEN) is a new tool that combines knowledge graphs, Retrieval Augmented Generation (RAG), and advanced prompting techniques to solve complex problems with natural language. KRAGEN converts knowledge graphs into a vector database and uses RAG to retrieve relevant facts from it. KRAGEN uses advanced prompting techniques: namely graph-of-thoughts (GoT), to dynamically break down a complex problem into smaller subproblems, and proceeds to solve each subproblem by using the relevant knowledge through the RAG framework, which limits the hallucinations, and finally, consolidates the subproblems and provides a solution. KRAGEN's graph visualization allows the user to interact with and evaluate the quality of the solution's GoT structure and logic. AVAILABILITY AND IMPLEMENTATION: KRAGEN is deployed by running its custom Docker containers. KRAGEN is available as open-source from GitHub at: https://github.com/EpistasisLab/KRAGEN.
Asunto(s)
Programas Informáticos , Procesamiento de Lenguaje Natural , Solución de Problemas , Algoritmos , Almacenamiento y Recuperación de la Información/métodos , Humanos , Biología Computacional/métodos , Bases de Datos FactualesRESUMEN
BACKGROUND: As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease's etiology and response to drugs. OBJECTIVE: We designed the Alzheimer's Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. METHODS: We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. RESULTS: AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. CONCLUSIONS: AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge.
Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/tratamiento farmacológico , Enfermedad de Alzheimer/genética , Reconocimiento de Normas Patrones Automatizadas , Bases del Conocimiento , Aprendizaje Automático , ConocimientoRESUMEN
The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.
Asunto(s)
Algoritmos , Biología Computacional , Humanos , Análisis por Conglomerados , Medicina de PrecisiónRESUMEN
BACKGROUND: The optimal management of blunt thoracic aortic injury (BTAI) remains controversial, with experienced centers offering therapy ranging from medical management to TEVAR. We investigated the utility of a machine learning (ML) algorithm to develop a prognostic model of risk factors on mortality in patients with BTAI. METHODS: The Aortic Trauma Foundation registry was utilized to examine demographics, injury characteristics, management and outcomes of patients with BTAI. A STREAMLINE (A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison) model as well as logistic regression (LR) analysis with imputation using chained equations was developed and compared. RESULTS: From a total of 1018 patients in the registry, 702 patients were included in the final analysis. Of the 258 (37%) patients who were medically managed, 44 (17%) died during admission, 14 (5.4%) of which were aortic related deaths. Four hundred forty-four (63%) patients underwent TEVAR and 343 of which underwent TEVAR within 24 hours of admission. Among TEVAR patients, 39 (8.8%) patients died and 7 (1.6%) had aortic related deaths ( Table 1 ). Comparison of the STREAMLINE and LR model showed no significant difference in ROC curves and high AUCs of 0.869 (95% confidence interval, 0.813-0.925) and 0.840 (95% confidence interval, 0.779-0.900) respectively in predicting in-hospital mortality. Unexpectedly, however, the variables prioritized in each model differed between models. The top 3 variables identified from the LR model were similar to that from existing literature. The STREAMLINE model, however, prioritized location of the injury along the lesser curve, age and aortic injury grade. CONCLUSION: Machine learning provides insight on prioritization of variables not typically identified in standard multivariable logistic regression. Further investigation and validation in other aortic injury cohorts are needed to delineate the utility of ML models. LEVEL OF EVIDENCE: Prognostic and Epidemiological; Level III.
Asunto(s)
Aorta Torácica , Aprendizaje Automático , Sistema de Registros , Heridas no Penetrantes , Humanos , Heridas no Penetrantes/mortalidad , Heridas no Penetrantes/terapia , Heridas no Penetrantes/diagnóstico , Heridas no Penetrantes/cirugía , Masculino , Femenino , Aorta Torácica/lesiones , Aorta Torácica/cirugía , Adulto , Persona de Mediana Edad , Pronóstico , Procedimientos Endovasculares , Puntaje de Gravedad del Traumatismo , Factores de Riesgo , Estudios Retrospectivos , Lesiones del Sistema Vascular/mortalidad , Lesiones del Sistema Vascular/cirugía , Lesiones del Sistema Vascular/diagnóstico , Lesiones del Sistema Vascular/terapia , Mortalidad Hospitalaria , Modelos Logísticos , Algoritmos , Traumatismos Torácicos/mortalidad , Traumatismos Torácicos/terapia , Traumatismos Torácicos/diagnóstico , Traumatismos Torácicos/cirugíaRESUMEN
Organelles play important roles in human health and disease, such as maintaining homeostasis, regulating growth and aging, and generating energy. Organelle diversity in cells not only exists between cell types but also between individual cells. Therefore, studying the distribution of organelles at the single-cell level is important to understand cellular function. Mesenchymal stem cells are multipotent cells that have been explored as a therapeutic method for treating a variety of diseases. Studying how organelles are structured in these cells can answer questions about their characteristics and potential. Herein, rapid multiplexed immunofluorescence (RapMIF) was performed to understand the spatial organization of 10 organelle proteins and the interactions between them in the bone marrow (BM) and umbilical cord (UC) mesenchymal stem cells (MSCs). Spatial correlations, colocalization, clustering, statistical tests, texture, and morphological analyses were conducted at the single cell level, shedding light onto the interrelations between the organelles and comparisons of the two MSC subtypes. Such analytics toolsets indicated that UC MSCs exhibited higher organelle expression and spatially spread distribution of mitochondria accompanied by several other organelles compared to BM MSCs. This data-driven single-cell approach provided by rapid subcellular proteomic imaging enables personalized stem cell therapeutics.
Asunto(s)
Células Madre Mesenquimatosas , Proteómica , Humanos , Células de la Médula Ósea , Diferenciación Celular/fisiología , Cordón Umbilical , MitocondriasRESUMEN
Protein-protein interaction networks are altered in multi-gene dysregulations in many disorders. Image-based protein multiplexing sheds light on signaling pathways to dissect cell-to-cell heterogeneity, previously masked by the bulk assays. Herein, we present a rapid multiplexed immunofluorescence (RapMIF) method measuring up to 25-plex spatial protein maps from cultures and tissues at subcellular resolution, providing combinatorial 272 pairwise and 1,360 tri-protein signaling states across 33 multiplexed pixel-level clusters. The RapMIF pipeline automated staining, bleaching, and imaging of the biospecimens in a single platform. RapMIF showed that WNT/ß-catenin signaling upregulated upon the inhibition of the AKT/mTOR pathway. Subcellular protein images demonstrated translocation patterns, spatial receptor discontinuity, and subcellular signaling clusters in single cells. Signaling networks exhibited spatial redistribution of signaling proteins in drug-responsive cultures. Machine learning analysis predicted the phosphorylated ß-catenin expression from interconnected signaling protein images. RapMIF is an ideal signaling discovery approach for precision therapy design.
RESUMEN
3D visualization technologies such as virtual reality (VR), augmented reality (AR), and mixed reality (MR) have gained popularity in the recent decade. Digital extended reality (XR) technologies have been adopted in various domains ranging from entertainment to education because of their accessibility and affordability. XR modalities create an immersive experience, enabling 3D visualization of the content without a conventional 2D display constraint. Here, we provide a perspective on XR in current biomedical applications and demonstrate case studies using cell biology concepts, multiplexed proteomics images, surgical data for heart operations, and cardiac 3D models. Emerging challenges associated with XR technologies in the context of adverse health effects and a cost comparison of distinct platforms are discussed. The presented XR platforms will be useful for biomedical education, medical training, surgical guidance, and molecular data visualization to enhance trainees' and students' learning, medical operation accuracy, and the comprehensibility of complex biological systems.
Asunto(s)
Realidad Aumentada , Tecnología Biomédica , Realidad Virtual , Tecnología Biomédica/economía , Costos y Análisis de Costo , Emociones , Humanos , AprendizajeRESUMEN
The Coronavirus Disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), outbreak from Wuhan City, Hubei province, China in 2019 has become an ongoing global health emergency. The emerging virus, SARS-CoV-2, causes coughing, fever, muscle ache, and shortness of breath or dyspnea in symptomatic patients. The pathogenic particles that are generated by coughing and sneezing remain suspended in the air or attach to a surface to facilitate transmission in an aerosol form. This review focuses on the recent trends in pandemic biology, diagnostics methods, prevention tools, and policies for COVID-19 management. To meet the growing demand for medical supplies during the COVID-19 era, a variety of personal protective equipment (PPE) and ventilators have been developed using do-it-yourself (DIY) manufacturing. COVID-19 diagnosis and the prediction of virus transmission are analyzed by machine learning algorithms, simulations, and digital monitoring. Until the discovery of a clinically approved vaccine for COVID-19, pandemics remain a public concern. Therefore, technological developments, biomedical research, and policy development are needed to decipher the coronavirus mechanism and epidemiological characteristics, prevent transmission, and develop therapeutic drugs.
RESUMEN
Conventional posters are effective in disseminating progress reports in scientific meetings, but they fail to deliver the need for visualization of dynamic biological data and become costly with the increasing number of conferences and the reprinting needs for emerging research. Here we present digital posters that repurpose digital frames from the art community and experiment with multiplexed imaging movies of cells as a demonstration of the digital poster concept, providing an interactive and low-cost tool for next-generation sharing platforms.