Búsqueda | Portal Regional de la BVS

1.

Towards Digital Quantification of Ploidy from Pan-Cancer Digital Pathology Slides using Deep Learning.

Carrillo-Perez, Francisco; Cramer, Eric M; Pizurica, Marija; Andor, Noemi; Gevaert, Olivier.

bioRxiv ; 2024 Aug 20.

Artículo en Inglés | MEDLINE | ID: mdl-39229200

RESUMEN

Abnormal DNA ploidy, found in numerous cancers, is increasingly being recognized as a contributor in driving chromosomal instability, genome evolution, and the heterogeneity that fuels cancer cell progression. Furthermore, it has been linked with poor prognosis of cancer patients. While next-generation sequencing can be used to approximate tumor ploidy, it has a high error rate for near-euploid states, a high cost and is time consuming, motivating alternative rapid quantification methods. We introduce PloiViT, a transformer-based model for tumor ploidy quantification that outperforms traditional machine learning models, enabling rapid and cost-effective quantification directly from pathology slides. We trained PloiViT on a dataset of fifteen cancer types from The Cancer Genome Atlas and validated its performance in multiple independent cohorts. Additionally, we explored the impact of self-supervised feature extraction on performance. PloiViT, using self-supervised features, achieved the lowest prediction error in multiple independent cohorts, exhibiting better generalization capabilities. Our findings demonstrate that PloiViT predicts higher ploidy values in aggressive cancer groups and patients with specific mutations, validating PloiViT potential as complementary for ploidy assessment to next-generation sequencing data. To further promote its use, we release our models as a user-friendly inference application and a Python package for easy adoption and use.

2.

Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects.

Warner, Elisa; Lee, Joonsang; Hsu, William; Syeda-Mahmood, Tanveer; Kahn, Charles E; Gevaert, Olivier; Rao, Arvind.

Int J Comput Vis ; 132(9): 3753-3769, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39211895

RESUMEN

Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing multimodal representation, fusion, translation, alignment, and co-learning, the paper explores the transformative potential of multimodal models for clinical predictions. It also highlights the need for principled assessments and practical implementation of such models, bringing attention to the dynamics between decision support systems and healthcare providers and personnel. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist. We conclude with a discussion on principled innovation and collaborative efforts to further the mission of seamless integration of multimodal ML models into biomedical practice.

3.

Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns.

Swaminathan, Akshay; Ren, Alexander L; Wu, Janet Y; Bhargava-Shah, Aarohi; Lopez, Ivan; Srivastava, Ujwal; Alexopoulos, Vassilis; Pizzitola, Rebecca; Bui, Brandon; Alkhani, Layth; Lee, Susan; Mohit, Nathan; Seo, Noel; Macedo, Nicholas; Cheng, Winson; Wang, William; Tran, Edward; Thomas, Reena; Gevaert, Olivier.

JCO Clin Cancer Inform ; 8: e2300091, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38857465

RESUMEN

PURPOSE: Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM). METHODS: Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method. RESULTS: Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days. CONCLUSION: We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.

Asunto(s)

Algoritmos , Registros Electrónicos de Salud , Glioblastoma , Humanos , Glioblastoma/diagnóstico , Glioblastoma/tratamiento farmacológico , Glioblastoma/terapia , Glioblastoma/patología , Femenino , Masculino , Persona de Mediana Edad , Anciano , Neoplasias Encefálicas/tratamiento farmacológico , Neoplasias Encefálicas/diagnóstico , Adulto

4.

Tumor metabolic activity is associated with subcutaneous adipose tissue radiodensity and survival in non-small cell lung cancer.

Sun, Yan; Deng, Min; Gevaert, Olivier; Aberle, Merel; Olde Damink, Steven W; van Dijk, David; Rensen, Sander S.

Clin Nutr ; 43(7): 1809-1815, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38870661

RESUMEN

BACKGROUND: Cachexia-associated body composition alterations and tumor metabolic activity are both associated with survival of cancer patients. Recently, subcutaneous adipose tissue properties have emerged as particularly prognostic body composition features. We hypothesized that tumors with higher metabolic activity instigate cachexia related peripheral metabolic alterations, and investigated whether tumor metabolic activity is associated with body composition and survival in patients with non-small-cell lung cancer (NSCLC), focusing on subcutaneous adipose tissue. METHODS: A retrospective analysis was performed on a cohort of 173 patients with NSCLC. 18F-fluorodeoxyglucose positron emission tomography-computed tomography (PET-CT) scans obtained before treatment were used to analyze tumor metabolic activity (standardized uptake value (SUV) and SUV normalized by lean body mass (SUL)) as well as body composition variables (subcutaneous and visceral adipose tissue radiodensity (SAT/VAT radiodensity) and area; skeletal muscle radiodensity (SM radiodensity) and area). Subjects were divided into groups with high or low SAT radiodensity based on Youden Index of Receiver Operator Characteristics (ROC). Associations between tumor metabolic activity, body composition variables, and survival were analyzed by Mann-Whitney tests, Cox regression, and Kaplan-Meier analysis. RESULTS: The overall prevalence of high SAT radiodensity was 50.9% (88/173). Patients with high SAT radiodensity had shorter survival compared with patients with low SAT radiodensity (mean: 45.3 vs. 50.5 months, p = 0.026). High SAT radiodensity was independently associated with shorter overall survival (multivariate Cox regression HR = 1.061, 95% CI: 1.022-1.101, p = 0.002). SAT radiodensity also correlated with tumor metabolic activity (SULpeak rs = 0.421, p = 0.029; SUVpeak rs = 0.370, p = 0.048). In contrast, the cross-sectional areas of SM, SAT, and VAT were not associated with tumor metabolic activity or survival. CONCLUSION: Higher SAT radiodensity is associated with higher tumor metabolic activity and shorter survival in patients with NSCLC. This may suggest that tumors with higher metabolic activity induce subcutaneous adipose tissue alterations such as decreased lipid density, increased fibrosis, or browning.

Asunto(s)

Composición Corporal , Caquexia , Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Tomografía Computarizada por Tomografía de Emisión de Positrones , Grasa Subcutánea , Humanos , Carcinoma de Pulmón de Células no Pequeñas/mortalidad , Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico por imagen , Masculino , Femenino , Estudios Retrospectivos , Grasa Subcutánea/diagnóstico por imagen , Grasa Subcutánea/metabolismo , Neoplasias Pulmonares/mortalidad , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/diagnóstico por imagen , Neoplasias Pulmonares/patología , Anciano , Tomografía Computarizada por Tomografía de Emisión de Positrones/métodos , Persona de Mediana Edad , Caquexia/metabolismo , Caquexia/mortalidad , Caquexia/diagnóstico por imagen , Fluorodesoxiglucosa F18 , Pronóstico

5.

Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study.

Er, Ahmet Gorkem; Ding, Daisy Yi; Er, Berrin; Uzun, Mertcan; Cakmak, Mehmet; Sadee, Christoph; Durhan, Gamze; Ozmen, Mustafa Nasuh; Tanriover, Mine Durusu; Topeli, Arzu; Aydin Son, Yesim; Tibshirani, Robert; Unal, Serhat; Gevaert, Olivier.

NPJ Digit Med ; 7(1): 117, 2024 May 07.

Artículo en Inglés | MEDLINE | ID: mdl-38714751

RESUMEN

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

6.

regionalpcs: improved discovery of DNA methylation associations with complex traits.

Eulalio, Tiffany; Sun, Min Woo; Gevaert, Olivier; Greicius, Michael D; Montine, Thomas J; Nachun, Daniel; Montgomery, Stephen B.

bioRxiv ; 2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-38746367

RESUMEN

We have developed the regional principal components (rPCs) method, a novel approach for summarizing gene-level methylation. rPCs address the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease (AD). In contrast to traditional averaging, rPCs leverage principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrated a 54% improvement in sensitivity over averaging in simulations, offering a robust framework for identifying subtle epigenetic variations. Applying rPCs to the AD brain methylation data in ROSMAP, combined with cell type deconvolution, we uncovered 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci (meQTL) with genome-wide association studies (GWAS) identified 17 genes with potential causal roles in AD, including MS4A4A and PICALM. Our approach is available in the Bioconductor package regionalpcs, opening avenues for research and facilitating a deeper understanding of the epigenetic landscape in complex diseases.

7.

AI-Based Denoising of Head Impact Kinematics Measurements With Convolutional Neural Network for Traumatic Brain Injury Prediction.

Zhan, Xianghao; Liu, Yuzhe; Cecchi, Nicholas J; Callan, Ashlyn A; Le Flao, Enora; Gevaert, Olivier; Zeineh, Michael M; Grant, Gerald A; Camarillo, David B.

IEEE Trans Biomed Eng ; 71(9): 2759-2770, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-38683703

RESUMEN

OBJECTIVE: Wearable devices are developed to measure head impact kinematics but are intrinsically noisy because of the imperfect interface with human bodies. This study aimed to improve the head impact kinematics measurements obtained from instrumented mouthguards using deep learning to enhance traumatic brain injury (TBI) risk monitoring. METHODS: We developed one-dimensional convolutional neural network (1D-CNN) models to denoise mouthguard kinematics measurements for tri-axial linear acceleration and tri-axial angular velocity from 163 laboratory dummy head impacts. The performance of the denoising models was evaluated on three levels: kinematics, brain injury criteria, and tissue-level strain and strain rate. Additionally, we performed a blind test on an on-field dataset of 118 college football impacts and a test on 413 post-mortem human subject (PMHS) impacts. RESULTS: On the dummy head impacts, the denoised kinematics showed better correlation with reference kinematics, with relative reductions of 36% for pointwise root mean squared error and 56% for peak absolute error. Absolute errors in six brain injury criteria were reduced by a mean of 82%. For maximum principal strain and maximum principal strain rate, the mean error reduction was 35% and 69%, respectively. On the PMHS impacts, similar denoising effects were observed and the peak kinematics after denoising were more accurate (relative error reduction for 10% noisiest impacts was 75.6%). CONCLUSION: The 1D-CNN denoising models effectively reduced errors in mouthguard-derived kinematics measurements on dummy and PMHS impacts. SIGNIFICANCE: This study provides a novel approach for denoising head kinematics measurements in dummy and PMHS impacts, which can be further validated on more real-human kinematics data before real-world applications.

Asunto(s)

Lesiones Traumáticas del Encéfalo , Cabeza , Redes Neurales de la Computación , Humanos , Fenómenos Biomecánicos/fisiología , Lesiones Traumáticas del Encéfalo/fisiopatología , Masculino , Protectores Bucales , Fútbol Americano/lesiones , Dispositivos Electrónicos Vestibles , Aprendizaje Profundo , Adulto

8.

Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.

Carrillo-Perez, Francisco; Pizurica, Marija; Zheng, Yuanning; Nandi, Tarak Nath; Madduri, Ravi; Shen, Jeanne; Gevaert, Olivier.

Nat Biomed Eng ; 2024 Mar 21.

Artículo en Inglés | MEDLINE | ID: mdl-38514775

RESUMEN

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

9.

Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI.

Abbasian, Mahyar; Khatibi, Elahe; Azimi, Iman; Oniani, David; Shakeri Hossein Abad, Zahra; Thieme, Alexander; Sriram, Ram; Yang, Zhongqi; Wang, Yanshan; Lin, Bryant; Gevaert, Olivier; Li, Li-Jia; Jain, Ramesh; Rahmani, Amir M.

NPJ Digit Med ; 7(1): 82, 2024 Mar 29.

Artículo en Inglés | MEDLINE | ID: mdl-38553625

RESUMEN

Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, dynamic scheduling of follow-ups, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present a comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.

10.

A 3D lung lesion variational autoencoder.

Li, Yiheng; Sadée, Christoph Y; Carrillo-Perez, Francisco; Selby, Heather M; Thieme, Alexander H; Gevaert, Olivier.

Cell Rep Methods ; 4(2): 100695, 2024 Feb 26.

Artículo en Inglés | MEDLINE | ID: mdl-38278157

RESUMEN

In this study, we develop a 3D beta variational autoencoder (beta-VAE) to advance lung cancer imaging analysis, countering the constraints of conventional radiomics methods. The autoencoder extracts information from public lung computed tomography (CT) datasets without additional labels. It reconstructs 3D lung nodule images with high quality (structural similarity: 0.774, peak signal-to-noise ratio: 26.1, and mean-squared error: 0.0008). The model effectively encodes lesion sizes in its latent embeddings, with a significant correlation with lesion size found after applying uniform manifold approximation and projection (UMAP) for dimensionality reduction. Additionally, the beta-VAE can synthesize new lesions of varying sizes by manipulating the latent features. The model can predict multiple clinical endpoints, including pathological N stage or KRAS mutation status, on the Stanford radiogenomics lung cancer dataset. Comparisons with other methods show that the beta-VAE performs equally well in these tasks, suggesting its potential as a pretrained model for predicting patient outcomes in medical imaging.

Asunto(s)

Procesamiento de Imagen Asistido por Computador , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Mutación , Proyección , Radiómica

11.

Brain Deformation Estimation With Transfer Learning for Head Impact Datasets Across Impact Types.

Zhan, Xianghao; Liu, Yuzhe; Cecchi, Nicholas J; Gevaert, Olivier; Zeineh, Michael M; Grant, Gerald A; Camarillo, David B.

IEEE Trans Biomed Eng ; 71(6): 1853-1863, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38224520

RESUMEN

OBJECTIVE: The machine-learning head model (MLHM) to accelerate the calculation of brain strain and strain rate, which are the predictors for traumatic brain injury (TBI), but the model accuracy was found to decrease sharply when the training/test datasets were from different head impacts types (i.e., car crash, college football), which limits the applicability of MLHMs to different types of head impacts and sports. Particularly, small sizes of target dataset for specific impact types with tens of impacts may not be enough to train an accurate impact-type-specific MLHM. METHODS: To overcome this, we propose data fusion and transfer learning to develop a series of MLHMs to predict the maximum principal strain (MPS) and maximum principal strain rate (MPSR). RESULTS: The strategies were tested on American football (338), mixed martial arts (457), reconstructed car crash (48) and reconstructed American football (36) and we found that the MLHMs developed with transfer learning are significantly more accurate in estimating MPS and MPSR than other models, with a mean absolute error (MAE) smaller than 0.03 in predicting MPS and smaller than [Formula: see text] in predicting MPSR on all target impact datasets. High performance in concussion detection was observed based on the MPS and MPSR estimated by the transfer-learning-based models. CONCLUSION: The MLHMs can be applied to various head impact types for rapidly and accurately calculating brain strain and strain rate. SIGNIFICANCE: This study enables developing MLHMs for the head impact type with limited availability of data, and will accelerate the applications of MLHMs.

Asunto(s)

Encéfalo , Aprendizaje Automático , Humanos , Encéfalo/diagnóstico por imagen , Encéfalo/fisiopatología , Fútbol Americano/lesiones , Lesiones Traumáticas del Encéfalo/fisiopatología , Cabeza/fisiología , Accidentes de Tránsito , Fenómenos Biomecánicos/fisiología , Modelos Biológicos

12.

Digital profiling of cancer transcriptomes from histology images with grouped vision attention.

Zheng, Yuanning; Pizurica, Marija; Carrillo-Perez, Francisco; Noor, Humaira; Yao, Wei; Wohlfart, Christian; Marchal, Kathleen; Vladimirova, Antoaneta; Gevaert, Olivier.

bioRxiv ; 2024 Jan 19.

Artículo en Inglés | MEDLINE | ID: mdl-37808782

RESUMEN

Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. Recently, deep learning has demonstrated potentials for cost-efficient prediction of molecular alterations from histology images. While transformer-based deep learning architectures have enabled significant progress in non-medical domains, their application to histology images remains limited due to small dataset sizes coupled with the explosion of trainable parameters. Here, we develop SEQUOIA, a transformer model to predict cancer transcriptomes from whole-slide histology images. To enable the full potential of transformers, we first pre-train the model using data from 1,802 normal tissues. Then, we fine-tune and evaluate the model in 4,331 tumor samples across nine cancer types. The prediction performance is assessed at individual gene levels and pathway levels through Pearson correlation analysis and root mean square error. The generalization capacity is validated across two independent cohorts comprising 1,305 tumors. In predicting the expression levels of 25,749 genes, the highest performance is observed in cancers from breast, kidney and lung, where SEQUOIA accurately predicts the expression of 11,069, 10,086 and 8,759 genes, respectively. The accurately predicted genes are associated with the regulation of inflammatory response, cell cycles and metabolisms. While the model is trained at the tissue level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. Leveraging the prediction performance, we develop a digital gene expression signature that predicts the risk of recurrence in breast cancer. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies.

13.

GeNNius: an ultrafast drug-target interaction inference method based on graph neural networks.

Veleiro, Uxía; de la Fuente, Jesús; Serrano, Guillermo; Pizurica, Marija; Casals, Mikel; Pineda-Lucena, Antonio; Vicent, Silve; Ochoa, Idoia; Gevaert, Olivier; Hernaez, Mikel.

Bioinformatics ; 40(1)2024 01 02.

Artículo en Inglés | MEDLINE | ID: mdl-38134424

RESUMEN

MOTIVATION: Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could potentially improve the development processes of DTI inferring approaches in terms of accuracy and robustness. RESULTS: In this work, we introduce GeNNius (Graph Embedding Neural Network Interaction Uncovering System), a Graph Neural Network (GNN)-based method that outperforms state-of-the-art models in terms of both accuracy and time efficiency across a variety of datasets. We also demonstrated its prediction power to uncover new interactions by evaluating not previously known DTIs for each dataset. We further assessed the generalization capability of GeNNius by training and testing it on different datasets, showing that this framework can potentially improve the DTI prediction task by training on large datasets and testing on smaller ones. Finally, we investigated qualitatively the embeddings generated by GeNNius, revealing that the GNN encoder maintains biological information after the graph convolutions while diffusing this information through nodes, eventually distinguishing protein families in the node embedding space. AVAILABILITY AND IMPLEMENTATION: GeNNius code is available at https://github.com/ubioinformat/GeNNius.

Asunto(s)

Sistemas de Liberación de Medicamentos , Reposicionamiento de Medicamentos , Interacciones Farmacológicas , Difusión , Redes Neurales de la Computación

14.

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19.

Er, Ahmet Gorkem; Ding, Daisy Yi; Er, Berrin; Uzun, Mertcan; Cakmak, Mehmet; Sadee, Christoph; Durhan, Gamze; Ozmen, Mustafa Nasuh; Tanriover, Mine Durusu; Topeli, Arzu; Son, Yesim Aydin; Tibshirani, Robert; Unal, Serhat; Gevaert, Olivier.

Res Sq ; 2023 Nov 20.

Artículo en Inglés | MEDLINE | ID: mdl-38045288

RESUMEN

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

15.

Loss of p53-DREAM-mediated repression of cell cycle genes as a driver of lymph node metastasis in head and neck cancer.

Brennan, Kevin; Espín-Pérez, Almudena; Chang, Serena; Bedi, Nikita; Saumyaa, Saumyaa; Shin, June Ho; Plevritis, Sylvia K; Gevaert, Olivier; Sunwoo, John B; Gentles, Andrew J.

Genome Med ; 15(1): 98, 2023 Nov 17.

Artículo en Inglés | MEDLINE | ID: mdl-37978395

RESUMEN

BACKGROUND: The prognosis for patients with head and neck cancer (HNC) is poor and has improved little in recent decades, partially due to lack of therapeutic options. To identify effective therapeutic targets, we sought to identify molecular pathways that drive metastasis and HNC progression, through large-scale systematic analyses of transcriptomic data. METHODS: We performed meta-analysis across 29 gene expression studies including 2074 primary HNC biopsies to identify genes and transcriptional pathways associated with survival and lymph node metastasis (LNM). To understand the biological roles of these genes in HNC, we identified their associated cancer pathways, as well as the cell types that express them within HNC tumor microenvironments, by integrating single-cell RNA-seq and bulk RNA-seq from sorted cell populations. RESULTS: Patient survival-associated genes were heterogenous and included drivers of diverse tumor biological processes: these included tumor-intrinsic processes such as epithelial dedifferentiation and epithelial to mesenchymal transition, as well as tumor microenvironmental factors such as T cell-mediated immunity and cancer-associated fibroblast activity. Unexpectedly, LNM-associated genes were almost universally associated with epithelial dedifferentiation within malignant cells. Genes negatively associated with LNM consisted of regulators of squamous epithelial differentiation that are expressed within well-differentiated malignant cells, while those positively associated with LNM represented cell cycle regulators that are normally repressed by the p53-DREAM pathway. These pro-LNM genes are overexpressed in proliferating malignant cells of TP53 mutated and HPV + ve HNCs and are strongly associated with stemness, suggesting that they represent markers of pre-metastatic cancer stem-like cells. LNM-associated genes are deregulated in high-grade oral precancerous lesions, and deregulated further in primary HNCs with advancing tumor grade and deregulated further still in lymph node metastases. CONCLUSIONS: In HNC, patient survival is affected by multiple biological processes and is strongly influenced by the tumor immune and stromal microenvironments. In contrast, LNM appears to be driven primarily by malignant cell plasticity, characterized by epithelial dedifferentiation coupled with EMT-independent proliferation and stemness. Our findings postulate that LNM is initially caused by loss of p53-DREAM-mediated repression of cell cycle genes during early tumorigenesis.

Asunto(s)

Genes cdc , Neoplasias de Cabeza y Cuello , Humanos , Transición Epitelial-Mesenquimal/genética , Neoplasias de Cabeza y Cuello/genética , Metástasis Linfática , Microambiente Tumoral/genética , Proteína p53 Supresora de Tumor/genética

16.

Natural language processing system for rapid detection and intervention of mental health crisis chat messages.

Swaminathan, Akshay; López, Iván; Mar, Rafael Antonio Garcia; Heist, Tyler; McClintock, Tom; Caoili, Kaitlin; Grace, Madeline; Rubashkin, Matthew; Boggs, Michael N; Chen, Jonathan H; Gevaert, Olivier; Mou, David; Nock, Matthew K.

NPJ Digit Med ; 6(1): 213, 2023 Nov 21.

Artículo en Inglés | MEDLINE | ID: mdl-37990134

RESUMEN

Patients experiencing mental health crises often seek help through messaging-based platforms, but may face long wait times due to limited message triage capacity. Here we build and deploy a machine-learning-enabled system to improve response times to crisis messages in a large, national telehealth provider network. We train a two-stage natural language processing (NLP) system with key word filtering followed by logistic regression on 721 electronic medical record chat messages, of which 32% are potential crises (suicidal/homicidal ideation, domestic violence, or non-suicidal self-injury). Model performance is evaluated on a retrospective test set (4/1/21-4/1/22, N = 481) and a prospective test set (10/1/22-10/31/22, N = 102,471). In the retrospective test set, the model has an AUC of 0.82 (95% CI: 0.78-0.86), sensitivity of 0.99 (95% CI: 0.96-1.00), and PPV of 0.35 (95% CI: 0.309-0.4). In the prospective test set, the model has an AUC of 0.98 (95% CI: 0.966-0.984), sensitivity of 0.98 (95% CI: 0.96-0.99), and PPV of 0.66 (95% CI: 0.626-0.692). The daily median time from message receipt to crisis specialist triage ranges from 8 to 13 min, compared to 9 h before the deployment of the system. We demonstrate that a NLP-based machine learning model can reliably identify potential crisis chat messages in a telehealth setting. Our system integrates into existing clinical workflows, suggesting that with appropriate training, humans can successfully leverage ML systems to facilitate triage of crisis messages.

17.

Selective prediction for extracting unstructured clinical data.

Swaminathan, Akshay; Lopez, Ivan; Wang, William; Srivastava, Ujwal; Tran, Edward; Bhargava-Shah, Aarohi; Wu, Janet Y; Ren, Alexander L; Caoili, Kaitlin; Bui, Brandon; Alkhani, Layth; Lee, Susan; Mohit, Nathan; Seo, Noel; Macedo, Nicholas; Cheng, Winson; Liu, Charles; Thomas, Reena; Chen, Jonathan H; Gevaert, Olivier.

J Am Med Inform Assoc ; 31(1): 188-197, 2023 12 22.

Artículo en Inglés | MEDLINE | ID: mdl-37769323

RESUMEN

OBJECTIVE: While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction. MATERIALS AND METHODS: We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost. RESULTS: The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables. DISCUSSION: We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes. CONCLUSION: Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.

Asunto(s)

Adenocarcinoma , Máquina de Vectores de Soporte , Humanos , Modelos Logísticos

18.

Synthetic whole-slide image tile generation with gene expression profile-infused deep generative models.

Carrillo-Perez, Francisco; Pizurica, Marija; Ozawa, Michael G; Vogel, Hannes; West, Robert B; Kong, Christina S; Herrera, Luis Javier; Shen, Jeanne; Gevaert, Olivier.

Cell Rep Methods ; 3(8): 100534, 2023 08 28.

Artículo en Inglés | MEDLINE | ID: mdl-37671024

RESUMEN

In this work, we propose an approach to generate whole-slide image (WSI) tiles by using deep generative models infused with matched gene expression profiles. First, we train a variational autoencoder (VAE) that learns a latent, lower-dimensional representation of multi-tissue gene expression profiles. Then, we use this representation to infuse generative adversarial networks (GANs) that generate lung and brain cortex tissue tiles, resulting in a new model that we call RNA-GAN. Tiles generated by RNA-GAN were preferred by expert pathologists compared with tiles generated using traditional GANs, and in addition, RNA-GAN needs fewer training epochs to generate high-quality tiles. Finally, RNA-GAN was able to generalize to gene expression profiles outside of the training set, showing imputation capabilities. A web-based quiz is available for users to play a game distinguishing real and synthetic tiles: https://rna-gan.stanford.edu/, and the code for RNA-GAN is available here: https://github.com/gevaertlab/RNA-GAN.

Asunto(s)

Encéfalo , Transcriptoma , Corteza Cerebral , Aprendizaje , ARN

19.

Multimodal data fusion for cancer biomarker discovery with deep learning.

Steyaert, Sandra; Pizurica, Marija; Nagaraj, Divya; Khandelwal, Priya; Hernandez-Boussard, Tina; Gentles, Andrew J; Gevaert, Olivier.

Nat Mach Intell ; 5(4): 351-362, 2023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37693852

RESUMEN

Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.

20.

Performance of alternative manual and automated deep learning segmentation techniques for the prediction of benign and malignant lung nodules.

Selby, Heather M; Mukherjee, Pritam; Parham, Christopher; Malik, Sachin B; Gevaert, Olivier; Napel, Sandy; Shah, Rajesh P.

J Med Imaging (Bellingham) ; 10(4): 044006, 2023 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-37564098

RESUMEN

Purpose: We aim to evaluate the performance of radiomic biopsy (RB), best-fit bounding box (BB), and a deep-learning-based segmentation method called no-new-U-Net (nnU-Net), compared to the standard full manual (FM) segmentation method for predicting benign and malignant lung nodules using a computed tomography (CT) radiomic machine learning model. Materials and Methods: A total of 188 CT scans of lung nodules from 2 institutions were used for our study. One radiologist identified and delineated all 188 lung nodules, whereas a second radiologist segmented a subset (n=20) of these nodules. Both radiologists employed FM and RB segmentation methods. BB segmentations were generated computationally from the FM segmentations. The nnU-Net, a deep-learning-based segmentation method, performed automatic nodule detection and segmentation. The time radiologists took to perform segmentations was recorded. Radiomic features were extracted from each segmentation method, and models to predict benign and malignant lung nodules were developed. The Kruskal-Wallis and DeLong tests were used to compare segmentation times and areas under the curve (AUC), respectively. Results: For the delineation of the FM, RB, and BB segmentations, the two radiologists required a median time (IQR) of 113 (54 to 251.5), 21 (9.25 to 38), and 16 (12 to 64.25) s, respectively (p=0.04). In dataset 1, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.964 (0.96 to 0.968), 0.985 (0.983 to 0.987), 0.961 (0.956 to 0.965), and 0.878 (0.869 to 0.888). In dataset 2, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.717 (0.705 to 0.729), 0.919 (0.913 to 0.924), 0.699 (0.687 to 0.711), and 0.644 (0.632 to 0.657). Conclusion: Radiomic biopsy-based models outperformed FM and BB models in prediction of benign and malignant lung nodules in two independent datasets while deep-learning segmentation-based models performed similarly to FM and BB. RB could be a more efficient segmentation method, but further validation is needed.

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA