Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 152
Filter
1.
JCO Clin Cancer Inform ; 8: e2300091, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38857465

ABSTRACT

PURPOSE: Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM). METHODS: Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method. RESULTS: Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days. CONCLUSION: We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.


Subject(s)
Algorithms , Electronic Health Records , Glioblastoma , Humans , Glioblastoma/diagnosis , Glioblastoma/drug therapy , Glioblastoma/therapy , Glioblastoma/pathology , Female , Male , Middle Aged , Aged , Brain Neoplasms/drug therapy , Brain Neoplasms/diagnosis , Adult
2.
Clin Nutr ; 43(7): 1809-1815, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38870661

ABSTRACT

BACKGROUND: Cachexia-associated body composition alterations and tumor metabolic activity are both associated with survival of cancer patients. Recently, subcutaneous adipose tissue properties have emerged as particularly prognostic body composition features. We hypothesized that tumors with higher metabolic activity instigate cachexia related peripheral metabolic alterations, and investigated whether tumor metabolic activity is associated with body composition and survival in patients with non-small-cell lung cancer (NSCLC), focusing on subcutaneous adipose tissue. METHODS: A retrospective analysis was performed on a cohort of 173 patients with NSCLC. 18F-fluorodeoxyglucose positron emission tomography-computed tomography (PET-CT) scans obtained before treatment were used to analyze tumor metabolic activity (standardized uptake value (SUV) and SUV normalized by lean body mass (SUL)) as well as body composition variables (subcutaneous and visceral adipose tissue radiodensity (SAT/VAT radiodensity) and area; skeletal muscle radiodensity (SM radiodensity) and area). Subjects were divided into groups with high or low SAT radiodensity based on Youden Index of Receiver Operator Characteristics (ROC). Associations between tumor metabolic activity, body composition variables, and survival were analyzed by Mann-Whitney tests, Cox regression, and Kaplan-Meier analysis. RESULTS: The overall prevalence of high SAT radiodensity was 50.9% (88/173). Patients with high SAT radiodensity had shorter survival compared with patients with low SAT radiodensity (mean: 45.3 vs. 50.5 months, p = 0.026). High SAT radiodensity was independently associated with shorter overall survival (multivariate Cox regression HR = 1.061, 95% CI: 1.022-1.101, p = 0.002). SAT radiodensity also correlated with tumor metabolic activity (SULpeak rs = 0.421, p = 0.029; SUVpeak rs = 0.370, p = 0.048). In contrast, the cross-sectional areas of SM, SAT, and VAT were not associated with tumor metabolic activity or survival. CONCLUSION: Higher SAT radiodensity is associated with higher tumor metabolic activity and shorter survival in patients with NSCLC. This may suggest that tumors with higher metabolic activity induce subcutaneous adipose tissue alterations such as decreased lipid density, increased fibrosis, or browning.


Subject(s)
Body Composition , Cachexia , Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Positron Emission Tomography Computed Tomography , Subcutaneous Fat , Humans , Carcinoma, Non-Small-Cell Lung/mortality , Carcinoma, Non-Small-Cell Lung/metabolism , Carcinoma, Non-Small-Cell Lung/diagnostic imaging , Male , Female , Retrospective Studies , Subcutaneous Fat/diagnostic imaging , Subcutaneous Fat/metabolism , Lung Neoplasms/mortality , Lung Neoplasms/metabolism , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/pathology , Aged , Positron Emission Tomography Computed Tomography/methods , Middle Aged , Cachexia/metabolism , Cachexia/mortality , Cachexia/diagnostic imaging , Fluorodeoxyglucose F18 , Prognosis
3.
NPJ Digit Med ; 7(1): 117, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38714751

ABSTRACT

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

4.
bioRxiv ; 2024 May 01.
Article in English | MEDLINE | ID: mdl-38746367

ABSTRACT

We have developed the regional principal components (rPCs) method, a novel approach for summarizing gene-level methylation. rPCs address the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease (AD). In contrast to traditional averaging, rPCs leverage principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrated a 54% improvement in sensitivity over averaging in simulations, offering a robust framework for identifying subtle epigenetic variations. Applying rPCs to the AD brain methylation data in ROSMAP, combined with cell type deconvolution, we uncovered 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci (meQTL) with genome-wide association studies (GWAS) identified 17 genes with potential causal roles in AD, including MS4A4A and PICALM. Our approach is available in the Bioconductor package regionalpcs, opening avenues for research and facilitating a deeper understanding of the epigenetic landscape in complex diseases.

5.
IEEE Trans Biomed Eng ; PP2024 Apr 29.
Article in English | MEDLINE | ID: mdl-38683703

ABSTRACT

OBJECTIVE: Wearable devices are developed to measure head impact kinematics but are intrinsically noisy because of the imperfect interface with human bodies. This study aimed to improve the head impact kinematics measurements obtained from instrumented mouthguards using deep learning to enhance traumatic brain injury (TBI) risk monitoring. METHODS: We developed one-dimensional convolutional neural network (1D-CNN) models to denoise mouthguard kinematics measurements for tri-axial linear acceleration and tri-axial angular velocity from 163 laboratory dummy head impacts. The performance of the denoising models was evaluated on three levels: kinematics, brain injury criteria, and tissue-level strain and strain rate. Additionally, we performed a blind test on an on-field dataset of 118 college football impacts and a test on 413 post-mortem human subject (PMHS) impacts. RESULTS: On the dummy head impacts, the denoised kinematics showed better correlation with reference kinematics, with relative reductions of 36% for pointwise root mean squared error and 56% for peak absolute error. Absolute errors in six brain injury criteria were reduced by a mean of 82%. For maximum principal strain and maximum principal strain rate, the mean error reduction was 35% and 69%, respectively. On the PMHS impacts, similar denoising effects were observed and the peak kinematics after denoising were more accurate (relative error reduction for 10% noisiest impacts was 75.6%). CONCLUSION: The 1D-CNN denoising models effectively reduced errors in mouthguard-derived kinematics measurements on dummy and PMHS impacts. SIGNIFICANCE: This study provides a novel approach for denoising head kinematics measurements in dummy and PMHS impacts, which can be further validated on more real-human kinematics data before real-world applications.

6.
Nat Biomed Eng ; 2024 Mar 21.
Article in English | MEDLINE | ID: mdl-38514775

ABSTRACT

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

7.
NPJ Digit Med ; 7(1): 82, 2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38553625

ABSTRACT

Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, dynamic scheduling of follow-ups, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present a comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.

8.
Cell Rep Methods ; 4(2): 100695, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38278157

ABSTRACT

In this study, we develop a 3D beta variational autoencoder (beta-VAE) to advance lung cancer imaging analysis, countering the constraints of conventional radiomics methods. The autoencoder extracts information from public lung computed tomography (CT) datasets without additional labels. It reconstructs 3D lung nodule images with high quality (structural similarity: 0.774, peak signal-to-noise ratio: 26.1, and mean-squared error: 0.0008). The model effectively encodes lesion sizes in its latent embeddings, with a significant correlation with lesion size found after applying uniform manifold approximation and projection (UMAP) for dimensionality reduction. Additionally, the beta-VAE can synthesize new lesions of varying sizes by manipulating the latent features. The model can predict multiple clinical endpoints, including pathological N stage or KRAS mutation status, on the Stanford radiogenomics lung cancer dataset. Comparisons with other methods show that the beta-VAE performs equally well in these tasks, suggesting its potential as a pretrained model for predicting patient outcomes in medical imaging.


Subject(s)
Image Processing, Computer-Assisted , Lung Neoplasms , Humans , Lung Neoplasms/diagnostic imaging , Mutation , Projection , Radiomics
9.
IEEE Trans Biomed Eng ; 71(6): 1853-1863, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38224520

ABSTRACT

OBJECTIVE: The machine-learning head model (MLHM) to accelerate the calculation of brain strain and strain rate, which are the predictors for traumatic brain injury (TBI), but the model accuracy was found to decrease sharply when the training/test datasets were from different head impacts types (i.e., car crash, college football), which limits the applicability of MLHMs to different types of head impacts and sports. Particularly, small sizes of target dataset for specific impact types with tens of impacts may not be enough to train an accurate impact-type-specific MLHM. METHODS: To overcome this, we propose data fusion and transfer learning to develop a series of MLHMs to predict the maximum principal strain (MPS) and maximum principal strain rate (MPSR). RESULTS: The strategies were tested on American football (338), mixed martial arts (457), reconstructed car crash (48) and reconstructed American football (36) and we found that the MLHMs developed with transfer learning are significantly more accurate in estimating MPS and MPSR than other models, with a mean absolute error (MAE) smaller than 0.03 in predicting MPS and smaller than [Formula: see text] in predicting MPSR on all target impact datasets. High performance in concussion detection was observed based on the MPS and MPSR estimated by the transfer-learning-based models. CONCLUSION: The MLHMs can be applied to various head impact types for rapidly and accurately calculating brain strain and strain rate. SIGNIFICANCE: This study enables developing MLHMs for the head impact type with limited availability of data, and will accelerate the applications of MLHMs.


Subject(s)
Brain , Machine Learning , Humans , Brain/diagnostic imaging , Brain/physiopathology , Football/injuries , Brain Injuries, Traumatic/physiopathology , Head/physiology , Accidents, Traffic , Biomechanical Phenomena/physiology , Models, Biological
10.
bioRxiv ; 2024 Jan 19.
Article in English | MEDLINE | ID: mdl-37808782

ABSTRACT

Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. Recently, deep learning has demonstrated potentials for cost-efficient prediction of molecular alterations from histology images. While transformer-based deep learning architectures have enabled significant progress in non-medical domains, their application to histology images remains limited due to small dataset sizes coupled with the explosion of trainable parameters. Here, we develop SEQUOIA, a transformer model to predict cancer transcriptomes from whole-slide histology images. To enable the full potential of transformers, we first pre-train the model using data from 1,802 normal tissues. Then, we fine-tune and evaluate the model in 4,331 tumor samples across nine cancer types. The prediction performance is assessed at individual gene levels and pathway levels through Pearson correlation analysis and root mean square error. The generalization capacity is validated across two independent cohorts comprising 1,305 tumors. In predicting the expression levels of 25,749 genes, the highest performance is observed in cancers from breast, kidney and lung, where SEQUOIA accurately predicts the expression of 11,069, 10,086 and 8,759 genes, respectively. The accurately predicted genes are associated with the regulation of inflammatory response, cell cycles and metabolisms. While the model is trained at the tissue level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. Leveraging the prediction performance, we develop a digital gene expression signature that predicts the risk of recurrence in breast cancer. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies.

11.
Bioinformatics ; 40(1)2024 01 02.
Article in English | MEDLINE | ID: mdl-38134424

ABSTRACT

MOTIVATION: Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could potentially improve the development processes of DTI inferring approaches in terms of accuracy and robustness. RESULTS: In this work, we introduce GeNNius (Graph Embedding Neural Network Interaction Uncovering System), a Graph Neural Network (GNN)-based method that outperforms state-of-the-art models in terms of both accuracy and time efficiency across a variety of datasets. We also demonstrated its prediction power to uncover new interactions by evaluating not previously known DTIs for each dataset. We further assessed the generalization capability of GeNNius by training and testing it on different datasets, showing that this framework can potentially improve the DTI prediction task by training on large datasets and testing on smaller ones. Finally, we investigated qualitatively the embeddings generated by GeNNius, revealing that the GNN encoder maintains biological information after the graph convolutions while diffusing this information through nodes, eventually distinguishing protein families in the node embedding space. AVAILABILITY AND IMPLEMENTATION: GeNNius code is available at https://github.com/ubioinformat/GeNNius.


Subject(s)
Drug Delivery Systems , Drug Repositioning , Drug Interactions , Diffusion , Neural Networks, Computer
12.
Res Sq ; 2023 Nov 20.
Article in English | MEDLINE | ID: mdl-38045288

ABSTRACT

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

13.
NPJ Digit Med ; 6(1): 213, 2023 Nov 21.
Article in English | MEDLINE | ID: mdl-37990134

ABSTRACT

Patients experiencing mental health crises often seek help through messaging-based platforms, but may face long wait times due to limited message triage capacity. Here we build and deploy a machine-learning-enabled system to improve response times to crisis messages in a large, national telehealth provider network. We train a two-stage natural language processing (NLP) system with key word filtering followed by logistic regression on 721 electronic medical record chat messages, of which 32% are potential crises (suicidal/homicidal ideation, domestic violence, or non-suicidal self-injury). Model performance is evaluated on a retrospective test set (4/1/21-4/1/22, N = 481) and a prospective test set (10/1/22-10/31/22, N = 102,471). In the retrospective test set, the model has an AUC of 0.82 (95% CI: 0.78-0.86), sensitivity of 0.99 (95% CI: 0.96-1.00), and PPV of 0.35 (95% CI: 0.309-0.4). In the prospective test set, the model has an AUC of 0.98 (95% CI: 0.966-0.984), sensitivity of 0.98 (95% CI: 0.96-0.99), and PPV of 0.66 (95% CI: 0.626-0.692). The daily median time from message receipt to crisis specialist triage ranges from 8 to 13 min, compared to 9 h before the deployment of the system. We demonstrate that a NLP-based machine learning model can reliably identify potential crisis chat messages in a telehealth setting. Our system integrates into existing clinical workflows, suggesting that with appropriate training, humans can successfully leverage ML systems to facilitate triage of crisis messages.

14.
Genome Med ; 15(1): 98, 2023 Nov 17.
Article in English | MEDLINE | ID: mdl-37978395

ABSTRACT

BACKGROUND: The prognosis for patients with head and neck cancer (HNC) is poor and has improved little in recent decades, partially due to lack of therapeutic options. To identify effective therapeutic targets, we sought to identify molecular pathways that drive metastasis and HNC progression, through large-scale systematic analyses of transcriptomic data. METHODS: We performed meta-analysis across 29 gene expression studies including 2074 primary HNC biopsies to identify genes and transcriptional pathways associated with survival and lymph node metastasis (LNM). To understand the biological roles of these genes in HNC, we identified their associated cancer pathways, as well as the cell types that express them within HNC tumor microenvironments, by integrating single-cell RNA-seq and bulk RNA-seq from sorted cell populations. RESULTS: Patient survival-associated genes were heterogenous and included drivers of diverse tumor biological processes: these included tumor-intrinsic processes such as epithelial dedifferentiation and epithelial to mesenchymal transition, as well as tumor microenvironmental factors such as T cell-mediated immunity and cancer-associated fibroblast activity. Unexpectedly, LNM-associated genes were almost universally associated with epithelial dedifferentiation within malignant cells. Genes negatively associated with LNM consisted of regulators of squamous epithelial differentiation that are expressed within well-differentiated malignant cells, while those positively associated with LNM represented cell cycle regulators that are normally repressed by the p53-DREAM pathway. These pro-LNM genes are overexpressed in proliferating malignant cells of TP53 mutated and HPV + ve HNCs and are strongly associated with stemness, suggesting that they represent markers of pre-metastatic cancer stem-like cells. LNM-associated genes are deregulated in high-grade oral precancerous lesions, and deregulated further in primary HNCs with advancing tumor grade and deregulated further still in lymph node metastases. CONCLUSIONS: In HNC, patient survival is affected by multiple biological processes and is strongly influenced by the tumor immune and stromal microenvironments. In contrast, LNM appears to be driven primarily by malignant cell plasticity, characterized by epithelial dedifferentiation coupled with EMT-independent proliferation and stemness. Our findings postulate that LNM is initially caused by loss of p53-DREAM-mediated repression of cell cycle genes during early tumorigenesis.


Subject(s)
Genes, cdc , Head and Neck Neoplasms , Humans , Epithelial-Mesenchymal Transition/genetics , Head and Neck Neoplasms/genetics , Lymphatic Metastasis , Tumor Microenvironment/genetics , Tumor Suppressor Protein p53/genetics
15.
Cell Rep Methods ; 3(8): 100534, 2023 08 28.
Article in English | MEDLINE | ID: mdl-37671024

ABSTRACT

In this work, we propose an approach to generate whole-slide image (WSI) tiles by using deep generative models infused with matched gene expression profiles. First, we train a variational autoencoder (VAE) that learns a latent, lower-dimensional representation of multi-tissue gene expression profiles. Then, we use this representation to infuse generative adversarial networks (GANs) that generate lung and brain cortex tissue tiles, resulting in a new model that we call RNA-GAN. Tiles generated by RNA-GAN were preferred by expert pathologists compared with tiles generated using traditional GANs, and in addition, RNA-GAN needs fewer training epochs to generate high-quality tiles. Finally, RNA-GAN was able to generalize to gene expression profiles outside of the training set, showing imputation capabilities. A web-based quiz is available for users to play a game distinguishing real and synthetic tiles: https://rna-gan.stanford.edu/, and the code for RNA-GAN is available here: https://github.com/gevaertlab/RNA-GAN.


Subject(s)
Brain , Transcriptome , Cerebral Cortex , Learning , RNA
16.
Nat Mach Intell ; 5(4): 351-362, 2023 Apr.
Article in English | MEDLINE | ID: mdl-37693852

ABSTRACT

Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.

17.
J Am Med Inform Assoc ; 31(1): 188-197, 2023 12 22.
Article in English | MEDLINE | ID: mdl-37769323

ABSTRACT

OBJECTIVE: While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction. MATERIALS AND METHODS: We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost. RESULTS: The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables. DISCUSSION: We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes. CONCLUSION: Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.


Subject(s)
Adenocarcinoma , Support Vector Machine , Humans , Logistic Models
18.
Cell Rep Methods ; 3(7): 100515, 2023 07 24.
Article in English | MEDLINE | ID: mdl-37533639

ABSTRACT

DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer and immunological and cardiovascular diseases. Recent technological advances have enabled genome-wide profiling of DNAme in large human cohorts. There is a need for analytical methods that can more sensitively detect differential methylation profiles present in subsets of individuals from these heterogeneous, population-level datasets. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared with existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and long non-coding RNAs (lncRNAs). Using cell-type-specific data from two separate studies, we discover epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven ncRNAs in non-small cell lung cancer.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Child , DNA Methylation/genetics , Carcinoma, Non-Small-Cell Lung/genetics , Epigenomics/methods , Lung Neoplasms/diagnosis , Epigenesis, Genetic
19.
J Med Imaging (Bellingham) ; 10(4): 044006, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37564098

ABSTRACT

Purpose: We aim to evaluate the performance of radiomic biopsy (RB), best-fit bounding box (BB), and a deep-learning-based segmentation method called no-new-U-Net (nnU-Net), compared to the standard full manual (FM) segmentation method for predicting benign and malignant lung nodules using a computed tomography (CT) radiomic machine learning model. Materials and Methods: A total of 188 CT scans of lung nodules from 2 institutions were used for our study. One radiologist identified and delineated all 188 lung nodules, whereas a second radiologist segmented a subset (n=20) of these nodules. Both radiologists employed FM and RB segmentation methods. BB segmentations were generated computationally from the FM segmentations. The nnU-Net, a deep-learning-based segmentation method, performed automatic nodule detection and segmentation. The time radiologists took to perform segmentations was recorded. Radiomic features were extracted from each segmentation method, and models to predict benign and malignant lung nodules were developed. The Kruskal-Wallis and DeLong tests were used to compare segmentation times and areas under the curve (AUC), respectively. Results: For the delineation of the FM, RB, and BB segmentations, the two radiologists required a median time (IQR) of 113 (54 to 251.5), 21 (9.25 to 38), and 16 (12 to 64.25) s, respectively (p=0.04). In dataset 1, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.964 (0.96 to 0.968), 0.985 (0.983 to 0.987), 0.961 (0.956 to 0.965), and 0.878 (0.869 to 0.888). In dataset 2, the mean AUC (95% CI) of the FM, RB, BB, and nnU-Net model were 0.717 (0.705 to 0.729), 0.919 (0.913 to 0.924), 0.699 (0.687 to 0.711), and 0.644 (0.632 to 0.657). Conclusion: Radiomic biopsy-based models outperformed FM and BB models in prediction of benign and malignant lung nodules in two independent datasets while deep-learning segmentation-based models performed similarly to FM and BB. RB could be a more efficient segmentation method, but further validation is needed.

20.
Heliyon ; 9(7): e17934, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37483733

ABSTRACT

In response to the unprecedented global healthcare crisis of the COVID-19 pandemic, the scientific community has joined forces to tackle the challenges and prepare for future pandemics. Multiple modalities of data have been investigated to understand the nature of COVID-19. In this paper, MIDRC investigators present an overview of the state-of-the-art development of multimodal machine learning for COVID-19 and model assessment considerations for future studies. We begin with a discussion of the lessons learned from radiogenomic studies for cancer diagnosis. We then summarize the multi-modality COVID-19 data investigated in the literature including symptoms and other clinical data, laboratory tests, imaging, pathology, physiology, and other omics data. Publicly available multimodal COVID-19 data provided by MIDRC and other sources are summarized. After an overview of machine learning developments using multimodal data for COVID-19, we present our perspectives on the future development of multimodal machine learning models for COVID-19.

SELECTION OF CITATIONS
SEARCH DETAIL
...