RESUMO
Background: The incidence of oropharyngeal cancer (OPC) is increasing, due mainly to a rise in Human Papilloma Virus (HPV)-mediated disease. HPV-mediated OPC has significantly better prognosis compared with HPV-negative OPC, stimulating interest in treatment de-intensification approaches to reduce long-term sequelae. Routine clinical testing frequently utilises immunohistochemistry to detect upregulation of p16 as a surrogate marker of HPV-mediation. However, this does not detect discordant p16-/HPV+ cases and incorrectly assigns p16+/HPV- cases, which, given their inferior prognosis compared to p16+/HPV+, may have important clinical implications. The biology underlying poorer prognosis of p16/HPV discordant OPC requires exploration. Methods: GeoMx digital spatial profiling was used to compare the expression patterns of selected immuno-oncology-related genes/gene families (n=73) within the tumour and stromal compartments of formalin-fixed, paraffin-embedded OPC tumour tissues (n=12) representing the three subgroups, p16+/HPV+, p16+/HPV- and p16-/HPV-. Results: Keratin (multi KRT) and HIF1A, a key regulator of hypoxia adaptation, were upregulated in both p16+/HPV- and p16-/HPV- tumours relative to p16+/HPV+. Several genes associated with tumour cell proliferation and survival (CCND1, AKT1 and CD44) were more highly expressed in p16-/HPV- tumours relative to p16+/HPV+. Conversely, multiple genes with potential roles in anti-tumour immune responses (immune cell recruitment/trafficking, antigen processing and presentation), such as CXCL9, CXCL10, ITGB2, PSMB10, CD74, HLA-DRB and B2M, were more highly expressed in the tumour and stromal compartments of p16+/HPV+ OPC versus p16-/HPV- and p16+/HPV-. CXCL9 was the only gene showing significant differential expression between p16+/HPV- and p16-/HPV- tumours being upregulated within the stromal compartment of the former. Conclusions: In terms of immune-oncology-related gene expression, discordant p16+/HPV- OPCs are much more closely aligned with p16-/HPV-OPCs and quite distinct from p16+/HPV+ tumours. This is consistent with previously described prognostic patterns (p16+/HPV+ >> p16+/HPV- > p16-/HPV-) and underlines the need for dual p16 and HPV testing to guide clinical decision making.
RESUMO
Abnormal DNA ploidy, found in numerous cancers, is increasingly being recognized as a contributor in driving chromosomal instability, genome evolution, and the heterogeneity that fuels cancer cell progression. Furthermore, it has been linked with poor prognosis of cancer patients. While next-generation sequencing can be used to approximate tumor ploidy, it has a high error rate for near-euploid states, a high cost and is time consuming, motivating alternative rapid quantification methods. We introduce PloiViT, a transformer-based model for tumor ploidy quantification that outperforms traditional machine learning models, enabling rapid and cost-effective quantification directly from pathology slides. We trained PloiViT on a dataset of fifteen cancer types from The Cancer Genome Atlas and validated its performance in multiple independent cohorts. Additionally, we explored the impact of self-supervised feature extraction on performance. PloiViT, using self-supervised features, achieved the lowest prediction error in multiple independent cohorts, exhibiting better generalization capabilities. Our findings demonstrate that PloiViT predicts higher ploidy values in aggressive cancer groups and patients with specific mutations, validating PloiViT potential as complementary for ploidy assessment to next-generation sequencing data. To further promote its use, we release our models as a user-friendly inference application and a Python package for easy adoption and use.
RESUMO
Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing multimodal representation, fusion, translation, alignment, and co-learning, the paper explores the transformative potential of multimodal models for clinical predictions. It also highlights the need for principled assessments and practical implementation of such models, bringing attention to the dynamics between decision support systems and healthcare providers and personnel. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist. We conclude with a discussion on principled innovation and collaborative efforts to further the mission of seamless integration of multimodal ML models into biomedical practice.
RESUMO
BACKGROUND: Cachexia-associated body composition alterations and tumor metabolic activity are both associated with survival of cancer patients. Recently, subcutaneous adipose tissue properties have emerged as particularly prognostic body composition features. We hypothesized that tumors with higher metabolic activity instigate cachexia related peripheral metabolic alterations, and investigated whether tumor metabolic activity is associated with body composition and survival in patients with non-small-cell lung cancer (NSCLC), focusing on subcutaneous adipose tissue. METHODS: A retrospective analysis was performed on a cohort of 173 patients with NSCLC. 18F-fluorodeoxyglucose positron emission tomography-computed tomography (PET-CT) scans obtained before treatment were used to analyze tumor metabolic activity (standardized uptake value (SUV) and SUV normalized by lean body mass (SUL)) as well as body composition variables (subcutaneous and visceral adipose tissue radiodensity (SAT/VAT radiodensity) and area; skeletal muscle radiodensity (SM radiodensity) and area). Subjects were divided into groups with high or low SAT radiodensity based on Youden Index of Receiver Operator Characteristics (ROC). Associations between tumor metabolic activity, body composition variables, and survival were analyzed by Mann-Whitney tests, Cox regression, and Kaplan-Meier analysis. RESULTS: The overall prevalence of high SAT radiodensity was 50.9% (88/173). Patients with high SAT radiodensity had shorter survival compared with patients with low SAT radiodensity (mean: 45.3 vs. 50.5 months, p = 0.026). High SAT radiodensity was independently associated with shorter overall survival (multivariate Cox regression HR = 1.061, 95% CI: 1.022-1.101, p = 0.002). SAT radiodensity also correlated with tumor metabolic activity (SULpeak rs = 0.421, p = 0.029; SUVpeak rs = 0.370, p = 0.048). In contrast, the cross-sectional areas of SM, SAT, and VAT were not associated with tumor metabolic activity or survival. CONCLUSION: Higher SAT radiodensity is associated with higher tumor metabolic activity and shorter survival in patients with NSCLC. This may suggest that tumors with higher metabolic activity induce subcutaneous adipose tissue alterations such as decreased lipid density, increased fibrosis, or browning.
Assuntos
Composição Corporal , Caquexia , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Gordura Subcutânea , Humanos , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Masculino , Feminino , Estudos Retrospectivos , Gordura Subcutânea/diagnóstico por imagem , Gordura Subcutânea/metabolismo , Neoplasias Pulmonares/mortalidade , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Idoso , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Pessoa de Meia-Idade , Caquexia/metabolismo , Caquexia/mortalidade , Caquexia/diagnóstico por imagem , Fluordesoxiglucose F18 , PrognósticoRESUMO
PURPOSE: Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM). METHODS: Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method. RESULTS: Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days. CONCLUSION: We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.
Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Glioblastoma , Humanos , Glioblastoma/diagnóstico , Glioblastoma/tratamento farmacológico , Glioblastoma/terapia , Glioblastoma/patologia , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Neoplasias Encefálicas/tratamento farmacológico , Neoplasias Encefálicas/diagnóstico , AdultoRESUMO
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
RESUMO
We have developed the regional principal components (rPCs) method, a novel approach for summarizing gene-level methylation. rPCs address the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease (AD). In contrast to traditional averaging, rPCs leverage principal components analysis to capture complex methylation patterns across gene regions. Our method demonstrated a 54% improvement in sensitivity over averaging in simulations, offering a robust framework for identifying subtle epigenetic variations. Applying rPCs to the AD brain methylation data in ROSMAP, combined with cell type deconvolution, we uncovered 838 differentially methylated genes associated with neuritic plaque burden-significantly outperforming conventional methods. Integrating methylation quantitative trait loci (meQTL) with genome-wide association studies (GWAS) identified 17 genes with potential causal roles in AD, including MS4A4A and PICALM. Our approach is available in the Bioconductor package regionalpcs, opening avenues for research and facilitating a deeper understanding of the epigenetic landscape in complex diseases.
RESUMO
OBJECTIVE: Wearable devices are developed to measure head impact kinematics but are intrinsically noisy because of the imperfect interface with human bodies. This study aimed to improve the head impact kinematics measurements obtained from instrumented mouthguards using deep learning to enhance traumatic brain injury (TBI) risk monitoring. METHODS: We developed one-dimensional convolutional neural network (1D-CNN) models to denoise mouthguard kinematics measurements for tri-axial linear acceleration and tri-axial angular velocity from 163 laboratory dummy head impacts. The performance of the denoising models was evaluated on three levels: kinematics, brain injury criteria, and tissue-level strain and strain rate. Additionally, we performed a blind test on an on-field dataset of 118 college football impacts and a test on 413 post-mortem human subject (PMHS) impacts. RESULTS: On the dummy head impacts, the denoised kinematics showed better correlation with reference kinematics, with relative reductions of 36% for pointwise root mean squared error and 56% for peak absolute error. Absolute errors in six brain injury criteria were reduced by a mean of 82%. For maximum principal strain and maximum principal strain rate, the mean error reduction was 35% and 69%, respectively. On the PMHS impacts, similar denoising effects were observed and the peak kinematics after denoising were more accurate (relative error reduction for 10% noisiest impacts was 75.6%). CONCLUSION: The 1D-CNN denoising models effectively reduced errors in mouthguard-derived kinematics measurements on dummy and PMHS impacts. SIGNIFICANCE: This study provides a novel approach for denoising head kinematics measurements in dummy and PMHS impacts, which can be further validated on more real-human kinematics data before real-world applications.
Assuntos
Lesões Encefálicas Traumáticas , Cabeça , Redes Neurais de Computação , Humanos , Fenômenos Biomecânicos/fisiologia , Lesões Encefálicas Traumáticas/fisiopatologia , Masculino , Protetores Bucais , Futebol Americano/lesões , Dispositivos Eletrônicos Vestíveis , Aprendizado Profundo , AdultoRESUMO
Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.
RESUMO
Generative Artificial Intelligence is set to revolutionize healthcare delivery by transforming traditional patient care into a more personalized, efficient, and proactive process. Chatbots, serving as interactive conversational models, will probably drive this patient-centered transformation in healthcare. Through the provision of various services, including diagnosis, personalized lifestyle recommendations, dynamic scheduling of follow-ups, and mental health support, the objective is to substantially augment patient health outcomes, all the while mitigating the workload burden on healthcare providers. The life-critical nature of healthcare applications necessitates establishing a unified and comprehensive set of evaluation metrics for conversational models. Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients' well-being. Moreover, these metrics neglect pivotal user-centered aspects, including trust-building, ethics, personalization, empathy, user comprehension, and emotional support. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present a comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.
RESUMO
OBJECTIVE: The machine-learning head model (MLHM) to accelerate the calculation of brain strain and strain rate, which are the predictors for traumatic brain injury (TBI), but the model accuracy was found to decrease sharply when the training/test datasets were from different head impacts types (i.e., car crash, college football), which limits the applicability of MLHMs to different types of head impacts and sports. Particularly, small sizes of target dataset for specific impact types with tens of impacts may not be enough to train an accurate impact-type-specific MLHM. METHODS: To overcome this, we propose data fusion and transfer learning to develop a series of MLHMs to predict the maximum principal strain (MPS) and maximum principal strain rate (MPSR). RESULTS: The strategies were tested on American football (338), mixed martial arts (457), reconstructed car crash (48) and reconstructed American football (36) and we found that the MLHMs developed with transfer learning are significantly more accurate in estimating MPS and MPSR than other models, with a mean absolute error (MAE) smaller than 0.03 in predicting MPS and smaller than [Formula: see text] in predicting MPSR on all target impact datasets. High performance in concussion detection was observed based on the MPS and MPSR estimated by the transfer-learning-based models. CONCLUSION: The MLHMs can be applied to various head impact types for rapidly and accurately calculating brain strain and strain rate. SIGNIFICANCE: This study enables developing MLHMs for the head impact type with limited availability of data, and will accelerate the applications of MLHMs.
Assuntos
Encéfalo , Aprendizado de Máquina , Humanos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiopatologia , Futebol Americano/lesões , Lesões Encefálicas Traumáticas/fisiopatologia , Cabeça/fisiologia , Acidentes de Trânsito , Fenômenos Biomecânicos/fisiologia , Modelos BiológicosRESUMO
In this study, we develop a 3D beta variational autoencoder (beta-VAE) to advance lung cancer imaging analysis, countering the constraints of conventional radiomics methods. The autoencoder extracts information from public lung computed tomography (CT) datasets without additional labels. It reconstructs 3D lung nodule images with high quality (structural similarity: 0.774, peak signal-to-noise ratio: 26.1, and mean-squared error: 0.0008). The model effectively encodes lesion sizes in its latent embeddings, with a significant correlation with lesion size found after applying uniform manifold approximation and projection (UMAP) for dimensionality reduction. Additionally, the beta-VAE can synthesize new lesions of varying sizes by manipulating the latent features. The model can predict multiple clinical endpoints, including pathological N stage or KRAS mutation status, on the Stanford radiogenomics lung cancer dataset. Comparisons with other methods show that the beta-VAE performs equally well in these tasks, suggesting its potential as a pretrained model for predicting patient outcomes in medical imaging.
Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Mutação , Projeção , RadiômicaRESUMO
Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. Recently, deep learning has demonstrated potentials for cost-efficient prediction of molecular alterations from histology images. While transformer-based deep learning architectures have enabled significant progress in non-medical domains, their application to histology images remains limited due to small dataset sizes coupled with the explosion of trainable parameters. Here, we develop SEQUOIA, a transformer model to predict cancer transcriptomes from whole-slide histology images. To enable the full potential of transformers, we first pre-train the model using data from 1,802 normal tissues. Then, we fine-tune and evaluate the model in 4,331 tumor samples across nine cancer types. The prediction performance is assessed at individual gene levels and pathway levels through Pearson correlation analysis and root mean square error. The generalization capacity is validated across two independent cohorts comprising 1,305 tumors. In predicting the expression levels of 25,749 genes, the highest performance is observed in cancers from breast, kidney and lung, where SEQUOIA accurately predicts the expression of 11,069, 10,086 and 8,759 genes, respectively. The accurately predicted genes are associated with the regulation of inflammatory response, cell cycles and metabolisms. While the model is trained at the tissue level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. Leveraging the prediction performance, we develop a digital gene expression signature that predicts the risk of recurrence in breast cancer. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies.
RESUMO
MOTIVATION: Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could potentially improve the development processes of DTI inferring approaches in terms of accuracy and robustness. RESULTS: In this work, we introduce GeNNius (Graph Embedding Neural Network Interaction Uncovering System), a Graph Neural Network (GNN)-based method that outperforms state-of-the-art models in terms of both accuracy and time efficiency across a variety of datasets. We also demonstrated its prediction power to uncover new interactions by evaluating not previously known DTIs for each dataset. We further assessed the generalization capability of GeNNius by training and testing it on different datasets, showing that this framework can potentially improve the DTI prediction task by training on large datasets and testing on smaller ones. Finally, we investigated qualitatively the embeddings generated by GeNNius, revealing that the GNN encoder maintains biological information after the graph convolutions while diffusing this information through nodes, eventually distinguishing protein families in the node embedding space. AVAILABILITY AND IMPLEMENTATION: GeNNius code is available at https://github.com/ubioinformat/GeNNius.
Assuntos
Sistemas de Liberação de Medicamentos , Reposicionamento de Medicamentos , Interações Medicamentosas , Difusão , Redes Neurais de ComputaçãoRESUMO
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
RESUMO
BACKGROUND: The prognosis for patients with head and neck cancer (HNC) is poor and has improved little in recent decades, partially due to lack of therapeutic options. To identify effective therapeutic targets, we sought to identify molecular pathways that drive metastasis and HNC progression, through large-scale systematic analyses of transcriptomic data. METHODS: We performed meta-analysis across 29 gene expression studies including 2074 primary HNC biopsies to identify genes and transcriptional pathways associated with survival and lymph node metastasis (LNM). To understand the biological roles of these genes in HNC, we identified their associated cancer pathways, as well as the cell types that express them within HNC tumor microenvironments, by integrating single-cell RNA-seq and bulk RNA-seq from sorted cell populations. RESULTS: Patient survival-associated genes were heterogenous and included drivers of diverse tumor biological processes: these included tumor-intrinsic processes such as epithelial dedifferentiation and epithelial to mesenchymal transition, as well as tumor microenvironmental factors such as T cell-mediated immunity and cancer-associated fibroblast activity. Unexpectedly, LNM-associated genes were almost universally associated with epithelial dedifferentiation within malignant cells. Genes negatively associated with LNM consisted of regulators of squamous epithelial differentiation that are expressed within well-differentiated malignant cells, while those positively associated with LNM represented cell cycle regulators that are normally repressed by the p53-DREAM pathway. These pro-LNM genes are overexpressed in proliferating malignant cells of TP53 mutated and HPV + ve HNCs and are strongly associated with stemness, suggesting that they represent markers of pre-metastatic cancer stem-like cells. LNM-associated genes are deregulated in high-grade oral precancerous lesions, and deregulated further in primary HNCs with advancing tumor grade and deregulated further still in lymph node metastases. CONCLUSIONS: In HNC, patient survival is affected by multiple biological processes and is strongly influenced by the tumor immune and stromal microenvironments. In contrast, LNM appears to be driven primarily by malignant cell plasticity, characterized by epithelial dedifferentiation coupled with EMT-independent proliferation and stemness. Our findings postulate that LNM is initially caused by loss of p53-DREAM-mediated repression of cell cycle genes during early tumorigenesis.
Assuntos
Genes cdc , Neoplasias de Cabeça e Pescoço , Humanos , Transição Epitelial-Mesenquimal/genética , Neoplasias de Cabeça e Pescoço/genética , Metástase Linfática , Microambiente Tumoral/genética , Proteína Supressora de Tumor p53/genéticaRESUMO
Patients experiencing mental health crises often seek help through messaging-based platforms, but may face long wait times due to limited message triage capacity. Here we build and deploy a machine-learning-enabled system to improve response times to crisis messages in a large, national telehealth provider network. We train a two-stage natural language processing (NLP) system with key word filtering followed by logistic regression on 721 electronic medical record chat messages, of which 32% are potential crises (suicidal/homicidal ideation, domestic violence, or non-suicidal self-injury). Model performance is evaluated on a retrospective test set (4/1/21-4/1/22, N = 481) and a prospective test set (10/1/22-10/31/22, N = 102,471). In the retrospective test set, the model has an AUC of 0.82 (95% CI: 0.78-0.86), sensitivity of 0.99 (95% CI: 0.96-1.00), and PPV of 0.35 (95% CI: 0.309-0.4). In the prospective test set, the model has an AUC of 0.98 (95% CI: 0.966-0.984), sensitivity of 0.98 (95% CI: 0.96-0.99), and PPV of 0.66 (95% CI: 0.626-0.692). The daily median time from message receipt to crisis specialist triage ranges from 8 to 13 min, compared to 9 h before the deployment of the system. We demonstrate that a NLP-based machine learning model can reliably identify potential crisis chat messages in a telehealth setting. Our system integrates into existing clinical workflows, suggesting that with appropriate training, humans can successfully leverage ML systems to facilitate triage of crisis messages.
RESUMO
OBJECTIVE: While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction. MATERIALS AND METHODS: We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost. RESULTS: The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables. DISCUSSION: We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes. CONCLUSION: Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.
Assuntos
Adenocarcinoma , Máquina de Vetores de Suporte , Humanos , Modelos LogísticosRESUMO
In this work, we propose an approach to generate whole-slide image (WSI) tiles by using deep generative models infused with matched gene expression profiles. First, we train a variational autoencoder (VAE) that learns a latent, lower-dimensional representation of multi-tissue gene expression profiles. Then, we use this representation to infuse generative adversarial networks (GANs) that generate lung and brain cortex tissue tiles, resulting in a new model that we call RNA-GAN. Tiles generated by RNA-GAN were preferred by expert pathologists compared with tiles generated using traditional GANs, and in addition, RNA-GAN needs fewer training epochs to generate high-quality tiles. Finally, RNA-GAN was able to generalize to gene expression profiles outside of the training set, showing imputation capabilities. A web-based quiz is available for users to play a game distinguishing real and synthetic tiles: https://rna-gan.stanford.edu/, and the code for RNA-GAN is available here: https://github.com/gevaertlab/RNA-GAN.
Assuntos
Encéfalo , Transcriptoma , Córtex Cerebral , Aprendizagem , RNARESUMO
Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.