RESUMO
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
RESUMO
In this study, we develop a 3D beta variational autoencoder (beta-VAE) to advance lung cancer imaging analysis, countering the constraints of conventional radiomics methods. The autoencoder extracts information from public lung computed tomography (CT) datasets without additional labels. It reconstructs 3D lung nodule images with high quality (structural similarity: 0.774, peak signal-to-noise ratio: 26.1, and mean-squared error: 0.0008). The model effectively encodes lesion sizes in its latent embeddings, with a significant correlation with lesion size found after applying uniform manifold approximation and projection (UMAP) for dimensionality reduction. Additionally, the beta-VAE can synthesize new lesions of varying sizes by manipulating the latent features. The model can predict multiple clinical endpoints, including pathological N stage or KRAS mutation status, on the Stanford radiogenomics lung cancer dataset. Comparisons with other methods show that the beta-VAE performs equally well in these tasks, suggesting its potential as a pretrained model for predicting patient outcomes in medical imaging.
Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Mutação , Projeção , RadiômicaRESUMO
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
RESUMO
Recent advances in artificial intelligence research have led to an increase in the development of algorithms for detecting malignancies from clinical and dermoscopic images of skin diseases. These methods are dependent on the collection of training and testing data. There are important considerations when acquiring skin images and data for translational artificial intelligence research. In this paper, we discuss the best practices and challenges for light photography image data collection, covering ethics, image acquisition, labeling, curation, and storage. The purpose of this work is to improve artificial intelligence for malignancy detection by supporting intentional data collection and collaboration between subject matter experts, such as dermatologists and data scientists.
Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Inteligência Artificial , Neoplasias Cutâneas/diagnóstico por imagem , Neoplasias Cutâneas/patologia , Melanoma/patologia , Dermatologistas , Dermoscopia/métodos , AlgoritmosRESUMO
Sequencing of the human genome in the early 2000s enabled probing of the genetic basis of disease on a scale previously unimaginable. Now, two decades later, after interrogating millions of markers in thousands of individuals, a significant portion of disease heritability still remains hidden. Recent efforts to unravel this 'missing heritability' have focused on garnering new insight from merging different data types, including medical imaging. Imaging offers promising intermediate phenotypes to bridge the gap between genetic variation and disease pathology. In this review we outline this fusion and provide examples of imaging genomics in a range of diseases, from oncology to cardiovascular and neurodegenerative disease. Finally, we discuss how ongoing revolutions in data science and sharing are primed to advance the field.
Assuntos
Variação Genética , Doenças Neurodegenerativas , Humanos , Predisposição Genética para Doença , Genômica por Imageamento , Fenótipo , Estudo de Associação Genômica AmplaRESUMO
Genomic methods have been valuable for identifying RNA-binding proteins (RBPs) and the genes, pathways, and processes they regulate. Nevertheless, standard motif descriptions cannot be used to predict all RNA targets or test quantitative models for cellular interactions and regulation. We present a complete thermodynamic model for RNA binding to the S. cerevisiae Pumilio protein PUF4 derived from direct binding data for 6180 RNAs measured using the RNA on a massively parallel array (RNA-MaP) platform. The PUF4 model is highly similar to that of the related RBPs, human PUM2 and PUM1, with one marked exception: a single favorable site of base flipping for PUF4, such that PUF4 preferentially binds to a non-contiguous series of residues. These results are foundational for developing and testing cellular models of RNA-RBP interactions and function, for engineering RBPs, for understanding the biophysical nature of RBP binding and the evolutionary landscape of RNAs and RBPs.
Assuntos
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Proteínas Fúngicas/metabolismo , Humanos , Ligação Proteica , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , TermodinâmicaRESUMO
The involvement of immunoglobulin (Ig) G3 in the humoral immune response to SARS-CoV-2 infection has been implicated in the pathogenesis of acute respiratory distress syndrome (ARDS) in COVID-19. The exact molecular mechanism is unknown, but it is thought to involve this IgG subtype's differential ability to fix, complement and stimulate cytokine release. We examined the binding of convalescent patient antibodies to immobilized nucleocapsids and spike proteins by matrix-assisted laser desorption/ionization-time of flight (MALDI-ToF) mass spectrometry. IgG3 was a major immunoglobulin found in all samples. Differential analysis of the spectral signatures found for the nucleocapsid versus the spike protein demonstrated that the predominant humoral immune response to the nucleocapsid was IgG3, whilst for the spike protein it was IgG1. However, the spike protein displayed a strong affinity for IgG3 itself, as it would bind from control plasma samples, as well as from those previously infected with SARS-CoV-2, similar to the way protein G binds IgG1. Furthermore, detailed spectral analysis indicated that a mass shift consistent with hyper-glycosylation or glycation was a characteristic of the IgG3 captured by the spike protein.
Assuntos
COVID-19 , Glicoproteína da Espícula de Coronavírus , Anticorpos Antivirais , Humanos , Imunoglobulina G , Nucleocapsídeo , SARS-CoV-2RESUMO
The immune response to SARS-CoV-2 infection requires antibody recognition of the spike protein. In a study designed to examine the molecular features of anti-spike and anti-nucleocapsid antibodies, patient plasma proteins binding to pre-fusion stabilised complete spike and nucleocapsid proteins were isolated and analysed by matrix-assisted laser desorption ionisation-time of flight (MALDI-ToF) mass spectrometry. Amongst the immunoglobulins, a high affinity for human serum albumin was evident in the anti-spike preparations. Careful mass comparison revealed the preferential capture of advanced glycation end product (AGE) forms of glycated human serum albumin by the pre-fusion spike protein. The ability of bacteria and viruses to surround themselves with serum proteins is a recognised immune evasion and pathogenic process. The preference of SARS-CoV-2 for AGE forms of glycated serum albumin may in part explain the severity and pathology of acute respiratory distress and the bias towards the elderly and those with (pre)diabetic and atherosclerotic/metabolic disease.
Assuntos
COVID-19 , Diabetes Mellitus Tipo 2 , Estado Pré-Diabético , Idoso , Anticorpos Antivirais , Humanos , SARS-CoV-2 , Albumina Sérica , Albumina Sérica Humana , Glicoproteína da Espícula de Coronavírus/metabolismoRESUMO
The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.
Assuntos
Aptidão Genética , Genoma Fúngico , Schizosaccharomyces/genética , Modelos Genéticos , Mutagênese InsercionalRESUMO
The TGFß pathway has essential roles in embryonic development, organ homeostasis, tissue repair and disease. These diverse effects are mediated through the intracellular effectors SMAD2 and SMAD3 (hereafter SMAD2/3), whose canonical function is to control the activity of target genes by interacting with transcriptional regulators. Therefore, a complete description of the factors that interact with SMAD2/3 in a given cell type would have broad implications for many areas of cell biology. Here we describe the interactome of SMAD2/3 in human pluripotent stem cells. This analysis reveals that SMAD2/3 is involved in multiple molecular processes in addition to its role in transcription. In particular, we identify a functional interaction with the METTL3-METTL14-WTAP complex, which mediates the conversion of adenosine to N6-methyladenosine (m6A) on RNA. We show that SMAD2/3 promotes binding of the m6A methyltransferase complex to a subset of transcripts involved in early cell fate decisions. This mechanism destabilizes specific SMAD2/3 transcriptional targets, including the pluripotency factor gene NANOG, priming them for rapid downregulation upon differentiation to enable timely exit from pluripotency. Collectively, these findings reveal the mechanism by which extracellular signalling can induce rapid cellular responses through regulation of the epitranscriptome. These aspects of TGFß signalling could have far-reaching implications in many other cell types and in diseases such as cancer.
Assuntos
Adenosina/análogos & derivados , Diferenciação Celular/genética , Células-Tronco Pluripotentes/metabolismo , RNA Mensageiro/metabolismo , Proteína Smad2/metabolismo , Proteína Smad3/metabolismo , Fator de Crescimento Transformador beta/metabolismo , Ativinas/metabolismo , Adenosina/metabolismo , Animais , Proteínas de Ciclo Celular , Epigênese Genética , Humanos , Metilação , Metiltransferases/química , Metiltransferases/metabolismo , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Proteína Homeobox Nanog/metabolismo , Proteína Nodal/metabolismo , Proteínas Nucleares/metabolismo , Células-Tronco Pluripotentes/citologia , Ligação Proteica , Fatores de Processamento de RNA , RNA Mensageiro/química , RNA Mensageiro/genética , Transdução de Sinais , TranscriptomaRESUMO
In this work, we investigate chemo- thermotherapy, a recently clinically-approved post-surgery treatment of non muscle invasive urothelial bladder carcinoma. We developed a mathematical model and numerically simulated the physical processes related to this treatment. The model is based on the conductive Maxwell's equations used to simulate the therapy administration and Convection-Diffusion equation for incompressible fluid to study heat propagation through the bladder tissue. The model parameters correspond to the data provided by the thermotherapy device manufacturer. We base our computational domain on a CT image of a human bladder. Our numerical simulations can be applied to further research on the effects of chemo- thermotherapy on bladder and surrounding tissues and for treatment personalization in order to maximize the effect of the therapy while avoiding burning of the bladder.