RESUMO
The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.
Assuntos
Aptidão Genética , Genoma Fúngico , Schizosaccharomyces/genética , Modelos Genéticos , Mutagênese InsercionalRESUMO
In this study, we develop a 3D beta variational autoencoder (beta-VAE) to advance lung cancer imaging analysis, countering the constraints of conventional radiomics methods. The autoencoder extracts information from public lung computed tomography (CT) datasets without additional labels. It reconstructs 3D lung nodule images with high quality (structural similarity: 0.774, peak signal-to-noise ratio: 26.1, and mean-squared error: 0.0008). The model effectively encodes lesion sizes in its latent embeddings, with a significant correlation with lesion size found after applying uniform manifold approximation and projection (UMAP) for dimensionality reduction. Additionally, the beta-VAE can synthesize new lesions of varying sizes by manipulating the latent features. The model can predict multiple clinical endpoints, including pathological N stage or KRAS mutation status, on the Stanford radiogenomics lung cancer dataset. Comparisons with other methods show that the beta-VAE performs equally well in these tasks, suggesting its potential as a pretrained model for predicting patient outcomes in medical imaging.
Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Mutação , Projeção , RadiômicaRESUMO
Sequencing of the human genome in the early 2000s enabled probing of the genetic basis of disease on a scale previously unimaginable. Now, two decades later, after interrogating millions of markers in thousands of individuals, a significant portion of disease heritability still remains hidden. Recent efforts to unravel this 'missing heritability' have focused on garnering new insight from merging different data types, including medical imaging. Imaging offers promising intermediate phenotypes to bridge the gap between genetic variation and disease pathology. In this review we outline this fusion and provide examples of imaging genomics in a range of diseases, from oncology to cardiovascular and neurodegenerative disease. Finally, we discuss how ongoing revolutions in data science and sharing are primed to advance the field.
Assuntos
Variação Genética , Doenças Neurodegenerativas , Humanos , Predisposição Genética para Doença , Genômica por Imageamento , Fenótipo , Estudo de Associação Genômica AmplaRESUMO
Recent advances in artificial intelligence research have led to an increase in the development of algorithms for detecting malignancies from clinical and dermoscopic images of skin diseases. These methods are dependent on the collection of training and testing data. There are important considerations when acquiring skin images and data for translational artificial intelligence research. In this paper, we discuss the best practices and challenges for light photography image data collection, covering ethics, image acquisition, labeling, curation, and storage. The purpose of this work is to improve artificial intelligence for malignancy detection by supporting intentional data collection and collaboration between subject matter experts, such as dermatologists and data scientists.