RESUMEN
Copy number variants (CNVs)-gains and losses of genomic sequences-are an important source of genetic variation underlying rapid adaptation and genome evolution. However, despite their central role in evolution little is known about the factors that contribute to the structure, size, formation rate, and fitness effects of adaptive CNVs. Local genomic sequences are likely to be an important determinant of these properties. Whereas it is known that point mutation rates vary with genomic location and local DNA sequence features, the role of genome architecture in the formation, selection, and the resulting evolutionary dynamics of CNVs is poorly understood. Previously, we have found that the GAP1 gene in Saccharomyces cerevisiae undergoes frequent and repeated amplification and selection under long-term experimental evolution in glutamine-limiting conditions. The GAP1 gene has a unique genomic architecture consisting of two flanking long terminal repeats (LTRs) and a proximate origin of DNA replication (autonomously replicating sequence, ARS), which are likely to promote rapid GAP1 CNV formation. To test the role of these genomic elements on CNV-mediated adaptive evolution, we performed experimental evolution in glutamine-limited chemostats using engineered strains lacking either the adjacent LTRs, ARS, or all elements. Using a CNV reporter system and neural network simulation-based inference (nnSBI) we quantified the formation rate and fitness effect of CNVs for each strain. We find that although GAP1 CNVs repeatedly form and sweep to high frequency in strains with modified genome architecture, removal of local DNA elements significantly impacts the rate and fitness effect of CNVs and the rate of adaptation. We performed genome sequence analysis to define the molecular mechanisms of CNV formation for 177 CNV lineages. We find that across all four strain backgrounds, between 26% and 80% of all GAP1 CNVs are mediated by Origin Dependent Inverted Repeat Amplification (ODIRA) which results from template switching between the leading and lagging strand during DNA synthesis. In the absence of the local ARS, a distal ARS can mediate CNV formation via ODIRA. In the absence of local LTRs, homologous recombination mechanisms still mediate gene amplification following de novo insertion of retrotransposon elements at the locus. Our study demonstrates the remarkable plasticity of the genome and reveals that template switching during DNA replication is a frequent source of adaptive CNVs.
RESUMEN
Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting in silico model predicts future gene expression states and can be perturbed to simulate the effect of transcription factor changes. We acquired model training data by sequencing the transcriptomes of 175,000 individual Saccharomyces cerevisiae cells that were subject to an external perturbation and continuously sampled over a one hour period. The rate of change for each transcript was calculated on a per-cell basis to estimate RNA velocity. We then trained a deep learning model with transcriptome and RNA velocity data to calculate time-dependent estimates of mRNA production and decay rates. By separating RNA velocity into transcription and decay rates, we show that rapamycin treatment causes existing ribosomal protein transcripts to be rapidly destabilized, while production of new transcripts gradually slows over the course of an hour. The neural network framework we present is designed to explicitly model causal regulatory relationships between transcription factors and their genes, and shows superior performance to existing models on the basis of recovery of known regulatory relationships. We validated the predictive power of the model by perturbing transcription factors in silico and comparing transcriptome-wide effects with experimental data. Our study represents the first step in constructing a complete, predictive, biophysical model of gene expression regulation.
RESUMEN
Large-scale genomic changes, including copy number variations (CNVs), are frequently observed in long-term evolution experiments (LTEEs). We have previously reported the detection of recurrent CNVs in Saccharomyces cerevisiae populations adapting to glutamine-limited conditions over hundreds of generations. Here, we present the whole-genome sequencing (WGS) assemblies of 7 LTEE strains and their ancestor.