RESUMEN
The 3' end of a gene, often called a terminator, modulates mRNA stability, localization, translation, and polyadenylation. Here, we adapted Plant STARR-seq, a massively parallel reporter assay, to measure the activity of over 50,000 terminators from the plants Arabidopsis thaliana and Zea mays. We characterize thousands of plant terminators, including many that outperform bacterial terminators commonly used in plants. Terminator activity is species-specific, differing in tobacco leaf and maize protoplast assays. While recapitulating known biology, our results reveal the relative contributions of polyadenylation motifs to terminator strength. We built a computational model to predict terminator strength and used it to conduct in silico evolution that generated optimized synthetic terminators. Additionally, we discover alternative polyadenylation sites across tens of thousands of terminators; however, the strongest terminators tend to have a dominant cleavage site. Our results establish features of plant terminator function and identify strong naturally occurring and synthetic terminators.
Asunto(s)
Arabidopsis , Poliadenilación , Zea mays , Zea mays/genética , Zea mays/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Regulación de la Expresión Génica de las Plantas , Regiones Terminadoras Genéticas/genética , Nicotiana/genética , Nicotiana/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismoRESUMEN
The 3' end of a gene, often called a terminator, modulates mRNA stability, localization, translation, and polyadenylation. Here, we adapted Plant STARR-seq, a massively parallel reporter assay, to measure the activity of over 50,000 terminators from the plants Arabidopsis thaliana and Zea mays. We characterize thousands of plant terminators, including many that outperform bacterial terminators commonly used in plants. Terminator activity is species-specific, differing in tobacco leaf and maize protoplast assays. While recapitulating known biology, our results reveal the relative contributions of polyadenylation motifs to terminator strength. We built a computational model to predict terminator strength and used it to conduct in silico evolution that generated optimized synthetic terminators. Additionally, we discover alternative polyadenylation sites across tens of thousands of terminators; however, the strongest terminators tend to have a dominant cleavage site. Our results establish features of plant terminator function and identify strong naturally occurring and synthetic terminators.
RESUMEN
Clonal propagation is frequently used in commercial plant breeding and biotechnology programs because it minimizes genetic variation, yet it is not uncommon to observe clonal plants with stable phenotypic changes, a phenomenon known as somaclonal variation. Several studies have linked epigenetic modifications induced during regeneration with this newly acquired phenotypic variation. However, the factors that determine the extent of somaclonal variation and the molecular changes underpinning this process remain poorly understood. To address this gap in our knowledge, we compared clonally propagated Arabidopsis thaliana plants derived from somatic embryogenesis using two different embryonic transcription factors- RWP-RK DOMAIN-CONTAINING 4 (RKD4) or LEAFY COTYLEDON2 (LEC2) and from two epigenetically distinct founder tissues. We found that both the epi(genetic) status of the explant and the regeneration protocol employed play critical roles in shaping the molecular and phenotypic landscape of clonal plants. Phenotypic variation in regenerated plants can be largely explained by the inheritance of tissue-specific DNA methylation imprints, which are associated with specific transcriptional and metabolic changes in sexual progeny of clonal plants. For instance, regenerants were particularly affected by the inheritance of root-specific epigenetic imprints, which were associated with an increased accumulation of salicylic acid in leaves and accelerated plant senescence. Collectively, our data reveal specific pathways underpinning the phenotypic and molecular variation that arise and accumulate in clonal plant populations.
Asunto(s)
Epigenómica , Factores de Transcripción , Factores de Transcripción/genéticaRESUMEN
Accessible chromatin regions are critical components of gene regulation but modeling them directly from sequence remains challenging, especially within plants, whose mechanisms of chromatin remodeling are less understood than in animals. We trained an existing deep-learning architecture, DanQ, on data from 12 angiosperm species to predict the chromatin accessibility in leaf of sequence windows within and across species. We also trained DanQ on DNA methylation data from 10 angiosperms because unmethylated regions have been shown to overlap significantly with ACRs in some plants. The across-species models have comparable or even superior performance to a model trained within species, suggesting strong conservation of chromatin mechanisms across angiosperms. Testing a maize (Zea mays L.) held-out model on a multi-tissue chromatin accessibility panel revealed our models are best at predicting constitutively accessible chromatin regions, with diminishing performance as cell-type specificity increases. Using a combination of interpretation methods, we ranked JASPAR motifs by their importance to each model and saw that the TCP and AP2/ERF transcription factor (TF) families consistently ranked highly. We embedded the top three JASPAR motifs for each model at all possible positions on both strands in our sequence window and observed position- and strand-specific patterns in their importance to the model. With our publicly available across-species 'a2z' model it is now feasible to predict the chromatin accessibility and methylation landscape of any angiosperm genome.
Asunto(s)
Cromatina , Magnoliopsida , Animales , Genoma , Magnoliopsida/genética , Redes Neurales de la Computación , Factores de Transcripción/genética , Zea mays/genéticaRESUMEN
Targeted engineering of plant gene expression holds great promise for ensuring food security and for producing biopharmaceuticals in plants. However, this engineering requires thorough knowledge of cis-regulatory elements to precisely control either endogenous or introduced genes. To generate this knowledge, we used a massively parallel reporter assay to measure the activity of nearly complete sets of promoters from Arabidopsis, maize and sorghum. We demonstrate that core promoter elements-notably the TATA box-as well as promoter GC content and promoter-proximal transcription factor binding sites influence promoter strength. By performing the experiments in two assay systems, leaves of the dicot tobacco and protoplasts of the monocot maize, we detect species-specific differences in the contributions of GC content and transcription factors to promoter strength. Using these observations, we built computational models to predict promoter strength in both assay systems, allowing us to design highly active promoters comparable in activity to the viral 35S minimal promoter. Our results establish a promising experimental approach to optimize native promoter elements and generate synthetic ones with desirable features.
Asunto(s)
Arabidopsis/genética , Regiones Promotoras Genéticas , Sorghum/genética , Zea mays/genética , Regiones no Traducidas 5' , Sitios de Unión , Elementos de Facilitación Genéticos , Regulación de la Expresión Génica de las Plantas , Genes Reporteros , Técnicas Genéticas , Genoma de Planta , Luz , Hojas de la Planta/genética , Plantas Modificadas Genéticamente , Secuencias Reguladoras de Ácidos Nucleicos , TATA Box , Nicotiana/genéticaRESUMEN
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
RESUMEN
BACKGROUND: Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. METHODS: We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. RESULTS AND DISCUSSION: The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.