RESUMEN
The cistrome is the complete set of transcription factor (TF) binding sites (cis-elements) in an organism, while an epicistrome incorporates tissue-specific DNA chemical modifications and TF-specific chemical sensitivities into these binding profiles. Robust methods to construct comprehensive cistrome and epicistrome maps are critical for elucidating complex transcriptional networks that underlie growth, behavior, and disease. Here, we describe DNA affinity purification sequencing (DAP-seq), a high-throughput TF binding site discovery method that interrogates genomic DNA with in-vitro-expressed TFs. Using DAP-seq, we defined the Arabidopsis cistrome by resolving motifs and peaks for 529 TFs. Because genomic DNA used in DAP-seq retains 5-methylcytosines, we determined that >75% (248/327) of Arabidopsis TFs surveyed were methylation sensitive, a property that strongly impacts the epicistrome landscape. DAP-seq datasets also yielded insight into the biology and binding site architecture of numerous TFs, demonstrating the value of DAP-seq for cost-effective cistromic and epicistromic annotation in any organism.
Asunto(s)
Arabidopsis/genética , ADN de Plantas/genética , Genoma de Planta , Elementos de Respuesta , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Secuencias de Aminoácidos , ADN de Plantas/metabolismo , Epigénesis Genética , Ácidos Indolacéticos/metabolismo , Proteínas de Plantas/genéticaRESUMEN
Sun-loving plants have the ability to detect and avoid shading through sensing of both blue and red light wavelengths. Higher plant cryptochromes (CRYs) control how plants modulate growth in response to changes in blue light. For growth under a canopy, where blue light is diminished, CRY1 and CRY2 perceive this change and respond by directly contacting two bHLH transcription factors, PIF4 and PIF5. These factors are also known to be controlled by phytochromes, the red/far-red photoreceptors; however, transcriptome analyses indicate that the gene regulatory programs induced by the different light wavelengths are distinct. Our results indicate that CRYs signal by modulating PIF activity genome wide and that these factors integrate binding of different plant photoreceptors to facilitate growth changes under different light conditions.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Criptocromos/metabolismo , Arabidopsis/crecimiento & desarrollo , Arabidopsis/efectos de la radiación , Expresión Génica , Hipocótilo/crecimiento & desarrollo , Luz , Fitocromo B/metabolismoRESUMEN
The epigenome orchestrates genome accessibility, functionality, and three-dimensional structure. Because epigenetic variation can impact transcription and thus phenotypes, it may contribute to adaptation. Here, we report 1,107 high-quality single-base resolution methylomes and 1,203 transcriptomes from the 1001 Genomes collection of Arabidopsis thaliana. Although the genetic basis of methylation variation is highly complex, geographic origin is a major predictor of genome-wide DNA methylation levels and of altered gene expression caused by epialleles. Comparison to cistrome and epicistrome datasets identifies associations between transcription factor binding sites, methylation, nucleotide variation, and co-expression modules. Physical maps for nine of the most diverse genomes reveal how transposons and other structural variants shape the epigenome, with dramatic effects on immunity genes. The 1001 Epigenomes Project provides a comprehensive resource for understanding how variation in DNA methylation contributes to molecular and non-molecular phenotypes in natural populations of the most studied model plant.
Asunto(s)
Arabidopsis/genética , Epigénesis Genética , Metilación de ADN , Epigenómica , Regulación de la Expresión Génica de las Plantas , Genoma de Planta , TranscriptomaRESUMEN
Variegation is a rare type of mosaicism not fully studied in plants, especially fruits. We examined red and white sections of grape (Vitis vinifera cv. 'Béquignol') variegated berries and found that accumulation of products from branches of the phenylpropanoid and isoprenoid pathways showed an opposite tendency. Light-responsive flavonol and monoterpene levels increased in anthocyanin-depleted areas in correlation with increasing MYB24 expression. Cistrome analysis suggested that MYB24 binds to the promoters of 22 terpene synthase (TPS) genes, as well as 32 photosynthesis/light-related genes, including carotenoid pathway members, the flavonol regulator HY5 HOMOLOGUE (HYH), and other radiation response genes. Indeed, TPS35, TPS09, the carotenoid isomerase gene CRTISO2, and HYH were activated in the presence of MYB24 and MYC2. We suggest that MYB24 modulates ultraviolet and high-intensity visible light stress responses that include terpene and flavonol synthesis and potentially affects carotenoids. The MYB24 regulatory network is developmentally triggered after the onset of berry ripening, while the absence of anthocyanin sunscreens accelerates its activation, likely in a dose-dependent manner due to increased radiation exposure. Anthocyanins and flavonols in variegated berry skins act as effective sunscreens but for different wavelength ranges. The expression patterns of stress marker genes in red and white sections of 'Béquignol' berries strongly suggest that MYB24 promotes light stress amelioration but only partly succeeds during late ripening.
Asunto(s)
Vitis , Vitis/genética , Vitis/metabolismo , Antocianinas/metabolismo , Frutas/genética , Frutas/metabolismo , Terpenos/metabolismo , Protectores Solares , Flavonoles/metabolismo , Carotenoides/metabolismo , Regulación de la Expresión Génica de las PlantasRESUMEN
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Expresión GénicaRESUMEN
The stilbenoid pathway is responsible for the production of resveratrol in grapevine (Vitis vinifera L.). A few transcription factors (TFs) have been identified as regulators of this pathway but the extent of this control has not been deeply studied. Here we show how DNA affinity purification sequencing (DAP-Seq) allows for the genome-wide TF-binding site interrogation in grape. We obtained 5190 and 4443 binding events assigned to 4041 and 3626 genes for MYB14 and MYB15, respectively (approximately 40% of peaks located within −10 kb of transcription start sites). DAP-Seq of MYB14/MYB15 was combined with aggregate gene co-expression networks (GCNs) built from more than 1400 transcriptomic datasets from leaves, fruits, and flowers to narrow down bound genes to a set of high confidence targets. The analysis of MYB14, MYB15, and MYB13, a third uncharacterized member of Subgroup 2 (S2), showed that in addition to the few previously known stilbene synthase (STS) targets, these regulators bind to 30 of 47 STS family genes. Moreover, all three MYBs bind to several PAL, C4H, and 4CL genes, in addition to shikimate pathway genes, the WRKY03 stilbenoid co-regulator and resveratrol-modifying gene candidates among which ROMT2-3 were validated enzymatically. A high proportion of DAP-Seq bound genes were induced in the activated transcriptomes of transient MYB15-overexpressing grapevine leaves, validating our methodological approach for delimiting TF targets. Overall, Subgroup 2 R2R3-MYBs appear to play a key role in binding and directly regulating several primary and secondary metabolic steps leading to an increased flux towards stilbenoid production. The integration of DAP-Seq and reciprocal GCNs offers a rapid framework for gene function characterization using genome-wide approaches in the context of non-model plant species and stands up as a valid first approach for identifying gene regulatory networks of specialized metabolism.
Asunto(s)
Regulación de la Expresión Génica de las Plantas , Estilbenos , Regulación de la Expresión Génica de las Plantas/genética , Redes Reguladoras de Genes , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Ácido Shikímico , Estilbenos/metabolismoRESUMEN
Protein microarrays enable investigation of diverse biochemical properties for thousands of proteins in a single experiment, an unparalleled capacity. Using a high-density system called HaloTag nucleic acid programmable protein array (HaloTag-NAPPA), we created high-density protein arrays comprising 12,000 Arabidopsis ORFs. We used these arrays to query protein-protein interactions for a set of 38 transcription factors and transcriptional regulators (TFs) that function in diverse plant hormone regulatory pathways. The resulting transcription factor interactome network, TF-NAPPA, contains thousands of novel interactions. Validation in a benchmarked in vitro pull-down assay revealed that a random subset of TF-NAPPA validated at the same rate of 64% as a positive reference set of literature-curated interactions. Moreover, using a bimolecular fluorescence complementation (BiFC) assay, we confirmed in planta several interactions of biological interest and determined the interaction localizations for seven pairs. The application of HaloTag-NAPPA technology to plant hormone signaling pathways allowed the identification of many novel transcription factor-protein interactions and led to the development of a proteome-wide plant hormone TF interactome network.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Reguladores del Crecimiento de las Plantas/metabolismo , Factores de Transcripción/metabolismo , Arabidopsis/metabolismo , Análisis por Matrices de Proteínas , Mapeo de Interacción de ProteínasRESUMEN
Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118-310, targeting ß-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets.
Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Oncogenes , Mapas de Interacción de Proteínas , Transducción de Señal , Línea Celular Tumoral , Supervivencia Celular/efectos de los fármacos , Descubrimiento de Drogas , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Reproducibilidad de los Resultados , TranscriptomaRESUMEN
High-throughput technologies including transcriptional profiling, proteomics and reverse genetics screens provide detailed molecular descriptions of cellular responses to perturbations. However, it is difficult to integrate these diverse data to reconstruct biologically meaningful signaling networks. Previously, we have established a framework for integrating transcriptional, proteomic and interactome data by searching for the solution to the prize-collecting Steiner tree problem. Here, we present a web server, SteinerNet, to make this method available in a user-friendly format for a broad range of users with data from any species. At a minimum, a user only needs to provide a set of experimentally detected proteins and/or genes and the server will search for connections among these data from the provided interactomes for yeast, human, mouse, Drosophila melanogaster and Caenorhabditis elegans. More advanced users can upload their own interactome data as well. The server provides interactive visualization of the resulting optimal network and downloadable files detailing the analysis and results. We believe that SteinerNet will be useful for researchers who would like to integrate their high-throughput data for a specific condition or cellular response and to find biologically meaningful pathways. SteinerNet is accessible at http://fraenkel.mit.edu/steinernet.
Asunto(s)
Redes Reguladoras de Genes , Proteómica/métodos , Programas Informáticos , Animales , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Perfilación de la Expresión Génica , Humanos , Internet , Ratones , Transducción de Señal , Levaduras/genética , Levaduras/metabolismoRESUMEN
SQUAMOSA Promoter-Binding Protein-Like (SPL) transcription factors play vital roles in plant development and stress responses. In this study, we report a comprehensive DNA Affinity Purification sequencing (DAP-seq) analysis for 14 of the 16 SPL transcription factors in Arabidopsis thaliana, providing valuable insights into their DNA-binding specificities. We performed Gene Ontology (GO) analysis of the target genes to reveal their convergent and diverse biological functions among SPL family proteins. Comparative analysis between the paralogs AtSPL9 and AtSPL15 revealed differences in their binding motifs, suggesting divergent regulatory functions. Additionally, we expanded our investigation to homologs of AtSPL9/15 in Zea mays (ZmSBP8/30) and Triticum aestivum (TaSPL7/13), identifying conserved and unique DNA-binding patterns across species. These findings provide key resources for understanding the molecular mechanisms of SPL transcription factors in regulating plant development and evolution across different species.
RESUMEN
Regulatory elements are important constituents of plant genomes that have shaped ancient and modern crops. Their identification, function, and diversity in crop genomes however are poorly characterized, thus limiting our ability to harness their power for further agricultural advances using induced or natural variation. Here, we use DNA affinity purification-sequencing (DAP-seq) to map transcription factor (TF) binding events for 200 maize TFs belonging to 30 distinct families and heterodimer pairs in two distinct inbred lines historically used for maize hybrid plant production, providing empirical binding site annotation for 5.3% of the maize genome. TF binding site comparison in B73 and Mo17 inbreds reveals widespread differences, driven largely by structural variation, that correlate with gene expression changes. TF binding site presence-absence variation helps clarify complex QTL such as vgt1, an important determinant of maize flowering time, and DICE, a distal enhancer involved in herbivore resistance. Modification of TF binding regions via CRISPR-Cas9 mediated editing alters target gene expression and phenotype. Our functional catalog of maize TF binding events enables collective and comparative TF binding analysis, and highlights its value for agricultural improvement.
RESUMEN
Many eukaryotic transcription factors (TF) form homodimer or heterodimer complexes to regulate gene expression. Dimerization of BASIC LEUCINE ZIPPER (bZIP) TFs are critical for their functions, but the molecular mechanism underlying the DNA binding and functional specificity of homo- versus heterodimers remains elusive. To address this gap, we present the double DNA Affinity Purification-sequencing (dDAP-seq) technique that maps heterodimer binding sites on endogenous genomic DNA. Using dDAP-seq we profile twenty pairs of C/S1 bZIP heterodimers and S1 homodimers in Arabidopsis and show that heterodimerization significantly expands the DNA binding preferences of these TFs. Analysis of dDAP-seq binding sites reveals the function of bZIP9 in abscisic acid response and the role of bZIP53 heterodimer-specific binding in seed maturation. The C/S1 heterodimers show distinct preferences for the ACGT elements recognized by plant bZIPs and motifs resembling the yeast GCN4 cis-elements. This study demonstrates the potential of dDAP-seq in deciphering the DNA binding specificities of interacting TFs that are key for combinatorial gene regulation.
Asunto(s)
Proteínas de Arabidopsis , Arabidopsis , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Sitios de Unión , ADN/metabolismoRESUMEN
Cross-regulation between hormone signaling pathways is indispensable for plant growth and development. However, the molecular mechanisms by which multiple hormones interact and co-ordinate activity need to be understood. Here, we generated a cross-regulation network explaining how hormone signals are integrated from multiple pathways in etiolated Arabidopsis (Arabidopsis thaliana) seedlings. To do so we comprehensively characterized transcription factor activity during plant hormone responses and reconstructed dynamic transcriptional regulatory models for six hormones; abscisic acid, brassinosteroid, ethylene, jasmonic acid, salicylic acid and strigolactone/karrikin. These models incorporated target data for hundreds of transcription factors and thousands of protein-protein interactions. Each hormone recruited different combinations of transcription factors, a subset of which were shared between hormones. Hub target genes existed within hormone transcriptional networks, exhibiting transcription factor activity themselves. In addition, a group of MITOGEN-ACTIVATED PROTEIN KINASES (MPKs) were identified as potential key points of cross-regulation between multiple hormones. Accordingly, the loss of function of one of these (MPK6) disrupted the global proteome, phosphoproteome and transcriptome during hormone responses. Lastly, we determined that all hormones drive substantial alternative splicing that has distinct effects on the transcriptome compared with differential gene expression, acting in early hormone responses. These results provide a comprehensive understanding of the common features of plant transcriptional regulatory pathways and how cross-regulation between hormones acts upon gene expression.
RESUMEN
Transcription factors (TFs) play a critical role in determining cell fate decisions by integrating developmental and environmental signals through binding to specific cis-regulatory modules and regulating spatio-temporal specificity of gene expression patterns. Precise identification of functional TF binding sites in time and space not only will revolutionize our understanding of regulatory networks governing cell fate decisions but is also instrumental to uncover how genetic variations cause morphological diversity or disease. In this review, we discuss recent advances in mapping TF binding sites and characterizing the various parameters underlying the complexity of binding site recognition by TFs.
Asunto(s)
ADN , Factores de Transcripción , Sitios de Unión , Biología , ADN/metabolismo , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
The Plant Cell Atlas (PCA) community hosted a virtual symposium on December 9 and 10, 2021 on single cell and spatial omics technologies. The conference gathered almost 500 academic, industry, and government leaders to identify the needs and directions of the PCA community and to explore how establishing a data synthesis center would address these needs and accelerate progress. This report details the presentations and discussions focused on the possibility of a data synthesis center for a PCA and the expected impacts of such a center on advancing science and technology globally. Community discussions focused on topics such as data analysis tools and annotation standards; computational expertise and cyber-infrastructure; modes of community organization and engagement; methods for ensuring a broad reach in the PCA community; recruitment, training, and nurturing of new talent; and the overall impact of the PCA initiative. These targeted discussions facilitated dialogue among the participants to gauge whether PCA might be a vehicle for formulating a data synthesis center. The conversations also explored how online tools can be leveraged to help broaden the reach of the PCA (i.e., online contests, virtual networking, and social media stakeholder engagement) and decrease costs of conducting research (e.g., virtual REU opportunities). Major recommendations for the future of the PCA included establishing standards, creating dashboards for easy and intuitive access to data, and engaging with a broad community of stakeholders. The discussions also identified the following as being essential to the PCA's success: identifying homologous cell-type markers and their biocuration, publishing datasets and computational pipelines, utilizing online tools for communication (such as Slack), and user-friendly data visualization and data sharing. In conclusion, the development of a data synthesis center will help the PCA community achieve these goals by providing a centralized repository for existing and new data, a platform for sharing tools, and new analytical approaches through collaborative, multidisciplinary efforts. A data synthesis center will help the PCA reach milestones, such as community-supported data evaluation metrics, accelerating plant research necessary for human and environmental health.
RESUMEN
Single-cell genomics, particularly single-cell transcriptome profiling by RNA sequencing have transformed the possibilities to relate genes to functions, structures, and eventually phenotypes. We can now observe changes in each cell's transcriptome and among its neighborhoods, interrogate the sequence of transcriptional events, and assess their influence on subsequent events. This paradigm shift in biology enables us to infer causal relationships in these events with high accuracy. Here we review the latest single-cell studies in plants that uncover how cellular phenotypes emerge as a result of the transcriptome process such as waves of expression, trajectories of development and responses to the environment, and spatial information. With an eye on the advances made in animal and human studies, we further highlight some of the needed areas for future research and development, including computational methods.
Asunto(s)
Biología Computacional , Análisis de la Célula Individual , Animales , Genómica , Fenotipo , Análisis de Secuencia de ARNRESUMEN
With growing populations and pressing environmental problems, future economies will be increasingly plant-based. Now is the time to reimagine plant science as a critical component of fundamental science, agriculture, environmental stewardship, energy, technology and healthcare. This effort requires a conceptual and technological framework to identify and map all cell types, and to comprehensively annotate the localization and organization of molecules at cellular and tissue levels. This framework, called the Plant Cell Atlas (PCA), will be critical for understanding and engineering plant development, physiology and environmental responses. A workshop was convened to discuss the purpose and utility of such an initiative, resulting in a roadmap that acknowledges the current knowledge gaps and technical challenges, and underscores how the PCA initiative can help to overcome them.
Asunto(s)
Células Vegetales , Agricultura , Chlamydomonas reinhardtii , Cloroplastos , Biología Computacional , Procesamiento de Imagen Asistido por Computador , Células Vegetales/fisiología , Desarrollo de la Planta , Plantas/clasificación , Plantas/genética , Zea maysRESUMEN
Next-generation sequencing (NGS)-based methods are revolutionizing biology. Their prevalence requires biologists to be increasingly knowledgeable about computational methods to manage the enormous scale of data. As such, early introduction to NGS analysis and conceptual connection to wet-lab experiments is crucial for training young scientists. However, significant challenges impede the introduction of these methods into the undergraduate classroom, including the need for specialized computer programs and knowledge of computer coding. Here, we describe a semester-long, course-based undergraduate research experience at a liberal arts college combining RNA-sequencing (RNA-seq) analysis with student-driven, wet-lab experiments to investigate plant responses to light. Students derived hypotheses based on analysis of RNA-seq data and designed follow-up studies of gene expression and plant growth. Our assessments indicate that students acquired knowledge of big data analysis and computer coding; however, earlier exposure to computational methods may be beneficial. Our course requires minimal prior knowledge of plant biology, is easy to replicate, and can be modified to a shorter, directed-inquiry module. This framework promotes exploration of the links between gene expression and phenotype using examples that are clear and tractable and improves computational skills and bioinformatics self-efficacy to prepare students for the "big data" era of modern biology.
Asunto(s)
Macrodatos , Perfilación de la Expresión Génica , Estudiantes , Universidades , Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Humanos , Aprendizaje , FenotipoRESUMEN
To enable low-cost, high-throughput generation of cistrome and epicistrome maps for any organism, we developed DNA affinity purification sequencing (DAP-seq), a transcription factor (TF)-binding site (TFBS) discovery assay that couples affinity-purified TFs with next-generation sequencing of a genomic DNA library. The method is fast, inexpensive, and more easily scaled than chromatin immunoprecipitation sequencing (ChIP-seq). DNA libraries are constructed using native genomic DNA from any source of interest, preserving cell- and tissue-specific chemical modifications that are known to affect TF binding (such as DNA methylation) and providing increased specificity as compared with in silico predictions based on motifs from methods such as protein-binding microarrays (PBMs) and systematic evolution of ligands by exponential enrichment (SELEX). The resulting DNA library is incubated with an affinity-tagged in vitro-expressed TF, and TF-DNA complexes are purified using magnetic separation of the affinity tag. Bound genomic DNA is eluted from the TF and sequenced using next-generation sequencing. Sequence reads are mapped to a reference genome, identifying genome-wide binding locations for each TF assayed, from which sequence motifs can then be derived. A researcher with molecular biology experience should be able to follow this protocol, processing up to 400 samples per week.