Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 100
Filtrar
1.
PLoS Biol ; 22(5): e3002612, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38781246

RESUMO

Scientific advances due to conceptual or technological innovations can be revealed by examining how research topics have evolved. But such topical evolution is difficult to uncover and quantify because of the large body of literature and the need for expert knowledge in a wide range of areas in a field. Using plant biology as an example, we used machine learning and language models to classify plant science citations into topics representing interconnected, evolving subfields. The changes in prevalence of topical records over the last 50 years reflect shifts in major research trends and recent radiation of new topics, as well as turnover of model species and vastly different plant science research trajectories among countries. Our approaches readily summarize the topical diversity and evolution of a scientific field with hundreds of thousands of relevant papers, and they can be applied broadly to other fields.


Assuntos
Plantas , Pesquisa/tendências , Aprendizado de Máquina , Botânica/tendências , Botânica/métodos
2.
Proc Natl Acad Sci U S A ; 120(19): e2219469120, 2023 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-37126718

RESUMO

Basic helix-loop-helix (bHLH) proteins are one of the largest families of transcription factor (TF) in eukaryotes, and ~30% of all flowering plants' bHLH TFs contain the aspartate kinase, chorismate mutase, and TyrA (ACT)-like domain at variable distances C-terminal from the bHLH. However, the evolutionary history and functional consequences of the bHLH/ACT-like domain association remain unknown. Here, we show that this domain association is unique to the plantae kingdom with green algae (chlorophytes) harboring a small number of bHLH genes with variable frequency of ACT-like domain's presence. bHLH-associated ACT-like domains form a monophyletic group, indicating a common origin. Indeed, phylogenetic analysis results suggest that the association of ACT-like and bHLH domains occurred early in Plantae by recruitment of an ACT-like domain in a common ancestor with widely distributed ACT DOMAIN REPEAT (ACR) genes by an ancestral bHLH gene. We determined the functional significance of this association by showing that Chlamydomonas reinhardtii ACT-like domains mediate homodimer formation and negatively affect DNA binding of the associated bHLH domains. We show that, while ACT-like domains have experienced faster selection than the associated bHLH domain, their rates of evolution are strongly and positively correlated, suggesting that the evolution of the ACT-like domains was constrained by the bHLH domains. This study proposes an evolutionary trajectory for the association of ACT-like and bHLH domains with the experimental characterization of the functional consequence in the regulation of plant-specific processes, highlighting the impacts of functional domain coevolution.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos , Plantas , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Filogenia , Plantas/genética , Fatores de Transcrição/metabolismo , Sequências Hélice-Alça-Hélice
3.
Plant Cell ; 34(2): 867-888, 2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-34865154

RESUMO

Plants respond to wounding stress by changing gene expression patterns and inducing the production of hormones including jasmonic acid. This wounding transcriptional response activates specialized metabolism pathways such as the glucosinolate pathways in Arabidopsis thaliana. While the regulatory factors and sequences controlling a subset of wound-response genes are known, it remains unclear how wound response is regulated globally. Here, we how these responses are regulated by incorporating putative cis-regulatory elements, known transcription factor binding sites, in vitro DNA affinity purification sequencing, and DNase I hypersensitive sites to predict genes with different wound-response patterns using machine learning. We observed that regulatory sites and regions of open chromatin differed between genes upregulated at early and late wounding time-points as well as between genes induced by jasmonic acid and those not induced. Expanding on what we currently know, we identified cis-elements that improved model predictions of expression clusters over known binding sites. Using a combination of genome editing, in vitro DNA-binding assays, and transient expression assays using native and mutated cis-regulatory elements, we experimentally validated four of the predicted elements, three of which were not previously known to function in wound-response regulation. Our study provides a global model predictive of wound response and identifies new regulatory sequences important for wounding without requiring prior knowledge of the transcriptional regulators.


Assuntos
Arabidopsis/fisiologia , Regulação da Expressão Gênica de Plantas , Reguladores de Crescimento de Plantas/fisiologia , Arabidopsis/efeitos dos fármacos , Arabidopsis/genética , Ciclopentanos/farmacologia , Redes e Vias Metabólicas , Modelos Biológicos , Oxilipinas/farmacologia , Reguladores de Crescimento de Plantas/farmacologia , Plantas Geneticamente Modificadas , Sequências Reguladoras de Ácido Nucleico , Reprodutibilidade dos Testes , Fatores de Transcrição/genética
4.
Proc Natl Acad Sci U S A ; 119(35): e2208795119, 2022 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-36001691

RESUMO

The superior photosynthetic efficiency of C4 leaves over C3 leaves is owing to their unique Kranz anatomy, in which the vein is surrounded by one layer of bundle sheath (BS) cells and one layer of mesophyll (M) cells. Kranz anatomy development starts from three contiguous ground meristem (GM) cells, but its regulators and underlying molecular mechanism are largely unknown. To identify the regulators, we obtained the transcriptomes of 11 maize embryonic leaf cell types from five stages of pre-Kranz cells starting from median GM cells and six stages of pre-M cells starting from undifferentiated cells. Principal component and clustering analyses of transcriptomic data revealed rapid pre-Kranz cell differentiation in the first two stages but slow differentiation in the last three stages, suggesting early Kranz cell fate determination. In contrast, pre-M cells exhibit a more prolonged transcriptional differentiation process. Differential gene expression and coexpression analyses identified gene coexpression modules, one of which included 3 auxin transporter and 18 transcription factor (TF) genes, including known regulators of Kranz anatomy and/or vascular development. In situ hybridization of 11 TF genes validated their expression in early Kranz development. We determined the binding motifs of 15 TFs, predicted TF target gene relationships among the 18 TF and 3 auxin transporter genes, and validated 67 predictions by electrophoresis mobility shift assay. From these data, we constructed a gene regulatory network for Kranz development. Our study sheds light on the regulation of early maize leaf development and provides candidate leaf development regulators for future study.


Assuntos
Regulação da Expressão Gênica no Desenvolvimento , Regulação da Expressão Gênica de Plantas , Folhas de Planta , Transcriptoma , Zea mays , Ácidos Indolacéticos/metabolismo , Microdissecção e Captura a Laser , Fotossíntese/genética , Folhas de Planta/embriologia , Folhas de Planta/genética , Zea mays/enzimologia , Zea mays/genética
5.
Plant Cell ; 33(2): 224-247, 2021 04 17.
Artigo em Inglês | MEDLINE | ID: mdl-33681966

RESUMO

The broad host range of Fusarium virguliforme represents a unique comparative system to identify and define differentially induced responses between an asymptomatic monocot host, maize (Zea mays), and a symptomatic eudicot host, soybean (Glycine max). Using a temporal, comparative transcriptome-based approach, we observed that early gene expression profiles of root tissue from infected maize suggest that pathogen tolerance coincides with the rapid induction of senescence dampening transcriptional regulators, including ANACs (Arabidopsis thaliana NAM/ATAF/CUC protein) and Ethylene-Responsive Factors. In contrast, the expression of senescence-associated processes in soybean was coincident with the appearance of disease symptom development, suggesting pathogen-induced senescence as a key pathway driving pathogen susceptibility in soybean. Based on the analyses described herein, we posit that root senescence is a primary contributing factor underlying colonization and disease progression in symptomatic versus asymptomatic host-fungal interactions. This process also supports the lifestyle and virulence of F. virguliforme during biotrophy to necrotrophy transitions. Further support for this hypothesis lies in comprehensive co-expression and comparative transcriptome analyses, and in total, supports the emerging concept of necrotrophy-activated senescence. We propose that F. virguliforme conditions an environment within symptomatic hosts, which favors susceptibility through transcriptomic reprogramming, and as described herein, the induction of pathways associated with senescence during the necrotrophic stage of fungal development.


Assuntos
Fusarium/fisiologia , Glycine max/microbiologia , Interações Hospedeiro-Patógeno/genética , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Transcrição Gênica , Zea mays/microbiologia , Contagem de Colônia Microbiana , Fusarium/crescimento & desenvolvimento , Regulação da Expressão Gênica de Plantas , Glycine max/genética , Fatores de Tempo , Fatores de Transcrição/metabolismo , Transcriptoma/genética , Zea mays/genética
6.
Trends Genet ; 36(6): 442-455, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32396837

RESUMO

Because of its ability to find complex patterns in high dimensional and heterogeneous data, machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and genomic data available. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights. Here, we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.


Assuntos
Biologia Computacional/métodos , Genética Médica , Genética Populacional , Genoma Humano , Aprendizado de Máquina , Humanos
7.
Plant Physiol ; 190(4): 2539-2556, 2022 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-36156105

RESUMO

A signaling complex comprising members of the LORELEI (LRE)-LIKE GPI-anchored protein (LLG) and Catharanthus roseus RECEPTOR-LIKE KINASE 1-LIKE (CrRLK1L) families perceive RAPID ALKALINIZATION FACTOR (RALF) peptides and regulate growth, reproduction, immunity, and stress responses in Arabidopsis (Arabidopsis thaliana). Genes encoding these proteins are members of multigene families in most angiosperms and could generate thousands of signaling complex variants. However, the links between expansion of these gene families and the functional diversification of this critical signaling complex as well as the evolutionary factors underlying the maintenance of gene duplicates remain unknown. Here, we investigated LLG gene family evolution by sampling land plant genomes and explored the function and expression of angiosperm LLGs. We found that LLG diversity within major land plant lineages is primarily due to lineage-specific duplication events, and that these duplications occurred both early in the history of these lineages and more recently. Our complementation and expression analyses showed that expression divergence (i.e. regulatory subfunctionalization), rather than functional divergence, explains the retention of LLG paralogs. Interestingly, all but one monocot and all eudicot species examined had an LLG copy with preferential expression in male reproductive tissues, while the other duplicate copies showed highest levels of expression in female or vegetative tissues. The single LLG copy in Amborella trichopoda is expressed vastly higher in male compared to in female reproductive or vegetative tissues. We propose that expression divergence plays an important role in retention of LLG duplicates in angiosperms.


Assuntos
Arabidopsis , Embriófitas , Magnoliopsida , Arabidopsis/metabolismo , Família Multigênica , Fosfotransferases/genética , Sementes/metabolismo , Embriófitas/genética , Magnoliopsida/genética , Magnoliopsida/metabolismo , Proteínas/genética , Duplicação Gênica , Evolução Molecular , Filogenia
8.
Plant Cell ; 32(1): 139-151, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31641024

RESUMO

The ability to predict traits from genome-wide sequence information (i.e., genomic prediction) has improved our understanding of the genetic basis of complex traits and transformed breeding practices. Transcriptome data may also be useful for genomic prediction. However, it remains unclear how well transcript levels can predict traits, particularly when traits are scored at different development stages. Using maize (Zea mays) genetic markers and transcript levels from seedlings to predict mature plant traits, we found that transcript and genetic marker models have similar performance. When the transcripts and genetic markers with the greatest weights (i.e., the most important) in those models were used in one joint model, performance increased. Furthermore, genetic markers important for predictions were not close to or identified as regulatory variants for important transcripts. These findings demonstrate that transcript levels are useful for predicting traits and that their predictive power is not simply due to genetic variation in the transcribed genomic regions. Finally, genetic marker models identified only 1 of 14 benchmark flowering-time genes, while transcript models identified 5. These data highlight that, in addition to being useful for genomic prediction, transcriptome data can provide a link between traits and variation that cannot be readily captured at the sequence level.


Assuntos
Genoma de Planta/genética , Herança Multifatorial , Transcriptoma , Zea mays/genética , Marcadores Genéticos , Variação Genética , Estudo de Associação Genômica Ampla , Genômica , Modelos Genéticos , Fenótipo
9.
Proc Natl Acad Sci U S A ; 117(13): 7482-7493, 2020 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-32170020

RESUMO

Plants balance their competing requirements for growth and stress tolerance via a sophisticated regulatory circuitry that controls responses to the external environments. We have identified a plant-specific gene, COST1 (constitutively stressed 1), that is required for normal plant growth but negatively regulates drought resistance by influencing the autophagy pathway. An Arabidopsis thaliana cost1 mutant has decreased growth and increased drought tolerance, together with constitutive autophagy and increased expression of drought-response genes, while overexpression of COST1 confers drought hypersensitivity and reduced autophagy. The COST1 protein is degraded upon plant dehydration, and this degradation is reduced upon treatment with inhibitors of the 26S proteasome or autophagy pathways. The drought resistance of a cost1 mutant is dependent on an active autophagy pathway, but independent of other known drought signaling pathways, indicating that COST1 acts through regulation of autophagy. In addition, COST1 colocalizes to autophagosomes with the autophagosome marker ATG8e and the autophagy adaptor NBR1, and affects the level of ATG8e protein through physical interaction with ATG8e, indicating a pivotal role in direct regulation of autophagy. We propose a model in which COST1 represses autophagy under optimal conditions, thus allowing plant growth. Under drought, COST1 is degraded, enabling activation of autophagy and suppression of growth to enhance drought tolerance. Our research places COST1 as an important regulator controlling the balance between growth and stress responses via the direct regulation of autophagy.


Assuntos
Proteínas de Arabidopsis/fisiologia , Arabidopsis/fisiologia , Estresse Fisiológico/fisiologia , Arabidopsis/citologia , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Autofagossomos/metabolismo , Autofagia/fisiologia , Família da Proteína 8 Relacionada à Autofagia/metabolismo , Proteínas de Transporte/metabolismo , Secas , Genes de Plantas , Transdução de Sinais , Estresse Fisiológico/genética
10.
Proc Natl Acad Sci U S A ; 117(35): 21747-21756, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32817425

RESUMO

Arabidopsis AINTEGUMENTA (ANT), an AP2 transcription factor, is known to control plant growth and floral organogenesis. In this study, our transcriptome analysis and in situ hybridization assays of maize embryonic leaves suggested that maize ANT1 (ZmANT1) regulates vascular development. To better understand ANT1 functions, we determined the binding motif of ZmANT1 and then showed that ZmANT1 binds the promoters of millet SCR1, GNC, and AN3, which are key regulators of Kranz anatomy, chloroplast development, and plant growth, respectively. We generated a mutant with a single-codon deletion and two frameshift mutants of the ANT1 ortholog in the C4 millet Setaria viridis by the CRISPR/Cas9 technique. The two frameshift mutants displayed reduced photosynthesis efficiency and growth rate, smaller leaves, and lower grain yields than wild-type (WT) plants. Moreover, their leaves sporadically exhibited distorted Kranz anatomy and vein spacing. Conducting transcriptomic analysis of developing leaves in the WT and the three mutants we identified differentially expressed genes (DEGs) in the two frameshift mutant lines and found many down-regulated DEGs enriched in photosynthesis, heme, tetrapyrrole binding, and antioxidant activity. In addition, we predicted many target genes of ZmANT1 and chose 13 of them to confirm binding of ZmANT1 to their promoters. Based on the above observations, we proposed a model for ANT1 regulation of cell proliferation and leaf growth, vascular and vein development, chloroplast development, and photosynthesis through its target genes. Our study revealed biological roles of ANT1 in several developmental processes beyond its known roles in plant growth and floral organogenesis.


Assuntos
Translocador 1 do Nucleotídeo Adenina/metabolismo , Zea mays/crescimento & desenvolvimento , Zea mays/genética , Translocador 1 do Nucleotídeo Adenina/fisiologia , Sistemas de Transporte de Aminoácidos Neutros/genética , Sistemas de Transporte de Aminoácidos Neutros/metabolismo , Cloroplastos/metabolismo , Flores/genética , Flores/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Milhetes/genética , Milhetes/metabolismo , Organogênese Vegetal/genética , Fotossíntese/genética , Fotossíntese/fisiologia , Desenvolvimento Vegetal/genética , Folhas de Planta/metabolismo , Proteínas de Plantas/genética , Fatores de Transcrição/metabolismo , Transcriptoma
11.
Mol Biol Evol ; 38(8): 3397-3414, 2021 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-33871641

RESUMO

Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.


Assuntos
Arabidopsis/genética , Evolução Biológica , Duplicação Gênica , Aprendizado de Máquina , Modelos Genéticos
12.
New Phytol ; 234(4): 1521-1533, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35218008

RESUMO

Revealing the contributions of genes to plant phenotype is frequently challenging because loss-of-function effects may be subtle or masked by varying degrees of genetic redundancy. Such effects can potentially be detected by measuring plant fitness, which reflects the cumulative effects of genetic changes over the lifetime of a plant. However, fitness is challenging to measure accurately, particularly in species with high fecundity and relatively small propagule sizes such as Arabidopsis thaliana. An image segmentation-based method using the software ImageJ and an object detection-based method using the Faster Region-based Convolutional Neural Network (R-CNN) algorithm were used for measuring two Arabidopsis fitness traits: seed and fruit counts. The segmentation-based method was error-prone (correlation between true and predicted seed counts, r2 = 0.849) because seeds touching each other were undercounted. By contrast, the object detection-based algorithm yielded near perfect seed counts (r2 = 0.9996) and highly accurate fruit counts (r2 = 0.980). Comparing seed counts for wild-type and 12 mutant lines revealed fitness effects for three genes; fruit counts revealed the same effects for two genes. Our study provides analysis pipelines and models to facilitate the investigation of Arabidopsis fitness traits and demonstrates the importance of examining fitness traits when studying gene functions.


Assuntos
Arabidopsis , Algoritmos , Arabidopsis/genética , Redes Neurais de Computação , Fenótipo , Sementes/genética
13.
Proc Natl Acad Sci U S A ; 116(6): 2344-2353, 2019 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-30674669

RESUMO

Plant specialized metabolism (SM) enzymes produce lineage-specific metabolites with important ecological, evolutionary, and biotechnological implications. Using Arabidopsis thaliana as a model, we identified distinguishing characteristics of SM and GM (general metabolism, traditionally referred to as primary metabolism) genes through a detailed study of features including duplication pattern, sequence conservation, transcription, protein domain content, and gene network properties. Analysis of multiple sets of benchmark genes revealed that SM genes tend to be tandemly duplicated, coexpressed with their paralogs, narrowly expressed at lower levels, less conserved, and less well connected in gene networks relative to GM genes. Although the values of each of these features significantly differed between SM and GM genes, any single feature was ineffective at predicting SM from GM genes. Using machine learning methods to integrate all features, a prediction model was established with a true positive rate of 87% and a true negative rate of 71%. In addition, 86% of known SM genes not used to create the machine learning model were predicted. We also demonstrated that the model could be further improved when we distinguished between SM, GM, and junction genes responsible for reactions shared by SM and GM pathways, indicating that topological considerations may further improve the SM prediction model. Application of the prediction model led to the identification of 1,220 A. thaliana genes with previously unknown functions, each assigned a confidence measure called an SM score, providing a global estimate of SM gene content in a plant genome.

14.
Proc Natl Acad Sci U S A ; 116(8): 3091-3099, 2019 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-30718437

RESUMO

Time-series transcriptomes of a biological process obtained under different conditions are useful for identifying the regulators of the process and their regulatory networks. However, such data are 3D (gene expression, time, and condition), and there is currently no method that can deal with their full complexity. Here, we developed a method that avoids time-point alignment and normalization between conditions. We applied it to analyze time-series transcriptomes of developing maize leaves under light-dark cycles and under total darkness and obtained eight time-ordered gene coexpression networks (TO-GCNs), which can be used to predict upstream regulators of any genes in the GCNs. One of the eight TO-GCNs is light-independent and likely includes all genes involved in the development of Kranz anatomy, which is a structure crucial for the high efficiency of photosynthesis in C4 plants. Using this TO-GCN, we predicted and experimentally validated a regulatory cascade upstream of SHORTROOT1, a key Kranz anatomy regulator. Moreover, we applied the method to compare transcriptomes from maize and rice leaf segments and identified regulators of maize C4 enzyme genes and RUBISCO SMALL SUBUNIT2 Our study provides not only a powerful method but also novel insights into the regulatory networks underlying Kranz anatomy development and C4 photosynthesis.


Assuntos
Redes Reguladoras de Genes/genética , Fotossíntese/genética , Folhas de Planta/genética , Transcriptoma/genética , Regulação da Expressão Gênica de Plantas/genética , Oryza/genética , Fotoperíodo , Proteínas de Plantas , Ribulose-Bifosfato Carboxilase/genética , Zea mays/genética
15.
BMC Genomics ; 22(1): 99, 2021 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-33530937

RESUMO

BACKGROUND: Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. RESULTS: To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. CONCLUSIONS: Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species.


Assuntos
Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala , Elementos de DNA Transponíveis/genética , Genômica , Análise de Sequência de DNA
16.
New Phytol ; 231(1): 475-489, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33749860

RESUMO

Plant metabolites from diverse pathways are important for plant survival, human nutrition and medicine. The pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilising > 600 tomato (Solanum lycopersicum) expression data combinations, three strategies for predicting memberships in 85 pathways were explored. Optimal predictions for different pathways require distinct data combinations indicative of pathway functions. Naive prediction (i.e. identifying pathways with the most similarly expressed genes) is error prone. In 52 pathways, unsupervised learning performed better than supervised approaches, possibly due to limited training data availability. Using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on expression levels. Using 36 experimental validated genes, the pathway-best model prediction accuracy is 58.3%, significantly better compared with that for predicting annotated genes without experimental evidence (37.0%) or random guess (1.2%), demonstrating the importance of data quality. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximise the accuracy of metabolic pathway membership assignment. The prediction framework outlined here can be applied to other species and serves as a baseline model for future comparisons.


Assuntos
Redes e Vias Metabólicas , Solanum lycopersicum , Expressão Gênica , Genes de Plantas , Solanum lycopersicum/genética , Redes e Vias Metabólicas/genética
17.
Plant Physiol ; 182(3): 1420-1439, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31937681

RESUMO

Plant iron deficiency (-Fe) activates a complex regulatory network that coordinates root Fe uptake and distribution to sink tissues. In Arabidopsis (Arabidopsis thaliana), FER-LIKE FE DEFICIENCY-INDUCED TRANSCRIPTION FACTOR (FIT), a basic helix-loop-helix (bHLH) transcription factor (TF), regulates root Fe acquisition genes. Many other -Fe-induced genes are FIT independent, and instead regulated by other bHLH TFs and by yet unknown TFs. The cis-regulatory code, that is, the cis-regulatory elements (CREs) and their combinations that regulate plant -Fe-responses, remains largely elusive. Using Arabidopsis root transcriptome data and coexpression clustering, we identified over 100 putative CREs (pCREs) that predicted -Fe-induced gene expression in computational models. To assess pCRE properties and possible functions, we used large-scale in vitro TF binding data, positional bias, and evolutionary conservation. As one example, our approach uncovered pCREs resembling IDE1 (iron deficiency-responsive element 1), a known grass -Fe response CRE. Arabidopsis IDE1-likes were associated with FIT-dependent gene expression, more specifically with biosynthesis of Fe-chelating compounds. Thus, IDE1 seems to be conserved in grass and nongrass species. Our pCREs matched among others in vitro binding sites of B3, NAC, bZIP, and TCP TFs, which might be regulators of -Fe responses. Altogether, our findings provide a comprehensive source of cis-regulatory information for -Fe-responsive genes that advance our mechanistic understanding and inform future efforts in engineering plants with more efficient Fe uptake or transport systems.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Raízes de Plantas/metabolismo , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Raízes de Plantas/genética , Sequências Reguladoras de Ácido Nucleico/genética
18.
Plant Cell ; 30(7): 1445-1460, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29743197

RESUMO

The evolution of transcriptional regulatory mechanisms is central to how stress response and tolerance differ between species. However, it remains largely unknown how divergence in cis-regulatory sites and, subsequently, transcription factor (TF) binding specificity contribute to stress-responsive expression divergence, particularly between wild and domesticated species. By profiling wound-responsive gene transcriptomes in wild Solanum pennellii and domesticated S. lycopersicum, we found extensive wound response divergence and identified 493 S. lycopersicum and 278 S. pennellii putative cis-regulatory elements (pCREs) that were predictive of wound-responsive gene expression. Only 24-52% of these wound response pCREs (depending on wound response patterns) were consistently enriched in the putative promoter regions of wound-responsive genes across species. In addition, between these two species, their differences in pCRE site sequences were significantly and positively correlated with differences in wound-responsive gene expression. Furthermore, ∼11-39% of pCREs were specific to only one of the species and likely bound by TFs from different families. These findings indicate substantial regulatory divergence in these two plant species that diverged ∼3-7 million years ago. Our study provides insights into the mechanistic basis of how the transcriptional response to wounding is regulated and, importantly, the contribution of cis-regulatory components to variation in wound-responsive gene expression between a wild and a domesticated plant species.


Assuntos
Solanum lycopersicum/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
19.
BMC Genomics ; 21(1): 159, 2020 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-32054475

RESUMO

BACKGROUND: Gene expression is regulated by DNA-binding transcription factors (TFs). Together with their target genes, these factors and their interactions collectively form a gene regulatory network (GRN), which is responsible for producing patterns of transcription, including cyclical processes such as genome replication and cell division. However, identifying how this network regulates the timing of these patterns, including important interactions and regulatory motifs, remains a challenging task. RESULTS: We employed four in vivo and in vitro regulatory data sets to investigate the regulatory basis of expression timing and phase-specific patterns cell-cycle expression in Saccharomyces cerevisiae. Specifically, we considered interactions based on direct binding between TF and target gene, indirect effects of TF deletion on gene expression, and computational inference. We found that the source of regulatory information significantly impacts the accuracy and completeness of recovering known cell-cycle expressed genes. The best approach involved combining TF-target and TF-TF interactions features from multiple datasets in a single model. In addition, TFs important to multiple phases of cell-cycle expression also have the greatest impact on individual phases. Important TFs regulating a cell-cycle phase also tend to form modules in the GRN, including two sub-modules composed entirely of unannotated cell-cycle regulators (STE12-TEC1 and RAP1-HAP1-MSN4). CONCLUSION: Our findings illustrate the importance of integrating both multiple omics data and regulatory motifs in order to understand the significance regulatory interactions involved in timing gene expression. This integrated approached allowed us to recover both known cell-cycles interactions and the overall pattern of phase-specific expression across the cell-cycle better than any single data set. Likewise, by looking at regulatory motifs in the form of TF-TF interactions, we identified sets of TFs whose co-regulation of target genes was important for cell-cycle expression, even when regulation by individual TFs was not. Overall, this demonstrates the power of integrating multiple data sets and models of interaction in order to understand the regulatory basis of established biological processes and their associated gene regulatory networks.


Assuntos
Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Genes cdc , Genômica , Saccharomyces cerevisiae/genética , Biologia Computacional/métodos , Genômica/métodos , Aprendizado de Máquina , Ligação Proteica , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismo
20.
Plant Physiol ; 181(4): 1739-1751, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31551359

RESUMO

Multicellular organisms have diverse cell types with distinct roles in development and responses to the environment. At the transcriptional level, the differences in the environmental response between cell types are due to differences in regulatory programs. In plants, although cell-type environmental responses have been examined, it is unclear how these responses are regulated. Here, we identify a set of putative cis-regulatory elements (pCREs) enriched in the promoters of genes responsive to high-salinity stress in six Arabidopsis (Arabidopsis thaliana) root cell types. We then use these pCREs to establish cis-regulatory codes (i.e. models predicting whether a gene is responsive to high salinity for each cell type with machine learning). These pCRE-based models outperform models using in vitro binding data of 758 Arabidopsis transcription factors. Surprisingly, organ pCREs identified based on the whole-root high-salinity response can predict cell-type responses as well as pCREs derived from cell-type data, because organ and cell-type pCREs predict complementary subsets of high-salinity response genes. Our findings not only advance our understanding of the regulatory mechanisms of the plant spatial transcriptional response through cis-regulatory codes but also suggest broad applicability of the approach to any species, particularly those with little or no trans-regulatory data.


Assuntos
Células Vegetais/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Salinidade , Sequência de Bases , Regulação da Expressão Gênica de Plantas , Aprendizado de Máquina , Especificidade de Órgãos/genética , Raízes de Plantas/genética , Ligação Proteica , Fatores de Transcrição/metabolismo , Transcrição Gênica , Regulação para Cima/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa