RESUMEN
BACKGROUND: RNA-sequencing analysis is increasingly utilized to study gene expression in non-model organisms without sequenced genomes. Aethionema arabicum (Brassicaceae) exhibits seed dimorphism as a bet-hedging strategy - producing both a less dormant mucilaginous (M+) seed morph and a more dormant non-mucilaginous (NM) seed morph. Here, we compared de novo and reference-genome based transcriptome assemblies to investigate Ae. arabicum seed dimorphism and to evaluate the reference-free versus -dependent approach for identifying differentially expressed genes (DEGs). RESULTS: A de novo transcriptome assembly was generated using sequences from M+ and NM Ae. arabicum dry seed morphs. The transcripts of the de novo assembly contained 63.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCO) compared to 90.9% for the transcripts of the reference genome. DEG detection used the strict consensus of three methods (DESeq2, edgeR and NOISeq). Only 37% of 1533 differentially expressed de novo assembled transcripts paired with 1876 genome-derived DEGs. Gene Ontology (GO) terms distinguished the seed morphs: the terms translation and nucleosome assembly were overrepresented in DEGs higher in abundance in M+ dry seeds, whereas terms related to mRNA processing and transcription were overrepresented in DEGs higher in abundance in NM dry seeds. DEGs amongst these GO terms included ribosomal proteins and histones (higher in M+), RNA polymerase II subunits and related transcription and elongation factors (higher in NM). Expression of the inferred DEGs and other genes associated with seed maturation (e.g. those encoding late embryogenesis abundant proteins and transcription factors regulating seed development and maturation such as ABI3, FUS3, LEC1 and WRI1 homologs) were put in context with Arabidopsis thaliana seed maturation and indicated that M+ seeds may desiccate and mature faster than NM. The 1901 transcriptomic DEG set GO-terms had almost 90% overlap with the 2191 genome-derived DEG GO-terms. CONCLUSIONS: Whilst there was only modest overlap of DEGs identified in reference-free versus -dependent approaches, the resulting GO analysis was concordant in both approaches. The identified differences in dry seed transcriptomes suggest mechanisms underpinning previously identified contrasts between morphology and germination behaviour of M+ and NM seeds.
Asunto(s)
Brassicaceae/crecimiento & desarrollo , Brassicaceae/genética , Regulación de la Expresión Génica de las Plantas , Semillas/crecimiento & desarrollo , Semillas/genética , Transcriptoma , Perfilación de la Expresión Génica , Ontología de Genes , Genoma de Planta , Germinación , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Proteínas de Plantas/genéticaRESUMEN
Coevolutionary interactions are thought to have spurred the evolution of key innovations and driven the diversification of much of life on Earth. However, the genetic and evolutionary basis of the innovations that facilitate such interactions remains poorly understood. We examined the coevolutionary interactions between plants (Brassicales) and butterflies (Pieridae), and uncovered evidence for an escalating evolutionary arms-race. Although gradual changes in trait complexity appear to have been facilitated by allelic turnover, key innovations are associated with gene and genome duplications. Furthermore, we show that the origins of both chemical defenses and of molecular counter adaptations were associated with shifts in diversification rates during the arms-race. These findings provide an important connection between the origins of biodiversity, coevolution, and the role of gene and genome duplications as a substrate for novel traits.
Asunto(s)
Brassicaceae/genética , Mariposas Diurnas/genética , Duplicación de Gen , Genoma de los Insectos/genética , Genoma de Planta/genética , Animales , Teorema de Bayes , Biodiversidad , Brassicaceae/clasificación , Brassicaceae/parasitología , Mariposas Diurnas/clasificación , Mariposas Diurnas/fisiología , Evolución Molecular , Expresión Génica , Genes de Insecto/genética , Genes de Plantas/genética , Variación Genética , Interacciones Huésped-Parásitos/genética , Proteínas de Insectos/genética , Filogenia , Proteínas de Plantas/genética , Especificidad de la EspecieRESUMEN
The Brassicaceae, including Arabidopsis thaliana and Brassica crops, is unmatched among plants in its wealth of genomic and functional molecular data and has long served as a model for understanding gene, genome, and trait evolution. However, genome information from a phylogenetic outgroup that is essential for inferring directionality of evolutionary change has been lacking. We therefore sequenced the genome of the spider flower (Tarenaya hassleriana) from the Brassicaceae sister family, the Cleomaceae. By comparative analysis of the two lineages, we show that genome evolution following ancient polyploidy and gene duplication events affect reproductively important traits. We found an ancient genome triplication in Tarenaya (Th-α) that is independent of the Brassicaceae-specific duplication (At-α) and nested Brassica (Br-α) triplication. To showcase the potential of sister lineage genome analysis, we investigated the state of floral developmental genes and show Brassica retains twice as many floral MADS (for minichromosome maintenance1, AGAMOUS, DEFICIENS and serum response factor) genes as Tarenaya that likely contribute to morphological diversity in Brassica. We also performed synteny analysis of gene families that confer self-incompatibility in Brassicaceae and found that the critical serine receptor kinase receptor gene is derived from a lineage-specific tandem duplication. The T. hassleriana genome will facilitate future research toward elucidating the evolutionary history of Brassicaceae genomes.
Asunto(s)
Brassicaceae/genética , Evolución Molecular , Genoma de Planta/genética , Carácter Cuantitativo Heredable , Flores/genética , Flores/crecimiento & desarrollo , Regulación de la Expresión Génica de las Plantas , Genes de Plantas/genética , Proteínas de Dominio MADS/genética , Proteínas de Dominio MADS/metabolismo , Anotación de Secuencia Molecular , Filogenia , Mapeo Físico de Cromosoma , Poliploidía , Reproducción/genética , Autoincompatibilidad en las Plantas con Flores/genética , Análisis de Secuencia de ADN , Sintenía/genética , Factores de TiempoRESUMEN
PREMISE OF THE STUDY: Glucosinolates (GS) are a class of plant secondary metabolites that provide defense against herbivores and may play an important role in pollinator attraction. Through coevolution with plant-interacting organisms, glucosinolates have diversified into a variety of chemotypes through gene sub- and neofunctionalization. Polyploidy has been of major importance in the evolutionary history of these gene families and the development of chemically separate GS types. Here we study the effects of polyploidy in Tarenaya hassleriana (Cleomaceae) on the genes underlying GS biosynthesis. METHODS: We established putative orthologs of all gene families involved in GS biosynthesis through sequence comparison and their duplication method through calculation of synonymous substitution ratios, phylogenetic gene trees, and synteny comparison. We drew expression data from previously published work of the identified genes and compared expression in several tissues. KEY RESULTS: We show that the majority of gene family expansion in T. hassleriana has taken place through the retention of polyploid duplicates, together with tandem and transpositional duplicates. We also show that the large majority (>75%) is actively expressed either globally or in specific tissues. We show that MAM and CYP83 gene families, which are crucial to GS diversification in Brassicaceae, are also recruited into specific tissue expression pathways in Cleomaceae. CONCLUSIONS: Many GS genes have expanded through polyploidy, gene transposition duplication, and tandem duplication in Cleomaceae. Duplicate retention through these mechanisms is similar to A. thaliana, but based on the expression of GS genes, Cleomaceae-specific diversification of GS genes has taken place.
Asunto(s)
Brassicaceae/genética , Flores/genética , Genoma de Planta/genética , Glucosinolatos/genética , Magnoliopsida/genética , Poliploidía , Arabidopsis/genética , Evolución Biológica , Vías Biosintéticas , Duplicación de GenRESUMEN
BACKGROUND: Recent advances in DNA sequencing techniques resulted in more than forty sequenced plant genomes representing a diverse set of taxa of agricultural, energy, medicinal and ecological importance. However, gene family curation is often only inferred from DNA sequence homology and lacks insights into evolutionary processes contributing to gene family dynamics. In a comparative genomics framework, we integrated multiple lines of evidence provided by gene synteny, sequence homology and protein-based Hidden Markov Modelling to extract homologous super-clusters composed of multi-domain resistance (R)-proteins of the NB-LRR type (for NUCLEOTIDE BINDING/LEUCINE-RICH REPEATS), that are involved in plant innate immunity. RESULTS: To assess the diversity of R-proteins within and between species, we screened twelve eudicot plant genomes including six major crops and found a total of 2,363 NB-LRR genes. Our curated R-proteins set shows a 50% average for tandem duplicates and a 22% fraction of gene copies retained from ancient polyploidy events (ohnologs). We provide evidence for strong positive selection and show significant differences in molecular evolution rates (Ka/Ks-ratio) among tandem- (mean = 1.59), ohnolog (mean = 1.36) and singleton (mean = 1.22) R-gene duplicates. To foster the process of gene-edited plant breeding, we report species-specific presence/absence of all 140 NB-LRR genes present in the model plant Arabidopsis and describe four distinct clusters of NB-LRR "gatekeeper" loci sharing syntenic orthologs across all analyzed genomes. CONCLUSION: By curating a near-complete set of multi-domain R-protein clusters in an eudicot-wide scale, our analysis offers significant insight into evolutionary dynamics underlying diversification of the plant innate immune system. Furthermore, our methods provide a blueprint for future efforts to identify and more rapidly clone functional NB-LRR genes from any plant species.
Asunto(s)
Resistencia a la Enfermedad , Evolución Molecular , Familia de Multigenes , Proteínas de Plantas/genética , Plantas/genética , Dominios y Motivos de Interacción de Proteínas , Análisis por Conglomerados , Biología Computacional , Secuencia Conservada , Duplicación de Gen , Sitios Genéticos , Genoma de Planta , Genómica , Anotación de Secuencia Molecular , Proteínas de Plantas/química , Plantas/clasificación , Dominios y Motivos de Interacción de Proteínas/genética , Secuencias Repetidas en TándemRESUMEN
The comparative analysis of plant gene families in a phylogenetic framework has greatly accelerated due to advances in next generation sequencing. In this study, we provide an evolutionary analysis of the L-type lectin receptor kinase and L-type lectin domain proteins (L-type LecRKs and LLPs) that are considered as components in plant immunity, in the plant family Brassicaceae and related outgroups. We combine several lines of evidence provided by sequence homology, HMM-driven protein domain annotation, phylogenetic analysis, and gene synteny for large-scale identification of L-type LecRK and LLP genes within nine core-eudicot genomes. We show that both polyploidy and local duplication events (tandem duplication and gene transposition duplication) have played a major role in L-type LecRK and LLP gene family expansion in the Brassicaceae. We also find significant differences in rates of molecular evolution based on the mode of duplication. Additionally, we show that LLPs share a common evolutionary origin with L-type LecRKs and provide a consistent gene family nomenclature. Finally, we demonstrate that the largest and most diverse L-type LecRK clades are lineage-specific. Our evolutionary analyses of these plant immune components provide a framework to support future plant resistance breeding.
Asunto(s)
Brassicaceae/genética , Evolución Molecular , Duplicación de Gen , Genoma de Planta , Familia de Multigenes , Proteínas de Plantas/genética , Proteínas Serina-Treonina Quinasas/genética , Proteínas de Arabidopsis/clasificación , Proteínas de Arabidopsis/genética , Genes Duplicados , Filogenia , Estructura Terciaria de Proteína/genéticaRESUMEN
An important component of plant evolution is the plethora of pathways producing more than 200,000 biochemically diverse specialized metabolites with pharmacological, nutritional and ecological significance. To unravel dynamics underlying metabolic diversification, it is critical to determine lineage-specific gene family expansion in a phylogenomics framework. However, robust functional annotation is often only available for core enzymes catalyzing committed reaction steps within few model systems. In a genome informatics approach, we extracted information from early-draft gene-space assemblies and non-redundant transcriptomes to identify protein families involved in isoprenoid biosynthesis. Isoprenoids comprise terpenoids with various roles in plant-environment interaction, such as pollinator attraction or pathogen defense. Combining lines of evidence provided by synteny, sequence homology and Hidden-Markov-Modelling, we screened 17 genomes including 12 major crops and found evidence for 1,904 proteins associated with terpenoid biosynthesis. Our terpenoid genes set contains evidence for 840 core terpene-synthases and 338 triterpene-specific synthases. We further identified 190 prenyltransferases, 39 isopentenyl-diphosphate isomerases as well as 278 and 219 proteins involved in mevalonate and methylerithrol pathways, respectively. Assessing the impact of gene and genome duplication to lineage-specific terpenoid pathway expansion, we illustrated key events underlying terpenoid metabolic diversification within 250 million years of flowering plant radiation. By quantifying Angiosperm-wide versatility and phylogenetic relationships of pleiotropic gene families in terpenoid modular pathways, our analysis offers significant insight into evolutionary dynamics underlying diversification of plant secondary metabolism. Furthermore, our data provide a blueprint for future efforts to identify and more rapidly clone terpenoid biosynthetic genes from any plant species.
Asunto(s)
Genoma de Planta , Magnoliopsida/genética , Familia de Multigenes , Filogenia , Proteínas de Plantas/genética , Terpenos/metabolismo , Transferasas Alquil y Aril/genética , Transferasas Alquil y Aril/metabolismo , Evolución Biológica , Isomerasas de Doble Vínculo Carbono-Carbono/genética , Isomerasas de Doble Vínculo Carbono-Carbono/metabolismo , Dimetilaliltranstransferasa/genética , Dimetilaliltranstransferasa/metabolismo , Hemiterpenos , Isoenzimas/genética , Isoenzimas/metabolismo , Magnoliopsida/clasificación , Magnoliopsida/metabolismo , Redes y Vías Metabólicas/genética , Metabolómica , Ácido Mevalónico/metabolismo , Anotación de Secuencia Molecular , Proteínas de Plantas/metabolismoRESUMEN
Plants share a common history of successive whole-genome duplication (WGD) events retaining genomic patterns of duplicate gene copies (ohnologs) organized in conserved syntenic blocks. Duplication was often proposed to affect the origin of novel traits during evolution. However, genetic evidence linking WGD to pathway diversification is scarce. We show that WGD and tandem duplication (TD) accelerated genetic versatility of plant secondary metabolism, exemplified with the glucosinolate (GS) pathway in the mustard family. GS biosynthesis is a well-studied trait, employing at least 52 biosynthetic and regulatory genes in the model plant Arabidopsis. In a phylogenomics approach, we identified 67 GS loci in Aethionema arabicum of the tribe Aethionemae, sister group to all mustard family members. All but one of the Arabidopsis GS gene families evolved orthologs in Aethionema and all but one of the orthologous sequence pairs exhibit synteny. The 45% fraction of duplicates among all protein-coding genes in Arabidopsis was increased to 95% and 97% for Arabidopsis and Aethionema GS pathway inventory, respectively. Compared with the 22% average for all protein-coding genes in Arabidopsis, 52% and 56% of Aethionema and Arabidopsis GS loci align to ohnolog copies dating back to the last common WGD event. Although 15% of all Arabidopsis genes are organized in tandem arrays, 45% and 48% of GS loci in Arabidopsis and Aethionema descend from TD, respectively. We describe a sequential combination of TD and WGD events driving gene family extension, thereby expanding the evolutionary playground for functional diversification and thus potential novelty and success.