RESUMO
Poplar (Populus) is a well-established model system for tree genomics and molecular breeding, and hybrid poplar is widely used in forest plantations. However, distinguishing its diploid homologous chromosomes is difficult, complicating advanced functional studies on specific alleles. In this study, we applied a trio-binning design and PacBio high-fidelity long-read sequencing to obtain haplotype-phased telomere-to-telomere genome assemblies for the 2 parents of the well-studied F1 hybrid "84K" (Populus alba × Populus tremula var. glandulosa). Almost all chromosomes, including the telomeres and centromeres, were completely assembled for each haplotype subgenome apart from 2 small gaps on one chromosome. By incorporating information from these haplotype assemblies and extensive RNA-seq data, we analyzed gene expression patterns between the 2 subgenomes and alleles. Transcription bias at the subgenome level was not uncovered, but extensive-expression differences were detected between alleles. We developed machine-learning (ML) models to predict allele-specific expression (ASE) with high accuracy and identified underlying genome features most highly influencing ASE. One of our models with 15 predictor variables achieved 77% accuracy on the training set and 74% accuracy on the testing set. ML models identified gene body CHG methylation, sequence divergence, and transposon occupancy both upstream and downstream of alleles as important factors for ASE. Our haplotype-phased genome assemblies and ML strategy highlight an avenue for functional studies in Populus and provide additional tools for studying ASE and heterosis in hybrids.
Assuntos
Alelos , Genoma de Planta , Populus , Populus/genética , Genoma de Planta/genética , Regulação da Expressão Gênica de Plantas , Haplótipos/genética , Hibridização Genética , Aprendizado de MáquinaRESUMO
Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, such as chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. Despite their critical functions in regulating plant growth, root development, and seed dormancy, the identification of plant lncRNAs remains a challenge due to the scarcity of specific and extensively tested identification methods. Most mainstream machine learning-based methods used for plant lncRNA identification were initially developed using human or other animal datasets, and their accuracy and effectiveness in predicting plant lncRNAs have not been fully evaluated or exploited. To overcome this limitation, we retrained several models, including CPAT, PLEK, and LncFinder, using plant datasets and compared their performance with mainstream lncRNA prediction tools such as CPC2, CNCI, RNAplonc, and LncADeep. Retraining these models significantly improved their performance, and two of the retrained models, LncFinder-plant and CPAT-plant, alongside their ensemble, emerged as the most suitable tools for plant lncRNA identification. This underscores the importance of model retraining in tackling the challenges associated with plant lncRNA identification. Finally, we developed a pipeline (Plant-LncPipe) that incorporates an ensemble of the two best-performing models and covers the entire data analysis process, including reads mapping, transcript assembly, lncRNA identification, classification, and origin, for the efficient identification of lncRNAs in plants. The pipeline, Plant-LncPipe, is available at: https://github.com/xuechantian/Plant-LncRNA-pipline.
RESUMO
Wood decay resistance (WDR) is marking the value of wood utilization. Many trees of the Lauraceae have exceptional WDR, as evidenced by their use in ancient royal palace buildings in China. However, the genetics of WDR remain elusive. Here, through comparative genomics, we revealed the unique characteristics related to the high WDR in Lauraceae trees. We present a 1.27-Gb chromosome-level assembly for Lindera megaphylla (Lauraceae). Comparative genomics integrating major groups of angiosperm revealed Lauraceae species have extensively shared gene microsynteny associated with the biosynthesis of specialized metabolites such as isoquinoline alkaloids, flavonoid, lignins and terpenoid, which play significant roles in WDR. In Lauraceae genomes, tandem and proximal duplications (TD/PD) significantly expanded the coding space of key enzymes of biosynthesis pathways related to WDR, which may enhance the decay resistance of wood by increasing the accumulation of these compounds. Among Lauraceae species, genes of WDR-related biosynthesis pathways showed remarkable expansion by TD/PD and conveyed unique and conserved motifs in their promoter and protein sequences, suggesting conserved gene collinearity, gene expansion and gene regulation supporting the high WDR. Our study thus reveals genomic profiles related to biochemical transitions among major plant groups and the genomic basis of WDR in the Lauraceae.
RESUMO
Terpenes and terpenoids are key natural compounds for plant defense, development, and composition of plant oil. The synthesis and accumulation of a myriad of volatile terpenoid compounds in these plants may dramatically alter the quality and flavor of the oils, which provide great commercial utilization value for oil-producing plants. Terpene synthases (TPSs) are important enzymes responsible for terpenic diversity. Investigating the differentiation of the TPS gene family could provide valuable theoretical support for the genetic improvement of oil-producing plants. While the origin and function of TPS genes have been extensively studied, the exact origin of the initial gene fusion event - it occurred in plants or microbes - remains uncertain. Furthermore, a comprehensive exploration of the TPS gene differentiation is still pending. Here, phylogenetic analysis revealed that the fusion of the TPS gene likely occurred in the ancestor of land plants, following the acquisition of individual C- and N- terminal domains. Potential mutual transfer of TPS genes was observed among microbes and plants. Gene synteny analysis disclosed a differential divergence pattern between TPS-c and TPS-e/f subfamilies involved in primary metabolism and those (TPS-a/b/d/g/h subfamilies) crucial for secondary metabolites. Biosynthetic gene clusters (BGCs) analysis suggested a correlation between lineage divergence and potential natural selection in structuring terpene diversities. This study provides fresh perspectives on the origin and evolution of the TPS gene family.
RESUMO
Coriaria nepalensis Wall. (Coriariaceae) is a nitrogen-fixing shrub which forms root nodules with the actinomycete Frankia. Oils and extracts of C. nepalensis have been reported to be bacteriostatic and insecticidal, and C. nepalensis bark provides a valuable tannin resource. Here, by combining PacBio HiFi sequencing and Hi-C scaffolding techniques, we generated a haplotype-resolved chromosome-scale genome assembly for C. nepalensis. This genome assembly is approximately 620 Mb in size with a contig N50 of 11 Mb, with 99.9% of the total assembled sequences anchored to 40 pseudochromosomes. We predicted 60,862 protein-coding genes of which 99.5% were annotated from databases. We further identified 939 tRNAs, 7,297 rRNAs, and 982 ncRNAs. The chromosome-scale genome of C. nepalensis is expected to be a significant resource for understanding the genetic basis of root nodulation with Frankia, toxicity, and tannin biosynthesis.
Assuntos
Genoma de Planta , Magnoliopsida , Haplótipos , Magnoliopsida/genética , Anotação de Sequência Molecular , Filogenia , Cromossomos de PlantasRESUMO
Xanthoceras sorbifolium (yellowhorn) is a woody oil plant with super stress resistance and excellent oil characteristics. The yellowhorn oil can be used as biofuel and edible oil with high nutritional and medicinal value. However, genetic studies on yellowhorn are just in the beginning, and fundamental biological questions regarding its very long-chain fatty acid (VLCFA) biosynthesis pathway remain largely unknown. In this study, we reconstructed the VLCFA biosynthesis pathway and annotated 137 genes encoding relevant enzymes. We identified four oleosin genes that package triacylglycerols (TAGs) and are specifically expressed in fruits, likely playing key roles in yellowhorn oil production. Especially, by examining time-ordered gene co-expression network (TO-GCN) constructed from fruit and leaf developments, we identified key enzymatic genes and potential regulatory transcription factors involved in VLCFA synthesis. In fruits, we further inferred a hierarchical regulatory network with MYB-related (XS03G0296800) and B3 (XS02G0057600) transcription factors as top-tier regulators, providing clues into factors controlling carbon flux into fatty acids. Our results offer new insights into key genes and transcriptional regulators governing fatty acid production in yellowhorn, laying the foundation for efforts to optimize oil content and fatty acid composition. Moreover, the gene expression patterns and putative regulatory relationships identified here will inform metabolic engineering and molecular breeding approaches tailored to meet biofuel and bioproduct demands.
RESUMO
The genus Rhododendron (Ericaceae), with more than 1000 species highly diverse in flower color, is providing distinct ornamental values and a model system for flower color studies. Here, we investigated the divergence between two parental species with different flower color widely used for azalea breeding. Gapless genome assembly was generated for the yellow-flowered azalea, Rhododendron molle. Comparative genomics found recent proliferation of long terminal repeat retrotransposons (LTR-RTs), especially Gypsy, has resulted in a 125 Mb (19%) genome size increase in species-specific regions, and a significant amount of dispersed gene duplicates (13 402) and pseudogenes (17 437). Metabolomic assessment revealed that yellow flower coloration is attributed to the dynamic changes of carotenoids/flavonols biosynthesis and chlorophyll degradation. Time-ordered gene co-expression networks (TO-GCNs) and the comparison confirmed the metabolome and uncovered the specific gene regulatory changes underpinning the distinct flower pigmentation. B3 and ERF TFs were found dominating the gene regulation of carotenoids/flavonols characterized pigmentation in R. molle, while WRKY, ERF, WD40, C2H2, and NAC TFs collectively regulated the anthocyanins characterized pigmentation in the red-flowered R simsii. This study employed a multi-omics strategy in disentangling the complex divergence between two important azaleas and provided references for further functional genetics and molecular breeding.
RESUMO
Sour or wild jujube fruits and dried seeds are popular food all over the world. In this study, we reported a high-quality genome assembly of sour jujube (Ziziphus jujuba Mill. var. spinosa), with a size of 406 Mbp and scaffold N50 of 30.3 Mbp, which experienced only γ hexaploidization event, without recent genome duplication. Population structure analysis identified four jujube subgroups (two domesticated ones, i.e., D1 in West China and D2 in East/SouthEast China, semi-wild, and wild), which underwent an evolutionary history of a significant decline of effective population size during the Last Glacial Period. The respective selection signatures of three subgroups were discovered, such as strong peaks on chromosomes #3 in D1, #1 in D2, and #4 in wild. Genes under the most significant selection on chromosomes #4 in wild were confirmed to be involved in fruit variations among jujube accessions, in transcriptomic analysis. Our study offered novel insights into the jujube population structure and domestication and provided valuable genomic resources for jujube improvement in stress response and fruit flavor in the future.
RESUMO
Polyploidization plays a key role in plant evolution, but the forces driving the fate of homoeologs in polyploid genomes, i.e., paralogs resulting from a whole-genome duplication (WGD) event, remain to be elucidated. Here, we present a chromosome-scale genome assembly of tetraploid scarlet sage (Salvia splendens), one of the most diverse ornamental plants. We found evidence for three WGD events following an older WGD event shared by most eudicots (the γ event). A comprehensive, spatiotemporal, genome-wide analysis of homoeologs from the most recent WGD unveiled expression asymmetries, which could be associated with genomic rearrangements, transposable element proximity discrepancies, coding sequence variation, selection pressure, and transcription factor binding site differences. The observed differences between homoeologs may reflect the first step toward sub- and/or neofunctionalization. This assembly provides a powerful tool for understanding WGD and gene and genome evolution and is useful in developing functional genomics and genetic engineering strategies for scarlet sage and other Lamiaceae species.
RESUMO
Ginger (Zingiber officinale) is one of the most valued spice plants worldwide; it is prized for its culinary and folk medicinal applications and is therefore of high economic and cultural importance. Here, we present a haplotype-resolved, chromosome-scale assembly for diploid ginger anchored to 11 pseudochromosome pairs with a total length of 3.1 Gb. Remarkable structural variation was identified between haplotypes, and two inversions larger than 15 Mb on chromosome 4 may be associated with ginger infertility. We performed a comprehensive, spatiotemporal, genome-wide analysis of allelic expression patterns, revealing that most alleles are coordinately expressed. The alleles that exhibited the largest differences in expression showed closer proximity to transposable elements, greater coding sequence divergence, more relaxed selection pressure, and more transcription factor binding site differences. We also predicted the transcription factors potentially regulating 6-gingerol biosynthesis. Our allele-aware assembly provides a powerful platform for future functional genomics, molecular breeding, and genome editing in ginger.
RESUMO
Azaleas (Ericaceae) comprise one of the most diverse ornamental plants, renowned for their cultural and economic importance. We present a chromosome-scale genome assembly for Rhododendron simsii, the primary ancestor of azalea cultivars. Genome analyses unveil the remnants of an ancient whole-genome duplication preceding the radiation of most Ericaceae, likely contributing to the genomic architecture of flowering time. Small-scale gene duplications contribute to the expansion of gene families involved in azalea pigment biosynthesis. We reconstruct entire metabolic pathways for anthocyanins and carotenoids and their potential regulatory networks by detailed analysis of time-ordered gene co-expression networks. MYB, bHLH, and WD40 transcription factors may collectively regulate anthocyanin accumulation in R. simsii, particularly at the initial stages of flower coloration, and with WRKY transcription factors controlling progressive flower coloring at later stages. This work provides a cornerstone for understanding the underlying genetics governing flower timing and coloration and could accelerate selective breeding in azalea.