RESUMO
Mutant populations are crucial for functional genomics and discovering novel traits for crop breeding. Sorghum, a drought and heat-tolerant C4 species, requires a vast, large-scale, annotated, and sequenced mutant resource to enhance crop improvement through functional genomics research. Here, we report a sorghum large-scale sequenced mutant population with 9.5 million ethyl methane sulfonate (EMS)-induced mutations that covered 98% of sorghum's annotated genes using inbred line BTx623. Remarkably, a total of 610 320 mutations within the promoter and enhancer regions of 18 000 and 11 790 genes, respectively, can be leveraged for novel research of cis-regulatory elements. A comparison of the distribution of mutations in the large-scale mutant library and sorghum association panel (SAP) provides insights into the influence of selection. EMS-induced mutations appeared to be random across different regions of the genome without significant enrichment in different sections of a gene, including the 5' UTR, gene body, and 3'-UTR. In contrast, there were low variation density in the coding and UTR regions in the SAP. Based on the Ka /Ks value, the mutant library (~1) experienced little selection, unlike the SAP (0.40), which has been strongly selected through breeding. All mutation data are publicly searchable through SorbMutDB (https://www.depts.ttu.edu/igcast/sorbmutdb.php) and SorghumBase (https://sorghumbase.org/). This current large-scale sequence-indexed sorghum mutant population is a crucial resource that enriched the sorghum gene pool with novel diversity and a highly valuable tool for the Poaceae family, that will advance plant biology research and crop breeding.
Assuntos
Sorghum , Sorghum/genética , Genética Reversa , Melhoramento Vegetal , Mutação , Fenótipo , Grão Comestível/genética , Metanossulfonato de Etila/farmacologia , Genoma de Planta/genéticaRESUMO
Cuticular wax (CW) is the first defensive barrier of plants that forms a waterproof barrier, protects the plant from desiccation, and defends against insects, pathogens, and UV radiation. Sorghum, an important grass crop with high heat and drought tolerance, exhibits a much higher wax load than other grasses and the model plant Arabidopsis. In this study, we explored the regulation of sorghum CW biosynthesis using a bloomless mutant. The CW on leaf sheaths of the bloomless 41 (bm41) mutant showed significantly reduced very long-chain fatty acids (VLCFAs), triterpenoids, alcohols, and other wax components, with an overall 86% decrease in total wax content compared with the wild type. Notably, the 28-carbon and 30-carbon VLCFAs were decreased in the mutants. Using bulk segregant analysis, we identified the causal gene of the bloomless phenotype as a leucine-rich repeat transmembrane protein kinase. Transcriptome analysis of the wild-type and bm41 mutant leaf sheaths revealed BM41 as a positive regulator of lipid biosynthesis and steroid metabolism. BM41 may regulate CW biosynthesis by regulating the expression of the gene encoding 3-ketoacyl-CoA synthase 6. Identification of BM41 as a new regulator of CW biosynthesis provides fundamental knowledge for improving grass crops' heat and drought tolerance by increasing CW.
Assuntos
Proteínas de Plantas , Sorghum , Ceras , Ceras/metabolismo , Sorghum/genética , Sorghum/metabolismo , Sorghum/fisiologia , Proteínas de Plantas/metabolismo , Proteínas de Plantas/genética , Regulação da Expressão Gênica de Plantas , Epiderme Vegetal/metabolismo , Epiderme Vegetal/genéticaRESUMO
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Assuntos
Genoma de Planta/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imagem Individual de Molécula/métodos , Zea mays/genética , Centrômero/genética , Cromossomos de Plantas/genética , Mapeamento de Sequências Contíguas , Produtos Agrícolas/genética , Elementos de DNA Transponíveis/genética , DNA Intergênico/genética , Genes de Plantas/genética , Anotação de Sequência Molecular , Óptica e Fotônica , Filogenia , RNA Mensageiro/análise , RNA Mensageiro/genética , Padrões de Referência , Sorghum/genéticaRESUMO
Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Genômica/métodos , Proteínas de Plantas/genética , Plantas/genética , Produtos Agrícolas , Elementos de DNA Transponíveis , Duplicação Gênica , Ontologia Genética , Redes Reguladoras de Genes , Internet , Bases de Conhecimento , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , Plantas/classificação , Plantas/metabolismo , Poliploidia , Mapeamento de Interação de Proteínas , Software , Zea mays/genética , Zea mays/metabolismoRESUMO
MAIN CONCLUSION: SorghumBase provides a community portal that integrates genetic, genomic, and breeding resources for sorghum germplasm improvement. Public research and development in agriculture rely on proper data and resource sharing within stakeholder communities. For plant breeders, agronomists, molecular biologists, geneticists, and bioinformaticians, centralizing desirable data into a user-friendly hub for crop systems is essential for successful collaborations and breakthroughs in germplasm development. Here, we present the SorghumBase web portal ( https://www.sorghumbase.org ), a resource for the sorghum research community. SorghumBase hosts a wide range of sorghum genomic information in a modular framework, built with open-source software, to provide a sustainable platform. This initial release of SorghumBase includes: (1) five sorghum reference genome assemblies in a pan-genome browser; (2) genetic variant information for natural diversity panels and ethyl methanesulfonate (EMS)-induced mutant populations; (3) search interface and integrated views of various data types; (4) links supporting interconnectivity with other repositories including genebank, QTL, and gene expression databases; and (5) a content management system to support access to community news and training materials. SorghumBase offers sorghum investigators improved data collation and access that will facilitate the growth of a robust research community to support genomics-assisted breeding.
Assuntos
Sorghum , Bases de Dados Genéticas , Grão Comestível , Genoma de Planta/genética , Genômica , Internet , Melhoramento Vegetal , Sorghum/genéticaRESUMO
SUMMARY: With the advance of next-generation sequencing technologies and reductions in the costs of these techniques, bulked segregant analysis (BSA) has become not only a powerful tool for mapping quantitative trait loci but also a useful way to identify causal gene mutations underlying phenotypes of interest. However, due to the presence of background mutations and errors in sequencing, genotyping, and reference assembly, it is often difficult to distinguish true causal mutations from background mutations. In this study, we developed the BSAseq workflow, which includes an automated bioinformatics analysis pipeline with a probabilistic model for estimating the linked region (the region linked to the causal mutation) and an interactive Shiny web application for visualizing the results. We deeply sequenced a sorghum male-sterile parental line (ms8) to capture the majority of background mutations in our bulked F2 data. We applied the workflow to 11 bulked sorghum F2 populations and 1 rice F2 population and identified the true causal mutation in each population. The workflow is intuitive and straightforward, facilitating its adoption by users without bioinformatics analysis skills. We anticipate that the BSAseq workflow will be broadly applicable to the identification of causal mutations for many phenotypes of interest. AVAILABILITY AND IMPLEMENTATION: BSAseq is freely available on https://www.sciapps.org/page/bsa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Locos de Características Quantitativas , Internet , Mutação , Sorghum/genética , Fluxo de TrabalhoRESUMO
The maize (Zea mays) mutant Unstable factor for orange1 (Ufo1) has been implicated in the epigenetic modifications of pericarp color1 (p1), which regulates the production of the flavonoid pigments phlobaphenes. Here, we show that the ufo1 gene maps to a genetically recalcitrant region near the centromere of chromosome 10. Transcriptome analysis of Ufo1-1 mutant and wild-type plants identified a candidate gene in the mapping region using a comparative sequence-based approach. The candidate gene, GRMZM2G053177, is overexpressed by >45-fold in multiple tissues of Ufo1-1, explaining the dominance of Ufo1-1 and its phenotypes. In the mutant stock, GRMZM2G053177 has a unique transcript originating within a CACTA transposon inserted in its first intron, and it is missing the first four codons of the wild-type transcript. GRMZM2G053177 expression is regulated by the DNA methylation status of the CACTA transposon, explaining the incomplete penetrance and poor expressivity of Ufo1-1 Transgenic overexpression lines of GRMZM2G053177 (Ufo1-1) phenocopy the p1-induced pigmentation in coleoptiles, tassels, leaf sheaths, husks, pericarps, and cob glumes. Transcriptome analysis of Ufo1 versus wild-type tissues revealed changes in several pathways related to abiotic and biotic stress. Thus, this study addresses the enigma of Ufo1 identity in maize, which had gone unsolved for more than 50 years.
Assuntos
Proteínas de Plantas/metabolismo , Zea mays/metabolismo , Metilação de DNA/genética , Metilação de DNA/fisiologia , Elementos de DNA Transponíveis/genética , Epigênese Genética/genética , Regulação da Expressão Gênica de Plantas/genética , Fenótipo , Proteínas de Plantas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Zea mays/genéticaRESUMO
Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.
Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genômica/métodos , Bases de Conhecimento , Plantas/genética , Epigênese Genética , Ontologia Genética , Pesquisa em Genética , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/metabolismo , Software , Interface Usuário-ComputadorRESUMO
Sorghum (Sorghum bicolor) is a versatile C4 crop and a model for research in family Poaceae. High-quality genome sequence is available for the elite inbred line BTx623, but functional validation of genes remains challenging due to the limited genomic and germplasm resources available for comprehensive analysis of induced mutations. In this study, we generated 6400 pedigreed M4 mutant pools from EMS-mutagenized BTx623 seeds through single-seed descent. Whole-genome sequencing of 256 phenotyped mutant lines revealed >1.8 million canonical EMS-induced mutations, affecting >95% of genes in the sorghum genome. The vast majority (97.5%) of the induced mutations were distinct from natural variations. To demonstrate the utility of the sequenced sorghum mutant resource, we performed reverse genetics to identify eight genes potentially affecting drought tolerance, three of which had allelic mutations and two of which exhibited exact cosegregation with the phenotype of interest. Our results establish that a large-scale resource of sequenced pedigreed mutants provides an efficient platform for functional validation of genes in sorghum, thereby accelerating sorghum breeding. Moreover, findings made in sorghum could be readily translated to other members of the Poaceae via integrated genomics approaches.
Assuntos
Sorghum/genética , Genoma de Planta/genética , Genótipo , Mutação/genética , Fenótipo , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/fisiologia , Poaceae/genética , Poaceae/fisiologia , Sorghum/fisiologiaRESUMO
As in other cereal crops, the panicles of sorghum (Sorghum bicolor (L.) Moench) comprise two types of floral spikelets (grass flowers). Only sessile spikelets (SSs) are capable of producing viable grains, whereas pedicellate spikelets (PSs) cease development after initiation and eventually abort. Consequently, grain number per panicle (GNP) is lower than the total number of flowers produced per panicle. The mechanism underlying this differential fertility is not well understood. To investigate this issue, we isolated a series of ethyl methane sulfonate (EMS)-induced multiseeded (msd) mutants that result in full spikelet fertility, effectively doubling GNP. Previously, we showed that MSD1 is a TCP (Teosinte branched/Cycloidea/PCF) transcription factor that regulates jasmonic acid (JA) biosynthesis, and ultimately floral sex organ development. Here, we show that MSD2 encodes a lipoxygenase (LOX) that catalyzes the first committed step of JA biosynthesis. Further, we demonstrate that MSD1 binds to the promoters of MSD2 and other JA pathway genes. Together, these results show that a JA-induced module regulates sorghum panicle development and spikelet fertility. The findings advance our understanding of inflorescence development and could lead to new strategies for increasing GNP and grain yield in sorghum and other cereal crops.
Assuntos
Ciclopentanos/metabolismo , Fertilidade , Oxilipinas/metabolismo , Desenvolvimento Vegetal , Sorghum/fisiologia , Sequência de Aminoácidos , Sítios de Ligação , Grão Comestível , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Redes e Vias Metabólicas , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Regiões Promotoras Genéticas , Ligação Proteica , Sorghum/classificação , Fatores de Transcrição/metabolismoRESUMO
Grain number per panicle is an important component of grain yield in sorghum (Sorghum bicolor (L.)) and other cereal crops. Previously, we reported that mutations in multi-seeded 1 (MSD1) and MSD2 genes result in a two-fold increase in grain number per panicle due to the restoration of the fertility of the pedicellate spikelets, which invariably abort in natural sorghum accessions. Here, we report the identification of another gene, MSD3, which is also involved in the regulation of grain numbers in sorghum. Four bulked F2 populations from crosses between BTx623 and each of the independent msd mutants p6, p14, p21, and p24 were sequenced to 20× coverage of the whole genome on a HiSeq 2000 system. Bioinformatic analyses of the sequence data showed that one gene, Sorbi_3001G407600, harbored homozygous mutations in all four populations. This gene encodes a plastidial ω-3 fatty acid desaturase that catalyzes the conversion of linoleic acid (18:2) to linolenic acid (18:3), a substrate for jasmonic acid (JA) biosynthesis. The msd3 mutants had reduced levels of linolenic acid in both leaves and developing panicles that in turn decreased the levels of JA. Furthermore, the msd3 panicle phenotype was reversed by treatment with methyl-JA (MeJA). Our characterization of MSD1, MSD2, and now MSD3 demonstrates that JA-regulated processes are critical to the msd phenotype. The identification of the MSD3 gene reveals a new target that could be manipulated to increase grain number per panicle in sorghum, and potentially other cereal crops, through the genomic editing of MSD3 functional orthologs.
Assuntos
Produtos Agrícolas/enzimologia , Ciclopentanos/metabolismo , Ácidos Graxos Dessaturases/genética , Ácidos Graxos Dessaturases/metabolismo , Oxilipinas/metabolismo , Sorghum/enzimologia , Alelos , Produtos Agrícolas/efeitos dos fármacos , Produtos Agrícolas/genética , Produtos Agrícolas/crescimento & desenvolvimento , Ciclopentanos/farmacologia , Grão Comestível/efeitos dos fármacos , Grão Comestível/genética , Grão Comestível/crescimento & desenvolvimento , Sequenciamento de Nucleotídeos em Larga Escala , Ácido Linoleico/química , Ácido Linoleico/metabolismo , Mutação , Oxilipinas/farmacologia , Fenótipo , Sementes/efeitos dos fármacos , Sementes/genética , Sementes/crescimento & desenvolvimento , Sorghum/genética , Sorghum/metabolismo , Ácido alfa-Linolênico/biossíntese , Ácido alfa-Linolênico/químicaRESUMO
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to â¼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Plantas/metabolismo , Expressão Gênica , Variação Genética , Genômica , Internet , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Plantas/genéticaRESUMO
Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Genômica , Produtos Agrícolas/genética , Variação Genética , Internet , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/genética , Plantas/metabolismoRESUMO
Although genetic imprinting was discovered in maize 40 years ago, its exact extent in the triploid endosperm remains unknown. Here, we have analyzed global patterns of allelic gene expression in developing maize endosperms from reciprocal crosses between inbreds B73 and Mo17. We have defined an imprinted gene as one in which the relative expression of the maternal and paternal alleles differ at least fivefold in both hybrids of the reciprocal crosses. We found that at least 179 genes (1.6% of protein-coding genes) expressed in the endosperm are imprinted, with 68 of them showing maternal preferential expression and 111 paternal preferential expression. Additionally, 38 long noncoding RNAs were imprinted. The latter are transcribed in either sense or antisense orientation from intronic regions of normal protein-coding genes or from intergenic regions. Imprinted genes show a clear pattern of clustering around the genome, with a number of imprinted genes being adjacent to each other. Analysis of allele-specific methylation patterns of imprinted loci in the hybrid endosperm identified 21 differentially methylated regions (DMRs) of several hundred base pairs in length, corresponding to both imprinted genes and noncoding transcripts. All DMRs identified are uniformly hypomethylated in maternal alleles and hypermethylated in paternal alleles, regardless of the imprinting direction of their corresponding loci. Our study indicates highly extensive and complex regulation of genetic imprinting in maize endosperm, a mechanism that can potentially function in the balancing of the gene dosage of this triploid tissue.
Assuntos
Endosperma/embriologia , Endosperma/genética , Impressão Genômica/genética , Fases de Leitura Aberta/genética , RNA não Traduzido/genética , Zea mays/embriologia , Zea mays/genética , Alelos , Análise por Conglomerados , Metilação de DNA/genética , Genoma de Planta/genética , Íntrons/genética , Reprodutibilidade dos TestesRESUMO
Cereal seeds are vital for food, feed, and agricultural sustainability because they store and provide essential nutrients to human and animal food and feed systems. Unraveling molecular processes in seed development is crucial for enhancing cereal grain yield and quality. We analyze spatiotemporal transcriptome and metabolome profiles during sorghum seed development in the inbred line 'BTx623'. Morphological and molecular analyses identify the key stages of seed maturation, specifying starch biosynthesis onset at 5 days post-anthesis (dpa) and protein at 10 dpa. Transcriptome profiling from 1 to 25 dpa reveal dynamic gene expression pathways, shifting from cellular growth and embryo development (1-5 dpa) to cell division, fatty acid biosynthesis (5-25 dpa), and seed storage compounds synthesis in the endosperm (5-25 dpa). Network analysis identifies 361 and 207 hub genes linked to starch and protein synthesis in the endosperm, respectively, which will help breeders enhance sorghum grain quality. The availability of this data in the sorghum reference genome line establishes a baseline for future studies as new pangenomes emerge, which will consider copy number and presence-absence variation in functional food traits.
Assuntos
Regulação da Expressão Gênica de Plantas , Metaboloma , Sementes , Sorghum , Transcriptoma , Sorghum/genética , Sorghum/metabolismo , Sementes/metabolismo , Sementes/genética , Sementes/crescimento & desenvolvimento , Redes Reguladoras de Genes , Perfilação da Expressão Gênica , Endosperma/metabolismo , Endosperma/genética , Amido/biossíntese , Amido/metabolismo , Grão Comestível/genética , Grão Comestível/metabolismoRESUMO
Sorghum bicolor (L.) Moench is a significant grass crop globally, known for its genetic diversity. High quality genome sequences are needed to capture the diversity. We constructed high-quality, chromosome-level genome assemblies for two vital sorghum inbred lines, Tx2783 and RTx436. Through advanced single-molecule techniques, long-read sequencing and optical maps, we improved average sequence continuity 19-fold and 11-fold higher compared to existing Btx623 v3.0 reference genome and obtained 19 and 18 scaffolds (N50 of 25.6 and 14.4) for Tx2783 and RTx436, respectively. Our gene annotation efforts resulted in 29 612 protein-coding genes for the Tx2783 genome and 29 265 protein-coding genes for the RTx436 genome. Comparative analyses with 26 plant genomes which included 18 sorghum genomes and 8 outgroup species identified around 31 210 protein-coding gene families, with about 13 956 specific to sorghum. Using representative models from gene trees across the 18 sorghum genomes, a total of 72 579 pan-genes were identified, with 14% core, 60% softcore and 26% shell genes. We identified 99 genes in Tx2783 and 107 genes in RTx436 that showed functional enrichment specifically in binding and metabolic processes, as revealed by the GO enrichment Pearson Chi-Square test. We detected 36 potential large inversions in the comparison between the BTx623 Bionano map and the BTx623 v3.1 reference sequence. Strikingly, these inversions were notably absent when comparing Tx2783 or RTx436 with the BTx623 Bionano map. These inversion were mostly in the pericentromeric region which is known to have low complexity regions and harder to assemble and suggests the presence of potential artifacts in the public BTx623 reference assembly. Furthermore, in comparison to Tx2783, RTx436 exhibited 324 883 additional Single Nucleotide Polymorphisms (SNPs) and 16 506 more Insertions/Deletions (INDELs) when using BTx623 as the reference genome. We also characterized approximately 348 nucleotide-binding leucine-rich repeat (NLR) disease resistance genes in the two genomes. These high-quality genomes serve as valuable resources for discovering agronomic traits and structural variation studies.
RESUMO
Sorghum (Sorghum bicolor) is the fifth most important cereal crop worldwide; however, its utilization in food products can be limited due to reduced nutritional quality related to amino acid composition and protein digestibility in cooked products. Low essential amino acid levels and digestibility are influenced by the composition of the sorghum seed storage proteins, kafirins. In this study, we report a core collection of 206 sorghum mutant lines with altered seed storage proteins. Wet lab chemistry analysis was conducted to evaluate the total protein content and 23 amino acids, including 19 protein-bound and 4 non-protein amino acids. We identified mutant lines with diverse compositions of essential and non-essential amino acids. The highest total protein content in these lines was almost double that of the wild-type (BTx623). The mutants identified in this study can be used as a genetic resource to improve the sorghum grain quality and determine the molecular mechanisms underlying the biosynthesis of storage protein and starch in sorghum seeds.
RESUMO
BACKGROUND: As a newly identified category of DNA transposon, helitrons have been found in a large number of eukaryotes genomes. Helitrons have contributed significantly to the intra-specific genome diversity in maize. Although many characteristics of helitrons in the maize genome have been well documented, the sequence of an intact autonomous helitrons has not been identified in maize. In addition, the process of gene fragment capturing during the transposition of helitrons has not been characterized. RESULTS: The whole genome sequences of maize inbred line B73 were analyzed, 1,649 helitron-like transposons including 1,515 helAs and 134 helBs were identified. ZmhelA1, ZmhelB1 and ZmhelB2 all encode an open reading frame (ORF) with intact replication initiator (Rep) motif and a DNA helicase (Hel) domain, which are similar to previously reported autonomous helitrons in other organisms. The putative autonomous ZmhelB1 and ZmhelB2 contain an extra replication factor-a protein1 (RPA1) transposase (RPA-TPase) including three single strand DNA-binding domains (DBD)-A/-B/-C in the ORF. Over ninety percent of maize helitrons identified have captured gene fragments. HelAs and helBs carry 4,645 and 249 gene fragments, which yield 2,507 and 187 different genes respectively. Many helitrons contain mutilple terminal sequences, but only one 3'-terminal sequence had an intact "CTAG" motif. There were no significant differences in the 5'-termini sequence between the veritas terminal sequence and the pseudo sequence. Helitrons not only can capture fragments, but were also shown to lose internal sequences during the course of transposing. CONCLUSIONS: Three putative autonomous elements were identified, which encoded an intact Rep motif and a DNA helicase domain, suggesting that autonomous helitrons may exist in modern maize. The results indicate that gene fragments captured during the transposition of many helitrons happen in a stepwise way, with multiple gene fragments within one helitron resulting from several sequential transpositions. In addition, we have proposed a potential mechanism regarding how helitrons with multiple termini are generated.
Assuntos
Genoma de Planta , Zea mays/genética , Sequência de Aminoácidos , Sequência de Bases , Dosagem de Genes , Dados de Sequência Molecular , Filogenia , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido NucleicoRESUMO
BACKGROUND: miRNAs are known to play important regulatory roles throughout plant development. Until recently, nearly all the miRNAs in maize were identified by comparative analysis to miRNAs sequences of other plant species, such as rice and Arabidopsis. RESULTS: To find new miRNA in this important crop, small RNAs from mixed tissues were sequenced, resulting in over 15 million unique sequences. Our sequencing effort validated 23 of the 28 known maize miRNA families, including 49 unique miRNAs. Using a newly established criterion, based on the precision of miRNA processing from precursors, we identified 66 novel miRNAs in maize. These miRNAs can be grouped into 58 families, 54 of which have not been identified in any other species. Five new miRNAs were validated by northern blot. Moreover, we found targets for 23 of the 66 new miRNAs. The targets of two of these newly identified miRNAs were confirmed by 5'RACE. CONCLUSION: We have implemented a novel method of identifying miRNA by measuring the precision of miRNA processing from precursors. Using this method, 66 novel miRNAs and 50 potential miRNAs have been identified in maize.
Assuntos
MicroRNAs/genética , Processamento Pós-Transcricional do RNA , RNA de Plantas/genética , Análise de Sequência de RNA/métodos , Zea mays/genética , Genoma de PlantaRESUMO
Haplotype phasing maize genetic variants is important for genome interpretation, population genetic analysis and functional analysis of allelic activity. We performed an isoform-level phasing study using two maize inbred lines and their reciprocal crosses, based on single-molecule, full-length cDNA sequencing. To phase and analyze transcripts between hybrids and parents, we developed IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data from embryo, endosperm and root tissues, and identified allele-specific, gene-level and isoform-level differential expression between the inbred parental lines and hybrid offspring. After phasing 6907 genes in the reciprocal hybrids, we annotated the SNPs and identified large-effect genes. In addition, we identified parent-of-origin isoforms, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression.