RESUMO
Hybrid potato breeding will transform the crop from a clonally propagated tetraploid to a seed-reproducing diploid. Historical accumulation of deleterious mutations in potato genomes has hindered the development of elite inbred lines and hybrids. Utilizing a whole-genome phylogeny of 92 Solanaceae and its sister clade species, we employ an evolutionary strategy to identify deleterious mutations. The deep phylogeny reveals the genome-wide landscape of highly constrained sites, comprising â¼2.4% of the genome. Based on a diploid potato diversity panel, we infer 367,499 deleterious variants, of which 50% occur at non-coding and 15% at synonymous sites. Counterintuitively, diploid lines with relatively high homozygous deleterious burden can be better starting material for inbred-line development, despite showing less vigorous growth. Inclusion of inferred deleterious mutations increases genomic-prediction accuracy for yield by 24.7%. Our study generates insights into the genome-wide incidence and properties of deleterious mutations and their far-reaching consequences for breeding.
Assuntos
Melhoramento Vegetal , Solanum tuberosum , Diploide , Mutação , Filogenia , Solanum tuberosum/genéticaRESUMO
Rational design of plant cis-regulatory DNA sequences without expert intervention or prior domain knowledge is still a daunting task. Here, we developed PhytoExpr, a deep learning framework capable of predicting both mRNA abundance and plant species using the proximal regulatory sequence as the sole input. PhytoExpr was trained over 17 species representative of major clades of the plant kingdom to enhance its generalizability. Via input perturbation, quantitative functional annotation of the input sequence was achieved at single-nucleotide resolution, revealing an abundance of predicted high-impact nucleotides in conserved noncoding sequences and transcription factor binding sites. Evaluation of maize HapMap3 single-nucleotide polymorphisms (SNPs) by PhytoExpr demonstrates an enrichment of predicted high-impact SNPs in cis-eQTL. Additionally, we provided two algorithms that harnessed the power of PhytoExpr in designing functional cis-regulatory variants, and de novo creation of species-specific cis-regulatory sequences through in silico evolution of random DNA sequences. Our model represents a general and robust approach for functional variant discovery in population genetics and rational design of regulatory sequences for genome editing and synthetic biology.
Assuntos
Polimorfismo de Nucleotídeo Único , Sequências Reguladoras de Ácido Nucleico , Zea mays , Sequências Reguladoras de Ácido Nucleico/genética , Zea mays/genética , Locos de Características Quantitativas , Algoritmos , Regulação da Expressão Gênica de Plantas , Aprendizado Profundo , Plantas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Modelos Genéticos , Genes de Plantas , Sítios de Ligação/genéticaRESUMO
Understanding the quantitative genetics of crops has been and will continue to be central to maintaining and improving global food security. We outline four stages that plant breeding either has already achieved or will probably soon achieve. Top-of-the-line breeding programs are currently in Breeding 3.0, where inexpensive, genome-wide data coupled with powerful algorithms allow us to start breeding on predicted instead of measured phenotypes. We focus on three major questions that must be answered to move from current Breeding 3.0 practices to Breeding 4.0: ( a) How do we adapt crops to better fit agricultural environments? ( b) What is the nature of the diversity upon which breeding can act? ( c) How do we deal with deleterious variants? Answering these questions and then translating them to actual gains for farmers will be a significant part of achieving global food security in the twenty-first century.
Assuntos
Produtos Agrícolas/genética , Genoma de Planta/genética , Melhoramento Vegetal , Locos de Características Quantitativas/genética , Genômica , HumanosRESUMO
Pleiotropy-when a single gene controls two or more seemingly unrelated traits-has been shown to impact genes with effects on flowering time, leaf architecture, and inflorescence morphology in maize. However, the genome-wide impact of biological pleiotropy across all maize phenotypes is largely unknown. Here, we investigate the extent to which biological pleiotropy impacts phenotypes within maize using GWAS summary statistics reanalyzed from previously published metabolite, field, and expression phenotypes across the Nested Association Mapping population and Goodman Association Panel. Through phenotypic saturation of 120,597 traits, we obtain over 480 million significant quantitative trait nucleotides. We estimate that only 1.56-32.3% of intervals show some degree of pleiotropy. We then assess the relationship between pleiotropy and various biological features such as gene expression, chromatin accessibility, sequence conservation, and enrichment for gene ontology terms. We find very little relationship between pleiotropy and these variables when compared to permuted pleiotropy. We hypothesize that biological pleiotropy of common alleles is not widespread in maize and is highly impacted by nuisance terms such as population structure and linkage disequilibrium. Natural selection on large standing natural variation in maize populations may target wide and large effect variants, leaving the prevalence of detectable pleiotropy relatively low.
Assuntos
Estudo de Associação Genômica Ampla , Zea mays , Mapeamento Cromossômico , Zea mays/genética , Fenótipo , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Pleiotropia GenéticaRESUMO
Drought tolerance is a highly complex trait controlled by numerous interconnected pathways with substantial variation within and across plant species. This complexity makes it difficult to distill individual genetic loci underlying tolerance, and to identify core or conserved drought-responsive pathways. Here, we collected drought physiology and gene expression datasets across diverse genotypes of the C4 cereals sorghum and maize and searched for signatures defining water-deficit responses. Differential gene expression identified few overlapping drought-associated genes across sorghum genotypes, but using a predictive modeling approach, we found a shared core drought response across development, genotype, and stress severity. Our model had similar robustness when applied to datasets in maize, reflecting a conserved drought response between sorghum and maize. The top predictors are enriched in functions associated with various abiotic stress-responsive pathways as well as core cellular functions. These conserved drought response genes were less likely to contain deleterious mutations than other gene sets, suggesting that core drought-responsive genes are under evolutionary and functional constraints. Our findings support a broad evolutionary conservation of drought responses in C4 grasses regardless of innate stress tolerance, which could have important implications for developing climate resilient cereals.
Assuntos
Sorghum , Zea mays , Zea mays/genética , Sorghum/genética , Secas , Grão Comestível/genética , PoaceaeRESUMO
Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication-informed collinear anchor identification between genomes and performs base pair-resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor-binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.
Assuntos
Genoma de Planta , Polimorfismo Genético , Alinhamento de Sequência , Software , Zea mays/genéticaRESUMO
SignificanceProteins are the machinery which execute essential cellular functions. However, measuring their abundance within an organism can be difficult and resource-intensive. Cells use a variety of mechanisms to control protein synthesis from mRNA, including short open reading frames (uORFs) that lie upstream of the main coding sequence. Ribosomes can preferentially translate uORFs instead of the main coding sequence, leading to reduced translation of the main protein. In this study, we show that uORF sequence variation between individuals can lead to different rates of protein translation and thus variable protein abundances. We also demonstrate that natural variation in uORFs occurs frequently and can be linked to whole-plant phenotypes, indicating that uORF sequence variation likely contributes to plant adaptation.
Assuntos
Biossíntese de Proteínas , Zea mays , Regiões 5' não Traduzidas , Fases de Leitura Aberta/genética , Biossíntese de Proteínas/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ribossomos/genética , Ribossomos/metabolismo , Zea mays/genética , Zea mays/metabolismoRESUMO
Native Americans domesticated maize (Zea mays ssp. mays) from lowland teosinte parviglumis (Zea mays ssp. parviglumis) in the warm Mexican southwest and brought it to the highlands of Mexico and South America where it was exposed to lower temperatures that imposed strong selection on flowering time. Phospholipids are important metabolites in plant responses to low-temperature and phosphorus availability and have been suggested to influence flowering time. Here, we combined linkage mapping with genome scans to identify High PhosphatidylCholine 1 (HPC1), a gene that encodes a phospholipase A1 enzyme, as a major driver of phospholipid variation in highland maize. Common garden experiments demonstrated strong genotype-by-environment interactions associated with variation at HPC1, with the highland HPC1 allele leading to higher fitness in highlands, possibly by hastening flowering. The highland maize HPC1 variant resulted in impaired function of the encoded protein due to a polymorphism in a highly conserved sequence. A meta-analysis across HPC1 orthologs indicated a strong association between the identity of the amino acid at this position and optimal growth in prokaryotes. Mutagenesis of HPC1 via genome editing validated its role in regulating phospholipid metabolism. Finally, we showed that the highland HPC1 allele entered cultivated maize by introgression from the wild highland teosinte Zea mays ssp. mexicana and has been maintained in maize breeding lines from the Northern United States, Canada, and Europe. Thus, HPC1 introgressed from teosinte mexicana underlies a large metabolic QTL that modulates phosphatidylcholine levels and has an adaptive effect at least in part via induction of early flowering time.
Assuntos
Adaptação Fisiológica , Flores , Interação Gene-Ambiente , Fosfatidilcolinas , Fosfolipases A1 , Proteínas de Plantas , Zea mays , Alelos , Mapeamento Cromossômico , Flores/genética , Flores/metabolismo , Genes de Plantas , Ligação Genética , Fosfatidilcolinas/metabolismo , Fosfolipases A1/classificação , Fosfolipases A1/genética , Fosfolipases A1/metabolismo , Proteínas de Plantas/classificação , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Zea mays/genética , Zea mays/crescimento & desenvolvimentoRESUMO
Thousands of species will be sequenced in the next few years; however, understanding how their genomes work, without an unlimited budget, requires both molecular and novel evolutionary approaches. We developed a sensitive sequence alignment pipeline to identify conserved noncoding sequences (CNSs) in the Andropogoneae tribe (multiple crop species descended from a common ancestor â¼18 million years ago). The Andropogoneae share similar physiology while being tremendously genomically diverse, harboring a broad range of ploidy levels, structural variation, and transposons. These contribute to the potential of Andropogoneae as a powerful system for studying CNSs and are factors we leverage to understand the function of maize CNSs. We found that 86% of CNSs were comprised of annotated features, including introns, UTRs, putative cis-regulatory elements, chromatin loop anchors, noncoding RNA (ncRNA) genes, and several transposable element superfamilies. CNSs were enriched in active regions of DNA replication in the early S phase of the mitotic cell cycle and showed different DNA methylation ratios compared to the genome-wide background. More than half of putative cis-regulatory sequences (identified via other methods) overlapped with CNSs detected in this study. Variants in CNSs were associated with gene expression levels, and CNS absence contributed to loss of gene expression. Furthermore, the evolution of CNSs was associated with the functional diversification of duplicated genes in the context of maize subgenomes. Our results provide a quantitative understanding of the molecular processes governing the evolution of CNSs in maize.
RESUMO
Vitamin A deficiency remains prevalent in parts of Asia, Latin America, and sub-Saharan Africa where maize (Zea mays) is a food staple. Extensive natural variation exists for carotenoids in maize grain. Here, to understand its genetic basis, we conducted a joint linkage and genome-wide association study of the US maize nested association mapping panel. Eleven of the 44 detected quantitative trait loci (QTL) were resolved to individual genes. Six of these were correlated expression and effect QTL (ceeQTL), showing strong correlations between RNA-seq expression abundances and QTL allelic effect estimates across six stages of grain development. These six ceeQTL also had the largest percentage of phenotypic variance explained, and in major part comprised the three to five loci capturing the bulk of genetic variation for each trait. Most of these ceeQTL had strongly correlated QTL allelic effect estimates across multiple traits. These findings provide an in-depth genome-level understanding of the genetic and molecular control of carotenoids in plants. In addition, these findings provide a roadmap to accelerate breeding for provitamin A and other priority carotenoid traits in maize grain that should be readily extendable to other cereals.
Assuntos
Carotenoides/metabolismo , Sementes/genética , Zea mays/genética , Zea mays/metabolismo , Epistasia Genética , Variação Genética , Estudo de Associação Genômica Ampla , Fenótipo , Proteínas de Plantas/genética , Locos de Características Quantitativas , Sementes/metabolismoRESUMO
Here we report a multi-tissue gene expression resource that represents the genotypic and phenotypic diversity of modern inbred maize, and includes transcriptomes in an average of 255 lines in seven tissues. We mapped expression quantitative trait loci and characterized the contribution of rare genetic variants to extremes in gene expression. Some of the new mutations that arise in the maize genome can be deleterious; although selection acts to keep deleterious variants rare, their complete removal is impeded by genetic linkage to favourable loci and by finite population size. Modern maize breeders have systematically reduced the effects of this constant mutational pressure through artificial selection and self-fertilization, which have exposed rare recessive variants in elite inbred lines. However, the ongoing effect of these rare alleles on modern inbred maize is unknown. By analysing this gene expression resource and exploiting the extreme diversity and rapid linkage disequilibrium decay of maize, we characterize the effect of rare alleles and evolutionary history on the regulation of expression. Rare alleles are associated with the dysregulation of expression, and we correlate this dysregulation to seed-weight fitness. We find enrichment of ancestral rare variants among expression quantitative trait loci mapped in modern inbred lines, which suggests that historic bottlenecks have shaped regulation. Our results suggest that one path for further genetic improvement in agricultural species lies in purging the rare deleterious variants that have been associated with crop fitness.
Assuntos
Alelos , Regulação da Expressão Gênica de Plantas/genética , Aptidão Genética/genética , Zea mays/genética , Produtos Agrícolas/genética , Variação Genética/genética , Genoma de Planta/genética , Genótipo , Desequilíbrio de Ligação , Fenótipo , Densidade Demográfica , Locos de Características Quantitativas/genética , RNA de Plantas/genética , Sementes/genética , Análise de Sequência de RNARESUMO
Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels-a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)-for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.
Assuntos
Haplótipos/genética , Locos de Características Quantitativas/genética , RNA/genética , Zea mays/genética , Alelos , Mapeamento Cromossômico/métodos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genótipo , Desequilíbrio de Ligação/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
[This corrects the article DOI: 10.1371/journal.pgen.1007019.].
RESUMO
Inbreeding depression is the reduction in fitness and vigor resulting from mating of close relatives observed in many plant and animal species. The extent to which the genetic load of mutations contributing to inbreeding depression is due to large-effect mutations versus variants with very small individual effects is unknown and may be affected by population history. We compared the effects of outcrossing and self-fertilization on 18 traits in a landrace population of maize, which underwent a population bottleneck during domestication, and a neighboring population of its wild relative teosinte. Inbreeding depression was greater in maize than teosinte for 15 of 18 traits, congruent with the greater segregating genetic load in the maize population that we predicted from sequence data. Parental breeding values were highly consistent between outcross and selfed offspring, indicating that additive effects determine most of the genetic value even in the presence of strong inbreeding depression. We developed a novel linkage scan to identify quantitative trait loci (QTL) representing large-effect rare variants carried by only a single parent, which were more important in teosinte than maize. Teosinte also carried more putative juvenile-acting lethal variants identified by segregation distortion. These results suggest a mixture of mostly polygenic, small-effect partially recessive effects in linkage disequilibrium underlying inbreeding depression, with an additional contribution from rare larger-effect variants that was more important in teosinte but depleted in maize following the domestication bottleneck. Purging associated with the maize domestication bottleneck may have selected against some large effect variants, but polygenic load is harder to purge and overall segregating mutational burden increased in maize compared to teosinte.
Assuntos
Domesticação , Depressão por Endogamia/genética , Locos de Características Quantitativas/genética , Zea mays/genética , Genes de Plantas , Variação Genética/genética , Fenótipo , Melhoramento Vegetal , Proteínas de Plantas/genética , Seleção Genética/genética , Zea mays/crescimento & desenvolvimentoRESUMO
Very little is known about how domestication was constrained by the quantitative genetic architecture of crop progenitors and how quantitative genetic architecture was altered by domestication. Yang et al. [C. J. Yang et al., Proc. Natl. Acad. Sci. U.S.A. 116, 5643-5652 (2019)] drew multiple conclusions about how genetic architecture influenced and was altered by maize domestication based on one sympatric pair of teosinte and maize populations. To test the generality of their conclusions, we assayed the structure of genetic variances, genetic correlations among traits, strength of selection during domestication, and diversity in genetic architecture within teosinte and maize. Our results confirm that additive genetic variance is decreased, while dominance genetic variance is increased, during maize domestication. The genetic correlations are moderately conserved among traits between teosinte and maize, while the genetic variance-covariance matrices (G-matrices) of teosinte and maize are quite different, primarily due to changes in the submatrix for reproductive traits. The inferred long-term selection intensities during domestication were weak, and the neutral hypothesis was rejected for reproductive and environmental response traits, suggesting that they were targets of selection during domestication. The G-matrix of teosinte imposed considerable constraint on selection during the early domestication process, and constraint increased further along the domestication trajectory. Finally, we assayed variation among populations and observed that genetic architecture is generally conserved among populations within teosinte and maize but is radically different between teosinte and maize. While selection drove changes in essentially all traits between teosinte and maize, selection explains little of the difference in domestication traits among populations within teosinte or maize.
Assuntos
Produtos Agrícolas/genética , Genes de Plantas , Zea mays/genética , Evolução Molecular , Flores , Interação Gene-Ambiente , Reprodução , Zea mays/fisiologiaRESUMO
Linking genotype with phenotype is a fundamental goal in biology and requires robust data for both. Recent advances in plant-genome sequencing have expedited comparisons among multiple-related individuals. The abundance of structural genomic within-species variation that has been discovered indicates that a single reference genome cannot represent the complete sequence diversity of a species, leading to the expansion of the pan-genome concept. For high-resolution forward genetics, this unprecedented access to genomic variation should be paralleled and integrated with phenotypic characterization of genetic diversity. We developed a multi-parental framework for trait dissection in melon (Cucumis melo), leveraging a novel pan-genome constructed for this highly variable cucurbit crop. A core subset of 25 diverse founders (MelonCore25), consisting of 24 accessions from the two widely cultivated subspecies of C. melo, encompassing 12 horticultural groups, and 1 feral accession was sequenced using a combination of short- and long-read technologies, and their genomes were assembled de novo. The construction of this melon pan-genome exposed substantial variation in genome size and structure, including detection of ~300 000 structural variants and ~9 million SNPs. A half-diallel derived set of 300 F2 populations, representing all possible MelonCore25 parental combinations, was constructed as a framework for trait dissection through integration with the pan-genome. We demonstrate the potential of this unified framework for genetic analysis of various melon traits, including rind color intensity and pattern, fruit sugar content, and resistance to fungal diseases. We anticipate that utilization of this integrated resource will enhance genetic dissection of important traits and accelerate melon breeding.
Assuntos
Cucumis melo , Cucurbitaceae , Cucumis melo/genética , Cucurbitaceae/genética , Melhoramento Vegetal , Mapeamento Cromossômico , FenótipoRESUMO
It has been just over a decade since the release of the maize (Zea mays) Nested Association Mapping (NAM) population. The NAM population has been and continues to be an invaluable resource for the maize genetics community and has yielded insights into the genetic architecture of complex traits. The parental lines have become some of the most well-characterized maize germplasm, and their de novo assemblies were recently made publicly available. As we enter an exciting new stage in maize genomics, this retrospective will summarize the design and intentions behind the NAM population; its application, the discoveries it has enabled, and its influence in other systems; and use the past decade of hindsight to consider whether and how it will remain useful in a new age of genomics.
Assuntos
Melhoramento Vegetal , Locos de Características Quantitativas , Zea mays/genética , Mapeamento Cromossômico , Produtos AgrícolasRESUMO
The genetics of domestication has been extensively studied ever since the rediscovery of Mendel's law of inheritance and much has been learned about the genetic control of trait differences between crops and their ancestors. Here, we ask how domestication has altered genetic architecture by comparing the genetic architecture of 18 domestication traits in maize and its ancestor teosinte using matched populations. We observed a strongly reduced number of QTL for domestication traits in maize relative to teosinte, which is consistent with the previously reported depletion of additive variance by selection during domestication. We also observed more dominance in maize than teosinte, likely a consequence of selective removal of additive variants. We observed that large effect QTL have low minor allele frequency (MAF) in both maize and teosinte. Regions of the genome that are strongly differentiated between teosinte and maize (high FST) explain less quantitative variation in maize than teosinte, suggesting that, in these regions, allelic variants were brought to (or near) fixation during domestication. We also observed that genomic regions of high recombination explain a disproportionately large proportion of heritable variance both before and after domestication. Finally, we observed that about 75% of the additive variance in both teosinte and maize is "missing" in the sense that it cannot be ascribed to detectable QTL and only 25% of variance maps to specific QTL. This latter result suggests that morphological evolution during domestication is largely attributable to very large numbers of QTL of very small effect.
Assuntos
Variação Genética , Locos de Características Quantitativas , Zea mays/genética , Domesticação , Fluxo Gênico , Frequência do Gene , Genes de Plantas , Genética Populacional , Característica Quantitativa Herdável , Seleção Genética , Zea mays/classificaçãoRESUMO
Maximizing soil exploration through modifications of the root system is a strategy for plants to overcome phosphorus (P) deficiency. Genome-wide association with 561 tropical maize inbred lines from Embrapa and DTMA panels was undertaken for root morphology and P acquisition traits under low- and high-P concentrations, with 353,540 SNPs. P supply modified root morphology traits, biomass and P content in the global maize panel, but root length and root surface area changed differentially in Embrapa and DTMA panels. This suggests that different root plasticity mechanisms exist for maize adaptation to low-P conditions. A total of 87 SNPs were associated to phenotypic traits in both P conditions at -log10(p-value) ≥ 5, whereas only seven SNPs reached the Bonferroni significance. Among these SNPs, S9_137746077, which is located upstream of the gene GRMZM2G378852 that encodes a MAPKKK protein kinase, was significantly associated with total seedling dry weight, with the same allele increasing root length and root surface area under P deficiency. The C allele of S8_88600375, mapped within GRMZM2G044531 that encodes an AGC kinase, significantly enhanced root length under low P, positively affecting root surface area and seedling weight. The broad genetic diversity evaluated in this panel suggests that candidate genes and favorable alleles could be exploited to improve P efficiency in maize breeding programs of Africa and Latin America.