RESUMO
Pod dehiscence is a major source of yield loss in legumes, which is exacerbated by aridity. Disruptive mutations in "Pod indehiscent 1" (PDH1), a pod sclerenchyma-specific lignin biosynthesis gene, has been linked to significant reductions in dehiscence in several legume species. We compared syntenic PDH1 regions across 12 legumes and two outgroups to uncover key historical evolutionary trends at this important locus. Our results clarified the extent to which PDH1 orthologs are present in legumes, showing the typical genomic context surrounding PDH1 has only arisen relatively recently in certain phaseoloid species (Vigna, Phaseolus, Glycine). The notable absence of PDH1 in Cajanus cajan may be a major contributor to its indehiscent phenotype compared with other phaseoloids. In addition, we identified a novel PDH1 ortholog in Vigna angularis and detected remarkable increases in PDH1 transcript abundance during Vigna unguiculata pod development. Investigation of the shared genomic context of PDH1 revealed it lies in a hotspot of transcription factors and signaling gene families that respond to abscisic acid and drought stress, which we hypothesize may be an additional factor influencing expression of PDH1 under specific environmental conditions. Our findings provide key insights into the evolutionary history of PDH1 and lay the foundation for optimizing the pod dehiscence role of PDH1 in major and understudied legume species.
Assuntos
Phaseolus , Vigna , Vigna/genética , Locos de Características Quantitativas , Genoma de Planta/genética , Phaseolus/genética , GenômicaRESUMO
Crop production systems need to expand their outputs sustainably to feed a burgeoning human population. Advances in genome sequencing technologies combined with efficient trait mapping procedures accelerate the availability of beneficial alleles for breeding and research. Enhanced interoperability between different omics and phenotyping platforms, leveraged by evolving machine learning tools, will help provide mechanistic explanations for complex plant traits. Targeted and rapid assembly of beneficial alleles using optimized breeding strategies and precise genome editing techniques could deliver ideal crops for the future. Realizing desired productivity gains in the field is imperative for securing an adequate future food supply for 10 billion people.
Assuntos
Genoma de Planta , Melhoramento Vegetal , Produtos Agrícolas/genética , Edição de Genes/métodos , Genoma de Planta/genética , Humanos , Fenótipo , Melhoramento Vegetal/métodosRESUMO
SUMMARY: Genome-wide association studies (GWAS) excels at harnessing dense genomic variant datasets to identify candidate regions responsible for producing a given phenotype. However, GWAS and traditional fine-mapping methods do not provide insight into the complex local landscape of linkage that contains and has been shaped by the causal variant(s). Here, we present crosshap, an R package that performs robust density-based clustering of variants based on their linkage profiles to capture haplotype structures in a local genomic region of interest. Following this, crosshap is equipped with visualization tools for choosing optimal clustering parameters (É) before producing an intuitive figure that provides an overview of the complex relationships between linked variants, haplotype combinations, phenotype, and metadata traits. AVAILABILITY AND IMPLEMENTATION: The crosshap package is freely available under the MIT license and can be downloaded directly from CRAN with R >4.0.0. The development version is available on GitHub alongside issue support (https://github.com/jacobimarsh/crosshap). Tutorial vignettes and documentation are available (https://jacobimarsh.github.io/crosshap/).
Assuntos
Documentação , Estudo de Associação Genômica Ampla , Análise por Conglomerados , Haplótipos , FenótipoRESUMO
Heavy and costly use of phosphorus (P) fertiliser is often needed to achieve high crop yields, but only a small amount of applied P fertiliser is available to most crop plants. Hakea prostrata (Proteaceae) is endemic to the P-impoverished landscape of southwest Australia and has several P-saving traits. We identified 16 members of the Phosphate Transporter 1 (PHT1) gene family (HpPHT1;1-HpPHT1;12d) in a long-read genome assembly of H. prostrata. Based on phylogenetics, sequence structure and expression patterns, we classified HpPHT1;1 as potentially involved in Pi uptake from soil and HpPHT1;8 and HpPHT1;9 as potentially involved in Pi uptake and root-to-shoot translocation. Three genes, HpPHT1;4, HpPHT1;6 and HpPHT1;8, lacked regulatory PHR1-binding sites (P1BS) in the promoter regions. Available expression data for HpPHT1;6 and HpPHT1;8 indicated they are not responsive to changes in P supply, potentially contributing to the high P sensitivity of H. prostrata. We also discovered a Proteaceae-specific clade of closely-spaced PHT1 genes that lacked conserved genetic architecture among genera, indicating an evolutionary hot spot within the genome. Overall, the genome assembly of H. prostrata provides a much-needed foundation for understanding the genetic mechanisms of novel adaptations to low P soils in southwest Australian plants.
RESUMO
Narrow-leafed lupin (NLL; Lupinus angustifolius) is a key rotational crop for sustainable farming systems, whose grain is high in protein content. It is a gluten-free, non-genetically modified, alternative protein source to soybean (Glycine max) and as such has gained interest as a human food ingredient. Here, we present a chromosome-length reference genome for the species and a pan-genome assembly comprising 55 NLL lines, including Australian and European cultivars, breeding lines and wild accessions. We present the core and variable genes for the species and report on the absence of essential mycorrhizal associated genes. The genome and pan-genomes of NLL and its close relative white lupin (Lupinus albus) are compared. Furthermore, we provide additional evidence supporting LaRAP2-7 as the key alkaloid regulatory gene for NLL and demonstrate the NLL genome is underrepresented in classical NLR disease resistance genes compared to other sequenced legume species. The NLL genomic resources generated here coupled with previously generated RNA sequencing datasets provide new opportunities to fast-track lupin crop improvement.
Assuntos
Lupinus , Austrália , Cromossomos , Genômica , Humanos , Lupinus/genética , Melhoramento VegetalRESUMO
The pangenome refers to a collection of genomic sequence found in the entire species or population rather than in a single individual; the sequence can be core, present in all individuals, or accessory (variable or dispensable), found in a subset of individuals only. While pangenomic studies were first undertaken in bacterial species, developments in genome sequencing and assembly approaches have allowed construction of pangenomes for eukaryotic organisms, fungi, plants, and animals, including two large-scale human pangenome projects. Analysis of the these pangenomes revealed key differences, most likely stemming from divergent evolutionary histories, but also surprising similarities.
Assuntos
Evolução Biológica , Genoma Bacteriano/genética , Genômica , Plantas/genética , Animais , Bactérias/genética , Humanos , FilogeniaRESUMO
Brassica rapa is grown worldwide as economically important vegetable and oilseed crop. However, its production is challenged by yield-limiting pathogens. The sustainable control of these pathogens mainly relies on the deployment of genetic resistance primarily driven by resistance gene analogues (RGAs). While several studies have identified RGAs in B. rapa, these were mainly based on a single genome reference and do not represent the full range of RGA diversity in B. rapa. In this study, we utilized the B. rapa pangenome, constructed from 71 lines encompassing 12 morphotypes, to describe a comprehensive repertoire of RGAs in B. rapa. We show that 309 RGAs were affected by presence-absence variation (PAV) and 223 RGAs were missing from the reference genome. The transmembrane leucine-rich repeat (TM-LRR) RGA class had more core gene types than variable genes, while the opposite was observed for nucleotide-binding site leucine-rich repeats (NLRs). Comparative analysis with the B. napus pangenome revealed significant RGA conservation (93%) between the two species. We identified 138 candidate RGAs located within known B. rapa disease resistance QTL, of which the majority were under negative selection. Using blackleg gene homologues, we demonstrated how these genes in B. napus were derived from B. rapa. This further clarifies the genetic relationship of these loci, which may be useful in narrowing-down candidate blackleg resistance genes. This study provides a novel genomic resource towards the identification of candidate genes for breeding disease resistance in B. rapa and its relatives.
Assuntos
Brassica napus , Brassica rapa , Brassica rapa/genética , Genes de Plantas/genética , Resistência à Doença/genética , Leucina , Melhoramento Vegetal , Brassica napus/genéticaRESUMO
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Assuntos
Genoma , Software , Evolução BiológicaRESUMO
Recent growth in crop genomic and trait data have opened opportunities for the application of novel approaches to accelerate crop improvement. Machine learning and deep learning are at the forefront of prediction-based data analysis. However, few approaches for genotype to phenotype prediction compare machine learning with deep learning and further interpret the models that support the predictions. This study uses genome wide molecular markers and traits across 1110 soybean individuals to develop accurate prediction models. For 13/14 sets of predictions, XGBoost or random forest outperformed deep learning models in prediction performance. Top ranked SNPs by F-score were identified from XGBoost, and with further investigation found overlap with significantly associated loci identified from GWAS and previous literature. Feature importance rankings were used to reduce marker input by up to 90%, and subsequent models maintained or improved their prediction performance. These findings support interpretable machine learning as an approach for genomic based prediction of traits in soybean and other crops.
Assuntos
Aprendizado Profundo , Glycine max , Genótipo , Aprendizado de Máquina , Fenótipo , Glycine max/genéticaRESUMO
Polyploidy has the potential to allow organisms to outcompete their diploid progenitor(s) and occupy new environments. Shark Bay, Western Australia, is a World Heritage Area dominated by temperate seagrass meadows including Poseidon's ribbon weed, Posidonia australis. This seagrass is at the northern extent of its natural geographic range and experiences extremes in temperature and salinity. Our genomic and cytogenetic assessments of 10 meadows identified geographically restricted, diploid clones (2n = 20) in a single location, and a single widespread, high-heterozygosity, polyploid clone (2n = 40) in all other locations. The polyploid clone spanned at least 180 km, making it the largest known example of a clone in any environment on earth. Whole-genome duplication through polyploidy, combined with clonality, may have provided the mechanism for P. australis to expand into new habitats and adapt to new environments that became increasingly stressful for its diploid progenitor(s). The new polyploid clone probably formed in shallow waters after the inundation of Shark Bay less than 8500 years ago and subsequently expanded via vegetative growth into newly submerged habitats.
Assuntos
Alismatales , Tubarões , Animais , Diploide , Ecossistema , PoliploidiaRESUMO
High-throughput phenotyping (HTP) platforms are capable of monitoring the phenotypic variation of plants through multiple types of sensors, such as red green and blue (RGB) cameras, hyperspectral sensors, and computed tomography, which can be associated with environmental and genotypic data. Because of the wide range of information provided, HTP datasets represent a valuable asset to characterize crop phenotypes. As HTP becomes widely employed with more tools and data being released, it is important that researchers are aware of these resources and how they can be applied to accelerate crop improvement. Researchers may exploit these datasets either for phenotype comparison or employ them as a benchmark to assess tool performance and to support the development of tools that are better at generalizing between different crops and environments. In this review, we describe the use of image-based HTP for yield prediction, root phenotyping, development of climate-resilient crops, detecting pathogen and pest infestation, and quantitative trait measurement. We emphasize the need for researchers to share phenotypic data, and offer a comprehensive list of available datasets to assist crop breeders and tool developers to leverage these resources in order to accelerate crop breeding.
Assuntos
Produtos Agrícolas/genética , Genômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Disseminação de Informação/métodos , Fenótipo , Melhoramento Vegetal/métodosRESUMO
KEY MESSAGE: The major soy protein QTL, cqProt-003, was analysed for haplotype diversity and global distribution, and results indicate 304 bp deletion and variable tandem repeats in protein coding regions are likely causal candidates. Here, we present association and linkage analysis of 985 wild, landrace and cultivar soybean accessions in a pan genomic dataset to characterize the major high-protein/low-oil associated locus cqProt-003 located on chromosome 20. A significant trait-associated region within a 173 kb linkage block was identified, and variants in the region were characterized, identifying 34 high confidence SNPs, 4 insertions, 1 deletion and a larger 304 bp structural variant in the high-protein haplotype. Trinucleotide tandem repeats of variable length present in the second exon of gene Glyma.20G085100 are strongly correlated with the high-protein phenotype and likely represent causal variation. Structural variation has previously been found in the same gene, for which we report the global distribution of the 304 bp deletion and have identified additional nested variation present in high-protein individuals. Mapping variation at the cqProt-003 locus across demographic groups suggests that the high-protein haplotype is common in wild accessions (94.7%), rare in landraces (10.6%) and near absent in cultivated breeding pools (4.1%), suggesting its decrease in frequency primarily correlates with domestication and continued during subsequent improvement. However, the variation that has persisted in under-utilized wild and landrace populations holds high breeding potential for breeders willing to forego seed oil to maximize protein content. The results of this study include the identification of distinct haplotype structures within the high-protein population, and a broad characterization of the genomic context and linkage patterns of cqProt-003 across global populations, supporting future functional characterization and modification.
Assuntos
Fabaceae , Glycine max , Fabaceae/genética , Haplótipos , Melhoramento Vegetal , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sementes/metabolismo , Glycine max/genética , Glycine max/metabolismoRESUMO
Brassica juncea (AABB), Indian mustard, is a source of disease resistance genes for a wide range of pathogens. The availability of reference genome sequences for B. juncea has made it possible to characterise the genomic structure and distribution of these disease resistance genes. Potentially functional disease resistance genes can be identified by co-localization with genetically mapped disease resistance quantitative trait loci (QTL). Here we identify and characterise disease resistance gene analogs (RGAs), including nucleotide-binding site-leucine-rich repeat (NLR), receptor-like kinase (RLK) and receptor-like protein (RLP) classes, and investigate their association with disease resistance QTL intervals. The molecular genetic marker sequences for four white rust (Albugo candida) disease resistance QTL, six blackleg (Leptosphaeria maculans) disease resistance QTL and BjCHI1, a gene cloned from B. juncea for hypocotyl rot disease, were extracted from previously published studies and used to compare with candidate RGAs. Our results highlight the complications for the identification of functional resistance genes, including the duplicated appearance of genetic markers for several resistance loci, including Ac2(t), AcB1-A4.1, AcB1-A5.1, Rlm6 and PhR2 in both the A and B genomes, due to the presence of homoeologous regions. Furthermore, the white rust loci, Ac2(t) and AcB1-A4.1, mapped to the same position on chromosome A04 and may be different alleles of the same gene. Despite these challenges, a total of nine candidate genomic regions hosting 14 RLPs, 28 NLRs and 115 RLKs were identified. This study facilitates the mapping and cloning of functional resistance genes for applications in crop improvement programs. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-022-01309-5.
RESUMO
Pangenomes aim to represent the complete repertoire of the genome diversity present within a species or cohort of species, capturing the genomic structural variance between individuals. This genomic information coupled with phenotypic data can be applied to identify genes and alleles involved with abiotic stress tolerance, disease resistance, and other desirable traits. The characterisation of novel structural variants from pangenomes can support genome editing approaches such as Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR associated protein Cas (CRISPR-Cas), providing functional information on gene sequences and new target sites in variant-specific genes with increased efficiency. This review discusses the application of pangenomes in genome editing and crop improvement, focusing on the potential of pangenomes to accurately identify target genes for CRISPR-Cas editing of plant genomes while avoiding adverse off-target effects. We consider the limitations of applying CRISPR-Cas editing with pangenome references and potential solutions to overcome these limitations.
Assuntos
Sistemas CRISPR-Cas/genética , Produtos Agrícolas/genética , Genoma de Planta/genética , Edição de Genes/métodos , Fenótipo , Melhoramento Vegetal/métodos , Plantas Geneticamente Modificadas/genéticaRESUMO
Pangenomes are a rich resource to examine the genomic variation observed within a species or genera, supporting population genetics studies, with applications for the improvement of crop traits. Major crop species such as maize (Zea mays), rice (Oryza sativa), Brassica (Brassica spp.), and soybean (Glycine max) have had pangenomes constructed and released, and this has led to the discovery of valuable genes associated with disease resistance and yield components. However, pangenome data are not available for many less prominent crop species that are currently under-utilised. Despite many under-utilised species being important food sources in regional populations, the scarcity of genomic data for these species hinders their improvement. Here, we assess several under-utilised crops and review the pangenome approaches that could be used to build resources for their improvement. Many of these under-utilised crops are cultivated in arid or semi-arid environments, suggesting that novel genes related to drought tolerance may be identified and used for introgression into related major crop species. In addition, we discuss how previously collected data could be used to enrich pangenome functional analysis in genome-wide association studies (GWAS) based on studies in major crops. Considering the technological advances in genome sequencing, pangenome references for under-utilised species are becoming more obtainable, offering the opportunity to identify novel genes related to agro-morphological traits in these species.
Assuntos
Estudo de Associação Genômica Ampla , Oryza , Mapeamento Cromossômico , Produtos Agrícolas/genética , Genoma de Planta , Oryza/genética , Melhoramento Vegetal , Glycine max/genética , Zea mays/genéticaRESUMO
Rye (Secale cereale) is a climate-resilient cereal grown extensively as grain or forage crop in Northern and Eastern Europe. In addition to being an important crop, it has been used to improve wheat through introgression of genomic regions for improved yield and disease resistance. Understanding the genomic diversity of rye will assist both the improvement of this crop and facilitate the introgression of more valuable traits into wheat. Here, we isolated and sequenced the short arm of rye chromosome 7 (7RS) from Triticale 380SD using flow cytometry and compared it to the public Lo7 rye whole genome reference assembly. We identify 2747 Lo7 genes present on the isolated chromosome arm and two clusters containing seven and sixty-five genes that are present on Triticale 380SD 7RS, but absent from Lo7 7RS. We identified 29 genes that are not assigned to chromosomal locations in the Lo7 assembly but are present on Triticale 380SD 7RS, suggesting a chromosome arm location for these genes. Our study supports the Lo7 reference assembly and provides a repertoire of genes on Triticale 7RS.
Assuntos
Secale , Triticale , Cromossomos de Plantas/genética , Resistência à Doença/genética , Grão Comestível/genética , Secale/genética , Triticale/genética , Triticum/genéticaRESUMO
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Assuntos
Variações do Número de Cópias de DNA , Genômica , Variação Genética , Variação Estrutural do Genoma , FenótipoRESUMO
Brassica rapa displays a wide range of morphological diversity which is exploited for a variety of food crops. Here we present a high-quality genome assembly for pak choi (Brassica rapa L. subsp. chinensis), an important non-heading leafy vegetable, and comparison with the genomes of heading type Chinese cabbage and the oilseed form, yellow sarson. Gene presence-absence variation (PAV) and genomic structural variations (SV) were identified, together with single nucleotide polymorphisms (SNPs). The structure and expression of genes for leaf morphology and flowering were compared between the three morphotypes revealing candidate genes for these traits in B. rapa. The pak choi genome assembly and its comparison with other B. rapa genome assemblies provides a valuable resource for the genetic improvement of this important vegetable crop and as a model to understand the diversity of morphological variation across Brassica species.
Assuntos
Brassica rapa , Brassica , Brassica/genética , Brassica rapa/genética , China , Fenótipo , Folhas de Planta/genéticaRESUMO
Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.
Assuntos
Brassica napus , Brassica , Brassica/genética , Brassica napus/genética , Diploide , Genoma de Planta/genética , PoliploidiaRESUMO
EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article.