ABSTRACT
Genome structural variation (SV) contributes strongly to trait variation in eukaryotic species and may have an even higher functional significance than single-nucleotide polymorphism (SNP). In recent years, there have been a number of studies associating large chromosomal scale SV ranging from hundreds of kilobases all the way up to a few megabases to key agronomic traits in plant genomes. However, there have been little or no efforts towards cataloguing small- (30-10 000 bp) to mid-scale (10 000-30 000 bp) SV and their impact on evolution and adaptation-related traits in plants. This might be attributed to complex and highly duplicated nature of plant genomes, which makes them difficult to assess using high-throughput genome screening methods. Here, we describe how long-read sequencing technologies can overcome this problem, revealing a surprisingly high level of widespread, small- to mid-scale SV in a major allopolyploid crop species, Brassica napus. We found that up to 10% of all genes were affected by small- to mid-scale SV events. Nearly half of these SV events ranged between 100 bp and 1000 bp, which makes them challenging to detect using short-read Illumina sequencing. Examples demonstrating the contribution of such SV towards eco-geographical adaptation and disease resistance in oilseed rape suggest that revisiting complex plant genomes using medium-coverage long-read sequencing might reveal unexpected levels of functional gene variation, with major implications for trait regulation and crop improvement.
Subject(s)
Brassica napus , Polyploidy , Brassica napus/genetics , Disease Resistance/genetics , Genome, Plant/genetics , Humans , Polymorphism, Single Nucleotide/geneticsABSTRACT
Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.
Subject(s)
Brassica napus , Brassica , Brassica/genetics , Brassica napus/genetics , Diploidy , Genome, Plant/genetics , PolyploidyABSTRACT
KEY MESSAGE: A novel structural variant was discovered in the FLOWERING LOCUS T orthologue BnaFT.A02 by long-read sequencing. Nested association mapping in an elite winter oilseed rape population revealed that this 288 bp deletion associates with early flowering, putatively by modification of binding-sites for important flowering regulation genes. Perfect timing of flowering is crucial for optimal pollination and high seed yield. Extensive previous studies of flowering behavior in Brassica napus (canola, rapeseed) identified mutations in key flowering regulators which differentiate winter, semi-winter and spring ecotypes. However, because these are generally fixed in locally adapted genotypes, they have only limited relevance for fine adjustment of flowering time in elite cultivar gene pools. In crosses between ecotypes, the ecotype-specific major-effect mutations mask minor-effect loci of interest for breeding. Here, we investigated flowering time in a multiparental mapping population derived from seven elite winter oilseed rape cultivars which are fixed for major-effect mutations separating winter-type rapeseed from other ecotypes. Association mapping revealed eight genomic regions on chromosomes A02, C02 and C03 associating with fine modulation of flowering time. Long-read genomic resequencing of the seven parental lines identified seven structural variants coinciding with candidate genes for flowering time within chromosome regions associated with flowering time. Segregation patterns for these variants in the elite multiparental population and a diversity set of winter types using locus-specific assays revealed significant associations with flowering time for three deletions on chromosome A02. One of these was a previously undescribed 288 bp deletion within the second intron of FLOWERING LOCUS T on chromosome A02, emphasizing the advantage of long-read sequencing for detection of structural variants in this size range. Detailed analysis revealed the impact of this specific deletion on flowering-time modulation under extreme environments and varying day lengths in elite, winter-type oilseed rape.
Subject(s)
Brassica napus/growth & development , Flowers/growth & development , Plant Proteins/genetics , Quantitative Trait Loci , Seasons , Brassica napus/genetics , Brassica napus/metabolism , Chromosome Mapping , Flowers/genetics , Flowers/metabolism , Genomics , Plant Breeding , Plant Proteins/metabolismABSTRACT
There is an increasing understanding that variation in gene presence-absence plays an important role in the heritability of agronomic traits; however, there have been relatively few studies on variation in gene presence-absence in crop species. Hexaploid wheat is one of the most important food crops in the world and intensive breeding has reduced the genetic diversity of elite cultivars. Major efforts have produced draft genome assemblies for the cultivar Chinese Spring, but it is unknown how well this represents the genome diversity found in current modern elite cultivars. In this study we build an improved reference for Chinese Spring and explore gene diversity across 18 wheat cultivars. We predict a pangenome size of 140 500 ± 102 genes, a core genome of 81 070 ± 1631 genes and an average of 128 656 genes in each cultivar. Functional annotation of the variable gene set suggests that it is enriched for genes that may be associated with important agronomic traits. In addition to variation in gene presence, more than 36 million intervarietal single nucleotide polymorphisms were identified across the pangenome. This study of the wheat pangenome provides insight into genome diversity in elite wheat as a basis for genomics-based improvement of this important crop. A wheat pangenome, GBrowse, is available at http://appliedbioinformatics.com.au/cgi-bin/gb2/gbrowse/WheatPan/, and data are available to download from http://wheatgenome.info/wheat_genome_databases.php.
Subject(s)
Genome, Plant/genetics , Triticum/genetics , Chromosomes, Plant/genetics , Genetic Variation/genetics , Polymorphism, Single Nucleotide/geneticsABSTRACT
SUMMARY: We developed runBNG, an open-source software package which wraps BioNano genomic analysis tools into a single script that can be run on the command line. runBNG can complete analyses, including quality control of single molecule maps, optical map de novo assembly, comparisons between different optical maps, super-scaffolding and structural variation detection. Compared to existing software BioNano IrysView and the KSU scripts, the major advantages of runBNG are that the whole pipeline runs on one single platform and it has a high customizability. AVAILABILITY AND IMPLEMENTATION: runBNG is written in bash, with the requirement of BioNano IrysSolve packages, GCC, Perl and Python software. It is freely available at https://github.com/appliedbioinformatics/runBNG. CONTACT: dave.edwards@uwa.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Genomics/methods , SoftwareABSTRACT
Seagrasses are marine angiosperms that live fully submerged in the sea. They evolved from land plant ancestors, with multiple species representing at least three independent return-to-the-sea events. This raises the question of whether these marine angiosperms followed the same adaptation pathway to allow them to live and reproduce under the hostile marine conditions. To compare the basis of marine adaptation between seagrass lineages, we generated genomic data for Halophila ovalis and compared this with recently published genomes for two members of Zosteraceae, as well as genomes of five non-marine plant species (Arabidopsis, Oryza sativa, Phoenix dactylifera, Musa acuminata, and Spirodela polyrhiza). Halophila and Zosteraceae represent two independent seagrass lineages separated by around 30 million years. Genes that were lost or conserved in both lineages were identified. All three species lost genes associated with ethylene and terpenoid biosynthesis, and retained genes related to salinity adaptation, such as those for osmoregulation. In contrast, the loss of the NADH dehydrogenase-like complex is unique to H. ovalis. Through comparison of two independent return-to-the-sea events, this study further describes marine adaptation characteristics common to seagrass families, identifies species-specific gene loss, and provides molecular evidence for convergent evolution in seagrass lineages.
Subject(s)
Evolution, Molecular , Genomics , Hydrocharitaceae/genetics , Magnoliopsida/genetics , Zosteraceae/genetics , Adaptation, Physiological , Ecosystem , Species SpecificityABSTRACT
As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome.
Subject(s)
Genome, Plant , Brassica napus/genetics , Expressed Sequence Tags , Genes, Plant , Molecular Sequence Annotation , Repetitive Sequences, Nucleic AcidABSTRACT
Seagrasses are marine angiosperms that evolved from land plants but returned to the sea around 140 million years ago during the early evolution of monocotyledonous plants. They successfully adapted to abiotic stresses associated with growth in the marine environment, and today, seagrasses are distributed in coastal waters worldwide. Seagrass meadows are an important oceanic carbon sink and provide food and breeding grounds for diverse marine species. Here, we report the assembly and characterization of the Zostera muelleri genome, a southern hemisphere temperate species. Multiple genes were lost or modified in Z. muelleri compared with terrestrial or floating aquatic plants that are associated with their adaptation to life in the ocean. These include genes for hormone biosynthesis and signaling and cell wall catabolism. There is evidence of whole-genome duplication in Z. muelleri; however, an ancient pan-commelinid duplication event is absent, highlighting the early divergence of this species from the main monocot lineages.
Subject(s)
Adaptation, Physiological/genetics , Ecosystem , Genome, Plant/genetics , Zosteraceae/genetics , Aquatic Organisms/genetics , Gene Duplication , Gene Ontology , Genes, Plant/genetics , Molecular Sequence Annotation , Oceans and Seas , Plant Proteins/genetics , Sequence Analysis, RNAABSTRACT
BACKGROUND: Some health websites provide a public forum for consumers to post ratings and reviews on drugs. Drug reviews are easily accessible and comprehensible, unlike clinical trials and published literature. Because the public increasingly uses the Internet as a source of medical information, it is important to know whether such information is reliable. OBJECTIVE: We aim to examine whether Web-based consumer drug ratings and reviews can be used as a resource to compare drug performance. METHODS: We analyzed 103,411 consumer-generated reviews on 615 drugs used to treat 249 disease conditions from the health website WebMD. Statistical analysis identified 427 drug pairs from 24 conditions for which two drugs treating the same condition had significantly and substantially different satisfaction ratings (with at least a half-point difference between Web-based ratings and P<.01). PubMed and Google Scholar were searched for publications that were assessed for concordance with findings online. RESULTS: Scientific literature was found for 77 out of the 427 drug pairs and compared to findings online. Nearly two-thirds (48/77, 62%) of the online drug trends with at least a half-point difference in online ratings were supported by published literature (P=.02). For a 1-point online rating difference, the concordance rate increased to 68% (15/22) (P=.07). The discrepancies between scientific literature and findings online were further examined to obtain more insights into the usability of Web-based consumer-generated reviews. We discovered that (1) drugs with FDA black box warnings or used off-label were rated poorly in Web-based reviews, (2) drugs with addictive properties were rated higher than their counterparts in Web-based reviews, and (3) second-line or alternative drugs were rated higher. In addition, Web-based ratings indicated drug delivery problems. If FDA black box warning labels are used to resolve disagreements between publications and online trends, the concordance rate increases to 71% (55/77) (P<.001) for a half-point rating difference and 82% (18/22) for a 1-point rating difference (P=.002). Our results suggest that Web-based reviews can be used to inform patients' drug choices, with certain caveats. CONCLUSIONS: Web-based reviews can be viewed as an orthogonal source of information for consumers, physicians, and drug manufacturers to assess the performance of a drug. However, one should be cautious to rely solely on consumer reviews as ratings can be strongly influenced by the consumer experience.
Subject(s)
Internet , Patient Satisfaction , Pharmaceutical Preparations , Attitude to Health , Health Resources , Humans , Physicians , PublicationsABSTRACT
Tweetable abstract Monitoring changes in methylation heterogeneity can be powerful in detecting disease progression early. This editorial highlights the importance of profiling methylation heterogeneity and identifies existing measures and research gaps.
Subject(s)
DNA Methylation , Epigenesis, Genetic , Humans , EpigenomicsABSTRACT
The mechanism of the addition of a methyl group to cytosine has been identified as one of several heritable epigenetic mechanisms. In plants, DNA methylation is involved in mediating response to stress, plant development, polyploidy, and domestication through regulation of gene expression. The correlation of epigenetic variation to phenotypic traits expands our understanding toward plant evolution, and provides new source for targeted manipulation in crop improvement. To address the increasing interest to map methylation landscape in plant species, this chapter describes methods to analyze bisulfite sequencing data and identify epigenetic variation between samples. We also detailed guidelines to highlight possible optimizations, as well as ways to tailor parameters according to data and biological variability.
Subject(s)
Cytosine , Sulfites , Cytosine/metabolism , DNA Methylation , Epigenesis, Genetic , Sulfites/metabolismABSTRACT
In a cross between two homozygous Brassica napus plants of synthetic and natural origin, we demonstrate that novel structural genome variants from the synthetic parent cause immediate genome diversification among F1 offspring. Long read sequencing in twelve F1 sister plants revealed five large-scale structural rearrangements where both parents carried different homozygous alleles but the heterozygous F1 genomes were not identical heterozygotes as expected. Such spontaneous rearrangements were part of homoeologous exchanges or segmental deletions and were identified in different, individual F1 plants. The variants caused deletions, gene copy-number variations, diverging methylation patterns and other structural changes in large numbers of genes and may have been causal for unexpected phenotypic variation between individual F1 sister plants, for example strong divergence of plant height and leaf area. This example supports the hypothesis that spontaneous de novo structural rearrangements after de novo polyploidization can rapidly overcome intense allopolyploidization bottlenecks to re-expand crops genetic diversity for ecogeographical expansion and human selection. The findings imply that natural genome restructuring in allopolyploid plants from interspecific hybridization, a common approach in plant breeding, can have a considerably more drastic impact on genetic diversity in agricultural ecosystems than extremely precise, biotechnological genome modifications.
ABSTRACT
Blackleg is one of the major fungal diseases in oilseed rape/canola worldwide. Most commercial cultivars carry R gene-mediated qualitative resistances that confer a high level of race-specific protection against Leptosphaeria maculans, the causal fungus of blackleg disease. However, monogenic resistances of this kind can potentially be rapidly overcome by mutations in the pathogen's avirulence genes. To counteract pathogen adaptation in this evolutionary arms race, there is a tremendous demand for quantitative background resistance to enhance durability and efficacy of blackleg resistance in oilseed rape. In this study, we characterized genomic regions contributing to quantitative L. maculans resistance by genome-wide association studies in a multiparental mapping population derived from six parental elite varieties exhibiting quantitative resistance, which were all crossed to one common susceptible parental elite variety. Resistance was screened using a fungal isolate with no corresponding avirulence (AvrLm) to major R genes present in the parents of the mapping population. Genome-wide association studies revealed eight significantly associated quantitative trait loci (QTL) on chromosomes A07 and A09, with small effects explaining 3-6% of the phenotypic variance. Unexpectedly, the qualitative blackleg resistance gene Rlm9 was found to be located within a resistance-associated haploblock on chromosome A07. Furthermore, long-range sequence data spanning this haploblock revealed high levels of single-nucleotide and structural variants within the Rlm9 coding sequence among the parents of the mapping population. The results suggest that novel variants of Rlm9 could play a previously unknown role in expression of quantitative disease resistance in oilseed rape.
ABSTRACT
Rapeseed (Brassica napus), the second most important oilseed crop globally, originated from an interspecific hybridization between B. rapa and B. oleracea. After this genome collision, B. napus underwent extensive genome restructuring, via homoeologous chromosome exchanges, resulting in widespread segmental deletions and duplications. Illicit pairing among genetically similar homoeologous chromosomes during meiosis is common in recent allopolyploids like B. napus, and post-polyploidization restructuring compounds the difficulties of assembling a complex polyploid plant genome. Specifically, genomic rearrangements between highly similar chromosomes are challenging to detect due to the limitation of sequencing read length and ambiguous alignment of reads. Recent advances in long read sequencing technologies provide promising new opportunities to unravel the genome complexities of B. napus by encompassing breakpoints of genomic rearrangements with high specificity. Moreover, recent evidence revealed ongoing genomic exchanges in natural B. napus, highlighting the need for multiple reference genomes to capture structural variants between accessions. Here we report the first long-read genome assembly of a winter B. napus cultivar. We sequenced the German winter oilseed rape accession 'Express 617' using 54.5x of long reads. Short reads, linked reads, optical map data and high-density genetic maps were used to further correct and scaffold the assembly to form pseudochromosomes. The assembled Express 617 genome provides another valuable resource for Brassica genomics in understanding the genetic consequences of polyploidization, crop domestication, and breeding of recently-formed crop species.
ABSTRACT
We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel's original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae diverged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.
Subject(s)
Chromosomes, Plant/genetics , Evolution, Molecular , Fabaceae/genetics , Genome, Plant , Pisum sativum/genetics , Plant Proteins/genetics , Quantitative Trait Loci , Chromosome Mapping , Fabaceae/classification , Gene Expression Regulation, Plant , Genetic Variation , Genomics , Phenotype , Phylogeny , Reference Standards , Repetitive Sequences, Nucleic Acid , Seed Storage Proteins/genetics , Whole Genome SequencingABSTRACT
Individual cells in an organism are variable, which strongly impacts cellular processes. Advances in sequencing technologies have enabled single-cell genomic analysis to become widespread, addressing shortcomings of analyses conducted on populations of bulk cells. While the field of single-cell plant genomics is in its infancy, there is great potential to gain insights into cell lineage and functional cell types to help understand complex cellular interactions in plants. In this review, we discuss current approaches for single-cell plant genomic analysis, with a focus on single-cell isolation, DNA amplification, next-generation sequencing, and bioinformatics analysis. We outline the technical challenges of analysing material from a single plant cell, and then examine applications of single-cell genomics and the integration of this approach with genome editing. Finally, we indicate future directions we expect in the rapidly developing field of plant single-cell genomic analysis.