RESUMO
Structural variations (SVs) are larger polymorphisms (> 50 bp in length), which consist of insertions, deletions, inversions, duplications, and translocations. They can have a strong impact on agronomical traits and play an important role in environmental adaptation. The development of long-read sequencing technologies, including Oxford Nanopore, allows for comprehensive SV discovery and characterization even in complex polyploid crop genomes. However, many of the SV discovery pipeline benchmarks do not include complex plant genome datasets. In this study, we benchmarked insertion and deletion detection by popular long-read alignment-based SV detection tools for crop plant genomes. We used real and simulated Oxford Nanopore reads for two crops, allotetraploid Brassica napus (oilseed rape) and diploid Solanum lycopersicum (tomato), and evaluated several read aligners and SV callers across 5×, 10×, and 20× coverages typically used in re-sequencing studies. We further validated our findings using maize and soybean datasets. Our benchmarks provide a useful guide for designing Oxford Nanopore re-sequencing projects and SV discovery pipelines for crop plants.
Assuntos
Benchmarking , Nanoporos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Genoma de PlantaRESUMO
Since the first reported crop pangenome in 2014, advances in high-throughput and cost-effective DNA sequencing technologies facilitated multiple such studies including the pangenomes of oilseed rape (Brassica napus L.), soybean [Glycine max (L.) Merr.], rice (Oryza sativa L.), wheat (Triticum aestivum L.), and barley (Hordeum vulgare L.). Compared with single-reference genomes, pangenomes provide a more accurate representation of the genetic variation present in a species. By combining the genomic data of multiple accessions, pangenomes allow for the detection and annotation of complex DNA polymorphisms such as structural variations (SVs), one of the major determinants of genetic diversity within a species. In this review we summarize the current literature on crop pangenomics, focusing on their application to find candidate SVs involved in traits of agronomic interest. We then highlight the potential of pangenomes in the discovery and functional characterization of noncoding regulatory sequences and their variations. We conclude with a summary and outlook on innovative data structures representing the complete content of plant pangenomes including annotations of coding and noncoding elements and outcomes of transcriptomic and epigenomic experiments.