RESUMO
Horizontal gene transfer accelerates microbial evolution. The marine picocyanobacterium Prochlorococcus exhibits high genomic plasticity, yet the underlying mechanisms are elusive. Here, we report a novel family of DNA transposons-"tycheposons"-some of which are viral satellites while others carry cargo, such as nutrient-acquisition genes, which shape the genetic variability in this globally abundant genus. Tycheposons share distinctive mobile-lifecycle-linked hallmark genes, including a deep-branching site-specific tyrosine recombinase. Their excision and integration at tRNA genes appear to drive the remodeling of genomic islands-key reservoirs for flexible genes in bacteria. In a selection experiment, tycheposons harboring a nitrate assimilation cassette were dynamically gained and lost, thereby promoting chromosomal rearrangements and host adaptation. Vesicles and phage particles harvested from seawater are enriched in tycheposons, providing a means for their dispersal in the wild. Similar elements are found in microbes co-occurring with Prochlorococcus, suggesting a common mechanism for microbial diversification in the vast oligotrophic oceans.
Assuntos
Ecossistema , Genoma Bacteriano , Genoma Bacteriano/genética , Filogenia , Oceanos e Mares , GenômicaRESUMO
Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.
Assuntos
Ecótipo , Variação Genética , Genoma de Planta , Oryza/genética , Adaptação Fisiológica/genética , Agricultura , Domesticação , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Variação Estrutural do Genoma , Anotação de Sequência Molecular , FenótipoRESUMO
Soybean is one of the most important vegetable oil and protein feed crops. To capture the entire genomic diversity, it is needed to construct a complete high-quality pan-genome from diverse soybean accessions. In this study, we performed individual de novo genome assemblies for 26 representative soybeans that were selected from 2,898 deeply sequenced accessions. Using these assembled genomes together with three previously reported genomes, we constructed a graph-based genome and performed pan-genome analysis, which identified numerous genetic variations that cannot be detected by direct mapping of short sequence reads onto a single reference genome. The structural variations from the 2,898 accessions that were genotyped based on the graph-based genome and the RNA sequencing (RNA-seq) data from the representative 26 accessions helped to link genetic variations to candidate genes that are responsible for important traits. This pan-genome resource will promote evolutionary and functional genomics studies in soybean.
Assuntos
Genoma de Planta , Glycine max/crescimento & desenvolvimento , Glycine max/genética , Sequência de Bases , Cromossomos de Plantas/genética , Domesticação , Ecótipo , Duplicação Gênica , Regulação da Expressão Gênica de Plantas , Fusão Gênica , Geografia , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único/genética , PoliploidiaRESUMO
There is an urgent need to improve wheat for upcoming challenges, including biotic and abiotic stresses. Sustainable wheat improvement requires the introduction of new genes and alleles in high-yielding wheat cultivars. Using new approaches, tools, and technologies to identify and introduce new genes in wheat cultivars is critical. High-quality genomes, transcriptomes, and pangenomes provide essential resources and tools to examine wheat closely to identify and manipulate new and targeted genes and alleles. Wheat genomics has improved excellently in the past 5 years, generating multiple genomes, pangenomes, and transcriptomes. Leveraging these resources allows us to accelerate our crop improvement pipelines. This review summarizes the progress made in wheat genomics and trait discovery in the past 5 years.
RESUMO
With broad genetic diversity and as a source of key agronomic traits, wild grape species (Vitis spp.) are crucial to enhance viticulture's climatic resilience and sustainability. This review discusses how recent breakthroughs in the genome assembly and analysis of wild grape species have led to discoveries on grape evolution, from wild species' adaptation to environmental stress to grape domestication. We detail how diploid chromosome-scale genomes from wild Vitis spp. have enabled the identification of candidate disease-resistance and flower sex determination genes and the creation of the first Vitis graph-based pangenome. Finally, we explore how wild grape genomics can impact grape research and viticulture, including aspects such as data sharing, the development of functional genomics tools, and the acceleration of genetic improvement.
Assuntos
Genoma de Planta , Genômica , Vitis , Vitis/genética , Genômica/métodos , Genoma de Planta/genética , Variação Genética , Resistência à Doença/genética , Domesticação , Evolução MolecularRESUMO
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Assuntos
Genoma Humano , Projeto Genoma Humano , Humanos , Variação Genética , Genômica/métodos , Análise de Sequência de DNA/métodos , Telômero/genéticaRESUMO
Genomic islands are hotspots for horizontal gene transfer (HGT) in bacteria, but, for Prochlorococcus, an abundant marine cyanobacterium, how these islands form has puzzled scientists. With the discovery of tycheposons, a new family of transposons, Hackl et al. provide evidence for elegant new mechanisms of gene rearrangement and transfer among Prochlorococcus and bacteria more broadly.
Assuntos
Bacteriófagos , Cianobactérias , Bacteriófagos/genética , Transferência Genética Horizontal/genética , Cianobactérias/genética , RNA de Transferência/genética , Ilhas GenômicasRESUMO
Potato (Solanum sp., family Solanaceae) is the most important noncereal food crop globally. It has over 100 wild relatives in the Solanum section Petota, which features species with both sexual and asexual reproduction and varying ploidy levels. A pangenome of Solanum section Petota composed of 296 accessions was constructed including diploids and polyploids compared via presence/absence variation (PAV). The Petota core (genes shared by at least 97% of the accessions) and shell genomes (shared by 3 to 97%) are enriched in basic molecular and cellular functions, while the cloud genome (genes present in less than 3% of the member accessions) showed enrichment in transposable elements (TEs). Comparison of PAV in domesticated vs. wild accessions was made, and a phylogenetic tree was constructed based on PAVs, grouping accessions into different clades, similar to previous phylogenies produced using DNA markers. A cladewise pangenome approach identified abiotic stress response among the core genes in clade 1+2 and clade 3, and flowering/tuberization among the core genes in clade 4. The TE content differed between the clades, with clade 1+2, which is composed of species from North and Central America with reproductive isolation from species in other clades, having much lower TE content compared to other clades. In contrast, accessions with in vitro propagation history were identified and found to have high levels of TEs. Results indicate a role for TEs in adaptation to new environments, both natural and artificial, for Solanum section Petota.
Assuntos
Solanum tuberosum , Solanum , Elementos de DNA Transponíveis , Filogenia , PloidiasRESUMO
Tea, one of the most widely consumed beverages globally, exhibits remarkable genomic diversity in its underlying flavour and health-related compounds. In this study, we present the construction and analysis of a tea pangenome comprising a total of 11 genomes, with a focus on three newly sequenced genomes comprising the purple-leaved assamica cultivar "Zijuan", the temperature-sensitive sinensis cultivar "Anjibaicha" and the wild accession "L618" whose assemblies exhibited excellent quality scores as they profited from latest sequencing technologies. Our analysis incorporates a detailed investigation of transposon complement across the tea pangenome, revealing shared patterns of transposon distribution among the studied genomes and improved transposon resolution with long read technologies, as shown by long terminal repeat (LTR) Assembly Index analysis. Furthermore, our study encompasses a gene-centric exploration of the pangenome, exploring the genomic landscape of the catechin pathway with our study, providing insights on copy number alterations and gene-centric variants, especially for Anthocyanidin synthases. We constructed a gene-centric pangenome by structurally and functionally annotating all available genomes using an identical pipeline, which both increased gene completeness and allowed for a high functional annotation rate. This improved and consistently annotated gene set will allow for a better comparison between tea genomes. We used this improved pangenome to capture the core and dispensable gene repertoire, elucidating the functional diversity present within the tea species. This pangenome resource might serve as a valuable resource for understanding the fundamental genetic basis of traits such as flavour, stress tolerance, and disease resistance, with implications for tea breeding programmes.
Assuntos
Camellia sinensis , Elementos de DNA Transponíveis , Genoma de Planta , Camellia sinensis/genética , Genoma de Planta/genética , Elementos de DNA Transponíveis/genética , Variação Genética , Chá/genética , Genômica , Catequina/genéticaRESUMO
BACKGROUND: Transposable elements (TEs) have a profound influence on the trajectory of plant evolution, driving genome expansion and catalyzing phenotypic diversification. The pangenome, a comprehensive genetic pool encompassing all variations within a species, serves as an invaluable tool, unaffected by the confounding factors of intraspecific diversity. This allows for a more nuanced exploration of plant TE evolution. RESULTS: Here, we constructed a pangenome for diploid A-genome cotton using 344 accessions from representative geographical regions, including 223 from China as the main component. We found 511 Mb of non-reference sequences (NRSs) and revealed the presence of 5479 previously undiscovered protein-coding genes. Our comprehensive approach enabled us to decipher the genetic underpinnings of the distinct geographic distributions of cotton. Notably, we identified 3301 presence-absence variations (PAVs) that are closely tied to gene expression patterns within the pangenome, among which 2342 novel expression quantitative trait loci (eQTLs) were found residing in NRSs. Our investigation also unveiled contrasting patterns of transposon proliferation between diploid and tetraploid cotton, with long terminal repeat (LTR) retrotransposons exhibiting a synchronized surge in polyploids. Furthermore, the invasion of LTR retrotransposons from the A subgenome to the D subgenome triggered a substantial expansion of the latter following polyploidization. In addition, we found that TE insertions were responsible for the loss of 36.2% of species-specific genes, as well as the generation of entirely new species-specific genes. CONCLUSIONS: Our pangenome analyses provide new insights into cotton genomics and subgenome dynamics after polyploidization and demonstrate the power of pangenome approaches for elucidating transposon impacts and genome evolution.
Assuntos
Elementos de DNA Transponíveis , Evolução Molecular , Genoma de Planta , Gossypium , Gossypium/genética , Elementos de DNA Transponíveis/genética , Locos de Características QuantitativasRESUMO
BACKGROUND: Kiwifruit, belonging to the genus Actinidia, represents a unique fruit crop characterized by its modern cultivars being genetically diverse and exhibiting remarkable variations in morphological traits and adaptability to harsh environments. However, the genetic mechanisms underlying such morphological diversity remain largely elusive. RESULTS: We report the high-quality genomes of five Actinidia species, including Actinidia longicarpa, A. macrosperma, A. polygama, A. reticulata, and A. rufa. Through comparative genomics analyses, we identified three whole genome duplication events shared by the Actinidia genus and uncovered rapidly evolving gene families implicated in the development of characteristic kiwifruit traits, including vitamin C (VC) content and fruit hairiness. A range of structural variations were identified, potentially contributing to the phenotypic diversity in kiwifruit. Notably, phylogenomic analyses revealed 76 cis-regulatory elements within the Actinidia genus, predominantly associated with stress responses, metabolic processes, and development. Among these, five motifs did not exhibit similarity to known plant motifs, suggesting the presence of possible novel cis-regulatory elements in kiwifruit. Construction of a pan-genome encompassing the nine Actinidia species facilitated the identification of gene DTZ79_23g14810 specific to species exhibiting extraordinarily high VC content. Expression of DTZ79_23g14810 is significantly correlated with the dynamics of VC concentration, and its overexpression in the transgenic roots of kiwifruit plants resulted in increased VC content. CONCLUSIONS: Collectively, the genomes and pan-genome of diverse Actinidia species not only enhance our understanding of fruit development but also provide a valuable genomic resource for facilitating the genome-based breeding of kiwifruit.
Assuntos
Actinidia , Genoma de Planta , Filogenia , Actinidia/genética , Actinidia/crescimento & desenvolvimento , Frutas/genética , Frutas/crescimento & desenvolvimento , Genes de PlantasRESUMO
BACKGROUND: Fungal plant pathogens have dynamic genomes that allow them to rapidly adapt to adverse conditions and overcome host resistance. One way by which this dynamic genome plasticity is expressed is through effector gene loss, which enables plant pathogens to overcome recognition by cognate resistance genes in the host. However, the exact nature of these loses remains elusive in many fungi. This includes the tomato pathogen Cladosporium fulvum, which is the first fungal plant pathogen from which avirulence (Avr) genes were ever cloned and in which loss of Avr genes is often reported as a means of overcoming recognition by cognate tomato Cf resistance genes. A recent near-complete reference genome assembly of C. fulvum isolate Race 5 revealed a compartmentalized genome architecture and the presence of an accessory chromosome, thereby creating a basis for studying genome plasticity in fungal plant pathogens and its impact on avirulence genes. RESULTS: Here, we obtained near-complete genome assemblies of four additional C. fulvum isolates. The genome assemblies had similar sizes (66.96 to 67.78 Mb), number of predicted genes (14,895 to 14,981), and estimated completeness (98.8 to 98.9%). Comparative analysis that included the genome of isolate Race 5 revealed high levels of synteny and colinearity, which extended to the density and distribution of repetitive elements and of repeat-induced point (RIP) mutations across homologous chromosomes. Nonetheless, structural variations, likely mediated by transposable elements and effecting the deletion of the avirulence genes Avr4E, Avr5, and Avr9, were also identified. The isolates further shared a core set of 13 chromosomes, but two accessory chromosomes were identified as well. Accessory chromosomes were significantly smaller in size, and one carried pseudogenized copies of two effector genes. Whole-genome alignments further revealed genomic islands of near-zero nucleotide diversity interspersed with islands of high nucleotide diversity that co-localized with repeat-rich regions. These regions were likely generated by RIP, which generally asymmetrically affected the genome of C. fulvum. CONCLUSIONS: Our results reveal new evolutionary aspects of the C. fulvum genome and provide new insights on the importance of genomic structural variations in overcoming host resistance in fungal plant pathogens.
Assuntos
Ascomicetos , Solanum lycopersicum , Solanum lycopersicum/genética , Elementos de DNA Transponíveis/genética , Genes Fúngicos , Cladosporium/genética , Cladosporium/metabolismo , Plantas/metabolismo , Cromossomos/metabolismo , Nucleotídeos , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Proteínas Fúngicas/metabolismoRESUMO
BACKGROUND: White clover (Trifolium repens) is a globally important perennial forage legume. This species also serves as an eco-evolutionary model system for studying within-species chemical defense variation; it features a well-studied polymorphism for cyanogenesis (HCN release following tissue damage), with higher frequencies of cyanogenic plants favored in warmer locations worldwide. Using a newly generated haplotype-resolved genome and two other long-read assemblies, we tested the hypothesis that copy number variants (CNVs) at cyanogenesis genes play a role in the ability of white clover to rapidly adapt to local environments. We also examined questions on subgenome evolution in this recently evolved allotetraploid species and on chromosomal rearrangements in the broader IRLC legume clade. RESULTS: Integration of PacBio HiFi, Omni-C, Illumina, and linkage map data yielded a completely de novo genome assembly for white clover (created without a priori sequence assignment to subgenomes). We find that white clover has undergone extensive transposon diversification since its origin but otherwise shows highly conserved genome organization and composition with its diploid progenitors. Unlike some other clover species, its chromosomal structure is conserved with other IRLC legumes. We further find extensive evidence of CNVs at the major cyanogenesis loci; these contribute to quantitative variation in the cyanogenic phenotype and to local adaptation across wild North American populations. CONCLUSIONS: This work provides a case study documenting the role of CNVs in local adaptation in a plant species, and it highlights the value of pan-genome data for identifying contributions of structural variants to adaptation in nature.
Assuntos
Variações do Número de Cópias de DNA , Genoma de Planta , Trifolium , Adaptação Fisiológica/genética , Trifolium/genéticaRESUMO
Clostridium butyricum is a Gram-positive anaerobic bacterium known for its ability to produce butyate. In this study, we conducted whole-genome sequencing and assembly of 14C. butyricum industrial strains collected from various parts of China. We performed a pan-genome comparative analysis of the 14 assembled strains and 139 strains downloaded from NCBI. We found that the genes related to critical industrial production pathways were primarily present in the core and soft-core gene categories. The phylogenetic analysis revealed that strains from the same clade of the phylogenetic tree possessed similar antibiotic resistance and virulence factors, with most of these genes present in the shell and cloud gene categories. Finally, we predicted the genes producing bacteriocins and botulinum toxins as well as CRISPR systems responsible for host defense. In conclusion, our research provides a desirable pan-genome database for the industrial production, food application, and genetic research of C. butyricum.
Assuntos
Clostridium butyricum , Genoma Bacteriano , Filogenia , Clostridium butyricum/genética , Clostridium butyricum/metabolismo , Sequenciamento Completo do Genoma , Bacteriocinas/genética , Bacteriocinas/biossíntese , Microbiologia Industrial , Toxinas Botulínicas/genética , Fatores de Virulência/genéticaRESUMO
MOTIVATION: Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. RESULTS: In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.
Assuntos
Variação Genética , Genoma Humano , Sequenciamento Completo do Genoma , Humanos , Sequenciamento Completo do Genoma/métodos , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Software , Algoritmos , Estudo de Associação Genômica Ampla/métodosRESUMO
Deep sequencing is a term that has become embedded in the plant genomic literature in recent years and with good reason. A torrent of (largely) high-quality genomic and transcriptomic data has been collected and most of this has been publicly released. Indeed, almost 1000 plant genomes have been reported (www.plabipd.de) and the 2000 Plant Transcriptomes Project has long been completed. The EarthBioGenome project will dwarf even these milestones. That said, massive progress in understanding plant physiology, evolution, and crop domestication has been made by sequencing broadly (across a species) as well as deeply (within a single individual). We will outline the current state of the art in genome and transcriptome sequencing before we briefly review the most visible of these broad approaches, namely genome-wide association and transcriptome-wide association studies, as well as the compilation of pangenomes. This will include both (i) the most commonly used methods reliant on single nucleotide polymorphisms and short InDels and (ii) more recent examples which consider structural variants. We will subsequently present case studies exemplifying how their application has brought insight into either plant physiology or evolution and crop domestication. Finally, we will provide conclusions and an outlook as to the perspective for the extension of such approaches to different species, tissues, and biological processes.
Assuntos
Domesticação , Estudo de Associação Genômica Ampla , Genoma de Planta/genética , Genômica , PlantasRESUMO
The advent of the pangenome era has unraveled previously unknown genetic variation existing within diverse crop plants, including rice. This untapped genetic variation is believed to account for a major portion of phenotypic variation existing in crop plants. However, the use of conventional single reference-guided genotyping often fails to capture a large portion of this genetic variation leading to a reference bias. This makes it difficult to identify and utilize novel population/cultivar-specific genes for crop improvement. Thus, we developed a Rice Pangenome Genotyping Array (RPGA) harboring probes assaying 80K single-nucleotide polymorphisms (SNPs) and presence-absence variants spanning the entire 3K rice pangenome. This array provides a simple, user-friendly and cost-effective (60-80 USD per sample) solution for rapid pangenome-based genotyping in rice. The genome-wide association study (GWAS) conducted using RPGA-SNP genotyping data of a rice diversity panel detected a total of 42 loci, including previously known as well as novel genomic loci regulating grain size/weight traits in rice. Eight of these identified trait-associated loci (dispensable loci) could not be detected with conventional single reference genome-based GWAS. A WD repeat-containing PROTEIN 12 gene underlying one of such dispensable locus on chromosome 7 (qLWR7) along with other non-dispensable loci were subsequently detected using high-resolution quantitative trait loci mapping confirming authenticity of RPGA-led GWAS. This demonstrates the potential of RPGA-based genotyping to overcome reference bias. The application of RPGA-based genotyping for population structure analysis, hybridity testing, ultra-high-density genetic map construction and chromosome-level genome assembly, and marker-assisted selection was also demonstrated. A web application (http://www.rpgaweb.com) was further developed to provide an easy to use platform for the imputation of RPGA-based genotyping data using 3K rice reference panel and subsequent GWAS.
Assuntos
Estudo de Associação Genômica Ampla , Oryza , Mapeamento Cromossômico , Oryza/genética , Genótipo , Locos de Características Quantitativas/genética , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes. MAIN BODY: In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline. CONCLUSION: Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.
Assuntos
Produtos Agrícolas , Genoma de Planta , Medicago sativa , Medicago sativa/genética , Produtos Agrícolas/genética , Genômica/métodos , PoliploidiaRESUMO
Graph-based pangenome is gaining more popularity than linear pangenome because it stores more comprehensive information of variations. However, traditional linear genome browser has its own advantages, especially the tremendous resources accumulated historically. With the fast-growing number of individual genomes and their annotations available, the demand for a genome browser to visualize genome annotation for many individuals together with a graph-based pangenome is getting higher and higher. Here we report a new pangenome browser PPanG, a precise pangenome browser enabling nucleotide-level comparison of individual genome annotations together with a graph-based pangenome. Nine rice genomes with annotations were provided by default as potential references, and any individual genome can be selected as the reference. Our pangenome browser provides unprecedented insights on genome variations at different levels from base to gene, and reveals how the structures of a gene could differ for individuals. PPanG can be applied to any species with multiple individual genomes available and it is available at https://cgm.sjtu.edu.cn/PPanG .
Assuntos
Genômica , Genômica/métodos , Oryza/genética , Anotação de Sequência Molecular , Genoma de Planta , Variação Genética , Software , Navegador , Bases de Dados Genéticas , Nucleotídeos/genética , GenomaRESUMO
BACKGROUND: The genus Geobacillus and its associated taxa have been the focal point of numerous thermophilic biotechnological investigations, both at the whole cell and enzyme level. By contrast, comparatively little research has been done on its recently delineated sister genus, Parageobacillus. Here we performed pan-genomic analyses on a subset of publicly available Parageobacillus and Saccharococcus genomes to elucidate their biotechnological potential. RESULTS: Phylogenomic analysis delineated the compared taxa into two distinct genera, Parageobacillus and Saccharococcus, with P. caldoxylosilyticus isolates clustering with S. thermophilus in the latter genus. Both genera present open pan-genomes, with the species P. toebii being characterized with the highest novel gene accrual. Diversification of the two genera is driven through the variable presence of plasmids, bacteriophages and transposable elements. Both genera present a range of potentially biotechnologically relevant features, including a source of novel antimicrobials, thermostable enzymes including DNA-active enzymes, carbohydrate active enzymes, proteases, lipases and carboxylesterases. Furthermore, they present a number of metabolic pathways pertinent to degradation of complex hydrocarbons and xenobiotics and for green energy production. CONCLUSIONS: Comparative genomic analyses of Parageobacillus and Saccharococcus suggest that taxa in both of these genera can serve as a rich source of biotechnologically and industrially relevant secondary metabolites, thermostable enzymes and metabolic pathways that warrant further investigation.