RESUMEN
Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess of 35.1 Mb of TE sequences per genome in tropical maize relative to temperate maize. A small number (n = 216) of TE families, mainly LTR retrotransposons, drive these differences. Evidence from the methylome, transcriptome, LTR age distribution, and LTR insertional polymorphisms reveals that 64.7% of the variability is contributed by LTR families that are young, less methylated, and more expressed in tropical maize, whereas 18.5% is driven by LTR families with removal or loss in temperate maize. Additionally, we find enrichment for Young LTR families adjacent to nucleotide-binding and leucine-rich repeat (NLR) clusters of varying copy number across lines, suggesting TE activity may be associated with disease resistance in maize.
Asunto(s)
Elementos Transponibles de ADN , Genoma de Planta , Retroelementos , Secuencias Repetidas Terminales , Zea mays , Zea mays/genética , Retroelementos/genética , Variación Genética , Anotación de Secuencia Molecular , Clima Tropical , Metilación de ADNRESUMEN
Selfish genetic elements contribute to hybrid incompatibility and bias or 'drive' their own transmission1,2. Chromosomal drive typically functions in asymmetric female meiosis, whereas gene drive is normally post-meiotic and typically found in males. Here, using single-molecule and single-pollen genome sequencing, we describe Teosinte Pollen Drive, an instance of gene drive in hybrids between maize (Zea mays ssp. mays) and teosinte mexicana (Z. mays ssp. mexicana) that depends on RNA interference (RNAi). 22-nucleotide small RNAs from a non-coding RNA hairpin in mexicana depend on Dicer-like 2 (Dcl2) and target Teosinte Drive Responder 1 (Tdr1), which encodes a lipase required for pollen viability. Dcl2, Tdr1 and the hairpin are in tight pseudolinkage on chromosome 5, but only when transmitted through the male. Introgression of mexicana into early cultivated maize is thought to have been critical to its geographical dispersal throughout the Americas3, and a tightly linked inversion in mexicana spans a major domestication sweep in modern maize4. A survey of maize traditional varieties and sympatric populations of teosinte mexicana reveals correlated patterns of admixture among unlinked genes required for RNAi on at least four chromosomes that are also subject to gene drive in pollen from synthetic hybrids. Teosinte Pollen Drive probably had a major role in maize domestication and diversification, and offers an explanation for the widespread abundance of 'self' small RNAs in the germ lines of plants and animals.
Asunto(s)
Domesticación , Tecnología de Genética Dirigida , Interferencia de ARN , Zea mays , Introgresión Genética/genética , Genoma de Planta/genética , Hibridación Genética/genética , Polen/enzimología , Polen/genética , Zea mays/clasificación , Zea mays/genética , Lipasa/genética , Lipasa/metabolismo , Imagen Individual de MoléculaRESUMEN
The patterns by which primary tumors spread to metastatic sites remain poorly understood. Here, we define patterns of metastatic seeding in prostate cancer using a novel injection-based mouse model-EvoCaP (Evolution in Cancer of the Prostate), featuring aggressive metastatic cancer to bone, liver, lungs, and lymph nodes. To define migration histories between primary and metastatic sites, we used our EvoTraceR pipeline to track distinct tumor clones containing recordable barcodes. We detected widespread intratumoral heterogeneity from the primary tumor in metastatic seeding, with few clonal populations instigating most migration. Metastasis-to-metastasis seeding was uncommon, as most cells remained confined within the tissue. Migration patterns in our model were congruent with human prostate cancer seeding topologies. Our findings support the view of metastatic prostate cancer as a systemic disease driven by waves of aggressive clones expanding their niche, infrequently overcoming constraints that otherwise keep them confined in the primary or metastatic site. Significance: Defining the kinetics of prostate cancer metastasis is critical for developing novel therapeutic strategies. This study uses CRISPR/Cas9-based barcoding technology to accurately define tumor clonal patterns and routes of migration in a novel somatically engineered mouse model (EvoCaP) that recapitulates human prostate cancer using an in-house developed analytical pipeline (EvoTraceR).
Asunto(s)
Metástasis de la Neoplasia , Neoplasias de la Próstata , Masculino , Neoplasias de la Próstata/patología , Neoplasias de la Próstata/genética , Animales , Ratones , Humanos , Movimiento Celular , Modelos Animales de EnfermedadRESUMEN
Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pre-trained on large-scale biological sequences can learn evolutionary conservation and offer cross-species prediction better than supervised models through fine-tuning limited labeled data. We introduce PlantCaduceus, a plant DNA LM based on the Caduceus and Mamba architectures, pre-trained on a curated dataset of 16 Angiosperm genomes. Fine-tuning PlantCaduceus on limited labeled Arabidopsis data for four tasks, including predicting translation initiation/termination sites and splice donor and acceptor sites, demonstrated high transferability to 160 million year diverged maize, outperforming the best existing DNA LM by 1.45 to 7.23-fold. PlantCaduceus is competitive to state-of-the-art protein LMs in terms of deleterious mutation identification, and is threefold better than PhyloP. Additionally, PlantCaduceus successfully identifies well-known causal variants in both Arabidopsis and maize. Overall, PlantCaduceus is a versatile DNA LM that can accelerate plant genomics and crop breeding applications.
RESUMEN
Structural variations (SVs) are a major source of domestication and improvement traits. We present the first duck pan-genome constructed using five genome assemblies capturing â¼40.98 Mb new sequences. This pan-genome together with high-depth sequencing data (â¼46.5×) identified 101,041 SVs, of which substantial proportions were derived from transposable element (TE) activity. Many TE-derived SVs anchoring in a gene body or regulatory region are linked to duck's domestication and improvement. By combining quantitative genetics with molecular experiments, we, for the first time, unraveled a 6945 bp Gypsy insertion as a functional mutation of the major gene IGF2BP1 associated with duck bodyweight. This Gypsy insertion, to our knowledge, explains the largest effect on bodyweight among avian species (27.61% of phenotypic variation). In addition, we also examined another 6634 bp Gypsy insertion in MITF intron, which triggers a novel transcript of MITF, thereby contributing to the development of white plumage. Our findings highlight the importance of using a pan-genome as a reference in genomics studies and illuminate the impact of transposons in trait formation and livestock breeding.
RESUMEN
Inversions, a type of chromosomal structural variation, significantly influence plant adaptation and gene functions by impacting gene expression and recombination rates. However, compared with other structural variations, their roles in functional biology and crop improvement remain largely unexplored. In this review, we highlight technological and methodological advancements that have allowed a comprehensive understanding of inversion variants through the pangenome framework and machine learning algorithms. Genome editing is an efficient method for inducing or reversing inversion mutations in plants, providing an effective mechanism to modify local recombination rates. Given the potential of inversions in crop breeding, we anticipate increasing attention on inversions from the scientific community in future research and breeding applications.
Asunto(s)
Edición Génica , Fitomejoramiento , Fitomejoramiento/métodos , Edición Génica/métodos , Plantas/genética , Inversión Cromosómica/genética , Genoma de Planta/genéticaRESUMEN
AlphaMissense is a recently developed method that is designed to classify missense variants into pathogenic, benign, or ambiguous categories across the entire human proteome. Asparagine Synthetase Deficiency (ASNSD) is a developmental disorder associated with severe symptoms, including congenital microcephaly, seizures, and premature death. Diagnosing ASNSD relies on identifying mutations in the asparagine synthetase (ASNS) gene through DNA sequencing and determining whether these variants are pathogenic or benign. Pathogenic ASNS variants are predicted to disrupt the protein's structure and/or function, leading to asparagine depletion within cells and inhibition of cell growth. AlphaMissense offers a promising solution for the rapid classification of ASNS variants established by DNA sequencing and provides a community resource of pathogenicity scores and classifications for newly diagnosed ASNSD patients. Here, we assessed AlphaMissense's utility in ASNSD by benchmarking it against known critical residues in ASNS and evaluating its performance against a list of previously reported ASNSD-associated variants. We also present a pipeline to calculate AlphaMissense scores for any protein in the UniProt database. AlphaMissense accurately attributed a high average pathogenicity score to known critical residues within the two ASNS active sites and the connecting intramolecular tunnel. The program successfully categorized 78.9% of known ASNSD-associated missense variants as pathogenic. The remaining variants were primarily labeled as ambiguous, with a smaller proportion classified as benign. This study underscores the potential role of AlphaMissense in classifying ASNS variants in suspected cases of ASNSD, potentially providing clarity to patients and their families grappling with ongoing diagnostic uncertainty.
RESUMEN
Bats are exceptional among mammals for their powered flight, extended lifespans, and robust immune systems and therefore have been of particular interest in comparative genomics. Using the Oxford Nanopore Technologies long-read platform, we sequenced the genomes of two bat species with key phylogenetic positions, the Jamaican fruit bat (Artibeus jamaicensis) and the Mesoamerican mustached bat (Pteronotus mesoamericanus), and carried out a comprehensive comparative genomic analysis with a diverse collection of bats and other mammals. The high-quality, long-read genome assemblies revealed a contraction of interferon (IFN)-α at the immunity-related type I IFN locus in bats, resulting in a shift in relative IFN-ω and IFN-α copy numbers. Contradicting previous hypotheses of constitutive expression of IFN-α being a feature of the bat immune system, three bat species lost all IFN-α genes. This shift to IFN-ω could contribute to the increased viral tolerance that has made bats a common reservoir for viruses that can be transmitted to humans. Antiviral genes stimulated by type I IFNs also showed evidence of rapid evolution, including a lineage-specific duplication of IFN-induced transmembrane genes and positive selection in IFIT2. In addition, 33 tumor suppressors and 6 DNA-repair genes showed signs of positive selection, perhaps contributing to increased longevity and reduced cancer rates in bats. The robust immune systems of bats rely on both bat-wide and lineage-specific evolution in the immune gene repertoire, suggesting diverse immune strategies. Our study provides new genomic resources for bats and sheds new light on the extraordinary molecular evolution in this critically important group of mammals.
Asunto(s)
Quirópteros , Neoplasias , Humanos , Animales , Quirópteros/genética , Filogenia , Evolución Molecular , Genómica , Longevidad , Neoplasias/genética , Neoplasias/veterinariaRESUMEN
Meiotic drivers subvert Mendelian expectations by manipulating reproductive development to bias their own transmission. Chromosomal drive typically functions in asymmetric female meiosis, while gene drive is normally postmeiotic and typically found in males. Using single molecule and single-pollen genome sequencing, we describe Teosinte Pollen Drive, an instance of gene drive in hybrids between maize (Zea mays ssp. mays) and teosinte mexicana (Zea mays ssp. mexicana), that depends on RNA interference (RNAi). 22nt small RNAs from a non-coding RNA hairpin in mexicana depend on Dicer-Like 2 (Dcl2) and target Teosinte Drive Responder 1 (Tdr1), which encodes a lipase required for pollen viability. Dcl2, Tdr1, and the hairpin are in tight pseudolinkage on chromosome 5, but only when transmitted through the male. Introgression of mexicana into early cultivated maize is thought to have been critical to its geographical dispersal throughout the Americas, and a tightly linked inversion in mexicana spans a major domestication sweep in modern maize. A survey of maize landraces and sympatric populations of teosinte mexicana reveals correlated patterns of admixture among unlinked genes required for RNAi on at least 4 chromosomes that are also subject to gene drive in pollen from synthetic hybrids. Teosinte Pollen Drive likely played a major role in maize domestication and diversification, and offers an explanation for the widespread abundance of "self" small RNAs in the germlines of plants and animals.
RESUMEN
Many plants exchanged in the global redistribution of species in the last 200 years, particularly between South Africa and Australia, have become threatening invasive species in their introduced range. Refining our understanding of the genetic diversity and population structure of native and alien populations, introduction pathways, propagule pressure, naturalization, and initial spread, can transform the effectiveness of management and prevention of further introductions. We used 20,221 single nucleotide polymorphisms to reconstruct the invasion of a coastal shrub, Chrysanthemoides monilifera ssp. rotundata (bitou bush) from South Africa, into eastern Australia (EAU), and Western Australia (WAU). We determined genetic diversity and population structure across the native and introduced ranges and compared hypothesized invasion scenarios using Bayesian modeling. We detected considerable genetic structure in the native range, as well as differentiation between populations in the native and introduced range. Phylogenetic analysis showed the introduced samples to be most closely related to the southern-most native populations, although Bayesian analysis inferred introduction from a ghost population. We detected strong genetic bottlenecks during the founding of both the EAU and WAU populations. It is likely that the WAU population was introduced from EAU, possibly involving an unsampled ghost population. The number of private alleles and polymorphic SNPs successively decreased from South Africa to EAU to WAU, although heterozygosity remained high. That bitou bush remains an invasion threat in EAU, despite reduced genetic diversity, provides a cautionary biosecurity message regarding the risk of introduction of potentially invasive species via shipping routes.
RESUMEN
Alignments of multiple genomes are a cornerstone of comparative genomics, but generating these alignments remains technically challenging and often impractical. We developed the msa_pipeline workflow (https://bitbucket.org/bucklerlab/msa_pipeline) to allow practical and sensitive multiple alignment of diverged plant genomes and calculation of conservation scores with minimal user inputs. As high repeat content and genomic divergence are substantial challenges in plant genome alignment, we also explored the effect of different masking approaches and parameters of the LAST aligner using genome assemblies of 33 grass species. Compared with conventional masking with RepeatMasker, a masking approach based on k-mers (nucleotide sequences of k length) increased the alignment rate of coding sequence and noncoding functional regions by 25 and 14%, respectively. We further found that default alignment parameters generally perform well, but parameter tuning can increase the alignment rate for noncoding functional regions by over 52% compared with default LAST settings. Finally, by increasing alignment sensitivity from the default baseline, parameter tuning can increase the number of noncoding sites that can be scored for conservation by over 76%. Overall, tuning of masking and alignment parameters can generate optimized multiple alignments to drive biological discovery in plants.
Asunto(s)
Genoma de Planta , Genómica , Secuencia de Bases , Flujo de TrabajoRESUMEN
Brassica napus (oilseed rape, canola) seedling resistance to Leptosphaeria maculans, the causal agent of blackleg (stem canker) disease, follows a gene-for-gene relationship. The avirulence genes AvrLmS and AvrLep2 were described to be perceived by the resistance genes RlmS and LepR2, respectively, present in B. napus 'Surpass 400'. Here we report cloning of AvrLmS and AvrLep2 using two independent methods. AvrLmS was cloned using combined in vitro crossing between avirulent and virulent isolates with sequencing of DNA bulks from avirulent or virulent progeny (bulked segregant sequencing). AvrLep2 was cloned using a biparental cross of avirulent and virulent L. maculans isolates and a classical map-based cloning approach. Taking these two approaches independently, we found that AvrLmS and AvrLep2 are the same gene. Complementation of virulent isolates with this gene confirmed its role in inducing resistance on Surpass 400, Topas-LepR2, and an RlmS-line. The gene, renamed AvrLmS-Lep2, encodes a small cysteine-rich protein of unknown function with an N-terminal secretory signal peptide, which is a common feature of the majority of effectors from extracellular fungal plant pathogens. The AvrLmS-Lep2/LepR2 interaction phenotype was found to vary from a typical hypersensitive response through intermediate resistance sometimes towards susceptibility, depending on the inoculation conditions. AvrLmS-Lep2 was nevertheless sufficient to significantly slow the systemic growth of the pathogen and reduce the stem lesion size on plant genotypes with LepR2, indicating the potential efficiency of this resistance to control the disease in the field.
Asunto(s)
Ascomicetos , Brassica napus , Ascomicetos/genética , Brassica napus/genética , Brassica napus/microbiología , Clonación Molecular , Leptosphaeria , Enfermedades de las Plantas/microbiologíaRESUMEN
In nature, single-strand breaks (SSBs) in DNA occur more frequently (by orders of magnitude) than double-strand breaks (DSBs). SSBs induced by the CRISPR/Cas9 nickase at a distance of 50-100 bp on opposite strands are highly mutagenic, leading to insertions/deletions (InDels), with insertions mainly occurring as direct tandem duplications. As short tandem repeats are overrepresented in plant genomes, this mechanism seems to be important for genome evolution. We investigated the distance at which paired 5'-overhanging SSBs are mutagenic and which DNA repair pathways are essential for insertion formation in Arabidopsis thaliana. We were able to detect InDel formation up to a distance of 250 bp, although with much reduced efficiency. Surprisingly, the loss of the classical nonhomologous end joining (NHEJ) pathway factors KU70 or DNA ligase 4 completely abolished tandem repeat formation. The microhomology-mediated NHEJ factor POLQ was required only for patch-like insertions, which are well-known from DSB repair as templated insertions from ectopic sites. As SSBs can also be repaired using homology, we furthermore asked whether the classical homologous recombination (HR) pathway is involved in this process in plants. The fact that RAD54 is not required for homology-mediated SSB repair demonstrates that the mechanisms for DSB- and SSB-induced HR differ in plants.
Asunto(s)
Arabidopsis/genética , Roturas del ADN de Cadena Simple , Reparación del ADN , ADN de Plantas/genética , Genoma de Planta , ADN de Plantas/químicaRESUMEN
Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.
Asunto(s)
Brassica napus , Brassica , Brassica/genética , Brassica napus/genética , Diploidia , Genoma de Planta/genética , PoliploidíaRESUMEN
Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional approximately 66.5-Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression levels are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based genome-wide association studies identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size quantitative trait locus located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.
Asunto(s)
Pollos , Estudio de Asociación del Genoma Completo , Animales , Tamaño Corporal/genética , Pollos/genética , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas , Sitios de Carácter CuantitativoRESUMEN
Climate change is increasingly impacting ecosystems globally. Understanding adaptive genetic diversity and whether it will keep pace with projected climatic change is necessary to assess species' vulnerability and design efficient mitigation strategies such as assisted adaptation. Kelp forests are the foundations of temperate reefs globally but are declining in many regions due to climate stress. A lack of knowledge of kelp's adaptive genetic diversity hinders assessment of vulnerability under extant and future climates. Using 4245 single nucleotide polymorphisms (SNPs), we characterized patterns of neutral and putative adaptive genetic diversity for the dominant kelp in the southern hemisphere (Ecklonia radiata) from ~1000 km of coastline off Western Australia. Strong population structure and isolation-by-distance was underpinned by significant signatures of selection related to temperature and light. Gradient forest analysis of temperature-linked SNPs under selection revealed a strong association with mean annual temperature range, suggesting adaptation to local thermal environments. Critically, modelling revealed that predicted climate-mediated temperature changes will probably result in high genomic vulnerability via a mismatch between current and future predicted genotype-environment relationships such that kelp forests off Western Australia will need to significantly adapt to keep pace with projected climate change. Proactive management techniques such as assisted adaptation to boost resilience may be required to secure the future of these kelp forests and the immense ecological and economic values they support.
Asunto(s)
Kelp , Cambio Climático , Ecosistema , Bosques , Genotipo , Kelp/genéticaRESUMEN
The innate and adaptive immune response are regulated by biological clocks, and circulating lymphocytes are lowest at sunrise. Accordingly, severity of disease in mouse models is highly dependent on the time of day of viral infection. Here, we explore whether circadian immunity contributes significantly to seasonality of respiratory viruses, including influenza and SARS-CoV-2. Susceptibility-Infection-Recovery-Susceptibility (SIRS) models of influenza and SIRS-derived models of COVID-19 suggest that local sunrise time is a better predictor of the basic reproductive number (R0) than climate, even when day length is taken into account. Moreover, these models predict a window of susceptibility when local sunrise time corresponds to the morning commute and contact rate is expected to be high. Counterfactual modeling suggests that retaining daylight savings time in the fall would reduce the length of this window, and substantially reduce seasonal waves of respiratory infections.
RESUMEN
BACKGROUND: Brassica napus is an important oilseed crop cultivated worldwide. During domestication and breeding of B. napus, flowering time has been a target of selection because of its substantial impact on yield. Here we use double digest restriction-site associated DNA sequencing (ddRAD) to investigate the genetic basis of flowering in B. napus. An F2 mapping population was derived from a cross between an early-flowering spring type and a late-flowering winter type. RESULTS: Flowering time in the mapping population differed by up to 25 days between individuals. High genotype error rates persisted after initial quality controls, as suggested by a genotype discordance of ~ 12% between biological sequencing replicates. After genotype error correction, a linkage map spanning 3981.31 cM and compromising 14,630 single nucleotide polymorphisms (SNPs) was constructed. A quantitative trait locus (QTL) on chromosome C2 was detected, covering eight flowering time genes including FLC. CONCLUSIONS: These findings demonstrate the effectiveness of the ddRAD approach to sample the B. napus genome. Our results also suggest that ddRAD genotype error rates can be higher than expected in F2 populations. Quality filtering and genotype correction and imputation can substantially reduce these error rates and allow effective linkage mapping and QTL analysis.