RESUMEN
The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes and diseases1. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an primate model system that is widely used in biomedical research2,3. The full spectrum of heterozygosity between the two haplotypes involves 1.36% of the genome-much higher than the 0.13% indicated by the standard estimation based on single-nucleotide heterozygosity alone. The de novo mutation rate is 0.43 × 10-8 per site per generation, and the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex-differentiation region and unique evolutionary changes in the marmoset Y chromosome. In addition, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain-related genes were highly conserved between marmosets and humans, although several genes experienced lineage-specific copy number variations or diversifying selection, with implications for the use of marmosets as a model system.
Asunto(s)
Callithrix/genética , Diploidia , Evolución Molecular , Genoma/genética , Genómica/normas , Animales , Investigación Biomédica , Variaciones en el Número de Copia de ADN , Femenino , Mutación de Línea Germinal/genética , Haplotipos/genética , Heterocigoto , Humanos , Mutación INDEL/genética , Masculino , Estándares de Referencia , Selección Genética , Diferenciación Sexual/genética , Cromosoma Y/genéticaRESUMEN
BACKGROUND: The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS: We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS: We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS: We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Asunto(s)
Pollos , Genoma , Animales , Pollos/genética , Genotipo , Análisis de Secuencia de ADN , GenómicaRESUMEN
Once abandoned, urban and post-industrial lands can undergo a re-greening, the natural regeneration and succession that leads to surprisingly healthy plant communities, but this process is dependent upon microbial activity and the health of the parent soil. This study aimed to evaluate the effects of arbuscular mycorrhizal fungi (AMF) in facilitating plant production in post-industrial soils. In so doing, we helped to resolve the mechanism through which AMF ameliorate environmental stress in terrestrial plants. An experiment was established in which rye grass (Lolium perenne) was grown in two heavy metal-contaminated soils from an urban brownfield in New Jersey, USA, and one non-contaminated control soil. One set of the treatments received an AMF inoculum (four species in a commercial mix: Glomus intraradices, G. mosseae, G. etunicatum and G. aggregatum) and the other did not. Upon harvest, dried plant biomass, root/shoot ratio, AMF colonization, and extracellular soil phosphatase activity, a proxy for soil microbial functioning, were all measured. Plant biomass increased across all treatments inoculated with AMF, with a significantly higher average shoot and root mass compared to non-inoculated treatments. AMF colonization of the roots in contaminated soil was significantly higher than colonization in control soil, and the root/shoot ratio of plants in contaminated soils was also higher when colonized by AMF. Mycorrhizal infection may help plants to overcome the production limits of post-industrial soils as is seen here with increased infection and growth. The application of this mechanistic understanding to remediation and restoration strategies will improve soil health and plant production in urban environments.
Asunto(s)
Metales Pesados , Micorrizas , Contaminantes del Suelo , Micorrizas/química , Suelo , Metales Pesados/análisis , Plantas/microbiología , Biomasa , Raíces de Plantas/microbiología , Contaminantes del Suelo/análisisRESUMEN
The Aeolian wall lizard, Podarcis raffonei, is an endangered species endemic to the Aeolian archipelago, Italy, where it is present only in 3 tiny islets and a narrow promontory of a larger island. Because of the extremely limited area of occupancy, severe population fragmentation and observed decline, it has been classified as Critically Endangered by the International Union for the Conservation of Nature (IUCN). Using Pacific Biosciences (PacBio) High Fidelity (HiFi) long-read sequencing, Bionano optical mapping and Arima chromatin conformation capture sequencing (Hi-C), we produced a high-quality, chromosome-scale reference genome for the Aeolian wall lizard, including Z and W sexual chromosomes. The final assembly spans 1.51 Gb across 28 scaffolds with a contig N50 of 61.4 Mb, a scaffold N50 of 93.6 Mb, and a BUSCO completeness score of 97.3%. This genome constitutes a valuable resource for the species to guide potential conservation efforts and more generally for the squamate reptiles that are underrepresented in terms of available high-quality genomic resources.
Asunto(s)
Genoma , Lagartos , Animales , Cromosomas/genética , Genómica , Anotación de Secuencia Molecular , Lagartos/genética , Cromosomas SexualesRESUMEN
The European mink Mustela lutreola (Mustelidae) ranks among the most endangered mammalian species globally, experiencing a rapid and severe decline in population size, density, and distribution. Given the critical need for effective conservation strategies, understanding its genomic characteristics becomes paramount. To address this challenge, the platinum-quality, chromosome-level reference genome assembly for the European mink was successfully generated under the project of the European Mink Centre consortium. Leveraging PacBio HiFi long reads, we obtained a 2586.3 Mbp genome comprising 25 scaffolds, with an N50 length of 154.1 Mbp. Through Hi-C data, we clustered and ordered the majority of the assembly (>99.9%) into 20 chromosomal pseudomolecules, including heterosomes, ranging from 6.8 to 290.1 Mbp. The newly sequenced genome displays a GC base content of 41.9%. Additionally, we successfully assembled the complete mitochondrial genome, spanning 16.6 kbp in length. The assembly achieved a BUSCO (Benchmarking Universal Single-Copy Orthologs) completeness score of 98.2%. This high-quality reference genome serves as a valuable genomic resource for future population genomics studies concerning the European mink and related taxa. Furthermore, the newly assembled genome holds significant potential in addressing key conservation challenges faced by M. lutreola. Its applications encompass potential revision of management units, assessment of captive breeding impacts, resolution of phylogeographic questions, and facilitation of monitoring and evaluating the efficiency and effectiveness of dedicated conservation strategies for the European mink. This species serves as an example that highlights the paramount importance of prioritizing endangered species in genome sequencing projects due to the race against time, which necessitates the comprehensive exploration and characterization of their genomic resources before their populations face extinction.
Asunto(s)
Especies en Peligro de Extinción , Visón , Animales , Visón/genética , Platino (Metal) , Conservación de los Recursos Naturales , GenómicaRESUMEN
RNA isoforms influence cell identity and function. However, a comprehensive brain isoform map was lacking. We analyze single-cell RNA isoforms across brain regions, cell subtypes, developmental time points and species. For 72% of genes, full-length isoform expression varies along one or more axes. Splicing, transcription start and polyadenylation sites vary strongly between cell types, influence protein architecture and associate with disease-linked variation. Additionally, neurotransmitter transport and synapse turnover genes harbor cell-type variability across anatomical regions. Regulation of cell-type-specific splicing is pronounced in the postnatal day 21-to-postnatal day 28 adolescent transition. Developmental isoform regulation is stronger than regional regulation for the same cell type. Cell-type-specific isoform regulation in mice is mostly maintained in the human hippocampus, allowing extrapolation to the human brain. Conversely, the human brain harbors additional cell-type specificity, suggesting gain-of-function isoforms. Together, this detailed single-cell atlas of full-length isoform regulation across development, anatomical regions and species reveals an unappreciated degree of isoform variability across multiple axes.
Asunto(s)
Encéfalo , Análisis de la Célula Individual , Animales , Humanos , Ratones , Encéfalo/metabolismo , Encéfalo/crecimiento & desarrollo , Análisis de la Célula Individual/métodos , Empalme del ARN/genética , Isoformas de ARN/genética , Empalme Alternativo/genética , Masculino , Ratones Endogámicos C57BLRESUMEN
Vocal rhythm plays a fundamental role in sexual selection and species recognition in birds, but little is known of its genetic basis due to the confounding effect of vocal learning in model systems. Uncovering its genetic basis could facilitate identifying genes potentially important in speciation. Here we investigate the genomic underpinnings of rhythm in vocal non-learning Pogoniulus tinkerbirds using 135 individual whole genomes distributed across a southern African hybrid zone. We find rhythm speed is associated with two genes that are also known to affect human speech, Neurexin-1 and Coenzyme Q8A. Models leveraging ancestry reveal these candidate loci also impact rhythmic stability, a trait linked with motor performance which is an indicator of quality. Character displacement in rhythmic stability suggests possible reinforcement against hybridization, supported by evidence of asymmetric assortative mating in the species producing faster, more stable rhythms. Because rhythm is omnipresent in animal communication, candidate genes identified here may shape vocal rhythm across birds and other vertebrates.
Asunto(s)
Vocalización Animal , Animales , Vocalización Animal/fisiología , Masculino , Genómica , Genoma/genética , Femenino , Pájaros Cantores/genética , Pájaros Cantores/fisiología , Aves/genética , Aves/fisiologíaRESUMEN
Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.
Asunto(s)
Cromosomas , Musarañas , Animales , Ratones , Cromosomas/genética , Genoma , Genómica , Anotación de Secuencia Molecular , Musarañas/genéticaRESUMEN
Sex-limited polymorphism has evolved in many species including our own. Yet, we lack a detailed understanding of the underlying genetic variation and evolutionary processes at work. The brood parasitic common cuckoo (Cuculus canorus) is a prime example of female-limited color polymorphism, where adult males are monochromatic gray and females exhibit either gray or rufous plumage. This polymorphism has been hypothesized to be governed by negative frequency-dependent selection whereby the rarer female morph is protected against harassment by males or from mobbing by parasitized host species. Here, we show that female plumage dichromatism maps to the female-restricted genome. We further demonstrate that, consistent with balancing selection, ancestry of the rufous phenotype is shared with the likewise female dichromatic sister species, the oriental cuckoo (Cuculus optatus). This study shows that sex-specific polymorphism in trait variation can be resolved by genetic variation residing on a sex-limited chromosome and be maintained across species boundaries.
Asunto(s)
Polimorfismo Genético , Animales , Femenino , Masculino , Aves/genética , Fenotipo , Evolución Biológica , Pigmentación/genética , Caracteres Sexuales , Evolución MolecularRESUMEN
RNA isoforms influence cell identity and function. Until recently, technological limitations prevented a genome-wide appraisal of isoform influence on cell identity in various parts of the brain. Using enhanced long-read single-cell isoform sequencing, we comprehensively analyze RNA isoforms in multiple mouse brain regions, cell subtypes, and developmental timepoints from postnatal day 14 (P14) to adult (P56). For 75% of genes, full-length isoform expression varies along one or more axes of phenotypic origin, underscoring the pervasiveness of isoform regulation across multiple scales. As expected, splicing varies strongly between cell types. However, certain gene classes including neurotransmitter release and reuptake as well as synapse turnover, harbor significant variability in the same cell type across anatomical regions, suggesting differences in network activity may influence cell-type identity. Glial brain-region specificity in isoform expression includes strong poly(A)-site regulation, whereas neurons have stronger TSS regulation. Furthermore, developmental patterns of cell-type specific splicing are especially pronounced in the murine adolescent transition from P21 to P28. The same cell type traced across development shows more isoform variability than across adult anatomical regions, indicating a coordinated modulation of functional programs dictating neural development. As most cell-type specific exons in P56 mouse hippocampus behave similarly in newly generated data from human hippocampi, these principles may be extrapolated to human brain. However, human brains have evolved additional cell-type specificity in splicing, suggesting gain-of-function isoforms. Taken together, we present a detailed single-cell atlas of full-length brain isoform regulation across development and anatomical regions, providing a previously unappreciated degree of isoform variability across multiple scales of the brain.
RESUMEN
Chub mackerels (Scomber japonicus) are a migratory marine fish widely distributed in the Indo-Pacific Ocean. They are globally consumed for their high Omega-3 content, but their population is declining due to global warming. Here, we generated the first chromosome-level genome assembly of chub mackerel (fScoJap1) using the Vertebrate Genomes Project assembly pipeline with PacBio HiFi genomic sequencing and Arima Hi-C chromosome contact data. The final assembly is 828.68 Mb with 24 chromosomes, nearly all containing telomeric repeats at their ends. We annotated 31,656 genes and discovered that approximately 2.19% of the genome contained DNA transposon elements repressed within duplicated genes. Analyzing 5-methylcytosine (5mC) modifications using HiFi reads, we observed open/close chromatin patterns at gene promoters, including the FADS2 gene involved in Omega-3 production. This chromosome-level reference genome provides unprecedented opportunities for advancing our knowledge of chub mackerels in biology, industry, and conservation.
Asunto(s)
Cyprinidae , Genoma , Perciformes , Animales , Cromosomas , Cyprinidae/genética , Océano Pacífico , Perciformes/genéticaRESUMEN
Amidst the current biodiversity crisis, the availability of genomic resources for declining species can provide important insights into the factors driving population decline. In the early 1990s, the black-legged kittiwake (Rissa tridactyla), a pelagic gull widely distributed across the arctic, subarctic, and temperate zones, suffered a steep population decline following an abrupt warming of sea surface temperature across its distribution range and is currently listed as Vulnerable by the International Union for the Conservation of Nature. Kittiwakes have long been the focus for field studies of physiology, ecology, and ecotoxicology and are primary indicators of fluctuating ecological conditions in arctic and subarctic marine ecosystems. We present a high-quality chromosome-level reference genome and annotation for the black-legged kittiwake using a combination of Pacific Biosciences HiFi sequencing, Bionano optical maps, Hi-C reads, and RNA-Seq data. The final assembly spans 1.35â Gb across 32 chromosomes, with a scaffold N50 of 88.21â Mb and a BUSCO completeness of 97.4%. This genome assembly substantially improves the quality of a previous draft genome, showing an approximately 5× increase in contiguity and a more complete annotation. Using this new chromosome-level reference genome and three more chromosome-level assemblies of Charadriiformes, we uncover several lineage-specific chromosome fusions and fissions, but find no shared rearrangements, suggesting that interchromosomal rearrangements have been commonplace throughout the diversification of Charadriiformes. This new high-quality genome assembly will enable population genomic, transcriptomic, and phenotype-genotype association studies in a widely studied sentinel species, which may provide important insights into the impacts of global change on marine systems.
Asunto(s)
Charadriiformes , Animales , Charadriiformes/genética , Ecosistema , Reordenamiento Génico , Genómica , Cromosomas/genéticaRESUMEN
Sharks occupy diverse ecological niches and play critical roles in marine ecosystems, often acting as apex predators. They are considered a slow-evolving lineage and have been suggested to exhibit exceptionally low cancer rates. These two features could be explained by a low nuclear mutation rate. Here, we provide a direct estimate of the nuclear mutation rate in the epaulette shark (Hemiscyllium ocellatum). We generate a high-quality reference genome, and resequence the whole genomes of parents and nine offspring to detect de novo mutations. Using stringent criteria, we estimate a mutation rate of 7×10-10 per base pair, per generation. This represents one of the lowest directly estimated mutation rates for any vertebrate clade, indicating that this basal vertebrate group is indeed a slowly evolving lineage whose ability to restore genetic diversity following a sustained population bottleneck may be hampered by a low mutation rate.
Asunto(s)
Tasa de Mutación , Tiburones , Animales , Tiburones/genética , EcosistemaRESUMEN
Programmed DNA loss is a gene silencing mechanism that is employed by several vertebrate and nonvertebrate lineages, including all living jawless vertebrates and songbirds. Reconstructing the evolution of somatically eliminated (germline-specific) sequences in these species has proven challenging due to a high content of repeats and gene duplications in eliminated sequences and a corresponding lack of highly accurate and contiguous assemblies for these regions. Here, we present an improved assembly of the sea lamprey (Petromyzon marinus) genome that was generated using recently standardized methods that increase the contiguity and accuracy of vertebrate genome assemblies. This assembly resolves highly contiguous, somatically retained chromosomes and at least one germline-specific chromosome, permitting new analyses that reconstruct the timing, mode, and repercussions of recruitment of genes to the germline-specific fraction. These analyses reveal major roles of interchromosomal segmental duplication, intrachromosomal duplication, and positive selection for germline functions in the long-term evolution of germline-specific chromosomes.
Asunto(s)
Petromyzon , Animales , Petromyzon/genética , Cromosomas/genética , ADN/genética , Genoma , Vertebrados/genética , Células Germinativas , Evolución Molecular , FilogeniaRESUMEN
Insights into the evolution of non-model organisms are limited by the lack of reference genomes of high accuracy, completeness, and contiguity. Here, we present a chromosome-level, karyotype-validated reference genome and pangenome for the barn swallow (Hirundo rustica). We complement these resources with a reference-free multialignment of the reference genome with other bird genomes and with the most comprehensive catalog of genetic markers for the barn swallow. We identify potentially conserved and accelerated genes using the multialignment and estimate genome-wide linkage disequilibrium using the catalog. We use the pangenome to infer core and accessory genes and to detect variants using it as a reference. Overall, these resources will foster population genomics studies in the barn swallow, enable detection of candidate genes in comparative genomics studies, and help reduce bias toward a single reference genome.
Asunto(s)
Golondrinas , Animales , Golondrinas/genética , Metagenómica , Genoma/genética , Genómica , CromosomasRESUMEN
Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).
RESUMEN
BACKGROUND: Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types. RESULTS: We find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4°C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4°C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield. CONCLUSION: We provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all â¼70,000 extant vertebrate species.
Asunto(s)
Benchmarking , Dimetilsulfóxido , Animales , ADN/genética , Ácido Edético , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Peso Molecular , Análisis de Secuencia de ADN/métodosRESUMEN
Single-nuclei RNA sequencing characterizes cell types at the gene level. However, compared to single-cell approaches, many single-nuclei cDNAs are purely intronic, lack barcodes and hinder the study of isoforms. Here we present single-nuclei isoform RNA sequencing (SnISOr-Seq). Using microfluidics, PCR-based artifact removal, target enrichment and long-read sequencing, SnISOr-Seq increased barcoded, exon-spanning long reads 7.5-fold compared to naive long-read single-nuclei sequencing. We applied SnISOr-Seq to adult human frontal cortex and found that exons associated with autism exhibit coordinated and highly cell-type-specific inclusion. We found two distinct combination patterns: those distinguishing neural cell types, enriched in TSS-exon, exon-polyadenylation-site and non-adjacent exon pairs, and those with multiple configurations within one cell type, enriched in adjacent exon pairs. Finally, we observed that human-specific exons are almost as tightly coordinated as conserved exons, implying that coordination can be rapidly established during evolution. SnISOr-Seq enables cell-type-specific long-read isoform analysis in human brain and in any frozen or hard-to-dissociate sample.
Asunto(s)
Encéfalo , ARN , Empalme Alternativo/genética , Encéfalo/metabolismo , Exones/genética , Humanos , Isoformas de Proteínas/genética , ARN/genética , Análisis de Secuencia de ARNRESUMEN
BACKGROUND: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. RESULTS: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. CONCLUSIONS: Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.