RESUMO
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Assuntos
Cromossomos Humanos Par 8/química , Cromossomos Humanos Par 8/genética , Evolução Molecular , Animais , Linhagem Celular , Centrômero/química , Centrômero/genética , Centrômero/metabolismo , Cromossomos Humanos Par 8/fisiologia , Metilação de DNA , DNA Satélite/genética , Epigênese Genética , Feminino , Humanos , Macaca mulatta/genética , Masculino , Repetições Minissatélites/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telômero/química , Telômero/genética , Telômero/metabolismoRESUMO
Satellite DNAs (SatDNA) are ubiquitously present in eukaryotic genomes and have been recently associated with several biological roles. Understanding the evolution and significance of SatDNA requires an extensive comparison across multiple phylogenetic depths. We combined the RepeatExplorer pipeline and cytogenetic approaches to conduct a comprehensive identification and analysis of the satellitome in 37 species from the genus Drosophila. We identified 188 SatDNA-like families, 112 of them being characterized for the first time. Repeat analysis within a phylogenetic framework has revealed the deeply divergent nature of SatDNA sequences in the Drosophila genus. The SatDNA content varied from 0.54% of the D. arizonae genome to 38.8% of the D. albomicans genome, with the SatDNA content often following a phylogenetic signal. Monomer size and guanine-cytosine-content also showed extreme variation ranging 2-570â bp and 9.1-71.4%, respectively. SatDNA families are shared among closely related species, consistent with the SatDNA library hypothesis. However, we uncovered the emergence of species-specific SatDNA families through amplification of unique or low abundant sequences in a lineage. Finally, we found that genome sizes of the Sophophora subgenus are positively correlated with transposable element content, whereas genome size in the Drosophila subgenus is positively correlated with SatDNA. This finding indicates genome size could be driven by different categories of repetitive elements in each subgenus. Altogether, we conducted the most comprehensive satellitome analysis in Drosophila from a phylogenetic perspective and generated the largest catalog of SatDNA sequences to date, enabling future discoveries in SatDNA evolution and Drosophila genome architecture.
Assuntos
Drosophila , Evolução Molecular , Animais , Elementos de DNA Transponíveis/genética , DNA Satélite/genética , Drosophila/genética , Humanos , FilogeniaRESUMO
Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.
Assuntos
Epigênese Genética , Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Telômero/genética , Transcrição Gênica , HumanosRESUMO
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Assuntos
Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA/normas , Linhagem Celular , Cromossomos Artificiais Bacterianos/genética , Cromossomos Humanos/genética , Humanos , Valores de ReferênciaRESUMO
Satellite DNAs (satDNAs) are a ubiquitous feature of eukaryotic genomes and are usually the major components of constitutive heterochromatin. The 1.688 satDNA, also known as the 359 bp satellite, is one of the most abundant repetitive sequences in Drosophila melanogaster and has been linked to several different biological functions. We investigated the presence and evolution of the 1.688 satDNA in 16 Drosophila genomes. We find that the 1.688 satDNA family is much more ancient than previously appreciated, being shared among part of the melanogaster group that diverged from a common ancestor â¼27 Mya. We found that the 1.688 satDNA family has two major subfamilies spread throughout Drosophila phylogeny (â¼360 bp and â¼190 bp). Phylogenetic analysis of â¼10,000 repeats extracted from 14 of the species revealed that the 1.688 satDNA family is present within heterochromatin and euchromatin. A high number of euchromatic repeats are gene proximal, suggesting the potential for local gene regulation. Notably, heterochromatic copies display concerted evolution and a species-specific pattern, whereas euchromatic repeats display a more typical evolutionary pattern, suggesting that chromatin domains may influence the evolution of these sequences. Overall, our data indicate the 1.688 satDNA as the most perduring satDNA family described in Drosophila phylogeny to date. Our study provides a strong foundation for future work on the functional roles of 1.688 satDNA across many Drosophila species.
Assuntos
DNA Satélite , Drosophila , Animais , DNA Satélite/genética , Drosophila/genética , Drosophila melanogaster/genética , Evolução Molecular , Filogenia , Sequências Repetitivas de Ácido NucleicoRESUMO
Eukaryote genomes are replete with repetitive DNAs. This class includes tandemly repeated satellite DNAs (satDNA) which are among the most abundant, fast evolving (yet poorly studied) genomic components. Here, we used high-throughput sequencing data from three cactophilic Drosophila species, D. buzzatii, D. seriema, and D. mojavensis, to access and study their whole satDNA landscape. In total, the RepeatExplorer software identified five satDNAs, three previously described (pBuM, DBC-150 and CDSTR198) and two novel ones (CDSTR138 and CDSTR130). Only pBuM is shared among all three species. The satDNA repeat length falls within only two classes, between 130 and 200 bp or between 340 and 390 bp. FISH on metaphase and polytene chromosomes revealed the presence of satDNA arrays in at least one of the following genomic compartments: centromeric, telomeric, subtelomeric, or dispersed along euchromatin. The chromosomal distribution ranges from a single chromosome to almost all chromosomes of the complement. Fiber-FISH and sequence analysis of contigs revealed interspersion between pBuM and CDSTR130 in the microchromosomes of D. mojavensis Phylogenetic analyses showed that the pBuM satDNA underwent concerted evolution at both interspecific and intraspecific levels. Based on RNA-seq data, we found transcription activity for pBuM (in D. mojavensis) and CDSTR198 (in D. buzzatii) in all five analyzed developmental stages, most notably in pupae and adult males. Our data revealed that cactophilic Drosophila present the lowest amount of satDNAs (1.9-2.9%) within the Drosophila genus reported so far. We discuss how our findings on the satDNA location, abundance, organization, and transcription activity may be related to functional aspects.
Assuntos
DNA Satélite/genética , Drosophila/genética , Genoma de Inseto , Análise de Sequência de DNA , Animais , Cactaceae , Centrômero/metabolismo , Sondas de DNA/metabolismo , Elementos de DNA Transponíveis/genética , Evolução Molecular , Hibridização in Situ Fluorescente , Filogenia , Cromossomos Politênicos/genética , Sequências Repetitivas de Ácido Nucleico/genética , Especificidade da Espécie , Telômero/genética , Transcrição GênicaRESUMO
Cactophilic Drosophila species provide a valuable model to study gene-environment interactions and ecological adaptation. Drosophila buzzatii and Drosophila mojavensis are two cactophilic species that belong to the repleta group, but have very different geographical distributions and primary host plants. To investigate the genomic basis of ecological adaptation, we sequenced the genome and developmental transcriptome of D. buzzatii and compared its gene content with that of D. mojavensis and two other noncactophilic Drosophila species in the same subgenus. The newly sequenced D. buzzatii genome (161.5 Mb) comprises 826 scaffolds (>3 kb) and contains 13,657 annotated protein-coding genes. Using RNA sequencing data of five life-stages we found expression of 15,026 genes, 80% protein-coding genes, and 20% noncoding RNA genes. In total, we detected 1,294 genes putatively under positive selection. Interestingly, among genes under positive selection in the D. mojavensis lineage, there is an excess of genes involved in metabolism of heterocyclic compounds that are abundant in Stenocereus cacti and toxic to nonresident Drosophila species. We found 117 orphan genes in the shared D. buzzatii-D. mojavensis lineage. In addition, gene duplication analysis identified lineage-specific expanded families with functional annotations associated with proteolysis, zinc ion binding, chitin binding, sensory perception, ethanol tolerance, immunity, physiology, and reproduction. In summary, we identified genetic signatures of adaptation in the shared D. buzzatii-D. mojavensis lineage, and in the two separate D. buzzatii and D. mojavensis lineages. Many of the novel lineage-specific genomic features are promising candidates for explaining the adaptation of these species to their distinct ecological niches.