Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
bioRxiv ; 2024 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-39026747

RESUMO

Since their initial discovery in maize, transposable elements (TEs) have emerged as being integral to the evolution of maize, accounting for 80% of its genome. However, the repetitive nature of TEs has hindered our understanding of their regulatory potential. Here, we demonstrate that long-read chromatin fiber sequencing (Fiber-seq) permits the comprehensive annotation of the regulatory potential of maize TEs. We uncover that only 94 LTR retrotransposons contain the functional epigenetic architecture required for mobilization within maize leaves. This epigenetic architecture degenerates with evolutionary age, resulting in solo TE enhancers being preferentially marked by simultaneous hyper-CpG methylation and chromatin accessibility, an architecture markedly divergent from canonical enhancers. We find that TEs shape maize gene regulation by creating novel promoters within the TE itself as well as through TE-mediated gene amplification. Lastly, we uncover a pervasive epigenetic code directing TEs to specific loci, including that locus that sparked McClintock's discovery of TEs.

2.
bioRxiv ; 2024 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-39026856

RESUMO

Accurately quantifying the functional consequences of non-coding mosaic variants requires the pairing of DNA sequence with both accessible and closed chromatin architectures along individual DNA molecules-a pairing that cannot be achieved using traditional fragmentation-based chromatin assays. We demonstrate that targeted single-molecule chromatin fiber sequencing (Fiber-seq) achieves this, permitting single-molecule, long-read genomic and epigenomic profiling across targeted >100 kilobase loci with ~10-fold enrichment over untargeted sequencing. Targeted Fiber-seq reveals that pathogenic expansions of the DMPK CTG repeat that underlie Myotonic Dystrophy 1 are characterized by somatic instability and disruption of multiple nearby regulatory elements, both of which are repeat length-dependent. Furthermore, we reveal that therapeutic adenine base editing of the segmentally duplicated γ-globin (HBG1/HBG2) promoters in primary human hematopoietic cells induced towards an erythroblast lineage increases the accessibility of the HBG1 promoter as well as neighboring regulatory elements. Overall, we find that these non-protein coding mosaic variants can have complex impacts on chromatin architectures, including extending beyond the regulatory element harboring the variant.

3.
Genome Res ; 2024 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-38849157

RESUMO

Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N 6 -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.

4.
Nat Genet ; 56(5): 877-888, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38714869

RESUMO

Thyrotropin (TSH) is the master regulator of thyroid gland growth and function. Resistance to TSH (RTSH) describes conditions with reduced sensitivity to TSH. Dominantly inherited RTSH has been linked to a locus on chromosome 15q, but its genetic basis has remained elusive. Here we show that non-coding mutations in a (TTTG)4 short tandem repeat (STR) underlie dominantly inherited RTSH in all 82 affected participants from 12 unrelated families. The STR is contained in a primate-specific Alu retrotransposon with thyroid-specific cis-regulatory chromatin features. Fiber-seq and RNA-seq studies revealed that the mutant STR activates a thyroid-specific enhancer cluster, leading to haplotype-specific upregulation of the bicistronic MIR7-2/MIR1179 locus 35 kb downstream and overexpression of its microRNA products in the participants' thyrocytes. An imbalance in signaling pathways targeted by these micro-RNAs provides a working model for this cause of RTSH. This finding broadens our current knowledge of genetic defects altering pituitary-thyroid feedback regulation.


Assuntos
Cromossomos Humanos Par 15 , Elementos Facilitadores Genéticos , MicroRNAs , Repetições de Microssatélites , Mutação , Tireotropina , Animais , Feminino , Humanos , Masculino , Cromossomos Humanos Par 15/genética , MicroRNAs/genética , Repetições de Microssatélites/genética , Linhagem , Primatas/genética , Glândula Tireoide/metabolismo , Tireotropina/genética
5.
bioRxiv ; 2023 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-37808736

RESUMO

Resolving the molecular basis of a Mendelian condition (MC) remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome, and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion, and structural variant calling and diploid de novo genome assembly, and permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility, and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network (UDN) participant with a chromosome X;13 balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (NBEA, PDK3, MAB21L1, and RB1) previously associated with single-gene MCs. Notably, the function of each gene was disrupted via a distinct mechanism that required integration of the four 'omes' to resolve. These included nonsense-mediated decay, fusion transcript formation, enhancer adoption, transcriptional readthrough silencing, and inappropriate X chromosome inactivation of autosomal genes. Overall, this highlights the utility of synchronized long-read multi-omic profiling for mechanistically resolving complex phenotypes.

6.
Genome Biol ; 24(1): 157, 2023 07 04.
Artigo em Inglês | MEDLINE | ID: mdl-37403156

RESUMO

BACKGROUND: The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS: Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION: Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.


Assuntos
Genoma Humano , Genômica , Animais , Humanos , Duplicações Segmentares Genômicas , Família Multigênica , Centrômero/genética , Subfamília C de Receptores Semelhantes a Lectina de Células NK/genética
7.
Genome Res ; 33(4): 496-510, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37164484

RESUMO

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.


Assuntos
DNA Satélite , Polimorfismo Genético , Humanos , DNA Satélite/genética , Haplótipos , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
8.
bioRxiv ; 2023 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-37131601

RESUMO

Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation as well as the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as co-processing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semi-supervised convolutional neural network for fast and accurate identification of m6A-marked bases using PacBio single-molecule long-read sequencing, as well as the co-processing of long-read genetic and epigenetic data produced using either PacBio or Oxford Nanopore sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kilobase long DNA molecules with a ~1,000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.

9.
Nature ; 617(7960): 325-334, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37165237

RESUMO

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.


Assuntos
Conversão Gênica , Mutação , Duplicações Segmentares Genômicas , Humanos , Conversão Gênica/genética , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos/genética , Éxons/genética , Citosina/química , Guanina/química , Ilhas de CpG/genética
10.
Nature ; 617(7960): 312-324, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37165242

RESUMO

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Assuntos
Genoma Humano , Genômica , Humanos , Diploide , Genoma Humano/genética , Haplótipos/genética , Análise de Sequência de DNA , Genômica/normas , Padrões de Referência , Estudos de Coortes , Alelos , Variação Genética
11.
Nature ; 611(7936): 519-531, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética
12.
Science ; 376(6588): eabl4178, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357911

RESUMO

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.


Assuntos
Centrômero/genética , Mapeamento Cromossômico , Epigênese Genética , Genoma Humano , Evolução Molecular , Genômica , Humanos , Sequências Repetitivas de Ácido Nucleico
13.
Science ; 376(6588): eabj5089, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357915

RESUMO

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.


Assuntos
Ilhas de CpG , Metilação de DNA , Epigênese Genética , Genoma Humano , Centrômero/genética , Centrômero/metabolismo , Doença/genética , Loci Gênicos , Genômica/normas , Humanos , Padrões de Referência , Análise de Sequência de DNA
14.
Science ; 376(6588): eabj6965, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357917

RESUMO

Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.


Assuntos
Variações do Número de Cópias de DNA , Duplicação Gênica , Genoma Humano , Duplicações Segmentares Genômicas , Evolução Molecular , Proteínas Ativadoras de GTPase/genética , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas/genética
15.
Science ; 376(6588): 44-53, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357919

RESUMO

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Assuntos
Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA/normas , Linhagem Celular , Cromossomos Artificiais Bacterianos/genética , Cromossomos Humanos/genética , Humanos , Valores de Referência
16.
Science ; 376(6588): eabk3112, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357925

RESUMO

Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.


Assuntos
Epigênese Genética , Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Telômero/genética , Transcrição Gênica , Humanos
17.
Science ; 376(6588): eabl3533, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357935

RESUMO

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.


Assuntos
Variação Genética , Genoma Humano , Genômica/normas , Análise de Sequência de DNA/normas , Humanos , Padrões de Referência
18.
Bioinformatics ; 38(7): 2049-2051, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35020798

RESUMO

SUMMARY: The visualization and analysis of genomic repeats is typically accomplished using dot plots; however, the emergence of telomere-to-telomere assemblies with multi-megabase repeats requires new visualization strategies. Here, we introduce StainedGlass, which can generate publication-quality figures and interactive visualizations that depict the identity and orientation of multi-megabase tandem repeat structures at a genome-wide scale. The tool can rapidly reveal higher-order structures and improve the inference of evolutionary history for some of the most complex regions of genomes. AVAILABILITY AND IMPLEMENTATION: StainedGlass is implemented using Snakemake and available open source under the MIT license at https://mrvollger.github.io/StainedGlass/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Sequências de Repetição em Tandem , Evolução Biológica
19.
Nat Commun ; 12(1): 5118, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-34433829

RESUMO

TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.


Assuntos
Duplicação Gênica , Hominidae/genética , Proteínas de Membrana/genética , Seleção Genética , Animais , Variações do Número de Cópias de DNA , Evolução Molecular , Genoma Humano , Haplótipos , Humanos , Homem de Neandertal , Filogenia
20.
Nature ; 593(7857): 101-107, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33828295

RESUMO

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.


Assuntos
Cromossomos Humanos Par 8/química , Cromossomos Humanos Par 8/genética , Evolução Molecular , Animais , Linhagem Celular , Centrômero/química , Centrômero/genética , Centrômero/metabolismo , Cromossomos Humanos Par 8/fisiologia , Metilação de DNA , DNA Satélite/genética , Epigênese Genética , Feminino , Humanos , Macaca mulatta/genética , Masculino , Repetições Minissatélites/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telômero/química , Telômero/genética , Telômero/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA