Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros











Intervalo de año de publicación
1.
Nature ; 611(7936): 519-531, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36261518

RESUMEN

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Asunto(s)
Mapeo Cromosómico , Diploidia , Genoma Humano , Genómica , Humanos , Mapeo Cromosómico/normas , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Estándares de Referencia , Genómica/métodos , Genómica/normas , Cromosomas Humanos/genética , Variación Genética/genética
2.
Nature ; 585(7823): 79-84, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32663838

RESUMEN

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.


Asunto(s)
Cromosomas Humanos X/genética , Genoma Humano/genética , Telómero/genética , Centrómero/genética , Islas de CpG/genética , Metilación de ADN , ADN Satélite/genética , Femenino , Humanos , Mola Hidatiforme/genética , Masculino , Embarazo , Reproducibilidad de los Resultados , Testículo/metabolismo
3.
Nat Methods ; 16(1): 88-94, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30559433

RESUMEN

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.


Asunto(s)
Biología Computacional , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN/métodos , Genoma Humano , Humanos , Anotación de Secuencia Molecular
4.
Genome Res ; 24(12): 2066-76, 2014 12.
Artículo en Inglés | MEDLINE | ID: mdl-25373144

RESUMEN

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.


Asunto(s)
Genoma Humano , Haplotipos , Mola Hidatiforme/genética , Alelos , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos , Biología Computacional/métodos , Femenino , Genómica/métodos , Heterocigoto , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Embarazo , Secuencias Repetitivas de Ácidos Nucleicos , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA