Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Cell ; 176(4): 743-756.e17, 2019 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-30735633

RESUMEN

Direct comparisons of human and non-human primate brains can reveal molecular pathways underlying remarkable specializations of the human brain. However, chimpanzee tissue is inaccessible during neocortical neurogenesis when differences in brain size first appear. To identify human-specific features of cortical development, we leveraged recent innovations that permit generating pluripotent stem cell-derived cerebral organoids from chimpanzee. Despite metabolic differences, organoid models preserve gene regulatory networks related to primary cell types and developmental processes. We further identified 261 differentially expressed genes in human compared to both chimpanzee organoids and macaque cortex, enriched for recent gene duplications, and including multiple regulators of PI3K-AKT-mTOR signaling. We observed increased activation of this pathway in human radial glia, dependent on two receptors upregulated specifically in human: INSR and ITGB8. Our findings establish a platform for systematic analysis of molecular changes contributing to human brain development and evolution.


Asunto(s)
Corteza Cerebral/citología , Organoides/metabolismo , Animales , Evolución Biológica , Encéfalo/citología , Técnicas de Cultivo de Célula/métodos , Diferenciación Celular/genética , Corteza Cerebral/metabolismo , Redes Reguladoras de Genes/genética , Humanos , Células Madre Pluripotentes Inducidas/citología , Macaca , Neurogénesis/genética , Organoides/crecimiento & desarrollo , Pan troglodytes , Células Madre Pluripotentes/citología , Análisis de la Célula Individual , Especificidad de la Especie , Transcriptoma/genética
2.
Cell ; 173(6): 1356-1369.e22, 2018 05 31.
Artículo en Inglés | MEDLINE | ID: mdl-29856954

RESUMEN

Genetic changes causing brain size expansion in human evolution have remained elusive. Notch signaling is essential for radial glia stem cell proliferation and is a determinant of neuronal number in the mammalian cortex. We find that three paralogs of human-specific NOTCH2NL are highly expressed in radial glia. Functional analysis reveals that different alleles of NOTCH2NL have varying potencies to enhance Notch signaling by interacting directly with NOTCH receptors. Consistent with a role in Notch signaling, NOTCH2NL ectopic expression delays differentiation of neuronal progenitors, while deletion accelerates differentiation into cortical neurons. Furthermore, NOTCH2NL genes provide the breakpoints in 1q21.1 distal deletion/duplication syndrome, where duplications are associated with macrocephaly and autism and deletions with microcephaly and schizophrenia. Thus, the emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, accompanied by loss of genomic stability at the 1q21.1 locus and resulting recurrent neurodevelopmental disorders.


Asunto(s)
Encéfalo/embriología , Corteza Cerebral/fisiología , Neurogénesis/fisiología , Receptor Notch2/metabolismo , Transducción de Señal , Animales , Diferenciación Celular , Células Madre Embrionarias/metabolismo , Femenino , Eliminación de Gen , Genes Reporteros , Gorilla gorilla , Células HEK293 , Humanos , Neocórtex/citología , Células-Madre Neurales/metabolismo , Neuroglía/metabolismo , Neuronas/metabolismo , Pan troglodytes , Receptor Notch2/genética , Análisis de Secuencia de ARN
3.
Nature ; 594(7861): 77-81, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33953399

RESUMEN

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.


Asunto(s)
Evolución Molecular , Genoma/genética , Genómica , Pan paniscus/genética , Filogenia , Animales , Factor 4A Eucariótico de Iniciación/genética , Femenino , Genes , Gorilla gorilla/genética , Anotación de Secuencia Molecular/normas , Pan troglodytes/genética , Pongo/genética , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN
4.
Nature ; 587(7833): 246-251, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33177663

RESUMEN

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.


Asunto(s)
Genoma/genética , Genómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Vertebrados/genética , Amnios , Animales , Simulación por Computador , Genómica/normas , Haplotipos , Humanos , Control de Calidad , Alineación de Secuencia/normas , Programas Informáticos/normas
5.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33270111

RESUMEN

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Asunto(s)
COVID-19/prevención & control , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular/métodos , SARS-CoV-2/genética , Animales , COVID-19/epidemiología , COVID-19/virología , Epidemias , Humanos , Internet , Ratones , Seudogenes/genética , ARN Largo no Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Transcripción Genética/genética
6.
Genome Res ; 29(4): 635-645, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30894395

RESUMEN

Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Polimorfismo Genético , Secuenciación Completa del Genoma/métodos , Línea Celular , Genoma Humano , Humanos , Péptidos y Proteínas de Señalización Intercelular , Proteínas de la Membrana/genética , Proteína 1 para la Supervivencia de la Neurona Motora/genética , Proteína 2 para la Supervivencia de la Neurona Motora/genética
7.
Genome Res ; 28(7): 1029-1038, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29884752

RESUMEN

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Asunto(s)
Genoma Humano/genética , Algoritmos , Animales , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Anotación de Secuencia Molecular/métodos , ARN/genética , Ratas
8.
Bioinformatics ; 36(12): 3905-3906, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32330223

RESUMEN

SUMMARY: Bulk RNA sequencing studies have demonstrated that human leukocyte antigen (HLA) genes may be expressed in a cell type-specific and allele-specific fashion. Single-cell gene expression assays have the potential to further resolve these expression patterns, but currently available methods do not perform allele-specific quantification at the molecule level. Here, we present scHLAcount, a post-processing workflow for single-cell RNA-seq data that computes allele-specific molecule counts of the HLA genes based on a personalized reference constructed from the sample's HLA genotypes. AVAILABILITY AND IMPLEMENTATION: scHLAcount is available under the MIT license at https://github.com/10XGenomics/scHLAcount. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Alelos , Expresión Génica , Humanos , Análisis de Secuencia de ARN , Flujo de Trabajo
9.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357393

RESUMEN

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano/genética , Genómica , Seudogenes/genética , Animales , Biología Computacional , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Programas Informáticos
10.
Nucleic Acids Res ; 46(D1): D762-D769, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29106570

RESUMEN

The UCSC Genome Browser (https://genome.ucsc.edu) provides a web interface for exploring annotated genome assemblies. The assemblies and annotation tracks are updated on an ongoing basis-12 assemblies and more than 28 tracks were added in the past year. Two recent additions are a display of CRISPR/Cas9 guide sequences and an interactive navigator for gene interactions. Other upgrades from the past year include a command-line version of the Variant Annotation Integrator, support for Human Genome Variation Society variant nomenclature input and output, and a revised highlighting tool that now supports multiple simultaneous regions and colors.


Asunto(s)
Bases de Datos Genéticas , Genoma , Navegador Web , Sistemas CRISPR-Cas , Presentación de Datos , Redes Reguladoras de Genes , Genoma Humano , Humanos , Anotación de Secuencia Molecular , Terminología como Asunto , Interfaz Usuario-Computador
11.
Hum Genet ; 138(7): 715-721, 2019 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31087184

RESUMEN

Sequences encoding Olduvai (DUF1220) protein domains show the largest human-specific increase in copy number of any coding region in the genome and have been linked to human brain evolution. Most human-specific copies of Olduvai (119/165) are encoded by three NBPF genes that are adjacent to three human-specific NOTCH2NL genes that have been shown to promote cortical neurogenesis. Here, employing genomic, phylogenetic, and transcriptomic evidence, we show that these NOTCH2NL/NBPF gene pairs evolved jointly, as two-gene units, very recently in human evolution, and are likely co-regulated. Remarkably, while three NOTCH2NL paralogs were added, adjacent Olduvai sequences hyper-amplified, adding 119 human-specific copies. The data suggest that human-specific Olduvai domains and adjacent NOTCH2NL genes may function in a coordinated, complementary fashion to promote neurogenesis and human brain expansion in a dosage-related manner.


Asunto(s)
Evolución Biológica , Encéfalo/crecimiento & desarrollo , Encéfalo/metabolismo , Proteínas Portadoras/genética , Genoma Humano , Receptor Notch2/genética , Genómica , Humanos , Filogenia , Dominios Proteicos
12.
Nat Methods ; 12(4): 351-6, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25686389

RESUMEN

Speed, single-base sensitivity and long read lengths make nanopores a promising technology for high-throughput sequencing. We evaluated and optimized the performance of the MinION nanopore sequencer using M13 genomic DNA and used expectation maximization to obtain robust maximum-likelihood estimates for insertion, deletion and substitution error rates (4.9%, 7.8% and 5.1%, respectively). Over 99% of high-quality 2D MinION reads mapped to the reference at a mean identity of 85%. We present a single-nucleotide-variant detection tool that uses maximum-likelihood parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer-testis gene family (CT47) within an unresolved region of human chromosome Xq24.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Nanoporos , Algoritmos , Dosificación de Gen , Humanos , Neoplasias/genética
13.
Genome Biol Evol ; 16(3)2024 03 02.
Artículo en Inglés | MEDLINE | ID: mdl-38319079

RESUMEN

Reptiles exhibit a variety of modes of sex determination, including both temperature-dependent and genetic mechanisms. Among those species with genetic sex determination, sex chromosomes of varying heterogamety (XX/XY and ZZ/ZW) have been observed with different degrees of differentiation. Karyotype studies have demonstrated that Gila monsters (Heloderma suspectum) have ZZ/ZW sex determination and this system is likely homologous to the ZZ/ZW system in the Komodo dragon (Varanus komodoensis), but little else is known about their sex chromosomes. Here, we report the assembly and analysis of the Gila monster genome. We generated a de novo draft genome assembly for a male using 10X Genomics technology. We further generated and analyzed short-read whole genome sequencing and whole transcriptome sequencing data for three males and three females. By comparing female and male genomic data, we identified four putative Z chromosome scaffolds. These putative Z chromosome scaffolds are homologous to Z-linked scaffolds identified in the Komodo dragon. Further, by analyzing RNAseq data, we observed evidence of incomplete dosage compensation between the Gila monster Z chromosome and autosomes and a lack of balance in Z-linked expression between the sexes. In particular, we observe lower expression of the Z in females (ZW) than males (ZZ) on a global basis, though we find evidence suggesting local gene-by-gene compensation. This pattern has been observed in most other ZZ/ZW systems studied to date and may represent a general pattern for female heterogamety in vertebrates.


Asunto(s)
Animales Ponzoñosos , Heloderma suspectum , Lagartos , Animales , Masculino , Femenino , Lagartos/genética , Cromosomas Sexuales/genética , Cariotipo , Compensación de Dosificación (Genética)
14.
bioRxiv ; 2023 Apr 28.
Artículo en Inglés | MEDLINE | ID: mdl-37163099

RESUMEN

Reptiles exhibit a variety of modes of sex determination, including both temperature-dependent and genetic mechanisms. Among those species with genetic sex determination, sex chromosomes of varying heterogamety (XX/XY and ZZ/ZW) have been observed with different degrees of differentiation. Karyotype studies have demonstrated that Gila monsters (Heloderma suspectum) have ZZ/ZW sex determination and this system is likely homologous to the ZZ/ZW system in the Komodo dragon (Varanus komodoensis), but little else is known about their sex chromosomes. Here, we report the assembly and analysis of the Gila monster genome. We generated a de novo draft genome assembly for a male using 10X Genomics technology. We further generated and analyzed short-read whole genome sequencing and whole transcriptome sequencing data for three males and three females. By comparing female and male genomic data, we identified four putative Z-chromosome scaffolds. These putative Z-chromosome scaffolds are homologous to Z-linked scaffolds identified in the Komodo dragon. Further, by analyzing RNAseq data, we observed evidence of incomplete dosage compensation between the Gila monster Z chromosome and autosomes and a lack of balance in Z-linked expression between the sexes. In particular, we observe lower expression of the Z in females (ZW) than males (ZZ) on a global basis, though we find evidence suggesting local gene-by-gene compensation. This pattern has been observed in most other ZZ/ZW systems studied to date and may represent a general pattern for female heterogamety in vertebrates.

15.
Science ; 376(6588): 44-53, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357919

RESUMEN

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Asunto(s)
Genoma Humano , Proyecto Genoma Humano , Análisis de Secuencia de ADN/normas , Línea Celular , Cromosomas Artificiales Bacterianos/genética , Cromosomas Humanos/genética , Humanos , Valores de Referencia
16.
Cell Genom ; 2(5)2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-36452119

RESUMEN

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.

17.
Commun Biol ; 3(1): 318, 2020 06 25.
Artículo en Inglés | MEDLINE | ID: mdl-32587328

RESUMEN

We performed shallow single-cell sequencing of genomic DNA across 1475 cells from a cell-line, COLO829, to resolve overall complexity and clonality. This melanoma tumor-line has been previously characterized by multiple technologies and is a benchmark for evaluating somatic alterations. In some of these studies, COLO829 has shown conflicting and/or indeterminate copy number and, thus, single-cell sequencing provides a tool for gaining insight. Following shallow single-cell sequencing, we first identified at least four major sub-clones by discriminant analysis of principal components of single-cell copy number data. Based on clustering, break-point and loss of heterozygosity analysis of aggregated data from sub-clones, we identified distinct hallmark events that were validated within bulk sequencing and spectral karyotyping. In summary, COLO829 exhibits a classical Dutrillaux's monosomic/trisomic pattern of karyotype evolution with endoreduplication, where consistent sub-clones emerge from the loss/gain of abnormal chromosomes. Overall, our results demonstrate how shallow copy number profiling can uncover hidden biological insights.


Asunto(s)
Melanoma/genética , Melanoma/patología , Análisis de la Célula Individual/métodos , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Humanos , Cariotipificación , Pérdida de Heterocigocidad , Análisis de Secuencia de ADN
18.
Nat Biotechnol ; 38(8): 954-961, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32231336

RESUMEN

Single-cell CRISPR screens enable the exploration of mammalian gene function and genetic regulatory networks. However, use of this technology has been limited by reliance on indirect indexing of single-guide RNAs (sgRNAs). Here we present direct-capture Perturb-seq, a versatile screening approach in which expressed sgRNAs are sequenced alongside single-cell transcriptomes. Direct-capture Perturb-seq enables detection of multiple distinct sgRNA sequences from individual cells and thus allows pooled single-cell CRISPR screens to be easily paired with combinatorial perturbation libraries that contain dual-guide expression vectors. We demonstrate the utility of this approach for high-throughput investigations of genetic interactions and, leveraging this ability, dissect epistatic interactions between cholesterol biogenesis and DNA repair. Using direct capture Perturb-seq, we also show that targeting individual genes with multiple sgRNAs per cell improves efficacy of CRISPR interference and activation, facilitating the use of compact, highly active CRISPR libraries for single-cell screens. Last, we show that hybridization-based target enrichment permits sensitive, specific sequencing of informative transcripts from single-cell RNA-seq experiments.


Asunto(s)
Sistemas CRISPR-Cas , Técnicas de Amplificación de Ácido Nucleico/métodos , ARN Guía de Kinetoplastida/genética , Regulación de la Expresión Génica , Marcación de Gen , Células HEK293 , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de la Célula Individual , Transcriptoma
20.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32541955

RESUMEN

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Asunto(s)
Mutación de Línea Germinal/genética , Mutación INDEL/genética , Diploidia , Variación Estructural del Genoma , Humanos , Anotación de Secuencia Molecular , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA