Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Nat Biotechnol ; 39(9): 1129-1140, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34504351

RESUMO

Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Pareamento Incorreto de Bases , Benchmarking , DNA/genética , DNA Bacteriano/genética , Genoma Bacteriano , Genoma Humano , Humanos
4.
Genet Med ; 23(9): 1673-1680, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34007000

RESUMO

PURPOSE: To evaluate the impact of technically challenging variants on the implementation, validation, and diagnostic yield of commonly used clinical genetic tests. Such variants include large indels, small copy-number variants (CNVs), complex alterations, and variants in low-complexity or segmentally duplicated regions. METHODS: An interlaboratory pilot study used synthetic specimens to assess detection of challenging variant types by various next-generation sequencing (NGS)-based workflows. One well-performing workflow was further validated and used in clinician-ordered testing of more than 450,000 patients. RESULTS: In the interlaboratory study, only 2 of 13 challenging variants were detected by all 10 workflows, and just 3 workflows detected all 13. Limitations were also observed among 11 less-challenging indels. In clinical testing, 21.6% of patients carried one or more pathogenic variants, of which 13.8% (17,561) were classified as technically challenging. These variants were of diverse types, affecting 556 of 1,217 genes across hereditary cancer, cardiovascular, neurological, pediatric, reproductive carrier screening, and other indicated tests. CONCLUSION: The analytic and clinical sensitivity of NGS workflows can vary considerably, particularly for prevalent, technically challenging variants. This can have important implications for the design and validation of tests (by laboratories) and the selection of tests (by clinicians) for a wide range of clinical indications.


Assuntos
Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Criança , Variações do Número de Cópias de DNA/genética , Humanos , Mutação INDEL/genética , Projetos Piloto
5.
Nat Biotechnol ; 39(3): 309-312, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33288905

RESUMO

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.


Assuntos
Cromossomos Humanos , Genoma Humano , Haplótipos , Algoritmos , Heterozigoto , Humanos , Polimorfismo de Nucleotídeo Único
6.
Nat Commun ; 11(1): 4794, 2020 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-32963235

RESUMO

Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.


Assuntos
Diploide , Complexo Principal de Histocompatibilidade/genética , Benchmarking , Linhagem Celular , Variação Genética , Genoma Humano , Haplótipos , Humanos
8.
Nat Biotechnol ; 38(9): 1044-1053, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32686750

RESUMO

De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.


Assuntos
Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos , Análise de Sequência de DNA/métodos , Algoritmos , Benchmarking , Cromossomos Humanos/genética , Aprendizado Profundo , Genômica , Antígenos HLA/genética , Haploidia , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Análise de Sequência de DNA/normas
9.
Genome Biol ; 21(1): 129, 2020 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-32487205

RESUMO

BACKGROUND: Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. RESULTS: Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. CONCLUSIONS: The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.


Assuntos
Genoma Humano , Humanos , Anotação de Sequência Molecular , Valores de Referência , Translocação Genética
10.
PLoS Comput Biol ; 16(6): e1007933, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32559231

RESUMO

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Heurística , Humanos , Mutação INDEL
11.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32541955

RESUMO

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Assuntos
Mutação em Linhagem Germinativa/genética , Mutação INDEL/genética , Diploide , Variação Estrutural do Genoma , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA
12.
Nat Biotechnol ; 37(10): 1155-1162, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31406327

RESUMO

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


Assuntos
DNA Circular/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Variação Genética , Haplótipos , Humanos
13.
Sci Data ; 6(1): 91, 2019 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-31201313

RESUMO

Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.


Assuntos
Grupo com Ancestrais do Continente Asiático/genética , Bases de Dados Genéticas , Genoma Humano , Núcleo Familiar , China , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Análise de Sequência de DNA
14.
Nat Biotechnol ; 37(5): 561-566, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30936564

RESUMO

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.


Assuntos
Benchmarking , Biologia Computacional/tendências , Genoma Humano/genética , Genômica/tendências , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único , Software/tendências
15.
Nat Biotechnol ; 37(5): 555-560, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30858580

RESUMO

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.


Assuntos
Benchmarking , Exoma/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Genômica/tendências , Células Germinativas , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software
16.
Nat Biotechnol ; 37(5): 567, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30899106

RESUMO

In the version of this article initially published online, two pairs of headings were switched with each other in Table 4: "Recall (PCR free)" was switched with "Recall (with PCR)," and "Precision (PCR free)" was switched with "Precision (with PCR)." The error has been corrected in the print, PDF and HTML versions of this article.

17.
J Mol Diagn ; 21(2): 318-329, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30610921

RESUMO

Orthogonal confirmation of next-generation sequencing (NGS)-detected germline variants is standard practice, although published studies have suggested that confirmation of the highest-quality calls may not always be necessary. The key question is how laboratories can establish criteria that consistently identify those NGS calls that require confirmation. Most prior studies addressing this question have had limitations: they have been generally of small scale, omitted statistical justification, and explored limited aspects of underlying data. The rigorous definition of criteria that separate high-accuracy NGS calls from those that may or may not be true remains a crucial issue. We analyzed five reference samples and over 80,000 patient specimens from two laboratories. Quality metrics were examined for approximately 200,000 NGS calls with orthogonal data, including 1662 false positives. A classification algorithm used these data to identify a battery of criteria that flag 100% of false positives as requiring confirmation (CI lower bound, 98.5% to 99.8%, depending on variant type) while minimizing the number of flagged true positives. These criteria identify false positives that the previously published criteria miss. Sampling analysis showed that smaller data sets resulted in less effective criteria. Our methodology for determining test- and laboratory-specific criteria can be generalized into a practical approach that can be used by laboratories to reduce the cost and time burdens of confirmation without affecting clinical accuracy.


Assuntos
Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Variação Genética/genética , Humanos , Análise de Sequência de DNA
18.
F1000Res ; 8: 1751, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-34386196

RESUMO

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

19.
J Mol Diagn ; 20(5): 583-590, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29959024

RESUMO

The National Institute of Standards and Technology has developed reference materials for five human genomes. DNA aliquots are available for purchase, and the data, analyses, and high-confidence small variant and homozygous reference calls are freely available on the web. These reference materials are useful for evaluating whole-genome sequencing methods and also can be used to benchmark targeted sequencing panels, which are used commonly in clinical settings. This article describes how to use the Genome in a Bottle samples to obtain performance metrics on any germline-targeted sequencing panel of interest, as well as the limitations of the reference materials. These materials are useful for understanding the limitations of, and optimizing, targeted sequencing panels and associated bioinformatics pipelines. Example figures are presented to illustrate ways to access the performance metrics of targeted sequencing panels, and a table of best practices is included.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequência de Bases , Feminino , Loci Gênicos , Humanos , Mutação INDEL/genética , Masculino , Polimorfismo de Nucleotídeo Único/genética , Padrões de Referência
20.
F1000Res ; 6: 1795, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29123647

RESUMO

The impact of structural variants (SVs) on a variety of organisms and diseases like cancer has become increasingly evident. Methods for SV detection when studying genomic differences across cells, individuals or populations are being actively developed. Currently, just a few methods are available to compare different SVs callsets, and no specialized methods are available to annotate SVs that account for the unique characteristics of these variant types. Here, we introduce SURVIVOR_ant, a tool that compares types and breakpoints for candidate SVs from different callsets and enables fast comparison of SVs to genomic features such as genes and repetitive regions, as well as to previously established SV datasets such as from the 1000 Genomes Project. As proof of concept we compared 16 SV callsets generated by different SV calling methods on a single genome, the Genome in a Bottle sample HG002 (Ashkenazi son), and annotated the SVs with gene annotations, 1000 Genomes Project SV calls, and four different types of repetitive regions. Computation time to annotate 134,528 SVs with 33,954 of annotations was 22 seconds on a laptop.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...