Búsqueda | Portal Regional de la BVS

Assessing reproducibility of inherited variants detected with short-read whole genome sequencing.

Pan, Bohu; Ren, Luyao; Onuchic, Vitor; Guan, Meijian; Kusko, Rebecca; Bruinsma, Steve; Trigg, Len; Scherer, Andreas; Ning, Baitang; Zhang, Chaoyang; Glidewell-Kenney, Christine; Xiao, Chunlin; Donaldson, Eric; Sedlazeck, Fritz J; Schroth, Gary; Yavas, Gokhan; Grunenwald, Haiying; Chen, Haodong; Meinholz, Heather; Meehan, Joe; Wang, Jing; Yang, Jingcheng; Foox, Jonathan; Shang, Jun; Miclaus, Kelci; Dong, Lianhua; Shi, Leming; Mohiyuddin, Marghoob; Pirooznia, Mehdi; Gong, Ping; Golshani, Rooz; Wolfinger, Russ; Lababidi, Samir; Sahraeian, Sayed Mohammad Ebrahim; Sherry, Steve; Han, Tao; Chen, Tao; Shi, Tieliu; Hou, Wanwan; Ge, Weigong; Zou, Wen; Guo, Wenjing; Bao, Wenjun; Xiao, Wenzhong; Fan, Xiaohui; Gondo, Yoichi; Yu, Ying; Zhao, Yongmei; Su, Zhenqiang; Liu, Zhichao.

Genome Biol ; 23(1): 2, 2022 01 03.

Artículo en Inglés | MEDLINE | ID: mdl-34980216

RESUMEN

BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.

Asunto(s)

Genoma Humano , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Reproducibilidad de los Resultados , Secuenciación Completa del Genoma

An open resource for accurately benchmarking small variant and reference calls.

Zook, Justin M; McDaniel, Jennifer; Olson, Nathan D; Wagner, Justin; Parikh, Hemang; Heaton, Haynes; Irvine, Sean A; Trigg, Len; Truty, Rebecca; McLean, Cory Y; De La Vega, Francisco M; Xiao, Chunlin; Sherry, Stephen; Salit, Marc.

Nat Biotechnol ; 37(5): 561-566, 2019 05.

Artículo en Inglés | MEDLINE | ID: mdl-30936564

RESUMEN

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.

Asunto(s)

Benchmarking , Biología Computacional/tendencias , Genoma Humano/genética , Genómica/tendencias , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL/genética , Polimorfismo de Nucleótido Simple , Programas Informáticos/tendencias

Author Correction: Best practices for benchmarking germline small-variant calls in human genomes.

Krusche, Peter; Trigg, Len; Boutros, Paul C; Mason, Christopher E; De La Vega, Francisco M; Moore, Benjamin L; Gonzalez-Porta, Mar; Eberle, Michael A; Tezak, Zivana; Lababidi, Samir; Truty, Rebecca; Asimenos, George; Funke, Birgit; Fleharty, Mark; Chapman, Brad A; Salit, Marc; Zook, Justin M.

Nat Biotechnol ; 37(5): 567, 2019 05.

Artículo en Inglés | MEDLINE | ID: mdl-30899106

RESUMEN

In the version of this article initially published online, two pairs of headings were switched with each other in Table 4: "Recall (PCR free)" was switched with "Recall (with PCR)," and "Precision (PCR free)" was switched with "Precision (with PCR)." The error has been corrected in the print, PDF and HTML versions of this article.

Best practices for benchmarking germline small-variant calls in human genomes.

Nat Biotechnol ; 37(5): 555-560, 2019 05.

Artículo en Inglés | MEDLINE | ID: mdl-30858580

RESUMEN

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.

Asunto(s)

Benchmarking , Exoma/genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Algoritmos , Genómica/tendencias , Células Germinativas , Humanos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos

Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data.

Cleary, John G; Braithwaite, Ross; Gaastra, Kurt; Hilbush, Brian S; Inglis, Stuart; Irvine, Sean A; Jackson, Alan; Littin, Richard; Nohzadeh-Malakshah, Sahar; Rathod, Mehul; Ware, David; Trigg, Len; De La Vega, Francisco M.

J Comput Biol ; 21(6): 405-19, 2014 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-24874280

RESUMEN

The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we present a Bayesian network framework that jointly analyzes data from all members of a pedigree simultaneously using Mendelian segregation priors, yet providing the ability to detect de novo mutations in offspring, and is scalable to large pedigrees. We evaluated our method by simulations and analysis of whole-genome sequencing (WGS) data from a 17-individual, 3-generation CEPH pedigree sequenced to 50× average depth. Compared with singleton calling, our family caller produced more high-quality variants and eliminated spurious calls as judged by common quality metrics such as Ti/Tv, Het/Hom ratios, and dbSNP/SNP array data concordance, and by comparing to ground truth variant sets available for this sample. We identify all previously validated de novo mutations in NA12878, concurrent with a 7× precision improvement. Our results show that our method is scalable to large genomics and human disease studies.

Asunto(s)

Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Linaje , Análisis Mutacional de ADN/métodos , Humanos

Data mining in bioinformatics using Weka.

Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H.

Bioinformatics ; 20(15): 2479-81, 2004 Oct 12.

Artículo en Inglés | MEDLINE | ID: mdl-15073010

RESUMEN

UNLABELLED: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. AVAILABILITY: http://www.cs.waikato.ac.nz/ml/weka.

Asunto(s)

Algoritmos , Inteligencia Artificial , Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Interfaz Usuario-Computador , Procesamiento de Lenguaje Natural , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA