Pesquisa | Secretaria de Estado da Saúde

Benchmarking variant callers in next-generation and third-generation sequencing analysis.

Pei, Surui; Liu, Tao; Ren, Xue; Li, Weizhong; Chen, Chongjian; Xie, Zhi.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-32698196

RESUMO

DNA variants represent an important source of genetic variations among individuals. Next- generation sequencing (NGS) is the most popular technology for genome-wide variant calling. Third-generation sequencing (TGS) has also recently been used in genetic studies. Although many variant callers are available, no single caller can call both types of variants on NGS or TGS data with high sensitivity and specificity. In this study, we systematically evaluated 11 variant callers on 12 NGS and TGS datasets. For germline variant calling, we tested DNAseq and DNAscope modes from Sentieon, HaplotypeCaller mode from GATK and WGS mode from DeepVariant. All the four callers had comparable performance on NGS data and 30× coverage of WGS data was recommended. For germline variant calling on TGS data, we tested DNAseq mode from Sentieon, HaplotypeCaller mode from GATK and PACBIO mode from DeepVariant. All the three callers had similar performance in SNP calling, while DeepVariant outperformed the others in InDel calling. TGS detected more variants than NGS, particularly in complex and repetitive regions. For somatic variant calling on NGS, we tested TNscope and TNseq modes from Sentieon, MuTect2 mode from GATK, NeuSomatic, VarScan2, and Strelka2. TNscope and Mutect2 outperformed the other callers. A higher proportion of tumor sample purity (from 10 to 20%) significantly increased the recall value of calling. Finally, computational costs of the callers were compared and Sentieon required the least computational cost. These results suggest that careful selection of a tool and parameters is needed for accurate SNP or InDel calling under different scenarios.

Assuntos

Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Benchmarking , Linhagem Celular , Biologia Computacional , Feminino , Genoma Humano , Humanos

Analyzing Low-Level mtDNA Heteroplasmy-Pitfalls and Challenges from Bench to Benchmarking.

Fazzini, Federica; Fendt, Liane; Schönherr, Sebastian; Forer, Lukas; Schöpf, Bernd; Streiter, Gertraud; Losso, Jamie Lee; Kloss-Brandstätter, Anita; Kronenberg, Florian; Weissensteiner, Hansi.

Int J Mol Sci ; 22(2)2021 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-33477827

RESUMO

Massive parallel sequencing technologies are promising a highly sensitive detection of low-level mutations, especially in mitochondrial DNA (mtDNA) studies. However, processes from DNA extraction and library construction to bioinformatic analysis include several varying tasks. Further, there is no validated recommendation for the comprehensive procedure. In this study, we examined potential pitfalls on the sequencing results based on two-person mtDNA mixtures. Therefore, we compared three DNA polymerases, six different variant callers in five mixtures between 50% and 0.5% variant allele frequencies generated with two different amplification protocols. In total, 48 samples were sequenced on Illumina MiSeq. Low-level variant calling at the 1% variant level and below was performed by comparing trimming and PCR duplicate removal as well as six different variant callers. The results indicate that sensitivity, specificity, and precision highly depend on the investigated polymerase but also vary based on the analysis tools. Our data highlight the advantage of prior standardization and validation of the individual laboratory setup with a DNA mixture model. Finally, we provide an artificial heteroplasmy benchmark dataset that can help improve somatic variant callers or pipelines, which may be of great interest for research related to cancer and aging.

Assuntos

Envelhecimento/genética , DNA Mitocondrial/genética , DNA Polimerase Dirigida por DNA/genética , Heteroplasmia/genética , Benchmarking , Predisposição Genética para Doença , Variação Genética/genética , Genoma Mitocondrial/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mitocôndrias/genética , Mutação/genética , Análise de Sequência de DNA

Evaluation of variant calling algorithms for wastewater-based epidemiology using mixed populations of SARS-CoV-2 variants in synthetic and wastewater samples.

Bassano, Irene; Ramachandran, Vinoy K; Khalifa, Mohammad S; Lilley, Chris J; Brown, Mathew R; van Aerle, Ronny; Denise, Hubert; Rowe, William; George, Airey; Cairns, Edward; Wierzbicki, Claudia; Pickwell, Natalie D; Carlile, Matthew; Holmes, Nadine; Payne, Alexander; Loose, Matthew; Burke, Terry A; Paterson, Steve; Wade, Matthew J; Grimsley, Jasmine M S.

Microb Genom ; 9(4)2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-37074153

RESUMO

Wastewater-based epidemiology has been used extensively throughout the COVID-19 (coronavirus disease 19) pandemic to detect and monitor the spread and prevalence of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) and its variants. It has proven an excellent, complementary tool to clinical sequencing, supporting the insights gained and helping to make informed public-health decisions. Consequently, many groups globally have developed bioinformatics pipelines to analyse sequencing data from wastewater. Accurate calling of mutations is critical in this process and in the assignment of circulating variants; yet, to date, the performance of variant-calling algorithms in wastewater samples has not been investigated. To address this, we compared the performance of six variant callers (VarScan, iVar, GATK, FreeBayes, LoFreq and BCFtools), used widely in bioinformatics pipelines, on 19 synthetic samples with known ratios of three different SARS-CoV-2 variants of concern (VOCs) (Alpha, Beta and Delta), as well as 13 wastewater samples collected in London between the 15th and 18th December 2021. We used the fundamental parameters of recall (sensitivity) and precision (specificity) to confirm the presence of mutational profiles defining specific variants across the six variant callers. Our results show that BCFtools, FreeBayes and VarScan found the expected variants with higher precision and recall than GATK or iVar, although the latter identified more expected defining mutations than other callers. LoFreq gave the least reliable results due to the high number of false-positive mutations detected, resulting in lower precision. Similar results were obtained for both the synthetic and wastewater samples.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Vigilância Epidemiológica Baseada em Águas Residuárias , Águas Residuárias , Algoritmos

Using whole genome sequence to compare variant callers and breed differences of US sheep.

Stegemiller, Morgan R; Redden, Reid R; Notter, David R; Taylor, Todd; Taylor, J Bret; Cockett, Noelle E; Heaton, Michael P; Kalbfleisch, Theodore S; Murdoch, Brenda M.

Front Genet ; 13: 1060882, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36685812

RESUMO

As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa