Your browser doesn't support javascript.
loading
QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.
Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen.
Afiliação
  • Van der Borght K; Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. kvdborgh@its.jnj.com.
  • Thys K; Interuniversity Institute for Biostatistics and statistical Bioinformatics, Katholieke Universiteit Leuven, B-3000, Leuven, Belgium. kvdborgh@its.jnj.com.
  • Wetzels Y; Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. kthys@its.jnj.com.
  • Clement L; Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. ywetzel@its.jnj.com.
  • Verbist B; Ghent University, Applied Mathematics, Informatics and Statistics, B-9000, Ghent, Belgium. lieven.clement@ugent.be.
  • Reumers J; Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. bverbis2@its.jnj.com.
  • van Vlijmen H; Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. jreumers@its.jnj.com.
  • Aerssens J; Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. hvvlijme@its.jnj.com.
BMC Bioinformatics ; 16: 379, 2015 Nov 10.
Article em En | MEDLINE | ID: mdl-26554718
ABSTRACT

BACKGROUND:

Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http//sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset.

RESULTS:

For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers.

CONCLUSIONS:

We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Infecções por HIV / HIV-1 / Hepatite C / Hepacivirus / Polimorfismo de Nucleotídeo Único / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Infecções por HIV / HIV-1 / Hepatite C / Hepacivirus / Polimorfismo de Nucleotídeo Único / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article