Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers.

Hofmann, Ariane L; Behr, Jonas; Singer, Jochen; Kuipers, Jack; Beisel, Christian; Schraml, Peter; Moch, Holger; Beerenwinkel, Niko

Hofmann, Ariane L; Behr, Jonas; Singer, Jochen; Kuipers, Jack; Beisel, Christian; Schraml, Peter; Moch, Holger; Beerenwinkel, Niko.

Afiliação

Hofmann AL; Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.
Behr J; Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.
Singer J; Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.
Kuipers J; Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.
Beisel C; Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.
Schraml P; Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.
Moch H; Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.
Beerenwinkel N; Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.

BMC Bioinformatics ; 18(1): 8, 2017 Jan 03.

Article em En | MEDLINE | ID: mdl-28049408

RESUMO

BACKGROUND: Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant. RESULTS: Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision. CONCLUSIONS: The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.

Assuntos

Exoma/genética; Neoplasias Renais/genética; Algoritmos; DNA de Neoplasias/química; DNA de Neoplasias/metabolismo; Genômica; Sequenciamento de Nucleotídeos em Larga Escala; Humanos; Neoplasias Renais/patologia; Polimorfismo de Nucleotídeo Único; Análise de Sequência de DNA

Palavras-chave

Cancer genomics; Exome sequencing; SNV; Variant caller integration; Variant calling

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Exoma / Neoplasias Renais Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2017 Tipo de documento: Article País de afiliação: Suíça

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google