Your browser doesn't support javascript.
loading
GATK hard filtering: tunable parameters to improve variant calling for next generation sequencing targeted gene panel data.
De Summa, Simona; Malerba, Giovanni; Pinto, Rosamaria; Mori, Antonio; Mijatovic, Vladan; Tommasi, Stefania.
Afiliação
  • De Summa S; IRCCS-Istituto Tumori "Giovanni Paolo II", Molecular Genetics Laboratory, viale Orazio Flacco, 65, 70124, Bari, Italy.
  • Malerba G; Department of Neuroscience, Biomedicine and Movement Sciences, Section of Biology and Genetics, University of Verona, Strada Le Grazie 8, 37135, Verona, Italy. giovanni.malerba@univr.it.
  • Pinto R; IRCCS-Istituto Tumori "Giovanni Paolo II", Molecular Genetics Laboratory, viale Orazio Flacco, 65, 70124, Bari, Italy.
  • Mori A; Department of Neuroscience, Biomedicine and Movement Sciences, Section of Biology and Genetics, University of Verona, Strada Le Grazie 8, 37135, Verona, Italy.
  • Mijatovic V; Department of Neuroscience, Biomedicine and Movement Sciences, Section of Biology and Genetics, University of Verona, Strada Le Grazie 8, 37135, Verona, Italy.
  • Tommasi S; IRCCS-Istituto Tumori "Giovanni Paolo II", Molecular Genetics Laboratory, viale Orazio Flacco, 65, 70124, Bari, Italy.
BMC Bioinformatics ; 18(Suppl 5): 119, 2017 Mar 23.
Article em En | MEDLINE | ID: mdl-28361668
BACKGROUND: NGS technology represents a powerful alternative to the standard Sanger sequencing in the context of clinical setting. The proprietary software that are generally used for variant calling often depend on preset parameters that may not fit in a satisfactory manner for different genes. GATK, which is widely used in the academic world, is rich in parameters for variant calling. However the self-adjusting parameter calibration of GATK requires data from a large number of exomes. When these are not available, which is the standard condition of a diagnostic laboratory, the parameters must be set by the operator (hard filtering). The aim of the present paper was to set up a procedure to assess the best parameters to be used in the hard filtering of GATK. This was pursued by using classification trees on true and false variants from simulated sequences of a real dataset data. RESULTS: We simulated two datasets, with different coverages, including all the sequence alterations identified in a real dataset according to their observed frequencies. Simulated sequences were aligned with standard protocols and then regression trees were built up to identify the most reliable parameters and cutoff values to discriminate true and false variant calls. Moreover, we analyzed flanking sequences of region presenting a high rate of false positive calls observing that such sequences present a low complexity make up. CONCLUSIONS: Our results showed that GATK hard filtering parameter values can be tailored through a simulation study based-on the DNA region of interest to ameliorate the accuracy of the variant calling.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Polimorfismo Genético / Software / Análise de Sequência de DNA / Sequenciamento de Nucleotídeos em Larga Escala / Mutação Tipo de estudo: Guideline / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2017 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Polimorfismo Genético / Software / Análise de Sequência de DNA / Sequenciamento de Nucleotídeos em Larga Escala / Mutação Tipo de estudo: Guideline / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2017 Tipo de documento: Article