Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations.

Broeckx, Bart J G; Peelman, Luc; Saunders, Jimmy H; Deforce, Dieter; Clement, Lieven

Broeckx, Bart J G; Peelman, Luc; Saunders, Jimmy H; Deforce, Dieter; Clement, Lieven.

Afiliação

Broeckx BJG; Laboratory of Animal Genetics, Faculty of Veterinary Medicine, Ghent University, Heidestraat 19, B-9820, Merelbeke, Belgium. bart.broeckx@ugent.be.
Peelman L; Laboratory of Animal Genetics, Faculty of Veterinary Medicine, Ghent University, Heidestraat 19, B-9820, Merelbeke, Belgium.
Saunders JH; Department of Medical Imaging and Orthopedics, Faculty of Veterinary Medicine, Ghent University, Merelbeke, Belgium.
Deforce D; Laboratory of Pharmaceutical Biotechnology, Faculty of Pharmaceutical Sciences, Ghent University, Ghent, Belgium.
Clement L; Department of Applied Mathematics, Computer Science and Statistics, Faculty of Sciences, Ghent University, Ghent, Belgium.

BMC Bioinformatics ; 18(1): 535, 2017 Dec 01.

Article em En | MEDLINE | ID: mdl-29191167

RESUMO

BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. RESULTS: Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. CONCLUSIONS: We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided.

Assuntos

Bases de Dados Genéticas; Estudos de Associação Genética; Alelos; Anormalidades Congênitas/genética; Anormalidades Congênitas/patologia; Frequência do Gene; Genótipo; Humanos; Fenótipo; Polimorfismo de Nucleotídeo Único

Palavras-chave

1000 Genomes project variant database; Allele frequency; HapMap; Variant database; Variant filtering; dbSNP

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Bases de Dados Genéticas / Estudos de Associação Genética Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2017 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google