The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

Grimm, Dominik G; Azencott, Chloé-Agathe; Aicheler, Fabian; Gieraths, Udo; MacArthur, Daniel G; Samocha, Kaitlin E; Cooper, David N; Stenson, Peter D; Daly, Mark J; Smoller, Jordan W; Duncan, Laramie E; Borgwardt, Karsten M

Grimm, Dominik G; Azencott, Chloé-Agathe; Aicheler, Fabian; Gieraths, Udo; MacArthur, Daniel G; Samocha, Kaitlin E; Cooper, David N; Stenson, Peter D; Daly, Mark J; Smoller, Jordan W; Duncan, Laramie E; Borgwardt, Karsten M.

Afiliación

Grimm DG; Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany; Zentrum für Bioinformatik, Eberhard Karls Universität Tübingen, Tübingen, Germany; Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.

Hum Mutat ; 36(5): 513-23, 2015 May.

Article en En | MEDLINE | ID: mdl-25684150

ABSTRACT

ABSTRACT

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

Asunto(s)

Biología Computacional/métodos; Mutación Missense; Programas Informáticos; Conjuntos de Datos como Asunto; Humanos; Internet; Reproducibilidad de los Resultados; Navegador Web

Palabras clave

exome sequencing; pathogenicity prediction tools

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Biología Computacional / Mutación Missense Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Hum Mutat Asunto de la revista: GENETICA MEDICA Año: 2015 Tipo del documento: Article País de afiliación: Suiza

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google