Improving variant calling using population data and deep learning.
BMC Bioinformatics
; 24(1): 197, 2023 May 12.
Article
em En
| MEDLINE
| ID: mdl-37173615
Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.
Texto completo:
1
Bases de dados:
MEDLINE
Assunto principal:
Aprendizado Profundo
Tipo de estudo:
Prognostic_studies
Limite:
Humans
Idioma:
En
Revista:
BMC Bioinformatics
Assunto da revista:
INFORMATICA MEDICA
Ano de publicação:
2023
Tipo de documento:
Article
País de afiliação:
Estados Unidos