Evaluation of vicinity-based hidden Markov models for genotype imputation.

Wang, Su; Kim, Miran; Jiang, Xiaoqian; Harmanci, Arif Ozgun

Wang, Su; Kim, Miran; Jiang, Xiaoqian; Harmanci, Arif Ozgun.

Afiliação

Wang S; Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
Kim M; Department of Mathematics, Hanyang University, Seoul, 04763, Republic of Korea.
Jiang X; Center for Secure Artificial Intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
Harmanci AO; Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA. arif.o.harmanci@uth.tmc.edu.

BMC Bioinformatics ; 23(1): 356, 2022 Aug 29.

Article em En | MEDLINE | ID: mdl-36038834

RESUMO

BACKGROUND: The decreasing cost of DNA sequencing has led to a great increase in our knowledge about genetic variation. While population-scale projects bring important insight into genotype-phenotype relationships, the cost of performing whole-genome sequencing on large samples is still prohibitive. In-silico genotype imputation coupled with genotyping-by-arrays is a cost-effective and accurate alternative for genotyping of common and uncommon variants. Imputation methods compare the genotypes of the typed variants with the large population-specific reference panels and estimate the genotypes of untyped variants by making use of the linkage disequilibrium patterns. Most accurate imputation methods are based on the Li-Stephens hidden Markov model, HMM, that treats the sequence of each chromosome as a mosaic of the haplotypes from the reference panel. RESULTS: Here we assess the accuracy of vicinity-based HMMs, where each untyped variant is imputed using the typed variants in a small window around itself (as small as 1 centimorgan). Locality-based imputation is used recently by machine learning-based genotype imputation approaches. We assess how the parameters of the vicinity-based HMMs impact the imputation accuracy in a comprehensive set of benchmarks and show that vicinity-based HMMs can accurately impute common and uncommon variants. CONCLUSIONS: Our results indicate that locality-based imputation models can be effectively used for genotype imputation. The parameter settings that we identified can be used in future methods and vicinity-based HMMs can be used for re-structuring and parallelizing new imputation methods. The source code for the vicinity-based HMM implementations is publicly available at https://github.com/harmancilab/LoHaMMer .

Assuntos

Polimorfismo de Nucleotídeo Único; Software; Estudo de Associação Genômica Ampla/métodos; Genótipo; Haplótipos; Desequilíbrio de Ligação; Análise de Sequência de DNA/métodos

Palavras-chave

ForwardBackward algorithm; Genotype imputation; Hidden Markov models; Viterbi algorithm

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Polimorfismo de Nucleotídeo Único Tipo de estudo: Health_economic_evaluation Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google