Efficient HLA imputation from sequential SNPs data by transformer.

Tanaka, Kaho; Kato, Kosuke; Nonaka, Naoki; Seita, Jun

Tanaka, Kaho; Kato, Kosuke; Nonaka, Naoki; Seita, Jun.

Afiliación

Tanaka K; Faculty of Engineering, Kyoto University, Kyoto, Japan.
Kato K; Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Tokyo, Japan.
Nonaka N; Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Tokyo, Japan.
Seita J; Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Tokyo, Japan.

J Hum Genet ; 69(10): 533-540, 2024 Oct.

Article en En | MEDLINE | ID: mdl-39095607

ABSTRACT

ABSTRACT

Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEP*HLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named "HLA Reliable IMpuatioN by Transformer (HLARIMNT)" designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT's performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEP*HLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.

Asunto(s)

Alelos; Antígenos HLA; Polimorfismo de Nucleótido Simple; Humanos; Antígenos HLA/genética; Frecuencia de los Genes; Diabetes Mellitus Tipo 1/genética; Algoritmos; Redes Neurales de la Computación; Estudio de Asociación del Genoma Completo/métodos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Polimorfismo de Nucleótido Simple / Alelos / Antígenos HLA Límite: Humans Idioma: En Revista: J Hum Genet / J. hum. genet / Journal of human genetics Asunto de la revista: GENETICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Japón

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google