Your browser doesn't support javascript.
loading
Evaluating NetMHCpan performance on non-European HLA alleles not present in training data.
Atkins, Thomas Karl; Solanki, Arnav; Vasmatzis, George; Cornette, James; Riedel, Marc.
Affiliation
  • Atkins TK; Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, United States.
  • Solanki A; Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, United States.
  • Vasmatzis G; Biomarker Discovery Group, Mayo Clinic, Center for Individualized Medicine, Rochester, MN, United States.
  • Cornette J; Department of Mathematics, Iowa State University, Ames, IA, United States.
  • Riedel M; Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, United States.
Front Immunol ; 14: 1288105, 2023.
Article in En | MEDLINE | ID: mdl-38292493
ABSTRACT
Bias in neural network model training datasets has been observed to decrease prediction accuracy for groups underrepresented in training data. Thus, investigating the composition of training datasets used in machine learning models with healthcare applications is vital to ensure equity. Two such machine learning models are NetMHCpan-4.1 and NetMHCIIpan-4.0, used to predict antigen binding scores to major histocompatibility complex class I and II molecules, respectively. As antigen presentation is a critical step in mounting the adaptive immune response, previous work has used these or similar predictions models in a broad array of applications, from explaining asymptomatic viral infection to cancer neoantigen prediction. However, these models have also been shown to be biased toward hydrophobic peptides, suggesting the network could also contain other sources of bias. Here, we report the composition of the networks' training datasets are heavily biased toward European Caucasian individuals and against Asian and Pacific Islander individuals. We test the ability of NetMHCpan-4.1 and NetMHCpan-4.0 to distinguish true binders from randomly generated peptides on alleles not included in the training datasets. Unexpectedly, we fail to find evidence that the disparities in training data lead to a meaningful difference in prediction quality for alleles not present in the training data. We attempt to explain this result by mapping the HLA sequence space to determine the sequence diversity of the training dataset. Furthermore, we link the residues which have the greatest impact on NetMHCpan predictions to structural features for three alleles (HLA-A*3401, HLA-C*0403, HLA-DRB1*1202).
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genes, MHC Class I / Histocompatibility Antigens Class I Type of study: Prognostic_studies Limits: Humans Language: En Journal: Front Immunol Year: 2023 Document type: Article Affiliation country: Estados Unidos

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Genes, MHC Class I / Histocompatibility Antigens Class I Type of study: Prognostic_studies Limits: Humans Language: En Journal: Front Immunol Year: 2023 Document type: Article Affiliation country: Estados Unidos