Improving antibody language models with native pairing.
Patterns (N Y)
; 5(5): 100967, 2024 May 10.
Article
en En
| MEDLINE
| ID: mdl-38800360
ABSTRACT
Existing antibody language models are limited by their use of unpaired antibody sequence data. A recently published dataset of â¼1.6 × 106 natively paired human antibody sequences offers a unique opportunity to evaluate how antibody language models are improved by training with native pairs. We trained three baseline antibody language models (BALM), using natively paired (BALM-paired), randomly-paired (BALM-shuffled), or unpaired (BALM-unpaired) sequences from this dataset. To address the paucity of paired sequences, we additionally fine-tuned ESM (evolutionary scale modeling)-2 with natively paired antibody sequences (ft-ESM). We provide evidence that training with native pairs allows the model to learn immunologically relevant features that span the light and heavy chains, which cannot be simulated by training with random pairs. We additionally show that training with native pairs improves model performance on a variety of metrics, including the ability of the model to classify antibodies by pathogen specificity.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
Patterns (N Y)
Año:
2024
Tipo del documento:
Article
País de afiliación:
Estados Unidos