RESUMEN
MOTIVATION: Neoantigen vaccines make use of tumor-specific mutations to enable the patient's immune system to recognize and eliminate cancer. Selecting vaccine elements, however, is a complex task which needs to take into account not only the underlying antigen presentation pathway but also tumor heterogeneity. RESULTS: Here, we present NeoAgDT, a two-step approach consisting of: (i) simulating individual cancer cells to create a digital twin of the patient's tumor cell population and (ii) optimizing the vaccine composition by integer linear programming based on this digital twin. NeoAgDT shows improved selection of experimentally validated neoantigens over ranking-based approaches in a study of seven patients. AVAILABILITY AND IMPLEMENTATION: The NeoAgDT code is published on Github: https://github.com/nec-research/neoagdt.
Asunto(s)
Antígenos de Neoplasias , Vacunas contra el Cáncer , Neoplasias , Programas Informáticos , Humanos , Vacunas contra el Cáncer/inmunología , Neoplasias/inmunología , Antígenos de Neoplasias/inmunología , Mutación , Simulación por Computador , Biología Computacional/métodos , AlgoritmosRESUMEN
MOTIVATION: We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides. RESULTS: Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR-peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Péptidos , Programas Informáticos , Secuencia de Aminoácidos , Receptores de Antígenos de Linfocitos T/genéticaRESUMEN
SUMMARY: The ability of a T cell to recognize foreign peptides is defined by a single α and a single ß hypervariable complementarity determining region (CDR3), which together form the T-cell receptor (TCR) heterodimer. In â¼30-35% of T cells, two α chains are expressed at the mRNA level but only one α chain is part of the functional TCR. This effect can also be observed for ß chains, although it is less common. The identification of functional α/ß chain pairs is instrumental in high-throughput characterization of therapeutic TCRs. TCRpair is the first method that predicts whether an α and ß chain pair forms a functional, HLA-A*02:01 specific TCR without requiring the sequence of a recognized peptide. By taking additional amino acids flanking the CDR3 regions into account, TCRpair achieves an AUC of 0.71. AVAILABILITY AND IMPLEMENTATION: TCRpair is implemented in Python using TensorFlow 2.0 and is freely available at https://www.github.com/amoesch/TCRpair. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Receptores de Antígenos de Linfocitos T alfa-beta , Receptores de Antígenos de Linfocitos T , Secuencia de Aminoácidos , Receptores de Antígenos de Linfocitos T alfa-beta/química , Receptores de Antígenos de Linfocitos T alfa-beta/genética , Receptores de Antígenos de Linfocitos T alfa-beta/metabolismo , Receptores de Antígenos de Linfocitos T/química , Linfocitos T/metabolismo , Regiones Determinantes de Complementariedad/química , Regiones Determinantes de Complementariedad/genética , Péptidos , Antígenos HLA-A/metabolismoRESUMEN
BACKGROUND: Adoptive immunotherapy offers great potential for treating many types of cancer but its clinical application is hampered by cross-reactive T cell responses in healthy human tissues, representing serious safety risks for patients. We previously developed a computational tool called Expitope for assessing cross-reactivity (CR) of antigens based on tissue-specific gene expression. However, transcript abundance only indirectly indicates protein expression. The recent availability of proteome-wide human protein abundance information now facilitates a more direct approach for CR prediction. Here we present a new version 2.0 of Expitope, which computes all naturally possible epitopes of a peptide sequence and the corresponding CR indices using both protein and transcript abundance levels weighted by a proposed hierarchy of importance of various human tissues. RESULTS: We tested the tool in two case studies: The first study quantitatively assessed the potential CR of the epitopes used for cancer immunotherapy. The second study evaluated HLA-A*02:01-restricted epitopes obtained from the Immune Epitope Database for different disease groups and demonstrated for the first time that there is a high variation in the background CR depending on the disease state of the host: compared to a healthy individual the CR index is on average two-fold higher for the autoimmune state, and five-fold higher for the cancer state. CONCLUSIONS: The ability to predict potential side effects in normal tissues helps in the development and selection of safer antigens, enabling more successful immunotherapy of cancer and other diseases.
Asunto(s)
Bases de Datos de Proteínas , Enfermedad , Epítopos de Linfocito T/inmunología , Inmunoterapia , Proteínas/inmunología , Programas Informáticos , Linfocitos T/inmunología , Antígenos de Histocompatibilidad Clase I/inmunología , Humanos , Internet , Fragmentos de Péptidos/inmunologíaRESUMEN
Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models' test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.
Asunto(s)
Péptidos , Receptores de Antígenos de Linfocitos T , Receptores de Antígenos de Linfocitos T/metabolismo , Unión Proteica , Péptidos/metabolismoRESUMEN
In the last years, immunotherapies have shown tremendous success as treatments for multiple types of cancer. However, there are still many obstacles to overcome in order to increase response rates and identify effective therapies for every individual patient. Since there are many possibilities to boost a patient's immune response against a tumor and not all can be covered, this review is focused on T cell receptor-mediated therapies. CD8+ T cells can detect and destroy malignant cells by binding to peptides presented on cell surfaces by MHC (major histocompatibility complex) class I molecules. CD4+ T cells can also mediate powerful immune responses but their peptide recognition by MHC class II molecules is more complex, which is why the attention has been focused on CD8+ T cells. Therapies based on the power of T cells can, on the one hand, enhance T cell recognition by introducing TCRs that preferentially direct T cells to tumor sites (so called TCR-T therapy) or through vaccination to induce T cells in vivo. On the other hand, T cell activity can be improved by immune checkpoint inhibition or other means that help create a microenvironment favorable for cytotoxic T cell activity. The manifold ways in which the immune system and cancer interact with each other require not only the use of large omics datasets from gene, to transcript, to protein, and to peptide but also make the application of machine learning methods inevitable. Currently, discovering and selecting suitable TCRs is a very costly and work intensive in vitro process. To facilitate this process and to additionally allow for highly personalized therapies that can simultaneously target multiple patient-specific antigens, especially neoepitopes, breakthrough computational methods for predicting antigen presentation and TCR binding are urgently required. Particularly, potential cross-reactivity is a major consideration since off-target toxicity can pose a major threat to patient safety. The current speed at which not only datasets grow and are made available to the public, but also at which new machine learning methods evolve, is assuring that computational approaches will be able to help to solve problems that immunotherapies are still facing.
RESUMEN
BACKGROUND: Human endogenous retroviruses (HERVs) are flanked by long terminal repeats (LTRs), which possess promoter activity and can therefore influence the expression of neighboring genes. HERV involvement in different types of cancer has already been thoroughly documented. However, so far there has been no systematic study of HERV expression patterns in a multitude of cell types in health and disease. In particular, the publication of the comprehensive ENCODE dataset has already facilitated many gene expression studies, but none so far focusing exclusively on HERVs. RESULTS: We present a comprehensive differential analysis of HERV expression based on ENCODE Tier 1 and Tier 2 RNA-seq data produced by Cold Spring Harbor Laboratories and the California Institute of Technology. This analysis was conducted for individual HERV loci and for entire HERV families in twelve different cell lines, of which six correspond to the normal condition and the other six represent cancer cell types. Although the principal component analysis revealed that the two groups of cells show distinguishable expression patterns, we were not able to link these differences to one or multiple particular HERV families. Two samples exhibit expression patterns, which are not similar to the corresponding cell lines of the other producing lab. Instead they show signs of cancer formation and expression of the pluripotency marker HERVH, despite being classified as a normal cell line and a differentiated cell, respectively. CONCLUSIONS: Our study demonstrates that ENCODE data are generally comparable between the different contributing labs and that the analysis of HERV elements can provide novel insights into differentiation and disease state of a cell that are easily overlooked when focusing on protein-coding genes. Our findings hint at a change in HERV expression during cancerogenesis.