RESUMEN
T cell alloreactivity against minor histocompatibility antigens (mHAgs)-polymorphic peptides resulting from donor-recipient (D-R) disparity at sites of genetic polymorphisms-is at the core of the therapeutic effect of allogeneic hematopoietic cell transplantation (allo-HCT). Despite the crucial role of mHAgs in graft-versus-leukemia (GvL) and graft-versus-host disease (GvHD) reactions, it remains challenging to consistently link patient-specific mHAg repertoires to clinical outcomes. Here we devise an analytic framework to systematically identify mHAgs, including their detection on HLA class I ligandomes and functional verification of their immunogenicity. The method relies on the integration of polymorphism detection by whole-exome sequencing of germline DNA from D-R pairs with organ-specific transcriptional- and proteome-level expression. Application of this pipeline to 220 HLA-matched allo-HCT D-R pairs demonstrated that total and organ-specific mHAg load could independently predict the occurrence of acute GvHD and chronic pulmonary GvHD, respectively, and defined promising GvL targets, confirmed in a validation cohort of 58 D-R pairs, for the prevention or treatment of post-transplant disease recurrence.
RESUMEN
The organization of immune cells in human tumors is not well understood. Immunogenic tumors harbor spatially localized multicellular 'immunity hubs' defined by expression of the T cell-attracting chemokines CXCL10/CXCL11 and abundant T cells. Here, we examined immunity hubs in human pre-immunotherapy lung cancer specimens and found an association with beneficial response to PD-1 blockade. Critically, we discovered the stem-immunity hub, a subtype of immunity hub strongly associated with favorable PD-1-blockade outcome. This hub is distinct from mature tertiary lymphoid structures and is enriched for stem-like TCF7+PD-1+CD8+ T cells, activated CCR7+LAMP3+ dendritic cells and CCL19+ fibroblasts as well as chemokines that organize these cells. Within the stem-immunity hub, we find preferential interactions between CXCL10+ macrophages and TCF7-CD8+ T cells as well as between mature regulatory dendritic cells and TCF7+CD4+ and regulatory T cells. These results provide a picture of the spatial organization of the human intratumoral immune response and its relevance to patient immunotherapy outcomes.
Asunto(s)
Neoplasias Pulmonares , Humanos , Linfocitos T CD8-positivos , Receptor de Muerte Celular Programada 1 , Quimiocinas/metabolismo , Inmunoterapia/métodos , Microambiente TumoralRESUMEN
Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Isoformas de ARN , ADN Complementario/genética , Isoformas de ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos , ARN/genéticaRESUMEN
Targeted synthetic vaccines have the potential to transform our response to viral outbreaks, yet the design of these vaccines requires a comprehensive knowledge of viral immunogens. Here, we report severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) peptides that are naturally processed and loaded onto human leukocyte antigen-II (HLA-II) complexes in infected cells. We identify over 500 unique viral peptides from canonical proteins as well as from overlapping internal open reading frames. Most HLA-II peptides colocalize with known CD4+ T cell epitopes in coronavirus disease 2019 patients, including 2 reported immunodominant regions in the SARS-CoV-2 membrane protein. Overall, our analyses show that HLA-I and HLA-II pathways target distinct viral proteins, with the structural proteins accounting for most of the HLA-II peptidome and nonstructural and noncanonical proteins accounting for the majority of the HLA-I peptidome. These findings highlight the need for a vaccine design that incorporates multiple viral elements harboring CD4+ and CD8+ T cell epitopes to maximize vaccine effectiveness.
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Epítopos de Linfocito T , Antígenos de Histocompatibilidad Clase I , Antígenos HLA , Antígenos de Histocompatibilidad , Linfocitos T CD8-positivos , PéptidosRESUMEN
Targeted synthetic vaccines have the potential to transform our response to viral outbreaks; yet the design of these vaccines requires a comprehensive knowledge of viral immunogens, including T-cell epitopes. Having previously mapped the SARS-CoV-2 HLA-I landscape, here we report viral peptides that are naturally processed and loaded onto HLA-II complexes in infected cells. We identified over 500 unique viral peptides from canonical proteins, as well as from overlapping internal open reading frames (ORFs), revealing, for the first time, the contribution of internal ORFs to the HLA-II peptide repertoire. Most HLA-II peptides co-localized with the known CD4+ T cell epitopes in COVID-19 patients. We also observed that two reported immunodominant regions in the SARS-CoV-2 membrane protein are formed at the level of HLA-II presentation. Overall, our analyses show that HLA-I and HLA-II pathways target distinct viral proteins, with the structural proteins accounting for most of the HLA-II peptidome and non-structural and non-canonical proteins accounting for the majority of the HLA-I peptidome. These findings highlight the need for a vaccine design that incorporates multiple viral elements harboring CD4+ and CD8+ T cell epitopes to maximize the vaccine effectiveness.
RESUMEN
The organization of immune cells in human tumors is not well understood. Immunogenic tumors harbor spatially-localized multicellular 'immunity hubs' defined by expression of the T cell-attracting chemokines CXCL10/CXCL11 and abundant T cells. Here, we examined immunity hubs in human pre-immunotherapy lung cancer specimens, and found that they were associated with beneficial responses to PD-1-blockade. Immunity hubs were enriched for many interferon-stimulated genes, T cells in multiple differentiation states, and CXCL9/10/11 + macrophages that preferentially interact with CD8 T cells. Critically, we discovered the stem-immunity hub, a subtype of immunity hub strongly associated with favorable PD-1-blockade outcomes, distinct from mature tertiary lymphoid structures, and enriched for stem-like TCF7+PD-1+ CD8 T cells and activated CCR7 + LAMP3 + dendritic cells, as well as chemokines that organize these cells. These results elucidate the spatial organization of the human intratumoral immune response and its relevance to patient immunotherapy outcomes.
RESUMEN
Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network called Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen binding affinity prediction benchmarks. Our model outperforms all unsupervised baselines (physics-based and statistical potentials) and matches supervised learning methods in the antibody case.
RESUMEN
Cancers avoid immune surveillance through an array of mechanisms, including perturbation of HLA class I antigen presentation. Merkel cell carcinoma (MCC) is an aggressive, HLA-I-low, neuroendocrine carcinoma of the skin often caused by the Merkel cell polyomavirus (MCPyV). Through the characterization of 11 newly generated MCC patient-derived cell lines, we identified transcriptional suppression of several class I antigen presentation genes. To systematically identify regulators of HLA-I loss in MCC, we performed parallel, genome-scale, gain- and loss-of-function screens in a patient-derived MCPyV-positive cell line and identified MYCL and the non-canonical Polycomb repressive complex 1.1 (PRC1.1) as HLA-I repressors. We observed physical interaction of MYCL with the MCPyV small T viral antigen, supporting a mechanism of virally mediated HLA-I suppression. We further identify the PRC1.1 component USP7 as a pharmacologic target to restore HLA-I expression in MCC.
Asunto(s)
Carcinoma de Células de Merkel , Poliomavirus de Células de Merkel , Infecciones por Polyomavirus , Neoplasias Cutáneas , Antígenos Virales de Tumores/genética , Antígenos Virales de Tumores/metabolismo , Carcinoma de Células de Merkel/genética , Carcinoma de Células de Merkel/patología , Epigénesis Genética , Humanos , Poliomavirus de Células de Merkel/genética , Poliomavirus de Células de Merkel/metabolismo , Infecciones por Polyomavirus/genética , Neoplasias Cutáneas/patología , Peptidasa Específica de Ubiquitina 7/metabolismoRESUMEN
Tumor-associated epitopes presented on MHC-I that can activate the immune system against cancer cells are typically identified from annotated protein-coding regions of the genome, but whether peptides originating from novel or unannotated open reading frames (nuORFs) can contribute to antitumor immune responses remains unclear. Here we show that peptides originating from nuORFs detected by ribosome profiling of malignant and healthy samples can be displayed on MHC-I of cancer cells, acting as additional sources of cancer antigens. We constructed a high-confidence database of translated nuORFs across tissues (nuORFdb) and used it to detect 3,555 translated nuORFs from MHC-I immunopeptidome mass spectrometry analysis, including peptides that result from somatic mutations in nuORFs of cancer samples as well as tumor-specific nuORFs translated in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs are an unexplored pool of MHC-I-presented, tumor-specific peptides with potential as immunotherapy targets.
Asunto(s)
Inmunoterapia , Melanoma , Antígenos de Neoplasias , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase I/metabolismo , Humanos , Inmunoterapia/métodos , Espectrometría de Masas , Melanoma/genética , PéptidosRESUMEN
Colorectal cancers (CRCs) arise from precursor polyps whose cellular origins, molecular heterogeneity, and immunogenic potential may reveal diagnostic and therapeutic insights when analyzed at high resolution. We present a single-cell transcriptomic and imaging atlas of the two most common human colorectal polyps, conventional adenomas and serrated polyps, and their resulting CRC counterparts. Integrative analysis of 128 datasets from 62 participants reveals adenomas arise from WNT-driven expansion of stem cells, while serrated polyps derive from differentiated cells through gastric metaplasia. Metaplasia-associated damage is coupled to a cytotoxic immune microenvironment preceding hypermutation, driven partly by antigen-presentation differences associated with tumor cell-differentiation status. Microsatellite unstable CRCs contain distinct non-metaplastic regions where tumor cells acquire stem cell properties and cytotoxic immune cells are depleted. Our multi-omic atlas provides insights into malignant progression of colorectal polyps and their microenvironment, serving as a framework for precision surveillance and prevention of CRC.
Asunto(s)
Pólipos del Colon/patología , Neoplasias Colorrectales/patología , Microambiente Tumoral , Inmunidad Adaptativa , Adenoma/genética , Adenoma/patología , Adulto , Anciano , Animales , Carcinogénesis/genética , Carcinogénesis/patología , Muerte Celular , Diferenciación Celular , Pólipos del Colon/genética , Pólipos del Colon/inmunología , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/inmunología , Progresión de la Enfermedad , Femenino , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Heterogeneidad Genética , Humanos , Masculino , Ratones , Persona de Mediana Edad , Mutación/genética , Células Madre Neoplásicas/metabolismo , Células Madre Neoplásicas/patología , RNA-Seq , Reproducibilidad de los Resultados , Análisis de la Célula Individual , Microambiente Tumoral/inmunologíaRESUMEN
MS is the most effective method to directly identify peptides presented on human leukocyte antigen (HLA) molecules. However, current standard approaches often use 500 million or more cells as input to achieve high coverage of the immunopeptidome, and therefore, these methods are not compatible with the often limited amounts of tissue available from clinical tumor samples. Here, we evaluated microscaled basic reversed-phase fractionation to separate HLA peptide samples offline followed by ion mobility coupled to LC-MS/MS for analysis. The combination of these two separation methods enabled identification of 20% to 50% more peptides compared with samples analyzed without either prior fractionation or use of ion mobility alone. We demonstrate coverage of HLA immunopeptidomes with up to 8107 distinct peptides starting with as few as 100 million cells. The increased sensitivity obtained using our methods can provide data useful to improve HLA-binding prediction algorithms as well as to enable detection of clinically relevant epitopes such as neoantigens.
Asunto(s)
Antígenos de Neoplasias/análisis , Antígenos de Histocompatibilidad Clase I/análisis , Péptidos/análisis , Línea Celular , Fraccionamiento Químico , Cromatografía Liquida , Humanos , Espectrometría de Movilidad Iónica , Neoplasias/química , Espectrometría de Masas en TándemRESUMEN
Immune responses to cancer are highly variable, with mismatch repair-deficient (MMRd) tumors exhibiting more anti-tumor immunity than mismatch repair-proficient (MMRp) tumors. To understand the rules governing these varied responses, we transcriptionally profiled 371,223 cells from colorectal tumors and adjacent normal tissues of 28 MMRp and 34 MMRd individuals. Analysis of 88 cell subsets and their 204 associated gene expression programs revealed extensive transcriptional and spatial remodeling across tumors. To discover hubs of interacting malignant and immune cells, we identified expression programs in different cell types that co-varied across tumors from affected individuals and used spatial profiling to localize coordinated programs. We discovered a myeloid cell-attracting hub at the tumor-luminal interface associated with tissue damage and an MMRd-enriched immune hub within the tumor, with activated T cells together with malignant and myeloid cells expressing T cell-attracting chemokines. By identifying interacting cellular programs, we reveal the logic underlying spatially organized immune-malignant cell networks.
Asunto(s)
Neoplasias Colorrectales/inmunología , Neoplasias Colorrectales/patología , Proteínas Morfogenéticas Óseas/metabolismo , Fibroblastos Asociados al Cáncer/metabolismo , Fibroblastos Asociados al Cáncer/patología , Compartimento Celular , Línea Celular Tumoral , Quimiocinas/metabolismo , Estudios de Cohortes , Neoplasias Colorrectales/genética , Reparación de la Incompatibilidad de ADN/genética , Células Endoteliales/metabolismo , Regulación Neoplásica de la Expresión Génica , Humanos , Inmunidad , Inflamación/patología , Monocitos/patología , Células Mieloides/patología , Neutrófilos/patología , Células del Estroma/metabolismo , Linfocitos T/metabolismo , Transcripción GenéticaRESUMEN
T cell-mediated immunity plays an important role in controlling SARS-CoV-2 infection, but the repertoire of naturally processed and presented viral epitopes on class I human leukocyte antigen (HLA-I) remains uncharacterized. Here, we report the first HLA-I immunopeptidome of SARS-CoV-2 in two cell lines at different times post infection using mass spectrometry. We found HLA-I peptides derived not only from canonical open reading frames (ORFs) but also from internal out-of-frame ORFs in spike and nucleocapsid not captured by current vaccines. Some peptides from out-of-frame ORFs elicited T cell responses in a humanized mouse model and individuals with COVID-19 that exceeded responses to canonical peptides, including some of the strongest epitopes reported to date. Whole-proteome analysis of infected cells revealed that early expressed viral proteins contribute more to HLA-I presentation and immunogenicity. These biological insights, as well as the discovery of out-of-frame ORF epitopes, will facilitate selection of peptides for immune monitoring and vaccine development.
Asunto(s)
Epítopos de Linfocito T/inmunología , Antígenos de Histocompatibilidad Clase I/inmunología , Sistemas de Lectura Abierta/genética , Péptidos/inmunología , Proteoma/inmunología , SARS-CoV-2/inmunología , Células A549 , Alelos , Secuencia de Aminoácidos , Animales , Presentación de Antígeno/inmunología , COVID-19/inmunología , COVID-19/virología , Femenino , Células HEK293 , Humanos , Cinética , Masculino , Ratones , Péptidos/química , Linfocitos T/inmunologíaRESUMEN
Immunotherapies have emerged to treat diseases by selectively modulating a patient's immune response. Although the roles of T and B cells in adaptive immunity have been well studied, it remains difficult to select targets for immunotherapeutic strategies. Because human leukocyte antigen class II (HLA-II) peptides activate CD4+ T cells and regulate B cell activation, proliferation, and differentiation, these peptide antigens represent a class of potential immunotherapy targets and biomarkers. To better understand the molecular basis of how HLA-II antigen presentation is involved in disease progression and treatment, systematic HLA-II peptidomics combined with multiomic analyses of diverse cell types in healthy and diseased states is required. For this reason, MS-based innovations that facilitate investigations into the interplay between disease pathologies and the presentation of HLA-II peptides to CD4+ T cells will aid in the development of patient-focused immunotherapies.
Asunto(s)
Antígenos de Histocompatibilidad Clase II/inmunología , Inmunoterapia , Péptidos/inmunología , Animales , Presentación de Antígeno , Genómica , Antígenos de Histocompatibilidad Clase II/genética , Humanos , Espectrometría de Masas , Péptidos/genéticaRESUMEN
Birdshot Uveitis (BU) is a blinding inflammatory eye condition that only affects HLA-A29-positive individuals. Genetic association studies linked ERAP2 with BU, an aminopeptidase which trims peptides before their presentation by HLA class I at the cell surface, which suggests that ERAP2-dependent peptide presentation by HLA-A29 drives the pathogenesis of BU. However, it remains poorly understood whether the effects of ERAP2 on the HLA-A29 peptidome are distinct from its effect on other HLA allotypes. To address this, we focused on the effects of ERAP2 on the immunopeptidome in patient-derived antigen presenting cells. Using complementary HLA-A29-based and pan-class I immunopurifications, isotope-labeled naturally processed and presented HLA-bound peptides were sequenced by mass spectrometry. We show that the effects of ERAP2 on the N-terminus of ligands of HLA-A29 are shared across endogenous HLA allotypes, but discover and replicate that one peptide motif generated in the presence of ERAP2 is specifically bound by HLA-A29. This motif can be found in the amino acid sequence of putative autoantigens. We further show evidence for internal sequence specificity for ERAP2 imprinted in the immunopeptidome. These results reveal that ERAP2 can generate an HLA-A29-specific antigen repertoire, which supports that antigen presentation is a key disease pathway in BU.
Asunto(s)
Aminopeptidasas/metabolismo , Células Presentadoras de Antígenos/enzimología , Autoantígenos/metabolismo , Autoinmunidad , Retinocoroidopatía en Perdigonada/enzimología , Antígenos HLA-A/metabolismo , Anciano de 80 o más Años , Secuencias de Aminoácidos , Aminopeptidasas/genética , Células Presentadoras de Antígenos/inmunología , Autoantígenos/genética , Autoantígenos/inmunología , Retinocoroidopatía en Perdigonada/diagnóstico , Retinocoroidopatía en Perdigonada/genética , Retinocoroidopatía en Perdigonada/inmunología , Línea Celular , Femenino , Antígenos HLA-A/genética , Antígenos HLA-A/inmunología , HumanosRESUMEN
Personal neoantigen vaccines have been envisioned as an effective approach to induce, amplify and diversify antitumor T cell responses. To define the long-term effects of such a vaccine, we evaluated the clinical outcome and circulating immune responses of eight patients with surgically resected stage IIIB/C or IVM1a/b melanoma, at a median of almost 4 years after treatment with NeoVax, a long-peptide vaccine targeting up to 20 personal neoantigens per patient ( NCT01970358 ). All patients were alive and six were without evidence of active disease. We observed long-term persistence of neoantigen-specific T cell responses following vaccination, with ex vivo detection of neoantigen-specific T cells exhibiting a memory phenotype. We also found diversification of neoantigen-specific T cell clones over time, with emergence of multiple T cell receptor clonotypes exhibiting distinct functional avidities. Furthermore, we detected evidence of tumor infiltration by neoantigen-specific T cell clones after vaccination and epitope spreading, suggesting on-target vaccine-induced tumor cell killing. Personal neoantigen peptide vaccines thus induce T cell responses that persist over years and broaden the spectrum of tumor-specific cytotoxicity in patients with melanoma.
Asunto(s)
Antígenos de Neoplasias/genética , Vacunas contra el Cáncer/inmunología , Epítopos/inmunología , Memoria Inmunológica , Melanoma/inmunología , Humanos , Melanoma/patologíaRESUMEN
T cell-mediated immunity may play a critical role in controlling and establishing protective immunity against SARS-CoV-2 infection; yet the repertoire of viral epitopes responsible for T cell response activation remains mostly unknown. Identification of viral peptides presented on class I human leukocyte antigen (HLA-I) can reveal epitopes for recognition by cytotoxic T cells and potential incorporation into vaccines. Here, we report the first HLA-I immunopeptidome of SARS-CoV-2 in two human cell lines at different times post-infection using mass spectrometry. We found HLA-I peptides derived not only from canonical ORFs, but also from internal out-of-frame ORFs in Spike and Nucleoprotein not captured by current vaccines. Proteomics analyses of infected cells revealed that SARS-CoV-2 may interfere with antigen processing and immune signaling pathways. Based on the endogenously processed and presented viral peptides that we identified, we estimate that a pool of 24 peptides would provide one or more peptides for presentation by at least one HLA allele in 99% of the human population. These biological insights and the list of naturally presented SARS-CoV-2 peptides will facilitate data-driven selection of peptides for immune monitoring and vaccine development.
RESUMEN
Massively parallel single-cell and single-nucleus RNA sequencing has opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so is the need for computational pipelines for scaled analysis. Here we developed Cumulus-a cloud-based framework for analyzing large-scale single-cell and single-nucleus RNA sequencing datasets. Cumulus combines the power of cloud computing with improvements in algorithm and implementation to achieve high scalability, low cost, user-friendliness and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.
Asunto(s)
Nube Computacional/economía , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Biología Computacional/economía , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Análisis de Secuencia de ARN/economíaRESUMEN
Tumor-associated macrophages (TAM) are regulators of extracellular matrix (ECM) remodeling and metastatic progression, the main cause of cancer-associated death. We found that disabled homolog 2 mitogen-responsive phosphoprotein (DAB2) is highly expressed in tumor-infiltrating TAMs and that its genetic ablation significantly impairs lung metastasis formation. DAB2-expressing TAMs, mainly localized along the tumor-invasive front, participate in integrin recycling, ECM remodeling, and directional migration in a tridimensional matrix. DAB2+ macrophages escort the invasive dissemination of cancer cells by a mechanosensing pathway requiring the transcription factor YAP. In human lobular breast and gastric carcinomas, DAB2+ TAMs correlated with a poor clinical outcome, identifying DAB2 as potential prognostic biomarker for stratification of patients with cancer. DAB2 is therefore central for the prometastatic activity of TAMs. SIGNIFICANCE: DAB2 expression in macrophages is essential for metastasis formation but not primary tumor growth. Mechanosensing cues, activating the complex YAP-TAZ, regulate DAB2 in macrophages, which in turn controls integrin recycling and ECM remodeling in 3-D tissue matrix. The presence of DAB2+ TAMs in patients with cancer correlates with worse prognosis.This article is highlighted in the In This Issue feature, p. 1611.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/antagonistas & inhibidores , Proteínas Reguladoras de la Apoptosis/antagonistas & inhibidores , Neoplasias/genética , Macrófagos Asociados a Tumores/metabolismo , Línea Celular Tumoral , HumanosRESUMEN
Prediction of HLA epitopes is important for the development of cancer immunotherapies and vaccines. However, current prediction algorithms have limited predictive power, in part because they were not trained on high-quality epitope datasets covering a broad range of HLA alleles. To enable prediction of endogenous HLA class I-associated peptides across a large fraction of the human population, we used mass spectrometry to profile >185,000 peptides eluted from 95 HLA-A, -B, -C and -G mono-allelic cell lines. We identified canonical peptide motifs per HLA allele, unique and shared binding submotifs across alleles and distinct motifs associated with different peptide lengths. By integrating these data with transcript abundance and peptide processing, we developed HLAthena, providing allele-and-length-specific and pan-allele-pan-length prediction models for endogenous peptide presentation. These models predicted endogenous HLA class I-associated ligands with 1.5-fold improvement in positive predictive value compared with existing tools and correctly identified >75% of HLA-bound peptides that were observed experimentally in 11 patient-derived tumor cell lines.