Search | VHL Regional Portal

1.

nf-core/airrflow: An adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.

Gabernet, Gisela; Marquez, Susanna; Bjornson, Robert; Peltzer, Alexander; Meng, Hailong; Aron, Edel; Lee, Noah Y; Jensen, Cole; Ladd, David; Polster, Mark; Hanssen, Friederike; Heumos, Simon; Yaari, Gur; Kowarik, Markus C; Nahnsen, Sven; Kleinstein, Steven H.

PLoS Comput Biol ; 20(7): e1012265, 2024 Jul 26.

Article in English | MEDLINE | ID: mdl-39058741

ABSTRACT

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets.

2.

Resolving haplotype variation and complex genetic architecture in the human immunoglobulin kappa chain locus in individuals of diverse ancestry.

Engelbrecht, Eric; Rodriguez, Oscar L; Shields, Kaitlyn; Schultze, Steven; Tieri, David; Jana, Uddalok; Yaari, Gur; Lees, William D; Smith, Melissa L; Watson, Corey T.

Genes Immun ; 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38844673

ABSTRACT

Immunoglobulins (IGs), critical components of the human immune system, are composed of heavy and light protein chains encoded at three genomic loci. The IG Kappa (IGK) chain locus consists of two large, inverted segmental duplications. The complexity of the IG loci has hindered use of standard high-throughput methods for characterizing genetic variation within these regions. To overcome these limitations, we use long-read sequencing to create haplotype-resolved IGK assemblies in an ancestrally diverse cohort (n = 36), representing the first comprehensive description of IGK haplotype variation. We identify extensive locus polymorphism, including novel single nucleotide variants (SNVs) and novel structural variants harboring functional IGKV genes. Among 47 functional IGKV genes, we identify 145 alleles, 67 of which were not previously curated. We report inter-population differences in allele frequencies for 10 IGKV genes, including alleles unique to specific populations within this dataset. We identify haplotypes carrying signatures of gene conversion that associate with SNV enrichment in the IGK distal region, and a haplotype with an inversion spanning the proximal and distal regions. These data provide a critical resource of curated genomic reference information from diverse ancestries, laying a foundation for advancing our understanding of population-level genetic variation in the IGK locus.

3.

Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data.

Peres, Ayelet; Klein, Vered; Frankel, Boaz; Lees, William; Polak, Pazit; Meehan, Mark; Rocha, Artur; Correia Lopes, João; Yaari, Gur.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38752856

ABSTRACT

Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.

Subject(s)

Computational Biology , Software , Humans , Computational Biology/methods , Reproducibility of Results , Receptors, Immunologic/genetics , High-Throughput Nucleotide Sequencing/methods , Adaptive Immunity/genetics , Guidelines as Topic

4.

Digger: directed annotation of immunoglobulin and T cell receptor V, D, and J gene sequences and assemblies.

Lees, William D; Saha, Swati; Yaari, Gur; Watson, Corey T.

Bioinformatics ; 40(3)2024 Mar 04.

Article in English | MEDLINE | ID: mdl-38478393

ABSTRACT

SUMMARY: Knowledge of immunoglobulin and T cell receptor encoding genes is derived from high-quality genomic sequencing. High-throughput sequencing is delivering large volumes of data, and precise, high-throughput approaches to annotation are needed. Digger is an automated tool that identifies coding and regulatory regions of these genes, with results comparable to those obtained by current expert curational methods. AVAILABILITY AND IMPLEMENTATION: Digger is published under open source license at https://github.com/williamdlees/Digger and is available as a Python package and a Docker container.

Subject(s)

Receptors, Antigen, T-Cell , Software , Receptors, Antigen, T-Cell/genetics , Chromosome Mapping , Immunoglobulins/genetics , High-Throughput Nucleotide Sequencing/methods

5.

nf-core/airrflow: an adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.

Gabernet, Gisela; Marquez, Susanna; Bjornson, Robert; Peltzer, Alexander; Meng, Hailong; Aron, Edel; Lee, Noah Yann; Jensen, Cole; Ladd, David; Hanssen, Friederike; Heumos, Simon; Yaari, Gur; Kowarik, Markus C; Nahnsen, Sven; Kleinstein, Steven H.

bioRxiv ; 2024 Jan 28.

Article in English | MEDLINE | ID: mdl-38293151

ABSTRACT

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).

6.

IGHV allele similarity clustering improves genotype inference from adaptive immune receptor repertoire sequencing data.

Peres, Ayelet; Lees, William D; Rodriguez, Oscar L; Lee, Noah Y; Polak, Pazit; Hope, Ronen; Kedmi, Meirav; Collins, Andrew M; Ohlin, Mats; Kleinstein, Steven H; Watson, Corey T; Yaari, Gur.

Nucleic Acids Res ; 51(16): e86, 2023 09 08.

Article in English | MEDLINE | ID: mdl-37548401

ABSTRACT

In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).

Subject(s)

Genomics , Immunoglobulin Heavy Chains , Receptors, Antigen, B-Cell , Alleles , Genotype , Receptors, Antigen, B-Cell/genetics , Immunoglobulin Heavy Chains/genetics

7.

Polyclonal lymphoid expansion drives paraneoplastic autoimmunity in neuroblastoma.

Rosenberg, Miriam I; Greenstein, Erez; Buchkovich, Martin; Peres, Ayelet; Santoni-Rugiu, Eric; Yang, Lei; Mikl, Martin; Vaksman, Zalman; Gibbs, David L; Reshef, Dan; Salovin, Amy; Irwin, Meredith S; Naranjo, Arlene; Ulitsky, Igor; de Alarcon, Pedro A; Matthay, Katherine K; Weigman, Victor; Yaari, Gur; Panzer, Jessica A; Friedman, Nir; Maris, John M.

Cell Rep ; 42(8): 112879, 2023 08 29.

Article in English | MEDLINE | ID: mdl-37537844

ABSTRACT

Neuroblastoma is a lethal childhood solid tumor of developing peripheral nerves. Two percent of children with neuroblastoma develop opsoclonus myoclonus ataxia syndrome (OMAS), a paraneoplastic disease characterized by cerebellar and brainstem-directed autoimmunity but typically with outstanding cancer-related outcomes. We compared tumor transcriptomes and tumor-infiltrating T and B cell repertoires from 38 OMAS subjects with neuroblastoma to 26 non-OMAS-associated neuroblastomas. We found greater B and T cell infiltration in OMAS-associated tumors compared to controls and showed that both were polyclonal expansions. Tertiary lymphoid structures (TLSs) were enriched in OMAS-associated tumors. We identified significant enrichment of the major histocompatibility complex (MHC) class II allele HLA-DOB∗01:01 in OMAS patients. OMAS severity scores were associated with the expression of several candidate autoimmune genes. We propose a model in which polyclonal auto-reactive B lymphocytes act as antigen-presenting cells and drive TLS formation, thereby supporting both sustained polyclonal T cell-mediated anti-tumor immunity and paraneoplastic OMAS neuropathology.

Subject(s)

Neuroblastoma , Opsoclonus-Myoclonus Syndrome , Child , Humans , Autoimmunity , Neuroblastoma/complications , Neuroblastoma/metabolism , Opsoclonus-Myoclonus Syndrome/complications , Opsoclonus-Myoclonus Syndrome/pathology , Autoantibodies , Genes, MHC Class II , Ataxia

8.

A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression.

Konstantinovsky, Thomas; Yaari, Gur.

Bioinformatics ; 39(7)2023 07 01.

Article in English | MEDLINE | ID: mdl-37417959

ABSTRACT

MOTIVATION: T-cell receptor beta chain (TCRB) repertoires are crucial for understanding immune responses. However, their high diversity and complexity present significant challenges in representation and analysis. The main motivation of this study is to develop a unified and compact representation of a TCRB repertoire that can efficiently capture its inherent complexity and diversity and allow for direct inference. RESULTS: We introduce a novel approach to TCRB repertoire encoding and analysis, leveraging the Lempel-Ziv 76 algorithm. This approach allows us to create a graph-like model, identify-specific sequence features, and produce a new encoding approach for an individual's repertoire. The proposed representation enables various applications, including generation probability inference, informative feature vector derivation, sequence generation, a new measure for diversity estimation, and a new sequence centrality measure. The approach was applied to four large-scale public TCRB sequencing datasets, demonstrating its potential for a wide range of applications in big biological sequencing data. AVAILABILITY AND IMPLEMENTATION: Python package for implementation is available https://github.com/MuteJester/LZGraphs.

Subject(s)

Data Compression , Receptors, Antigen, T-Cell, alpha-beta , Receptors, Antigen, T-Cell, alpha-beta/genetics , Algorithms , Receptors, Antigen, T-Cell/genetics

9.

AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets.

Lees, William D; Christley, Scott; Peres, Ayelet; Kos, Justin T; Corrie, Brian; Ralph, Duncan; Breden, Felix; Cowell, Lindsay G; Yaari, Gur; Corcoran, Martin; Karlsson Hedestam, Gunilla B; Ohlin, Mats; Collins, Andrew M; Watson, Corey T; Busse, Christian E.

Immunoinformatics (Amst) ; 102023 Jun.

Article in English | MEDLINE | ID: mdl-37388275

ABSTRACT

Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.

10.

Altered somatic hypermutation patterns in COVID-19 patients classifies disease severity.

Safra, Modi; Tamari, Zvi; Polak, Pazit; Shiber, Shachaf; Matan, Moshe; Karameh, Hani; Helviz, Yigal; Levy-Barda, Adva; Yahalom, Vered; Peretz, Avi; Ben-Chetrit, Eli; Brenner, Baruch; Tuller, Tamir; Gal-Tanamy, Meital; Yaari, Gur.

Front Immunol ; 14: 1031914, 2023.

Article in English | MEDLINE | ID: mdl-37153628

ABSTRACT

Introduction: The success of the human body in fighting SARS-CoV2 infection relies on lymphocytes and their antigen receptors. Identifying and characterizing clinically relevant receptors is of utmost importance. Methods: We report here the application of a machine learning approach, utilizing B cell receptor repertoire sequencing data from severely and mildly infected individuals with SARS-CoV2 compared with uninfected controls. Results: In contrast to previous studies, our approach successfully stratifies non-infected from infected individuals, as well as disease level of severity. The features that drive this classification are based on somatic hypermutation patterns, and point to alterations in the somatic hypermutation process in COVID-19 patients. Discussion: These features may be used to build and adapt therapeutic strategies to COVID-19, in particular to quantitatively assess potential diagnostic and therapeutic antibodies. These results constitute a proof of concept for future epidemiological challenges.

Subject(s)

B-Lymphocytes , COVID-19 , Humans , Receptors, Antigen, B-Cell/genetics , RNA, Viral , SARS-CoV-2/genetics , Patient Acuity

11.

FLAIRR-Seq: A Method for Single-Molecule Resolution of Near Full-Length Antibody H Chain Repertoires.

Ford, Easton E; Tieri, David; Rodriguez, Oscar L; Francoeur, Nancy J; Soto, Juan; Kos, Justin T; Peres, Ayelet; Gibson, William S; Silver, Catherine A; Deikus, Gintaras; Hudson, Elizabeth; Woolley, Cassandra R; Beckmann, Noam; Charney, Alexander; Mitchell, Thomas C; Yaari, Gur; Sebra, Robert P; Watson, Corey T; Smith, Melissa L.

J Immunol ; 210(10): 1607-1619, 2023 05 15.

Article in English | MEDLINE | ID: mdl-37027017

ABSTRACT

Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5' RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5' RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.

Subject(s)

Complementarity Determining Regions , Humans , Complementarity Determining Regions/genetics , Base Sequence

12.

B cell class switch recombination is regulated by DYRK1A through MSH6 phosphorylation.

Stoler-Barak, Liat; Harris, Ethan; Peres, Ayelet; Hezroni, Hadas; Kuka, Mirela; Di Lucia, Pietro; Grenov, Amalie; Gurwicz, Neta; Kupervaser, Meital; Yip, Bon Ham; Iannacone, Matteo; Yaari, Gur; Crispino, John D; Shulman, Ziv.

Nat Commun ; 14(1): 1462, 2023 03 16.

Article in English | MEDLINE | ID: mdl-36927854

ABSTRACT

Protection from viral infections depends on immunoglobulin isotype switching, which endows antibodies with effector functions. Here, we find that the protein kinase DYRK1A is essential for B cell-mediated protection from viral infection and effective vaccination through regulation of class switch recombination (CSR). Dyrk1a-deficient B cells are impaired in CSR activity in vivo and in vitro. Phosphoproteomic screens and kinase-activity assays identify MSH6, a DNA mismatch repair protein, as a direct substrate for DYRK1A, and deletion of a single phosphorylation site impaired CSR. After CSR and germinal center (GC) seeding, DYRK1A is required for attenuation of B cell proliferation. These findings demonstrate DYRK1A-mediated biological mechanisms of B cell immune responses that may be used for therapeutic manipulation in antibody-mediated autoimmunity.

Subject(s)

B-Lymphocytes , Immunoglobulin Class Switching , Phosphorylation , Immunoglobulin Class Switching/genetics , Germinal Center , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism

13.

High-Resolution Genomic Profiling of Liver Cancer Links Etiology With Mutation and Epigenetic Signatures.

Perez, Shira; Lavi-Itzkovitz, Anat; Gidoni, Moriah; Domovitz, Tom; Dabour, Roba; Khurana, Ishant; Davidovich, Ateret; Tobar, Ana; Livoff, Alejandro; Solomonov, Evgeny; Maman, Yaakov; El-Osta, Assam; Tsai, Yishan; Yu, Ming-Lung; Stemmer, Salomon M; Haviv, Izhak; Yaari, Gur; Gal-Tanamy, Meital.

Cell Mol Gastroenterol Hepatol ; 16(1): 63-81, 2023.

Article in English | MEDLINE | ID: mdl-36965814

ABSTRACT

BACKGROUND & AIMS: Hepatocellular carcinoma (HCC) is a model of a diverse spectrum of cancers because it is induced by well-known etiologies, mainly hepatitis C virus (HCV) and hepatitis B virus. Here, we aimed to identify HCV-specific mutational signatures and explored the link between the HCV-related regional variation in mutations rates and HCV-induced alterations in genome-wide chromatin organization. METHODS: To identify an HCV-specific mutational signature in HCC, we performed high-resolution targeted sequencing to detect passenger mutations on 64 HCC samples from 3 etiology groups: hepatitis B virus, HCV, or other. To explore the link between the genomic signature and genome-wide chromatin organization we performed chromatin immunoprecipitation sequencing for the transcriptionally permissive H3K4Me3, H3K9Ac, and suppressive H3K9Me3 modifications after HCV infection. RESULTS: Regional variation in mutation rate analysis showed significant etiology-dependent regional mutation rates in 12 genes: LRP2, KRT84, TMEM132B, DOCK2, DMD, INADL, JAK2, DNAH6, MTMR9, ATM, SLX4, and ARSD. We found an enrichment of C->T transversion mutations in the HCV-associated HCC cases. Furthermore, these cases showed regional variation in mutation rates associated with genomic intervals in which HCV infection dictated epigenetic alterations. This signature may be related to the HCV-induced decreased expression of genes encoding key enzymes in the base excision repair pathway. CONCLUSIONS: We identified novel distinct HCV etiology-dependent mutation signatures in HCC associated with HCV-induced alterations in histone modification. This study presents a link between cancer-causing mutagenesis and the increased predisposition to liver cancer in chronic HCV-infected individuals, and unveils novel etiology-specific mechanisms leading to HCC and cancer in general.

Subject(s)

Carcinoma, Hepatocellular , Hepatitis C , Liver Neoplasms , Humans , Liver Neoplasms/pathology , Carcinoma, Hepatocellular/pathology , Hepatitis C/complications , Hepatitis C/genetics , Mutation/genetics , Hepacivirus/genetics , Hepatitis B virus/genetics , Epigenesis, Genetic/genetics , Chromatin , Genomics , Protein Tyrosine Phosphatases, Non-Receptor/genetics , Keratins, Type II/genetics , Keratins, Hair-Specific/genetics

14.

A somatic hypermutation-based machine learning model stratifies individuals with Crohn's disease and controls.

Safra, Modi; Werner, Lael; Peres, Ayelet; Polak, Pazit; Salamon, Naomi; Schvimer, Michael; Weiss, Batia; Barshack, Iris; Shouval, Dror S; Yaari, Gur.

Genome Res ; 33(1): 71-79, 2023 01.

Article in English | MEDLINE | ID: mdl-36526432

ABSTRACT

Crohn's disease (CD) is a chronic relapsing-remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.

Subject(s)

Crohn Disease , Adult , Humans , Child , Crohn Disease/genetics , Machine Learning , Biopsy , Algorithms , Chronic Disease

15.

AIRR-C IG Reference Sets: curated sets of immunoglobulin heavy and light chain germline genes.

Collins, Andrew M; Ohlin, Mats; Corcoran, Martin; Heather, James M; Ralph, Duncan; Law, Mansun; Martínez-Barnetche, Jesus; Ye, Jian; Richardson, Eve; Gibson, William S; Rodriguez, Oscar L; Peres, Ayelet; Yaari, Gur; Watson, Corey T; Lees, William D.

Front Immunol ; 14: 1330153, 2023.

Article in English | MEDLINE | ID: mdl-38406579

ABSTRACT

Introduction: Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.

Subject(s)

Genes, Immunoglobulin , Immunoglobulins , Humans , Immunoglobulins/genetics , Alleles , V(D)J Recombination/genetics , Germ Cells

16.

A BALB/c IGHV Reference Set, Defined by Haplotype Analysis of Long-Read VDJ-C Sequences From F1 (BALB/c x C57BL/6) Mice.

Jackson, Katherine J L; Kos, Justin T; Lees, William; Gibson, William S; Smith, Melissa Laird; Peres, Ayelet; Yaari, Gur; Corcoran, Martin; Busse, Christian E; Ohlin, Mats; Watson, Corey T; Collins, Andrew M.

Front Immunol ; 13: 888555, 2022.

Article in English | MEDLINE | ID: mdl-35720344

ABSTRACT

The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that current MGP assemblies are unsuitable for the comprehensive documentation of germline IGHVs and more efforts will be needed to establish strain-specific reference sets.

Subject(s)

Immunoglobulin Heavy Chains , Immunoglobulin Variable Region , Animals , Haplotypes , Immunoglobulin Heavy Chains/genetics , Immunoglobulin Variable Region/genetics , Mice , Mice, Inbred BALB C , Mice, Inbred C57BL , Sequence Analysis, DNA

17.

Ontogeny of the B Cell Receptor Repertoire and Microbiome in Mice.

Gilboa, Amit; Hope, Ronen; Ben Simon, Shira; Polak, Pazit; Koren, Omry; Yaari, Gur.

J Immunol ; 208(12): 2713-2725, 2022 06 15.

Article in English | MEDLINE | ID: mdl-35623663

ABSTRACT

The immune system matures throughout childhood to achieve full functionality in protecting our bodies against threats. The immune system has a strong reciprocal symbiosis with the host bacterial population and the two systems co-develop, shaping each other. Despite their fundamental role in health physiology, the ontogeny of these systems is poorly characterized. In this study, we investigated the development of the BCR repertoire by analyzing high-throughput sequencing of their receptors in several time points of young C57BL/6J mice. In parallel, we explored the development of the gut microbiome. We discovered that the gut IgA repertoires change from birth to adolescence, including an increase in CDR3 lengths and somatic hypermutation levels. This contrasts with the spleen IgM repertoires that remain stable and distinct from the IgA repertoires in the gut. We also discovered that large clones that germinate in the gut are initially confined to a specific gut compartment, then expand to nearby compartments and later on expand also to the spleen and remain there. Finally, we explored the associations between diversity indices of the B cell repertoires and the microbiome, as well as associations between bacterial and BCR clusters. Our results shed light on the ontogeny of the adaptive immune system and the microbiome, providing a baseline for future research.

Subject(s)

Microbiota , Animals , High-Throughput Nucleotide Sequencing , Immunoglobulin A/genetics , Mice , Mice, Inbred C57BL , Receptors, Antigen, B-Cell/genetics

18.

Tumor-reactive antibodies evolve from non-binding and autoreactive precursors.

Mazor, Roei D; Nathan, Nachum; Gilboa, Amit; Stoler-Barak, Liat; Moss, Lihee; Solomonov, Inna; Hanuna, Assaf; Divinsky, Yalin; Shmueli, Merav D; Hezroni, Hadas; Zaretsky, Irina; Mor, Michael; Golani, Ofra; Sabah, Gad; Jakobson-Setton, Ariella; Yanichkin, Natalia; Feinmesser, Meora; Tsoref, Daliah; Salman, Lina; Yeoshoua, Effi; Peretz, Eyal; Erlich, Inna; Cohen, Netta Mendelson; Gershoni, Jonathan M; Freund, Natalia; Merbl, Yifat; Yaari, Gur; Eitan, Ram; Sagi, Irit; Shulman, Ziv.

Cell ; 185(7): 1208-1222.e21, 2022 03 31.

Article in English | MEDLINE | ID: mdl-35305314

ABSTRACT

The tumor microenvironment hosts antibody-secreting cells (ASCs) associated with a favorable prognosis in several types of cancer. Patient-derived antibodies have diagnostic and therapeutic potential; yet, it remains unclear how antibodies gain autoreactivity and target tumors. Here, we found that somatic hypermutations (SHMs) promote antibody antitumor reactivity against surface autoantigens in high-grade serous ovarian carcinoma (HGSOC). Patient-derived tumor cells were frequently coated with IgGs. Intratumoral ASCs in HGSOC were both mutated and clonally expanded and produced tumor-reactive antibodies that targeted MMP14, which is abundantly expressed on the tumor cell surface. The reversion of monoclonal antibodies to their germline configuration revealed two types of classes: one dependent on SHMs for tumor binding and a second with germline-encoded autoreactivity. Thus, tumor-reactive autoantibodies are either naturally occurring or evolve through an antigen-driven selection process. These findings highlight the origin and potential applicability of autoantibodies directed at surface antigens for tumor targeting in cancer patients.

Subject(s)

Antibodies, Neoplasm , Ovarian Neoplasms , Antibodies, Monoclonal , Autoantibodies , Autoantigens , Female , Humans , Ovarian Neoplasms/genetics , Tumor Microenvironment

19.

T cell receptor beta germline variability is revealed by inference from repertoire data.

Omer, Aviv; Peres, Ayelet; Rodriguez, Oscar L; Watson, Corey T; Lees, William; Polak, Pazit; Collins, Andrew M; Yaari, Gur.

Genome Med ; 14(1): 2, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34991709

ABSTRACT

BACKGROUND: T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS: To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS: From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS: We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.

Subject(s)

High-Throughput Nucleotide Sequencing , Receptors, Antigen, T-Cell, alpha-beta , Alleles , Germ Cells , High-Throughput Nucleotide Sequencing/methods , Humans , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell, alpha-beta/genetics

20.

simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods.

Kanduri, Chakravarthi; Scheffer, Lonneke; Pavlovic, Milena; Rand, Knut Dagestad; Chernigovskaya, Maria; Pirvandy, Oz; Yaari, Gur; Greiff, Victor; Sandve, Geir K.

Gigascience ; 122022 12 28.

Article in English | MEDLINE | ID: mdl-37848619

ABSTRACT

BACKGROUND: Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS: We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS: This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.

Subject(s)

Benchmarking , Computer Simulation

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL