RESUMO
Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.
Assuntos
Biologia Computacional , Software , Humanos , Biologia Computacional/métodos , Reprodutibilidade dos Testes , Receptores Imunológicos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunidade Adaptativa/genética , Guias como AssuntoRESUMO
The reconstruction of clonal families (CFs) in B-cell receptor (BCR) repertoire analysis is a crucial step to understand the adaptive immune system and how it responds to antigens. The BCR repertoire of an individual is formed throughout life and is diverse due to several factors such as gene recombination and somatic hypermutation. The use of Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using next generation sequencing enabled the generation of full BCR repertoires that also include rare CFs. The reconstruction of CFs from AIRR-seq data is challenging and several approaches have been developed to solve this problem. Currently, most methods use the heavy chain (HC) only, as it is more variable than the light chain (LC). CF reconstruction options include the definition of appropriate sequence similarity measures, the use of shared mutations among sequences, and the possibility of reconstruction without preliminary clustering based on V- and J-gene annotation. In this study, we aimed to systematically evaluate different approaches for CF reconstruction and to determine their impact on various outcome measures such as the number of CFs derived, the size of the CFs, and the accuracy of the reconstruction. The methods were compared to each other and to a method that groups sequences based on identical junction sequences and another method that only determines subclones. We found that after accounting for data set variability, in particular sequencing depth and mutation load, the reconstruction approach has an impact on part of the outcome measures, including the number of CFs. Simulations indicate that unique junctions and subclones should not be used as substitutes for CF and that more complex methods do not outperform simpler methods. Also, we conclude that different approaches differ in their ability to correctly reconstruct CFs when not considering the LC and to identify shared CFs. The results showed the effect of different approaches on the reconstruction of CFs and highlighted the importance of choosing an appropriate method.
Assuntos
Linfócitos B , Receptores de Antígenos de Linfócitos B , Humanos , Mutação , Receptores de Antígenos de Linfócitos B/genética , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
INTRODUCTION: Dual-expressing lymphocytes (DEs) are unique immune cells that express both B cell receptors (BCRs, surface antibody) and T cell receptors (TCRs). In type 1 diabetes, DE antibodies are predominated by one antibody (x-mAb), an IgM monoclonal antibody with a germline-encoded CDR3 that recognizes self-reactive TCRs. We explored if x-mAb and its interacting TCRs have distinct structural features. METHODS: Using bioinformatics, we compared x-mAb and its most common interacting TCRαß to billions of antigen receptor sequences to determine if they were unique or randomly generated. RESULTS: X-mAb represents a unique class of human antibodies with a conserved CDR3 sequence (CARx1-4DTAMVYYFYDW), consisting of a fixed DJH motif (DTAMVYYFDYW) paired with various VH genes. A public TCRß clonotype (CASSPGTEAFF) associated with x-mAb on DEs features two invariant segments, VßD (CASSPGT) and DJß (PGTEAFF), key to two large families of public TCRß clonotypes-CASSPGT-Jßx and CASSPGT-Jßx-formed by recombining the VßD motif with Jß genes and the DJß motif with Vß genes. B cells also use CASSPGT as a VHD motif for public IGH clonotypes (CASSPGT-Jßx). DISCUSSION: DEs, unlike conventional T and B cells, use invariant motifs to create public antibodies and TCRs, a trait previously seen only in cartilaginous fish.
Assuntos
Anticorpos Monoclonais , Humanos , Anticorpos Monoclonais/imunologia , Regiões Determinantes de Complementaridade/genética , Regiões Determinantes de Complementaridade/imunologia , Diabetes Mellitus Tipo 1/imunologia , Diabetes Mellitus Tipo 1/genética , Biologia Computacional/métodos , Receptores de Antígenos de Linfócitos B/imunologia , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/metabolismo , Motivos de Aminoácidos , Imunoglobulina M/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/metabolismo , Receptores de Antígenos de Linfócitos T alfa-beta/genética , Receptores de Antígenos de Linfócitos T alfa-beta/imunologia , Receptores de Antígenos de Linfócitos T alfa-beta/metabolismo , Sequência de AminoácidosRESUMO
The genomes of classical inbred mouse strains include genes derived from all three major subspecies of the house mouse, Mus musculus. We recently posited that genetic diversity in the immunoglobulin heavy chain (IGH) gene loci of C57BL/6 and BALB/c mice reflects differences in subspecies origin. To investigate this hypothesis, we conducted high-throughput sequencing of IGH gene rearrangements to document IGH variable (IGHV), joining (IGHJ) and diversity (IGHD) genes in four inbred wild-derived mouse strains (CAST/EiJ, LEWES/EiJ, MSM/MsJ and PWD/PhJ) and a single disease model strain (NOD/ShiLtJ), collectively representing genetic backgrounds of several major mouse subspecies. A total of 341 germline IGHV sequences were inferred in the wild-derived strains, including 247 not curated in the international ImMunoGeneTics information system. By contrast, 83/84 inferred NOD IGHV genes had previously been observed in C57BL/6 mice. Variability among the strains examined was observed for only a single IGHJ gene, involving a description of a novel allele. By contrast, unexpected variation was found in the IGHD gene loci, with four previously unreported IGHD gene sequences being documented. Very few IGHV sequences of C57BL/6 and BALB/c mice were shared with strains representing major subspecies, suggesting that their IGH loci may be complex mosaics of genes of disparate origins. This suggests a similar level of diversity is likely present in the IGH loci of other classical inbred strains. This must now be documented if we are to properly understand interstrain variation in models of antibody-mediated disease.
Assuntos
Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Animais , Sequência de Bases , Bases de Dados Genéticas , Células Germinativas/metabolismo , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NODRESUMO
BACKGROUND: Although B-cell depleting therapy in rheumatoid arthritis (RA) is clearly effective, response is variable and does not correlate with B cell depletion itself. METHODS: The B-cell receptor (BCR) repertoire was prospectively analyzed in peripheral blood samples of twenty-eight RA patients undergoing rituximab therapy. Timepoints of achieved BCR-depletion and -repopulation were defined based on the percentage of unmutated BCRs in the repertoire. The predictive value of early BCR-depletion (within one-month post-treatment) and early BCR-repopulation (within 6 months post-treatment) on clinical response was assessed. RESULTS: We observed changes in the peripheral blood BCR repertoire after rituximab treatment, i.e., increased clonal expansion, decreased clonal diversification and increased mutation load which persisted up to 12 months after treatment, but started to revert at month 6. Early BCR depletion was not associated with early clinical response but late depleters did show early response. Patients with early repopulation with unmutated BCRs showed a significant decrease in disease activity in the interval 6 to 12 months. Development of anti-drug antibodies non-significantly correlated with more BCR repopulation. CONCLUSION: Our findings indicate that rather than BCR-depletion it is repopulation with unmutated BCRs, possibly from naïve B cells, which induces remission. This suggests that (pre-existing) differences in B-cell turnover between patients explain the interindividual differences in early clinical effect.
Assuntos
Antirreumáticos , Artrite Reumatoide , Humanos , Rituximab/uso terapêutico , Rituximab/farmacologia , Antirreumáticos/uso terapêutico , Antirreumáticos/farmacologia , Artrite Reumatoide/tratamento farmacológico , Artrite Reumatoide/genética , Linfócitos B , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/uso terapêuticoRESUMO
Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.
RESUMO
Introduction: The success of the human body in fighting SARS-CoV2 infection relies on lymphocytes and their antigen receptors. Identifying and characterizing clinically relevant receptors is of utmost importance. Methods: We report here the application of a machine learning approach, utilizing B cell receptor repertoire sequencing data from severely and mildly infected individuals with SARS-CoV2 compared with uninfected controls. Results: In contrast to previous studies, our approach successfully stratifies non-infected from infected individuals, as well as disease level of severity. The features that drive this classification are based on somatic hypermutation patterns, and point to alterations in the somatic hypermutation process in COVID-19 patients. Discussion: These features may be used to build and adapt therapeutic strategies to COVID-19, in particular to quantitatively assess potential diagnostic and therapeutic antibodies. These results constitute a proof of concept for future epidemiological challenges.
Assuntos
Linfócitos B , COVID-19 , Humanos , Receptores de Antígenos de Linfócitos B/genética , RNA Viral , SARS-CoV-2/genética , Gravidade do PacienteRESUMO
Vaccination of SARS-CoV-2 with BNT162b2 or mRNA-1273 both have a low incidence of induction of myocarditis. Here we report on utilizing adaptive immune receptor repertoire sequencing (AIRR-Seq) as a way to assess the specificity of tissue infiltrating immune cells.
RESUMO
Objectives: To characterize the T cell receptor (TCRß) repertoire in peripheral blood and muscle tissues of treatment naïve patients with newly diagnosed idiopathic inflammatory myopathies (IIMs). Methods: High throughput RNA sequencing of the TCRß chain was performed in peripheral blood and muscle tissue in twenty newly-diagnosed treatment-naïve IIM patients (9 DM, 5 NM/OM, 5 IMNM and 1 ASyS) and healthy controls. Results thereof were correlated with markers of disease activity. Results: Muscle tissue of IIM patients shows more expansion of TCRß clones and decreased diversity when compared to peripheral blood of IIM as well as healthy controls (both p=0.0001). Several expanded TCRß clones in muscle are tissue restricted and cannot be retrieved in peripheral blood. These clones have significantly longer CDR3 regions when compared to clones (also) found in circulation (p=0.0002), while their CDR3 region is more hydrophobic (p<0.01). Network analysis shows that clonal TCRß signatures are shared between patients. Increased clonal expansion in muscle tissue is significantly correlated with increased CK levels (p=0.03), while it tends to correlate with decreased muscle strength (p=0.08). Conclusion: Network analysis of clones in muscle of IIM patients shows shared clusters of sequences across patients. Muscle-restricted CDR3 TCRß clones show specific structural features in their T cell receptor. Our results indicate that clonal TCRß expansion in muscle tissue might be associated with disease activity. Collectively, these findings support a role for specific clonal T cell responses in muscle tissue in the pathogenesis of the IIM subtypes studied.
Assuntos
Músculos , Miosite , Humanos , Células Clonais , Sequenciamento de Nucleotídeos em Larga Escala , Receptores de Antígenos de Linfócitos T/genéticaRESUMO
BACKGROUND: T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS: To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS: From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS: We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Receptores de Antígenos de Linfócitos T alfa-beta , Alelos , Células Germinativas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T alfa-beta/genéticaRESUMO
Adaptive immune receptor repertoires (AIRRs) are rich with information that can be mined for insights into the workings of the immune system. Gene usage, CDR3 properties, clonal lineage structure, and sequence diversity are all capable of revealing the dynamic immune response to perturbation by disease, vaccination, or other interventions. Here we focus on a conceptual introduction to the many aspects of repertoire analysis and orient the reader toward the uses and advantages of each. Along the way, we note some of the many software tools that have been developed for these investigations and link the ideas discussed to chapters on methods provided elsewhere in this volume.
Assuntos
Receptores Imunológicos , Software , Receptores Imunológicos/genéticaRESUMO
The development of high-throughput sequencing of adaptive immune receptor repertoires (AIRR-seq of IG and TR rearrangements) has provided a new frontier for in-depth analysis of the immune system. The last decade has witnessed an explosion in protocols, experimental methodologies, and computational tools. In this chapter, we discuss the major considerations in planning a successful AIRR-seq experiment together with basic strategies for controlling and evaluating the outcome of the experiment. Members of the AIRR Community have authored several chapters in this edition, which cover step-by-step instructions to successfully conduct, analyze, and share an AIRR-seq project.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Receptores Imunológicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Receptores Imunológicos/genéticaRESUMO
High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR) has revolutionized the ability to carry out large-scale experiments to study the adaptive immune response. Since the method was first introduced in 2009, AIRR sequencing (AIRR-Seq) has been applied to survey the immune state of individuals, identify antigen-specific or immune-state-associated signatures of immune responses, study the development of the antibody immune response, and guide the development of vaccines and antibody therapies. Recent advancements in the technology include sequencing at the single-cell level and in parallel with gene expression, which allows the introduction of multi-omics approaches to understand in detail the adaptive immune response. Analyzing AIRR-seq data can prove challenging even with high-quality sequencing, in part due to the many steps involved and the need to parameterize each step. In this chapter, we outline key factors to consider when preprocessing raw AIRR-Seq data and annotating the genetic origins of the rearranged receptors. We also highlight a number of common difficulties with common AIRR-seq data processing and provide strategies to address them.
Assuntos
Genes de Imunoglobulinas , Sequenciamento de Nucleotídeos em Larga Escala , Anticorpos/genética , Humanos , Anotação de Sequência Molecular , Receptores Imunológicos/genéticaRESUMO
AIRR-seq data sets are usually large and require specialized analysis methods and software tools. A typical Illumina MiSeq sequencing run generates 20-30 million 2 × 300 bp paired-end sequence reads, which roughly corresponds to 15 GB of sequence data to be processed. Other platforms like NextSeq, which is useful in projects where the full V gene is not needed, create about 400 million 2 × 150 bp paired-end reads. Because of the size of the data sets, the analysis can be computationally expensive, particularly the early analysis steps like preprocessing and gene annotation that process the majority of the sequence data. A standard desktop PC may take 3-5 days of constant processing for a single MiSeq run, so dedicated high-performance computational resources may be required.VDJServer provides free access to high-performance computing (HPC) at the Texas Advanced Computing Center (TACC) through a graphical user interface (Christley et al. Front Immunol 9:976, 2018). VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provides access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene assignment, repertoire characterization, and repertoire comparison. Furthermore, VDJServer has parallelized execution for tools such as IgBLAST, so more compute resources are utilized as the size of the input data grows. Analysis that takes days on a desktop PC might take only a few hours on VDJServer. VDJServer is a free, publicly available, and open-source licensed resource. Here, we describe the workflow for performing immune repertoire analysis on VDJServer's high-performance computing.
Assuntos
Metodologias Computacionais , Software , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de TrabalhoRESUMO
High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR ) has revolutionized the ability to study the adaptive immune response via large-scale experiments. Since 2009, AIRR sequencing (AIRR-seq) has been widely applied to survey the immune state of individuals (see "The AIRR Community Guide to Repertoire Analysis" chapter for details). One of the goals of the AIRR Community is to make the resulting AIRR-seq data FAIR (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al. Sci Data 3:1-9, 2016), with a primary goal of making it easy for the research community to reuse AIRR-seq data (Breden et al. Front Immunol 8:1418, 2017; Scott and Breden. Curr Opin Syst Biol 24:71-77, 2020). The basis for this is the MiAIRR data standard (Rubelt et al. Nat Immunol 18:1274-1278, 2017). For long-term preservation, it is recommended that researchers store their sequence read data in an INSDC repository. At the same time, the AIRR Community has established the AIRR Data Commons (Christley et al. Front Big Data 3:22, 2020), a distributed set of AIRR-compliant repositories that store the critically important annotated AIRR-seq data based on the MiAIRR standard, making the data findable, interoperable, and, because the data are annotated, more valuable in its reuse. Here, we build on the other AIRR Community chapters and illustrate how these principles and standards can be incorporated into AIRR-seq data analysis workflows. We discuss the importance of careful curation of metadata to ensure reproducibility and facilitate data sharing and reuse, and we illustrate how data can be shared via the AIRR Data Commons.
Assuntos
Disseminação de Informação , Projetos de Pesquisa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Disseminação de Informação/métodos , Reprodutibilidade dos Testes , Fluxo de TrabalhoRESUMO
The role of B cells in the tumor microenvironment (TME) has largely been under investigated, and data regarding the antibody repertoire encoded by B cells in the TME and the adjacent lymphoid organs are scarce. Here, we utilized B cell receptor high-throughput sequencing (BCR-Seq) to profile the antibody repertoire signature of tumor-infiltrating lymphocyte B cells (TIL-Bs) in comparison to B cells from three anatomic compartments in a mouse model of triple-negative breast cancer. We found that TIL-Bs exhibit distinct antibody repertoire measures, including high clonal polarization and elevated somatic hypermutation rates, suggesting a local antigen-driven B-cell response. Importantly, TIL-Bs were highly mutated but non-class switched, suggesting that class-switch recombination may be inhibited in the TME. Tracing the distribution of TIL-B clones across various compartments indicated that they migrate to and from the TME. The data thus suggests that antibody repertoire signatures can serve as indicators for identifying tumor-reactive B cells.
Assuntos
Diversidade de Anticorpos , Subpopulações de Linfócitos B/imunologia , Cadeias Pesadas de Imunoglobulinas/genética , Linfócitos do Interstício Tumoral/imunologia , Neoplasias Mamárias Experimentais/imunologia , Receptores de Antígenos de Linfócitos B/imunologia , Neoplasias de Mama Triplo Negativas/imunologia , Microambiente Tumoral/imunologia , Animais , Células Sanguíneas/imunologia , Medula Óssea/patologia , Linhagem Celular Tumoral/transplante , Movimento Celular , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imunoglobulina G/genética , Imunoglobulina G/imunologia , Imunoglobulina M/genética , Imunoglobulina M/imunologia , Região Variável de Imunoglobulina/genética , Linfonodos/patologia , Neoplasias Mamárias Experimentais/patologia , Camundongos , Camundongos Endogâmicos BALB C , Especificidade de Órgãos , Receptores de Antígenos de Linfócitos B/genética , Hipermutação Somática de Imunoglobulina , Neoplasias de Mama Triplo Negativas/patologiaRESUMO
In order to better understand how the immune system interacts with environmental triggers to produce organ-specific disease, we here address the hypothesis that B and plasma cells are free to migrate through the mucosal surfaces of the upper and lower respiratory tracts, and that their total antibody repertoire is modified in a common respiratory tract disease, in this case atopic asthma. Using Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) we have catalogued the antibody repertoires of B cell clones retrieved near contemporaneously from multiple sites in the upper and lower respiratory tract mucosa of adult volunteers with atopic asthma and non-atopic controls and traced their migration. We show that the lower and upper respiratory tracts are immunologically connected, with trafficking of B cells directionally biased from the upper to the lower respiratory tract and points of selection when migrating from the nasal mucosa and into the bronchial mucosa. The repertoires are characterized by both IgD-only B cells and others undergoing class switch recombination, with restriction of the antibody repertoire distinct in asthmatics compared with controls. We conclude that B cells and plasma cells migrate freely throughout the respiratory tract and exhibit distinct antibody repertoires in health and disease.
Assuntos
Antígenos/imunologia , Asma/imunologia , Linfócitos B/imunologia , Anticorpos/imunologia , Brônquios/imunologia , Movimento Celular/imunologia , Humanos , Imunoglobulina D/imunologia , Mucosa Nasal/imunologia , Plasmócitos/imunologiaRESUMO
Immunoglobulin genes are rarely considered as disease susceptibility genes despite their obvious and central contributions to immune function. This appears to be a consequence of historical views on antibody repertoire formation that no longer stand, and of difficulties that until recently surrounded the documentation of the suite of antibody genes in any individual. If these important genes are to be accessible to GWAS studies, allelic variation within the human population needs to be better documented, and a curated set of genomic variations associated with antibody genes needs to be formulated. Repertoire studies arising from the COVID-19 pandemic provide an opportunity to meet these needs, and may provide insights into the profound variability that is seen in outcomes to this infection.
RESUMO
The adaptive immune receptor repertoire (AIRR) contains information on an individuals' immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D), and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorithm which can detect alleles that differ by any number of SNPs from the nearest database allele, and can construct subject-specific genotypes with minimal prior information. TIgGER predictions are validated both computationally (using a leave-one-out strategy) and experimentally (using genomic sequencing), resulting in the addition of three new immunoglobulin heavy chain V (IGHV) gene alleles to the IMGT repertoire. Finally, we develop a Bayesian strategy to provide a confidence estimate associated with genotype calls. All together, these methods allow for much higher accuracy in germline allele assignment, an essential step in AIRR-seq studies.
Assuntos
Imunoglobulinas/genética , Algoritmos , Alelos , Teorema de Bayes , Genótipo , Humanos , Miastenia Gravis/imunologia , Análise de Sequência de DNARESUMO
Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.