RESUMO
Introduction: Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.
Assuntos
Genes de Imunoglobulinas , Imunoglobulinas , Humanos , Imunoglobulinas/genética , Alelos , Recombinação V(D)J/genética , Células GerminativasRESUMO
The formation of the B cell receptor (BCR) heavy chain variable region is derived from the germline V(D)J gene rearrangement according to the "12/23" rule and the "beyond 12/23" rule. The usage frequency of each V(D)J gene in the peripheral BCR repertoires is related to the initial recombination, self-tolerance selection, and the clonal proliferative response. However, their specific differences and possible mechanisms are still unknown. We analyzed in-frame and out-of-frame BCR-H repertoires from human samples with normal physiological and various pathological conditions by high-throughput sequencing. Our results showed that IGHJ gene frequency follows a similar pattern which is previously known, where IGHJ4 is used at high frequency (>40%), IGHJ6/IGHJ3/IGHJ5 is used at medium frequencies (10â¼20%), and IGH2/IGHJ1 is used at low frequency (<4%) under whether normal physiological or various pathological conditions. However, our analysis of the recombination signal sequences suggested that the conserved non-amer and heptamer and certain 23 bp spacer length may affect the initial IGHD-IGHJ recombination, which results in different frequencies of IGHJ genes among the initial BCR-H repertoire. Based on this "initial repertoire," we recommend that re-evaluation and further investigation are needed when analyzing the significance and mechanism of IGHJ gene frequency in self-tolerance selection and the clonal proliferative response.
RESUMO
The genomes of classical inbred mouse strains include genes derived from all three major subspecies of the house mouse, Mus musculus. We recently posited that genetic diversity in the immunoglobulin heavy chain (IGH) gene loci of C57BL/6 and BALB/c mice reflects differences in subspecies origin. To investigate this hypothesis, we conducted high-throughput sequencing of IGH gene rearrangements to document IGH variable (IGHV), joining (IGHJ) and diversity (IGHD) genes in four inbred wild-derived mouse strains (CAST/EiJ, LEWES/EiJ, MSM/MsJ and PWD/PhJ) and a single disease model strain (NOD/ShiLtJ), collectively representing genetic backgrounds of several major mouse subspecies. A total of 341 germline IGHV sequences were inferred in the wild-derived strains, including 247 not curated in the international ImMunoGeneTics information system. By contrast, 83/84 inferred NOD IGHV genes had previously been observed in C57BL/6 mice. Variability among the strains examined was observed for only a single IGHJ gene, involving a description of a novel allele. By contrast, unexpected variation was found in the IGHD gene loci, with four previously unreported IGHD gene sequences being documented. Very few IGHV sequences of C57BL/6 and BALB/c mice were shared with strains representing major subspecies, suggesting that their IGH loci may be complex mosaics of genes of disparate origins. This suggests a similar level of diversity is likely present in the IGH loci of other classical inbred strains. This must now be documented if we are to properly understand interstrain variation in models of antibody-mediated disease.
Assuntos
Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Animais , Sequência de Bases , Bases de Dados Genéticas , Células Germinativas/metabolismo , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos NODAssuntos
Regulação Neoplásica da Expressão Gênica , Cadeias Pesadas de Imunoglobulinas , Região Variável de Imunoglobulina , Linfoma de Células B , Proteínas de Neoplasias , Receptores de Antígenos de Linfócitos B , Fator Reumatoide , Afinidade de Anticorpos , Linhagem Celular Tumoral , Humanos , Cadeias Pesadas de Imunoglobulinas/biossíntese , Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/biossíntese , Região Variável de Imunoglobulina/genética , Linfoma de Células B/genética , Linfoma de Células B/metabolismo , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , Receptores de Antígenos de Linfócitos B/biossíntese , Receptores de Antígenos de Linfócitos B/genéticaRESUMO
The human and mouse antibody repertoires are formed by identical processes, but like all small animals, mice only have sufficient lymphocytes to express a small part of the potential antibody repertoire. In this study, we determined how the heavy chain repertoires of two mouse strains are generated. Analysis of IgM- and IgG-associated VDJ rearrangements generated by high-throughput sequencing confirmed the presence of 99 functional immunoglobulin heavy chain variable (IGHV) genes in the C57BL/6 genome, and inferred the presence of 164 IGHV genes in the BALB/c genome. Remarkably, only five IGHV sequences were common to both strains. Compared with humans, little N nucleotide addition was seen in the junctions of mouse VDJ genes. Germline human IgG-associated IGHV genes are rare, but many murine IgG-associated IGHV genes were unmutated. Together these results suggest that the expressed mouse repertoire is more germline-focused than the human repertoire. The apparently divergent germline repertoires of the mouse strains are discussed with reference to reports that inbred mouse strains carry blocks of genes derived from each of the three subspecies of the house mouse. We hypothesize that the germline genes of BALB/c and C57BL/6 mice may originally have evolved to generate distinct germline-focused antibody repertoires in the different mouse subspecies.