|

1.

Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data.

Peres, Ayelet; Klein, Vered; Frankel, Boaz; Lees, William; Polak, Pazit; Meehan, Mark; Rocha, Artur; Correia Lopes, João; Yaari, Gur.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38752856

Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.

Computational Biology , Software , Humans , Computational Biology/methods , Reproducibility of Results , Receptors, Immunologic/genetics , High-Throughput Nucleotide Sequencing/methods , Adaptive Immunity/genetics , Guidelines as Topic

2.

Digger: directed annotation of immunoglobulin and T cell receptor V, D, and J gene sequences and assemblies.

Lees, William D; Saha, Swati; Yaari, Gur; Watson, Corey T.

Bioinformatics ; 40(3)2024 Mar 04.

Article En | MEDLINE | ID: mdl-38478393

SUMMARY: Knowledge of immunoglobulin and T cell receptor encoding genes is derived from high-quality genomic sequencing. High-throughput sequencing is delivering large volumes of data, and precise, high-throughput approaches to annotation are needed. Digger is an automated tool that identifies coding and regulatory regions of these genes, with results comparable to those obtained by current expert curational methods. AVAILABILITY AND IMPLEMENTATION: Digger is published under open source license at https://github.com/williamdlees/Digger and is available as a Python package and a Docker container.

Receptors, Antigen, T-Cell , Software , Receptors, Antigen, T-Cell/genetics , Chromosome Mapping , Immunoglobulins/genetics , High-Throughput Nucleotide Sequencing/methods

3.

IGHV allele similarity clustering improves genotype inference from adaptive immune receptor repertoire sequencing data.

Peres, Ayelet; Lees, William D; Rodriguez, Oscar L; Lee, Noah Y; Polak, Pazit; Hope, Ronen; Kedmi, Meirav; Collins, Andrew M; Ohlin, Mats; Kleinstein, Steven H; Watson, Corey T; Yaari, Gur.

Nucleic Acids Res ; 51(16): e86, 2023 09 08.

Article En | MEDLINE | ID: mdl-37548401

In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).

Genomics , Immunoglobulin Heavy Chains , Receptors, Antigen, B-Cell , Alleles , Genotype , Receptors, Antigen, B-Cell/genetics , Immunoglobulin Heavy Chains/genetics

4.

AIRR community curation and standardised representation for immunoglobulin and T cell receptor germline sets.

Lees, William D; Christley, Scott; Peres, Ayelet; Kos, Justin T; Corrie, Brian; Ralph, Duncan; Breden, Felix; Cowell, Lindsay G; Yaari, Gur; Corcoran, Martin; Karlsson Hedestam, Gunilla B; Ohlin, Mats; Collins, Andrew M; Watson, Corey T; Busse, Christian E.

Immunoinformatics (Amst) ; 102023 Jun.

Article En | MEDLINE | ID: mdl-37388275

Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.

5.

AIRR-C IG Reference Sets: curated sets of immunoglobulin heavy and light chain germline genes.

Collins, Andrew M; Ohlin, Mats; Corcoran, Martin; Heather, James M; Ralph, Duncan; Law, Mansun; Martínez-Barnetche, Jesus; Ye, Jian; Richardson, Eve; Gibson, William S; Rodriguez, Oscar L; Peres, Ayelet; Yaari, Gur; Watson, Corey T; Lees, William D.

Front Immunol ; 14: 1330153, 2023.

Article En | MEDLINE | ID: mdl-38406579

Introduction: Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.

Genes, Immunoglobulin , Immunoglobulins , Humans , Immunoglobulins/genetics , Alleles , V(D)J Recombination/genetics , Germ Cells

6.

An entropic safety catch controls hepatitis C virus entry and antibody resistance.

Stejskal, Lenka; Kalemera, Mphatso D; Lewis, Charlotte B; Palor, Machaela; Walker, Lucas; Daviter, Tina; Lees, William D; Moss, David S; Kremyda-Vlachou, Myrto; Kozlakidis, Zisis; Gallo, Giulia; Bailey, Dalan; Rosenberg, William; Illingworth, Christopher J R; Shepherd, Adrian J; Grove, Joe.

Elife ; 112022 07 07.

Article En | MEDLINE | ID: mdl-35796426

E1 and E2 (E1E2), the fusion proteins of Hepatitis C Virus (HCV), are unlike that of any other virus yet described, and the detailed molecular mechanisms of HCV entry/fusion remain unknown. Hypervariable region-1 (HVR-1) of E2 is a putative intrinsically disordered protein tail. Here, we demonstrate that HVR-1 has an autoinhibitory function that suppresses the activity of E1E2 on free virions; this is dependent on its conformational entropy. Thus, HVR-1 is akin to a safety catch that prevents premature triggering of E1E2 activity. Crucially, this mechanism is turned off by host receptor interactions at the cell surface to allow entry. Mutations that reduce conformational entropy in HVR-1, or genetic deletion of HVR-1, turn off the safety catch to generate hyper-reactive HCV that exhibits enhanced virus entry but is thermally unstable and acutely sensitive to neutralising antibodies. Therefore, the HVR-1 safety catch controls the efficiency of virus entry and maintains resistance to neutralising antibodies. This discovery provides an explanation for the ability of HCV to persist in the face of continual immune assault and represents a novel regulatory mechanism that is likely to be found in other viral fusion machinery.

Hepacivirus , Hepatitis C , Antibodies, Neutralizing , Entropy , Hepacivirus/genetics , Hepacivirus/metabolism , Humans , Viral Envelope Proteins/metabolism , Virus Internalization

7.

A BALB/c IGHV Reference Set, Defined by Haplotype Analysis of Long-Read VDJ-C Sequences From F1 (BALB/c x C57BL/6) Mice.

Jackson, Katherine J L; Kos, Justin T; Lees, William; Gibson, William S; Smith, Melissa Laird; Peres, Ayelet; Yaari, Gur; Corcoran, Martin; Busse, Christian E; Ohlin, Mats; Watson, Corey T; Collins, Andrew M.

Front Immunol ; 13: 888555, 2022.

Article En | MEDLINE | ID: mdl-35720344

The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that current MGP assemblies are unsuitable for the comprehensive documentation of germline IGHVs and more efforts will be needed to establish strain-specific reference sets.

Immunoglobulin Heavy Chains , Immunoglobulin Variable Region , Animals , Haplotypes , Immunoglobulin Heavy Chains/genetics , Immunoglobulin Variable Region/genetics , Mice , Mice, Inbred BALB C , Mice, Inbred C57BL , Sequence Analysis, DNA

8.

Adaptive Immune Receptor Repertoire (AIRR) Community Guide to TR and IG Gene Annotation.

Babrak, Lmar; Marquez, Susanna; Busse, Christian E; Lees, William D; Miho, Enkelejda; Ohlin, Mats; Rosenfeld, Aaron M; Stervbo, Ulrik; Watson, Corey T; Schramm, Chaim A.

Methods Mol Biol ; 2453: 279-296, 2022.

Article En | MEDLINE | ID: mdl-35622332

High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR) has revolutionized the ability to carry out large-scale experiments to study the adaptive immune response. Since the method was first introduced in 2009, AIRR sequencing (AIRR-Seq) has been applied to survey the immune state of individuals, identify antigen-specific or immune-state-associated signatures of immune responses, study the development of the antibody immune response, and guide the development of vaccines and antibody therapies. Recent advancements in the technology include sequencing at the single-cell level and in parallel with gene expression, which allows the introduction of multi-omics approaches to understand in detail the adaptive immune response. Analyzing AIRR-seq data can prove challenging even with high-quality sequencing, in part due to the many steps involved and the need to parameterize each step. In this chapter, we outline key factors to consider when preprocessing raw AIRR-Seq data and annotating the genetic origins of the rearranged receptors. We also highlight a number of common difficulties with common AIRR-seq data processing and provide strategies to address them.

Genes, Immunoglobulin , High-Throughput Nucleotide Sequencing , Antibodies/genetics , Humans , Molecular Sequence Annotation , Receptors, Immunologic/genetics

9.

Adaptive Immune Receptor Repertoire (AIRR) Community Guide to Repertoire Analysis.

Marquez, Susanna; Babrak, Lmar; Greiff, Victor; Hoehn, Kenneth B; Lees, William D; Luning Prak, Eline T; Miho, Enkelejda; Rosenfeld, Aaron M; Schramm, Chaim A; Stervbo, Ulrik.

Methods Mol Biol ; 2453: 297-316, 2022.

Article En | MEDLINE | ID: mdl-35622333

Adaptive immune receptor repertoires (AIRRs) are rich with information that can be mined for insights into the workings of the immune system. Gene usage, CDR3 properties, clonal lineage structure, and sequence diversity are all capable of revealing the dynamic immune response to perturbation by disease, vaccination, or other interventions. Here we focus on a conceptual introduction to the many aspects of repertoire analysis and orient the reader toward the uses and advantages of each. Along the way, we note some of the many software tools that have been developed for these investigations and link the ideas discussed to chapters on methods provided elsewhere in this volume.

Receptors, Immunologic , Software , Receptors, Immunologic/genetics

10.

T cell receptor beta germline variability is revealed by inference from repertoire data.

Omer, Aviv; Peres, Ayelet; Rodriguez, Oscar L; Watson, Corey T; Lees, William; Polak, Pazit; Collins, Andrew M; Yaari, Gur.

Genome Med ; 14(1): 2, 2022 01 07.

Article En | MEDLINE | ID: mdl-34991709

BACKGROUND: T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS: To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS: From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS: We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.

High-Throughput Nucleotide Sequencing , Receptors, Antigen, T-Cell, alpha-beta , Alleles , Germ Cells , High-Throughput Nucleotide Sequencing/methods , Humans , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell, alpha-beta/genetics

11.

Commentary on Population matched (pm) germline allelic variants of immunoglobulin (IG) loci: relevance in infectious diseases and vaccination studies in human populations.

Collins, Andrew M; Peres, Ayelet; Corcoran, Martin M; Watson, Corey T; Yaari, Gur; Lees, William D; Ohlin, Mats.

Genes Immun ; 22(7-8): 335-338, 2021 12.

Article En | MEDLINE | ID: mdl-34667305

12.

Diversity in immunogenomics: the value and the challenge.

Peng, Kerui; Safonova, Yana; Shugay, Mikhail; Popejoy, Alice B; Rodriguez, Oscar L; Breden, Felix; Brodin, Petter; Burkhardt, Amanda M; Bustamante, Carlos; Cao-Lormeau, Van-Mai; Corcoran, Martin M; Duffy, Darragh; Fuentes-Guajardo, Macarena; Fujita, Ricardo; Greiff, Victor; Jönsson, Vanessa D; Liu, Xiao; Quintana-Murci, Lluis; Rossetti, Maura; Xie, Jianming; Yaari, Gur; Zhang, Wei; Abedalthagafi, Malak S; Adekoya, Khalid O; Ahmed, Rahaman A; Chang, Wei-Chiao; Gray, Clive; Nakamura, Yusuke; Lees, William D; Khatri, Purvesh; Alachkar, Houda; Scheepers, Cathrine; Watson, Corey T; Karlsson Hedestam, Gunilla B; Mangul, Serghei.

Nat Methods ; 18(6): 588-591, 2021 06.

Article En | MEDLINE | ID: mdl-34002093

Genomics , Immunogenetics , B-Lymphocytes/immunology , Databases, Genetic , Germ Cells , Humans , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, B-Cell/immunology , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell/immunology , T-Lymphocytes/immunology , Whole Genome Sequencing

13.

Coastal heritage, global climate change, public engagement, and citizen science.

Dawson, Tom; Hambly, Joanna; Kelley, Alice; Lees, William; Miller, Sarah.

Proc Natl Acad Sci U S A ; 117(15): 8280-8286, 2020 04 14.

Article En | MEDLINE | ID: mdl-32284415

Climate change is threatening an uncalculated number of archaeological sites globally, totaling perhaps hundreds of thousands of culturally and paleoenvironmentally significant resources. As with all archaeological sites, they provide evidence of humanity's past and help us understand our place in the present world. Coastal sites, clustered at the water's edge, are already experiencing some of the most dramatic damage due to anthropogenic climate change, and the situation is predicted to worsen in the future. In the face of catastrophic loss, organizations around the world are developing new ways of working with this threatened coastal resource. This paper uses three examples from Scotland, Florida, and Maine to highlight how new partnerships and citizen science approaches are building communities of practice to better manage threatened coastal heritage. It compares methods on either side of the Atlantic and highlights challenges and solutions. The approaches are applicable to the increasing number of heritage sites everywhere at risk from climate change; the study of coastal sites thus helps society prepare for climate change impacts to heritage worldwide.

Citizen Science , Climate Change , Archaeology , Conservation of Natural Resources , Florida , Humans , Maine , Scotland

14.

Flexibility and intrinsic disorder are conserved features of hepatitis C virus E2 glycoprotein.

Stejskal, Lenka; Lees, William D; Moss, David S; Palor, Machaela; Bingham, Richard J; Shepherd, Adrian J; Grove, Joe.

PLoS Comput Biol ; 16(2): e1007710, 2020 02.

Article En | MEDLINE | ID: mdl-32109245

The glycoproteins of hepatitis C virus, E1E2, are unlike any other viral fusion machinery yet described, and are the current focus of immunogen design in HCV vaccine development; thus, making E1E2 both scientifically and medically important. We used pre-existing, but fragmentary, structures to model a complete ectodomain of the major glycoprotein E2 from three strains of HCV. We then performed molecular dynamic simulations to explore the conformational landscape of E2, revealing a number of important features. Despite high sequence divergence, and subtle differences in the models, E2 from different strains behave similarly, possessing a stable core flanked by highly flexible regions, some of which perform essential functions such as receptor binding. Comparison with sequence data suggest that this consistent behaviour is conferred by a network of conserved residues that act as hinge and anchor points throughout E2. The variable regions (HVR-1, HVR-2 and VR-3) exhibit particularly high flexibility, and bioinformatic analysis suggests that HVR-1 is a putative intrinsically disordered protein region. Dynamic cross-correlation analyses demonstrate intramolecular communication and suggest that specific regions, such as HVR-1, can exert influence throughout E2. To support our computational approach we performed small-angle X-ray scattering with purified E2 ectodomain; this data was consistent with our MD experiments, suggesting a compact globular core with peripheral flexible regions. This work captures the dynamic behaviour of E2 and has direct relevance to the interaction of HCV with cell-surface receptors and neutralising antibodies.

Hepatitis C/virology , Viral Envelope Proteins/chemistry , Virus Internalization , Antibodies, Neutralizing/immunology , Antibodies, Viral/immunology , Computer Simulation , Epitopes/immunology , Glycosylation , HEK293 Cells , Humans , Molecular Dynamics Simulation , Protein Binding , Protein Domains , Scattering, Radiation , X-Rays

15.

Germline immunoglobulin genes: disease susceptibility genes hidden in plain sight?

Collins, Andrew M; Yaari, Gur; Shepherd, Adrian J; Lees, William; Watson, Corey T.

Curr Opin Syst Biol ; 24: 100-108, 2020 Dec.

Article En | MEDLINE | ID: mdl-37008538

Immunoglobulin genes are rarely considered as disease susceptibility genes despite their obvious and central contributions to immune function. This appears to be a consequence of historical views on antibody repertoire formation that no longer stand, and of difficulties that until recently surrounded the documentation of the suite of antibody genes in any individual. If these important genes are to be accessible to GWAS studies, allelic variation within the human population needs to be better documented, and a curated set of genomic variations associated with antibody genes needs to be formulated. Repertoire studies arising from the COVID-19 pandemic provide an opportunity to meet these needs, and may provide insights into the profound variability that is seen in outcomes to this infection.

16.

OGRDB: a reference database of inferred immune receptor genes.

Lees, William; Busse, Christian E; Corcoran, Martin; Ohlin, Mats; Scheepers, Cathrine; Matsen, Frederick A; Yaari, Gur; Watson, Corey T; Collins, Andrew; Shepherd, Adrian J.

Nucleic Acids Res ; 48(D1): D964-D970, 2020 01 08.

Article En | MEDLINE | ID: mdl-31566225

High-throughput sequencing of the adaptive immune receptor repertoire (AIRR-seq) is providing unprecedented insights into the immune response to disease and into the development of immune disorders. The accurate interpretation of AIRR-seq data depends on the existence of comprehensive germline gene reference sets. Current sets are known to be incomplete and unrepresentative of the degree of polymorphism and diversity in human and animal populations. A key issue is the complexity of the genomic regions in which they lie, which, because of the presence of multiple repeats, insertions and deletions, have not proved tractable with short-read whole genome sequencing. Recently, tools and methods for inferring such gene sequences from AIRR-seq datasets have become available, and a community approach has been developed for the expert review and publication of such inferences. Here, we present OGRDB, the Open Germline Receptor Database (https://ogrdb.airr-community.org), a public resource for the submission, review and publication of previously unknown receptor germline sequences together with supporting evidence.

Computational Biology/methods , Databases, Genetic , Genomics , Receptors, Immunologic/genetics , Genomics/methods , Humans , Software , Web Browser

17.

VDJbase: an adaptive immune receptor genotype and haplotype database.

Omer, Aviv; Shemesh, Or; Peres, Ayelet; Polak, Pazit; Shepherd, Adrian J; Watson, Corey T; Boyd, Scott D; Collins, Andrew M; Lees, William; Yaari, Gur.

Nucleic Acids Res ; 48(D1): D1051-D1056, 2020 01 08.

Article En | MEDLINE | ID: mdl-31602484

VDJbase is a publicly available database that offers easy searching of data describing the complete sets of gene sequences (genotypes and haplotypes) inferred from adaptive immune receptor repertoire sequencing datasets. VDJbase is designed to act as a resource that will allow the scientific community to explore the genetic variability of the immunoglobulin (Ig) and T cell receptor (TR) gene loci. It can also assist in the investigation of Ig- and TR-related genetic predispositions to diseases. Our database includes web-based query and online tools to assist in visualization and analysis of the genotype and haplotype data. It enables users to detect those alleles and genes that are significantly over-represented in a particular population, in terms of genotype, haplotype and gene expression. The database website can be freely accessed at https://www.vdjbase.org/, and no login is required. The data and code use creative common licenses and are freely downloadable from https://bitbucket.org/account/user/yaarilab/projects/GPHP.

Computational Biology/methods , Databases, Genetic , Genotype , Haplotypes , Receptors, Immunologic/genetics , V(D)J Recombination , Humans , Molecular Sequence Annotation , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, T-Cell/genetics , Software , Software Design , Web Browser , Workflow

18.

Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences.

Smakaj, Erand; Babrak, Lmar; Ohlin, Mats; Shugay, Mikhail; Briney, Bryan; Tosoni, Deniz; Galli, Christopher; Grobelsek, Vendi; D'Angelo, Igor; Olson, Branden; Reddy, Sai; Greiff, Victor; Trück, Johannes; Marquez, Susanna; Lees, William; Miho, Enkelejda.

Bioinformatics ; 36(6): 1731-1739, 2020 03 01.

Article En | MEDLINE | ID: mdl-31873728

SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Benchmarking , High-Throughput Nucleotide Sequencing , Antibodies , Humans , Reproducibility of Results

19.

sumrep: A Summary Statistic Framework for Immune Receptor Repertoire Comparison and Model Validation.

Olson, Branden J; Moghimi, Pejvak; Schramm, Chaim A; Obraztsova, Anna; Ralph, Duncan; Vander Heiden, Jason A; Shugay, Mikhail; Shepherd, Adrian J; Lees, William; Matsen, Frederick A.

Front Immunol ; 10: 2533, 2019.

Article En | MEDLINE | ID: mdl-31736960

The adaptive immune system generates an incredible diversity of antigen receptors for B and T cells to keep dangerous pathogens at bay. The DNA sequences coding for these receptors arise by a complex recombination process followed by a series of productivity-based filters, as well as affinity maturation for B cells, giving considerable diversity to the circulating pool of receptor sequences. Although these datasets hold considerable promise for medical and public health applications, the complex structure of the resulting adaptive immune receptor repertoire sequencing (AIRR-seq) datasets makes analysis difficult. In this paper we introduce sumrep, an R package that efficiently performs a wide variety of repertoire summaries and comparisons, and show how sumrep can be used to perform model validation. We find that summaries vary in their ability to differentiate between datasets, although many are able to distinguish between covariates such as donor, timepoint, and cell type for BCR and TCR repertoires. We show that deletion and insertion lengths resulting from V(D)J recombination tend to be more discriminative characterizations of a repertoire than summaries that describe the amino acid composition of the CDR3 region. We also find that state-of-the-art generative models excel at recapitulating gene usage and recombination statistics in a given experimental repertoire, but struggle to capture many physiochemical properties of real repertoires.

Models, Statistical , Receptors, Immunologic , Software , Data Interpretation, Statistical , Humans

20.

Inferred Allelic Variants of Immunoglobulin Receptor Genes: A System for Their Evaluation, Documentation, and Naming.

Ohlin, Mats; Scheepers, Cathrine; Corcoran, Martin; Lees, William D; Busse, Christian E; Bagnara, Davide; Thörnqvist, Linnea; Bürckert, Jean-Philippe; Jackson, Katherine J L; Ralph, Duncan; Schramm, Chaim A; Marthandan, Nishanth; Breden, Felix; Scott, Jamie; Matsen Iv, Frederick A; Greiff, Victor; Yaari, Gur; Kleinstein, Steven H; Christley, Scott; Sherkow, Jacob S; Kossida, Sofia; Lefranc, Marie-Paule; van Zelm, Menno C; Watson, Corey T; Collins, Andrew M.

Front Immunol ; 10: 435, 2019.

Article En | MEDLINE | ID: mdl-30936866

Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.

Alleles , Genes, Immunoglobulin , Genetic Variation/genetics , Terminology as Topic , V(D)J Recombination , Base Sequence , Databases, Genetic , Datasets as Topic , Gene Library , Germ-Line Mutation , High-Throughput Nucleotide Sequencing , Humans , Immunoglobulin Heavy Chains/genetics , Immunoglobulin Variable Region/genetics , Polymerase Chain Reaction/methods , Sequence Alignment , Sequence Homology, Nucleic Acid , VDJ Exons/genetics