RESUMEN
SUMMARY: Knowledge of immunoglobulin and T cell receptor encoding genes is derived from high-quality genomic sequencing. High-throughput sequencing is delivering large volumes of data, and precise, high-throughput approaches to annotation are needed. Digger is an automated tool that identifies coding and regulatory regions of these genes, with results comparable to those obtained by current expert curational methods. AVAILABILITY AND IMPLEMENTATION: Digger is published under open source license at https://github.com/williamdlees/Digger and is available as a Python package and a Docker container.
Asunto(s)
Receptores de Antígenos de Linfocitos T , Programas Informáticos , Receptores de Antígenos de Linfocitos T/genética , Mapeo Cromosómico , Inmunoglobulinas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).
Asunto(s)
Genómica , Cadenas Pesadas de Inmunoglobulina , Receptores de Antígenos de Linfocitos B , Alelos , Genotipo , Receptores de Antígenos de Linfocitos B/genética , Cadenas Pesadas de Inmunoglobulina/genéticaRESUMEN
Immunoglobulins (IGs), critical components of the human immune system, are composed of heavy and light protein chains encoded at three genomic loci. The IG Kappa (IGK) chain locus consists of two large, inverted segmental duplications. The complexity of the IG loci has hindered use of standard high-throughput methods for characterizing genetic variation within these regions. To overcome these limitations, we use long-read sequencing to create haplotype-resolved IGK assemblies in an ancestrally diverse cohort (n = 36), representing the first comprehensive description of IGK haplotype variation. We identify extensive locus polymorphism, including novel single nucleotide variants (SNVs) and novel structural variants harboring functional IGKV genes. Among 47 functional IGKV genes, we identify 145 alleles, 67 of which were not previously curated. We report inter-population differences in allele frequencies for 10 IGKV genes, including alleles unique to specific populations within this dataset. We identify haplotypes carrying signatures of gene conversion that associate with SNV enrichment in the IGK distal region, and a haplotype with an inversion spanning the proximal and distal regions. These data provide a critical resource of curated genomic reference information from diverse ancestries, laying a foundation for advancing our understanding of population-level genetic variation in the IGK locus.
Asunto(s)
Haplotipos , Cadenas kappa de Inmunoglobulina , Polimorfismo de Nucleótido Simple , Humanos , Cadenas kappa de Inmunoglobulina/genética , Frecuencia de los Genes , AlelosRESUMEN
The glycoproteins of hepatitis C virus, E1E2, are unlike any other viral fusion machinery yet described, and are the current focus of immunogen design in HCV vaccine development; thus, making E1E2 both scientifically and medically important. We used pre-existing, but fragmentary, structures to model a complete ectodomain of the major glycoprotein E2 from three strains of HCV. We then performed molecular dynamic simulations to explore the conformational landscape of E2, revealing a number of important features. Despite high sequence divergence, and subtle differences in the models, E2 from different strains behave similarly, possessing a stable core flanked by highly flexible regions, some of which perform essential functions such as receptor binding. Comparison with sequence data suggest that this consistent behaviour is conferred by a network of conserved residues that act as hinge and anchor points throughout E2. The variable regions (HVR-1, HVR-2 and VR-3) exhibit particularly high flexibility, and bioinformatic analysis suggests that HVR-1 is a putative intrinsically disordered protein region. Dynamic cross-correlation analyses demonstrate intramolecular communication and suggest that specific regions, such as HVR-1, can exert influence throughout E2. To support our computational approach we performed small-angle X-ray scattering with purified E2 ectodomain; this data was consistent with our MD experiments, suggesting a compact globular core with peripheral flexible regions. This work captures the dynamic behaviour of E2 and has direct relevance to the interaction of HCV with cell-surface receptors and neutralising antibodies.
Asunto(s)
Hepatitis C/virología , Proteínas del Envoltorio Viral/química , Internalización del Virus , Anticuerpos Neutralizantes/inmunología , Anticuerpos Antivirales/inmunología , Simulación por Computador , Epítopos/inmunología , Glicosilación , Células HEK293 , Humanos , Simulación de Dinámica Molecular , Unión Proteica , Dominios Proteicos , Dispersión de Radiación , Rayos XAsunto(s)
Genómica , Inmunogenética , Linfocitos B/inmunología , Bases de Datos Genéticas , Células Germinativas , Humanos , Receptores de Antígenos de Linfocitos B/genética , Receptores de Antígenos de Linfocitos B/inmunología , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T/inmunología , Linfocitos T/inmunología , Secuenciación Completa del GenomaRESUMEN
The extent of the role of N-linked glycans (N-glycans) in shielding influenza A hemagglutinin (HA) against host antibodies has proved controversial, with different authors making widely different assumptions. One common assumption is that N-glycans physically shield surface residues that are near to glycosylation sites, thereby preventing antibodies from binding to them. However, it is unclear, from existing experimental evidence, whether antibodies that bind close to N-glycans are a rare or commonplace feature of human herd immune responses to influenza AHA. The aim of this paper is to present a computational analysis of mutations in the vicinity of N-glycans that will facilitate a better understanding of their protective role. We identify, from an analysis of over 6000 influenza A H3N2 sequences, a set of residues adjacent to N-glycosylation sites that are highly likely to be involved in antigenic escape from host antibodies. Fifteen of these residues occur within 10 Å of an N-glycosylation site. Hence, we conclude that it is relatively common for antibodies to bind in close proximity to N-glycans on the surface ofHA, with any shielding effect largely attributable to the inability of host antibodies to bind across an N-glycan attachment site, rather than to the physical masking of neighboring residues.
Asunto(s)
Anticuerpos Antivirales/química , Glicoproteínas Hemaglutininas del Virus de la Influenza/química , Subtipo H3N2 del Virus de la Influenza A/química , Mutación , Polisacáridos/química , Secuencias de Aminoácidos , Sustitución de Aminoácidos , Sitios de Unión , Conformación de Carbohidratos , Glicosilación , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Humanos , Subtipo H3N2 del Virus de la Influenza A/genética , Modelos Moleculares , Datos de Secuencia Molecular , Unión ProteicaRESUMEN
Recently, a number of broad-spectrum human antibodies binding to the stalk region of influenza A haemagglutinin (HA) have been isolated. As this region tends to develop substitutions at a slower rate than other regions of HA, a vaccine eliciting such antibodies could have a longer effective life. But this begs a question: is the stalk resistant to change even in the face of evolutionary pressure? In this paper, we analysed the known epitopes in the H3 stalk and, utilizing a collection of 3440 sequences, present a novel approach for detecting putative B-cell epitopes in regions such as this, in which mutations occur infrequently. We concluded that there have been periods of activity in the stalk that are consistent with the evolution of antigenic escape. This work casts light on the presence of stalk-binding antibodies in the population as a whole and, through the analysis of antigenically active regions in the stalk, may contribute to the identification of epitopes that are refractive to change and hence useful for vaccine development.
Asunto(s)
Variación Genética , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Glicoproteínas Hemaglutininas del Virus de la Influenza/inmunología , Virus de la Influenza A/genética , Virus de la Influenza A/inmunología , Vacunas contra la Influenza/inmunología , Gripe Humana/virología , Anticuerpos Antivirales/inmunología , Biología Computacional , Epítopos de Linfocito B/genética , Epítopos de Linfocito B/inmunología , Evolución Molecular , Humanos , Gripe Humana/prevención & control , Análisis de Secuencia de ADNRESUMEN
Introduction: Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.
Asunto(s)
Genes de Inmunoglobulinas , Inmunoglobulinas , Humanos , Inmunoglobulinas/genética , Alelos , Recombinación V(D)J/genética , Células GerminativasRESUMEN
Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.
RESUMEN
In this paper we undertake an analysis of the antigenicity of influenza A virus hemagglutinin. We developed a novel computational approach to the identification of antigenically active regions and showed that the amino acid substitutions between successive predominant seasonal strains form clusters that are consistent, in terms of both their location and their size, with the properties of B-cell epitopes in general and with those epitopes that have been identified experimentally in influenza A virus hemagglutinin to date. Such an interpretation provides a biologically plausible framework for an understanding of the location of antigenically important substitutions that is more specific than the canonical "antigenic site" model and provides an effective basis for deriving models that predict antigenic escape in the H3N2 subtype. Our results support recent indications that antibodies binding to the "stalk" region of hemagglutinin are found in the human population and exert evolutionary pressure on the virus. Our computational approach provides a possible method for identifying antigenic escape through evolution in this region, which in some cases will not be identified by the hemagglutinin inhibition assay.
Asunto(s)
Antígenos Virales/inmunología , Epítopos de Linfocito B/inmunología , Glicoproteínas Hemaglutininas del Virus de la Influenza/inmunología , Subtipo H1N1 del Virus de la Influenza A/inmunología , Subtipo H3N2 del Virus de la Influenza A/inmunología , Sustitución de Aminoácidos/genética , Antígenos Virales/genética , Biología Computacional/métodos , Epítopos de Linfocito B/genética , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Humanos , Evasión Inmune , Subtipo H1N1 del Virus de la Influenza A/genética , Subtipo H3N2 del Virus de la Influenza A/genética , Modelos Moleculares , Mutación MissenseRESUMEN
High-throughput sequencing of adaptive immune receptor repertoires (AIRR, i.e., IG and TR) has revolutionized the ability to carry out large-scale experiments to study the adaptive immune response. Since the method was first introduced in 2009, AIRR sequencing (AIRR-Seq) has been applied to survey the immune state of individuals, identify antigen-specific or immune-state-associated signatures of immune responses, study the development of the antibody immune response, and guide the development of vaccines and antibody therapies. Recent advancements in the technology include sequencing at the single-cell level and in parallel with gene expression, which allows the introduction of multi-omics approaches to understand in detail the adaptive immune response. Analyzing AIRR-seq data can prove challenging even with high-quality sequencing, in part due to the many steps involved and the need to parameterize each step. In this chapter, we outline key factors to consider when preprocessing raw AIRR-Seq data and annotating the genetic origins of the rearranged receptors. We also highlight a number of common difficulties with common AIRR-seq data processing and provide strategies to address them.
Asunto(s)
Genes de Inmunoglobulinas , Secuenciación de Nucleótidos de Alto Rendimiento , Anticuerpos/genética , Humanos , Anotación de Secuencia Molecular , Receptores Inmunológicos/genéticaRESUMEN
Adaptive immune receptor repertoires (AIRRs) are rich with information that can be mined for insights into the workings of the immune system. Gene usage, CDR3 properties, clonal lineage structure, and sequence diversity are all capable of revealing the dynamic immune response to perturbation by disease, vaccination, or other interventions. Here we focus on a conceptual introduction to the many aspects of repertoire analysis and orient the reader toward the uses and advantages of each. Along the way, we note some of the many software tools that have been developed for these investigations and link the ideas discussed to chapters on methods provided elsewhere in this volume.
Asunto(s)
Receptores Inmunológicos , Programas Informáticos , Receptores Inmunológicos/genéticaRESUMEN
E1 and E2 (E1E2), the fusion proteins of Hepatitis C Virus (HCV), are unlike that of any other virus yet described, and the detailed molecular mechanisms of HCV entry/fusion remain unknown. Hypervariable region-1 (HVR-1) of E2 is a putative intrinsically disordered protein tail. Here, we demonstrate that HVR-1 has an autoinhibitory function that suppresses the activity of E1E2 on free virions; this is dependent on its conformational entropy. Thus, HVR-1 is akin to a safety catch that prevents premature triggering of E1E2 activity. Crucially, this mechanism is turned off by host receptor interactions at the cell surface to allow entry. Mutations that reduce conformational entropy in HVR-1, or genetic deletion of HVR-1, turn off the safety catch to generate hyper-reactive HCV that exhibits enhanced virus entry but is thermally unstable and acutely sensitive to neutralising antibodies. Therefore, the HVR-1 safety catch controls the efficiency of virus entry and maintains resistance to neutralising antibodies. This discovery provides an explanation for the ability of HCV to persist in the face of continual immune assault and represents a novel regulatory mechanism that is likely to be found in other viral fusion machinery.
Asunto(s)
Hepacivirus , Hepatitis C , Anticuerpos Neutralizantes , Entropía , Hepacivirus/genética , Hepacivirus/metabolismo , Humanos , Proteínas del Envoltorio Viral/metabolismo , Internalización del VirusRESUMEN
MOTIVATION: Modelling antigenic shift in influenza A H3N2 can help to predict the efficiency of vaccines. The virus is known to exhibit sudden jumps in antigenic distance, and prediction of such novel strains from amino acid sequence differences remains a challenge. RESULTS: From analysis of 6624 amino acid sequences of wild-type H3, we propose updates to the frequently referenced list of 131 amino acids located at or near the five identified antibody binding regions in haemagglutinin (HA). We introduce a class of predictive models based on the analysis of amino acid changes in these binding regions, and extend the principle to changes in HA1 as a whole by dividing the molecule into regional bands. Our results show that a range of simple models based on banded changes give better predictive performance than models based on the established five canonical regions and can identify a higher proportion of vaccine escape candidates among novel strains than a current state-of-the-art model.
Asunto(s)
Variación Antigénica/genética , Biología Computacional/métodos , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Subtipo H3N2 del Virus de la Influenza A/genética , Subtipo H3N2 del Virus de la Influenza A/inmunología , Secuencia de Aminoácidos , Sitios de Unión de Anticuerpos , Humanos , Gripe Humana/inmunología , Gripe Humana/virología , Modelos Moleculares , Conformación ProteicaRESUMEN
Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.
Asunto(s)
Alelos , Genes de Inmunoglobulinas , Variación Genética/genética , Terminología como Asunto , Recombinación V(D)J , Secuencia de Bases , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Biblioteca de Genes , Mutación de Línea Germinal , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Cadenas Pesadas de Inmunoglobulina/genética , Región Variable de Inmunoglobulina/genética , Reacción en Cadena de la Polimerasa/métodos , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico , Exones VDJ/genéticaRESUMEN
Next-generation sequencing is making it possible to study the antibody repertoire of an organism in unprecedented detail, and, by so doing, to characterize its behavior in the response to infection and in pathological conditions such as autoimmunity and cancer. The polymorphic nature of the repertoire poses unique challenges that rule out the use of many commonly used NGS methods and require tradeoffs to be made when considering experimental design.We outline the main contexts in which antibody repertoire analysis has been used, and summarize the key tools that are available. The humoral immune response to vaccination has been a particular focus of repertoire analyses, and we review the key conclusions and methods used in these studies.
Asunto(s)
Anticuerpos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Humanos , Inmunidad Humoral/genéticaRESUMEN
In studying the binding of host antibodies to the surface antigens of pathogens, the structural and functional characterization of antibody-antigen complexes by X-ray crystallography and binding assay is important. However, the characterization requires experiments that are typically time consuming and expensive: thus, many antibody-antigen complexes are under-characterized. For vaccine development and disease surveillance, it is often vital to assess the impact of amino acid substitutions on antibody binding. For example, are there antibody substitutions capable of improving binding without a loss of breadth, or antigen substitutions that lead to antigenic escape? The questions cannot be answered reliably from sequence variation alone, exhaustive substitution assays are usually impractical, and alanine scans provide at best an incomplete identification of the critical residue-residue interactions. Here, we show that, given an initial structure of an antibody bound to an antigen, molecular dynamics simulations using the energy method molecular mechanics with Generalized Born surface area (MM/GBSA) can model the impact of single amino acid substitutions on antibody-antigen binding energy. We apply the technique to three broad-spectrum antibodies to influenza A hemagglutinin and examine both previously characterized and novel variant strains observed in the human population that may give rise to antigenic escape. We find that in some cases the impact of a substitution is local, while in others it causes a reorientation of the antibody with wide-ranging impact on residue-residue interactions: this explains, in part, why the change in chemical properties of a residue can be, on its own, a poor predictor of overall change in binding energy. Our estimates are in good agreement with experimental results-indeed, they approximate the degree of agreement between different experimental techniques. Simulations were performed on commodity computer hardware; hence, this approach has the potential to be widely adopted by those undertaking infectious disease research. Novel aspects of this research include the application of MM/GBSA to investigate binding between broadly binding antibodies and a viral glycoprotein; the development of an approach for visualizing substrate-ligand interactions; and the use of experimental assay data to rescale our predictions, allowing us to make inferences about absolute, as well as relative, changes in binding energy.
RESUMEN
There are at present few tools available to assist with the determination and analysis of B-cell lineage trees from next-generation sequencing data. Here we present two utilities that support automated large-scale analysis and the creation of publication-quality results. The tools are available on the web and are also available for download so that they can be integrated into an automated pipeline. Critically, and in contrast to previously published tools, these utilities can be used with any suitable phylogenetic inference method and with any antibody germline library and hence are species-independent.