RESUMO
The emergence of current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern (VOCs) and potential future spillovers of SARS-like coronaviruses into humans pose a major threat to human health and the global economy. Development of broadly effective coronavirus vaccines that can mitigate these threats is needed. Here, we utilized a targeted donor selection strategy to isolate a large panel of human broadly neutralizing antibodies (bnAbs) to sarbecoviruses. Many of these bnAbs are remarkably effective in neutralizing a diversity of sarbecoviruses and against most SARS-CoV-2 VOCs, including the Omicron variant. Neutralization breadth is achieved by bnAb binding to epitopes on a relatively conserved face of the receptor-binding domain (RBD). Consistent with targeting of conserved sites, select RBD bnAbs exhibited protective efficacy against diverse SARS-like coronaviruses in a prophylaxis challenge model in vivo. These bnAbs provide new opportunities and choices for next-generation antibody prophylactic and therapeutic applications and provide a molecular basis for effective design of pan-sarbecovirus vaccines.
Assuntos
COVID-19 , SARS-CoV-2 , Anticorpos Neutralizantes , Anticorpos Antivirais , Anticorpos Amplamente Neutralizantes , COVID-19/prevenção & controle , Humanos , Glicoproteína da Espícula de CoronavírusRESUMO
Pan-betacoronavirus neutralizing antibodies may hold the key to developing broadly protective vaccines against novel pandemic coronaviruses and to more effectively respond to SARS-CoV-2 variants. The emergence of Omicron and subvariants of SARS-CoV-2 illustrates the limitations of solely targeting the receptor-binding domain (RBD) of the spike (S) protein. Here, we isolated a large panel of broadly neutralizing antibodies (bnAbs) from SARS-CoV-2 recovered-vaccinated donors, which targets a conserved S2 region in the betacoronavirus spike fusion machinery. Select bnAbs showed broad in vivo protection against all three deadly betacoronaviruses, SARS-CoV-1, SARS-CoV-2, and MERS-CoV, which have spilled over into humans in the past two decades. Structural studies of these bnAbs delineated the molecular basis for their broad reactivity and revealed common antibody features targetable by broad vaccination strategies. These bnAbs provide new insights and opportunities for antibody-based interventions and for developing pan-betacoronavirus vaccines.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Anticorpos Amplamente Neutralizantes , Anticorpos Neutralizantes , Anticorpos AntiviraisRESUMO
The V(D)J recombination process rearranges the variable (V), diversity (D), and joining (J) genes in the immunoglobulin (IG) loci to generate antibody repertoires. Annotation of these loci across various species and predicting the V, D, and J genes (IG genes) are critical for studies of the adaptive immune system. However, because the standard gene finding algorithms are not suitable for predicting IG genes, they have been semimanually annotated in very few species. We developed the IGDetective algorithm for predicting IG genes and applied it to species with the assembled IG loci. IGDetective generated the first large collection of IG genes across many species and enabled their evolutionary analysis, including the analysis of the "bat IG diversity" hypothesis. This analysis revealed extremely conserved V genes in evolutionary distant species, indicating that these genes may be subjected to the same selective pressure, for example, pressure driven by common pathogens. IGDetective also revealed extremely diverged V genes and a new family of evolutionary conserved V genes in bats with unusual noncanonical cysteines. Moreover, unlike all other previously reported antibodies, these cysteines are located within complementarity-determining regions. Because cysteines form disulfide bonds, we hypothesize that these cysteine-rich V genes might generate antibodies with noncanonical conformations and could potentially form a unique part of the immune repertoire in bats. We also analyzed the diversity landscape of the recombination signal sequences and revealed their features that trigger the high/low usage of the IG genes.
Assuntos
Diversidade de Anticorpos , Recombinação V(D)J , Anticorpos , Regiões Determinantes de Complementaridade/genética , Genes de ImunoglobulinasRESUMO
An important challenge in vaccine development is to figure out why a vaccine succeeds in some individuals and fails in others. Although antibody repertoires hold the key to answering this question, there have been very few personalized immunogenomics studies so far aimed at revealing how variations in immunoglobulin genes affect a vaccine response. We conducted an immunosequencing study of 204 calves vaccinated against bovine respiratory disease (BRD) with the goal to reveal variations in immunoglobulin genes and somatic hypermutations that impact the efficacy of vaccine response. Our study represents the largest longitudinal personalized immunogenomics study reported to date across all species, including humans. To analyze the generated data set, we developed an algorithm for identifying variations of the immunoglobulin genes (as well as frequent somatic hypermutations) that affect various features of the antibody repertoire and titers of neutralizing antibodies. In contrast to relatively short human antibodies, cattle have a large fraction of ultralong antibodies that have opened new therapeutic opportunities. Our study reveals that ultralong antibodies are a key component of the immune response against the costliest disease of beef cattle in North America. The detected variants of the cattle immunoglobulin genes, which are implicated in the success/failure of the BRD vaccine, have the potential to direct the selection of individual cattle for ongoing breeding programs.
Assuntos
Doenças dos Bovinos , Vacinas , Animais , Anticorpos , Bovinos , Doenças dos Bovinos/prevenção & controle , América do Norte , Vacinas/genéticaRESUMO
Ab "ultralong" third H chain complementarity-determining regions (CDR H3) appear unique to bovine Abs and may enable binding to difficult epitopes that shorter CDR H3 regions cannot easily access. Diversity is concentrated in the "knob" domain of the CDR H3, which is encoded by the DH gene segment and sits atop a ß-ribbon "stalk" that protrudes far from the Ab surface. Knob region cysteine content is quite diverse in terms of total number of cysteines, sequence position, and disulfide bond pattern formation. We investigated the role of germline cysteines in production of a diverse CDR H3 structural repertoire. The relationship between DH polymorphisms and deletions relative to germline at the nucleotide level, as well as diversity in cysteine and disulfide bond content at the structural level, was ascertained. Structural diversity is formed through (1) DH polymorphisms with altered cysteine positions, (2) DH deletions, and (3) new cysteines that arise through somatic hypermutation that form new, unique disulfide bonds to alter the knob structure. Thus, a combination of mechanisms at both the germline and somatic immunogenetic levels results in diversity in knob region cysteine content, contributing to remarkable complexity in knob region disulfide patterns, loops, and Ag binding surface.
Assuntos
Cisteína , Células Germinativas , Animais , Bovinos , Cisteína/genética , Polimorfismo Genético , Regiões Determinantes de Complementaridade/genética , DissulfetosRESUMO
The V(DD)J recombination is currently viewed as an aberrant and inconsequential variant of the canonical V(D)J recombination. Moreover, since the classical 12/23 rule for the V(D)J recombination fails to explain the V(DD)J recombination, the molecular mechanism of tandem D-D fusions has remained unknown since they were discovered three decades ago. Revealing this mechanism is a biomedically important goal since tandem fusions contribute to broadly neutralizing antibodies with ultralong CDR3s. We reveal previously overlooked cryptic nonamers in the recombination signal sequences of human IGHD genes and demonstrate that these nonamers explain the vast majority of tandem fusions in human repertoires. We further reveal large clonal lineages formed by tandem fusions in antigen-stimulated immunosequencing data sets, suggesting that such data sets contain many more tandem fusions than previously thought and that about a quarter of large clonal lineages with unusually long CDR3s are generated through tandem fusions. Finally, we developed the SEARCH-D algorithm for identifying D genes in mammalian genomes and applied it to the recently completed Vertebrate Genomes Project assemblies, nearly doubling the number of mammalian species with known D genes. Our analysis revealed cryptic nonamers in RSSs of many mammalian genomes, thus demonstrating that the V(DD)J recombination is not a "bug" but an important feature preserved throughout mammalian evolution.
Assuntos
Regiões Determinantes de Complementaridade/genética , Recombinação V(D)J , Algoritmos , Animais , Antígenos , Genes de Cadeia Pesada de Imunoglobulina , Humanos , Mamíferos/genética , Sequências de Repetição em TandemRESUMO
Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes.
Assuntos
Biologia Computacional/métodos , Genes de Imunoglobulinas/genética , Imunoglobulina D/genética , Algoritmos , Animais , Bases de Dados Genéticas , Humanos , Imunidade/genéticaRESUMO
The problem of reconstructing a string from its error-prone copies, the trace reconstruction problem, was introduced by Vladimir Levenshtein two decades ago. While there has been considerable theoretical work on trace reconstruction, practical solutions have only recently started to emerge in the context of two rapidly developing research areas: immunogenomics and DNA data storage. In immunogenomics, traces correspond to mutated copies of genes, with mutations generated naturally by the adaptive immune system. In DNA data storage, traces correspond to noisy copies of DNA molecules that encode digital data, with errors being artifacts of the data retrieval process. In this paper, we introduce several new trace generation models and open questions relevant to trace reconstruction for immunogenomics and DNA data storage, survey theoretical results on trace reconstruction, and highlight their connections to computational biology. Throughout, we discuss the applicability and shortcomings of known solutions and suggest future research directions.
Assuntos
Genômica , Imunogenética , Linfócitos B/imunologia , Bases de Dados Genéticas , Células Germinativas , Humanos , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/imunologia , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/imunologia , Linfócitos T/imunologia , Sequenciamento Completo do GenomaRESUMO
Transforming error-prone immunosequencing datasets into Ab repertoires is a fundamental problem in immunogenomics, and a prerequisite for studies of immune responses. Although various repertoire reconstruction algorithms were released in the last 3 y, it remains unclear how to benchmark them and how to assess the accuracy of the reconstructed repertoires. We describe an accurate IgReC algorithm for constructing Ab repertoires from high-throughput immunosequencing datasets and a new framework for assessing the quality of reconstructed repertoires. Surprisingly, Ab repertoires constructed by IgReC from barcoded immunosequencing datasets in the blind mode (without using information about unique molecular identifiers) improved upon the repertoires constructed by the state-of-the-art tools that use barcoding. This finding suggests that IgReC may alleviate the need to generate repertoires using the barcoding technology (the workhorse of current immunogenomics efforts) because our computational approach to error correction of immunosequencing data is nearly as powerful as the experimental approach based on barcoding.
Assuntos
Algoritmos , Anticorpos/genética , Análise de Sequência de Proteína/métodos , Animais , HumanosRESUMO
Populations of different species vary in the amounts of genetic diversity they possess. Nucleotide diversity π, the fraction of nucleotides that are different between two randomly chosen genotypes, has been known to range in eukaryotes between 0.0001 in Lynx lynx and 0.16 in Caenorhabditis brenneri. Here, we report the results of a comparative analysis of 24 haploid genotypes (12 from the United States and 12 from European Russia) of a split-gill fungus Schizophyllum commune. The diversity at synonymous sites is 0.20 in the American population of S. commune and 0.13 in the Russian population. This exceptionally high level of nucleotide diversity also leads to extreme amino acid diversity of protein-coding genes. Using whole-genome resequencing of 2 parental and 17 offspring haploid genotypes, we estimate that the mutation rate in S. commune is high, at 2.0 × 10(-8) (95% CI: 1.1 × 10(-8) to 4.1 × 10(-8)) per nucleotide per generation. Therefore, the high diversity of S. commune is primarily determined by its elevated mutation rate, although high effective population size likely also plays a role. Small genome size, ease of cultivation and completion of the life cycle in the laboratory, free-living haploid life stages and exceptionally high variability of S. commune make it a promising model organism for population, quantitative, and evolutionary genetics.
Assuntos
Agaricales/genética , Variação Genética , Madeira/microbiologia , Nucleotídeos/genética , Polimorfismo GenéticoRESUMO
MOTIVATION: The recent introduction of next-generation sequencing technologies to antibody studies have resulted in a growing number of immunoinformatics tools for antibody repertoire analysis. However, benchmarking these newly emerging tools remains problematic since the gold standard datasets that are needed to validate these tools are typically not available. RESULTS: Since simulating antibody repertoires is often the only feasible way to benchmark new immunoinformatics tools, we developed the IgSimulator tool that addresses various complications in generating realistic antibody repertoires. IgSimulator's code has modular structure and can be easily adapted to new requirements to simulation. AVAILABILITY AND IMPLEMENTATION: IgSimulator is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from yana-safonova.github.io/ig_simulator. CONTACT: safonova.yana@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Anticorpos/genética , Biologia Computacional/métodos , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Bases de Dados Factuais , Genoma Humano , Humanos , SoftwareRESUMO
UNLABELLED: The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. AVAILABILITY AND IMPLEMENTATION: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. CONTACT: ppevzner@ucsd.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Anticorpos/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunoglobulinas/análise , Proteoma/análise , Análise de Sequência de DNA/métodos , Software , Bases de Dados Factuais , Humanos , Espectrometria de Massas/métodos , Fragmentos de Peptídeos/análiseRESUMO
New high-quality human genome assemblies derived from lymphoblastoid cell lines (LCLs) provide reference genomes and pangenomes for genomics studies. However, the characteristics of LCLs pose technical challenges to profiling immunoglobulin (IG) genes. IG loci in LCLs contain a mixture of germline and somatically recombined haplotypes, making them difficult to genotype or assemble accurately. To address these challenges, we introduce IGLoo, a software tool that implements novel methods for analyzing sequence data and genome assemblies derived from LCLs. IGLoo characterizes somatic V(D)J recombination events in the sequence data and identifies the breakpoints and missing IG genes in the LCL-based assemblies. Furthermore, IGLoo implements a novel reassembly framework to improve germline assembly quality by integrating information about somatic events and population structural variantions in the IG loci. We applied IGLoo to study the assemblies from the Human Pangenome Reference Consortium, providing new insights into the mechanisms, gene usage, and patterns of V(D)J recombination, causes of assembly fragmentation in the IG heavy chain (IGH) locus, and improved representation of the IGH assemblies.
RESUMO
Long-read sequencing technologies have revolutionized genome assembly producing near-complete chromosome assemblies for numerous organisms, which are invaluable to research in many fields. However, regions with complex repetitive structure continue to represent a challenge for genome assembly algorithms, particularly in areas with high heterozygosity. Robust and comprehensive solutions for the assessment of assembly accuracy and completeness in these regions do not exist. In this study we focus on the assembly of biomedically important antibody-encoding immunoglobulin (IG) loci, which are characterized by complex duplications and repeat structures. High-quality full-length assemblies for these loci are critical for resolving haplotype-level annotations of IG genes, without which, functional and evolutionary studies of antibody immunity across vertebrates are not tractable. To address these challenges, we developed a pipeline, "CloseRead", that generates multiple assembly verification metrics for analysis and visualization. These metrics expand upon those of existing quality assessment tools and specifically target complex and highly heterozygous regions. Using CloseRead, we systematically assessed the accuracy and completeness of IG loci in publicly available assemblies of 74 vertebrate species, identifying problematic regions. We also demonstrated that inspecting assembly graphs for problematic regions can both identify the root cause of assembly errors and illuminate solutions for improving erroneous assemblies. For a subset of species, we were able to correct assembly errors through targeted reassembly. Together, our analysis demonstrated the utility of assembly assessment in improving the completeness and accuracy of IG loci across species.
RESUMO
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
RESUMO
Variation in the antibody response has been linked to differential outcomes in disease, and suboptimal vaccine and therapeutic responsiveness, the determinants of which have not been fully elucidated. Countering models that presume antibodies are generated largely by stochastic processes, we demonstrate that polymorphisms within the immunoglobulin heavy chain locus (IGH) impact the naive and antigen-experienced antibody repertoire, indicating that genetics predisposes individuals to mount qualitatively and quantitatively different antibody responses. We pair recently developed long-read genomic sequencing methods with antibody repertoire profiling to comprehensively resolve IGH genetic variation, including novel structural variants, single nucleotide variants, and genes and alleles. We show that IGH germline variants determine the presence and frequency of antibody genes in the expressed repertoire, including those enriched in functional elements linked to V(D)J recombination, and overlapping disease-associated variants. These results illuminate the power of leveraging IGH genetics to better understand the regulation, function, and dynamics of the antibody response in disease.
Assuntos
Genes de Cadeia Pesada de Imunoglobulina , Genes de Imunoglobulinas , Humanos , Genes de Cadeia Pesada de Imunoglobulina/genética , Alelos , Mutação em Linhagem Germinativa , Cadeias Pesadas de Imunoglobulinas/genéticaRESUMO
Developing broad coronavirus vaccines requires identifying and understanding the molecular basis of broadly neutralizing antibody (bnAb) spike sites. In our previous work, we identified sarbecovirus spike RBD group 1 and 2 bnAbs. We have now shown that many of these bnAbs can still neutralize highly mutated SARS-CoV-2 variants, including the XBB.1.5. Structural studies revealed that group 1 bnAbs use recurrent germline-encoded CDRH3 features to interact with a conserved RBD region that overlaps with class 4 bnAb site. Group 2 bnAbs recognize a less well-characterized "site V" on the RBD and destabilize spike trimer. The site V has remained largely unchanged in SARS-CoV-2 variants and is highly conserved across diverse sarbecoviruses, making it a promising target for broad coronavirus vaccine development. Our findings suggest that targeted vaccine strategies may be needed to induce effective B cell responses to escape resistant subdominant spike RBD bnAb sites.
RESUMO
Affinity maturation (AM) of B cells through somatic hypermutations (SHMs) enables the immune system to evolve to recognize diverse pathogens. The accumulation of SHMs leads to the formation of clonal lineages of antibody-secreting b cells that have evolved from a common naïve B cell. Advances in high-throughput sequencing have enabled deep scans of B cell receptor repertoires, paving the way for reconstructing clonal trees. However, it is not clear if clonal trees, which capture microevolutionary time scales, can be reconstructed using traditional phylogenetic reconstruction methods with adequate accuracy. In fact, several clonal tree reconstruction methods have been developed to fix supposed shortcomings of phylogenetic methods. Nevertheless, no consensus has been reached regarding the relative accuracy of these methods, partially because evaluation is challenging. Benchmarking the performance of existing methods and developing better methods would both benefit from realistic models of clonal lineage evolution specifically designed for emulating B cell evolution. In this paper, we propose a model for modeling B cell clonal lineage evolution and use this model to benchmark several existing clonal tree reconstruction methods. Our model, designed to be extensible, has several features: by evolving the clonal tree and sequences simultaneously, it allows modeling selective pressure due to changes in affinity binding; it enables scalable simulations of large numbers of cells; it enables several rounds of infection by an evolving pathogen; and, it models building of memory. In addition, we also suggest a set of metrics for comparing clonal trees and measuring their properties. Our results show that while maximum likelihood phylogenetic reconstruction methods can fail to capture key features of clonal tree expansion if applied naively, a simple post-processing of their results, where short branches are contracted, leads to inferences that are better than alternative methods.