Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
1.
Genes Immun ; 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38844673

RESUMEN

Immunoglobulins (IGs), critical components of the human immune system, are composed of heavy and light protein chains encoded at three genomic loci. The IG Kappa (IGK) chain locus consists of two large, inverted segmental duplications. The complexity of the IG loci has hindered use of standard high-throughput methods for characterizing genetic variation within these regions. To overcome these limitations, we use long-read sequencing to create haplotype-resolved IGK assemblies in an ancestrally diverse cohort (n = 36), representing the first comprehensive description of IGK haplotype variation. We identify extensive locus polymorphism, including novel single nucleotide variants (SNVs) and novel structural variants harboring functional IGKV genes. Among 47 functional IGKV genes, we identify 145 alleles, 67 of which were not previously curated. We report inter-population differences in allele frequencies for 10 IGKV genes, including alleles unique to specific populations within this dataset. We identify haplotypes carrying signatures of gene conversion that associate with SNV enrichment in the IGK distal region, and a haplotype with an inversion spanning the proximal and distal regions. These data provide a critical resource of curated genomic reference information from diverse ancestries, laying a foundation for advancing our understanding of population-level genetic variation in the IGK locus.

2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38752856

RESUMEN

Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.


Asunto(s)
Biología Computacional , Programas Informáticos , Humanos , Biología Computacional/métodos , Reproducibilidad de los Resultados , Receptores Inmunológicos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Inmunidad Adaptativa/genética , Guías como Asunto
3.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38478393

RESUMEN

SUMMARY: Knowledge of immunoglobulin and T cell receptor encoding genes is derived from high-quality genomic sequencing. High-throughput sequencing is delivering large volumes of data, and precise, high-throughput approaches to annotation are needed. Digger is an automated tool that identifies coding and regulatory regions of these genes, with results comparable to those obtained by current expert curational methods. AVAILABILITY AND IMPLEMENTATION: Digger is published under open source license at https://github.com/williamdlees/Digger and is available as a Python package and a Docker container.


Asunto(s)
Receptores de Antígenos de Linfocitos T , Programas Informáticos , Receptores de Antígenos de Linfocitos T/genética , Mapeo Cromosómico , Inmunoglobulinas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
4.
bioRxiv ; 2024 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-38293151

RESUMEN

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).

5.
Cell Rep ; 42(8): 112879, 2023 08 29.
Artículo en Inglés | MEDLINE | ID: mdl-37537844

RESUMEN

Neuroblastoma is a lethal childhood solid tumor of developing peripheral nerves. Two percent of children with neuroblastoma develop opsoclonus myoclonus ataxia syndrome (OMAS), a paraneoplastic disease characterized by cerebellar and brainstem-directed autoimmunity but typically with outstanding cancer-related outcomes. We compared tumor transcriptomes and tumor-infiltrating T and B cell repertoires from 38 OMAS subjects with neuroblastoma to 26 non-OMAS-associated neuroblastomas. We found greater B and T cell infiltration in OMAS-associated tumors compared to controls and showed that both were polyclonal expansions. Tertiary lymphoid structures (TLSs) were enriched in OMAS-associated tumors. We identified significant enrichment of the major histocompatibility complex (MHC) class II allele HLA-DOB∗01:01 in OMAS patients. OMAS severity scores were associated with the expression of several candidate autoimmune genes. We propose a model in which polyclonal auto-reactive B lymphocytes act as antigen-presenting cells and drive TLS formation, thereby supporting both sustained polyclonal T cell-mediated anti-tumor immunity and paraneoplastic OMAS neuropathology.


Asunto(s)
Neuroblastoma , Síndrome de Opsoclonía-Mioclonía , Niño , Humanos , Autoinmunidad , Neuroblastoma/complicaciones , Neuroblastoma/metabolismo , Síndrome de Opsoclonía-Mioclonía/complicaciones , Síndrome de Opsoclonía-Mioclonía/patología , Autoanticuerpos , Genes MHC Clase II , Ataxia
6.
Nucleic Acids Res ; 51(16): e86, 2023 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-37548401

RESUMEN

In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).


Asunto(s)
Genómica , Cadenas Pesadas de Inmunoglobulina , Receptores de Antígenos de Linfocitos B , Alelos , Genotipo , Receptores de Antígenos de Linfocitos B/genética , Cadenas Pesadas de Inmunoglobulina/genética
7.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37417959

RESUMEN

MOTIVATION: T-cell receptor beta chain (TCRB) repertoires are crucial for understanding immune responses. However, their high diversity and complexity present significant challenges in representation and analysis. The main motivation of this study is to develop a unified and compact representation of a TCRB repertoire that can efficiently capture its inherent complexity and diversity and allow for direct inference. RESULTS: We introduce a novel approach to TCRB repertoire encoding and analysis, leveraging the Lempel-Ziv 76 algorithm. This approach allows us to create a graph-like model, identify-specific sequence features, and produce a new encoding approach for an individual's repertoire. The proposed representation enables various applications, including generation probability inference, informative feature vector derivation, sequence generation, a new measure for diversity estimation, and a new sequence centrality measure. The approach was applied to four large-scale public TCRB sequencing datasets, demonstrating its potential for a wide range of applications in big biological sequencing data. AVAILABILITY AND IMPLEMENTATION: Python package for implementation is available https://github.com/MuteJester/LZGraphs.


Asunto(s)
Compresión de Datos , Receptores de Antígenos de Linfocitos T alfa-beta , Receptores de Antígenos de Linfocitos T alfa-beta/genética , Algoritmos , Receptores de Antígenos de Linfocitos T/genética
8.
Artículo en Inglés | MEDLINE | ID: mdl-37388275

RESUMEN

Analysis of an individual's immunoglobulin or T cell receptor gene repertoire can provide important insights into immune function. High-quality analysis of adaptive immune receptor repertoire sequencing data depends upon accurate and relatively complete germline sets, but current sets are known to be incomplete. Established processes for the review and systematic naming of receptor germline genes and alleles require specific evidence and data types, but the discovery landscape is rapidly changing. To exploit the potential of emerging data, and to provide the field with improved state-of-the-art germline sets, an intermediate approach is needed that will allow the rapid publication of consolidated sets derived from these emerging sources. These sets must use a consistent naming scheme and allow refinement and consolidation into genes as new information emerges. Name changes should be minimised, but, where changes occur, the naming history of a sequence must be traceable. Here we outline the current issues and opportunities for the curation of germline IG/TR genes and present a forward-looking data model for building out more robust germline sets that can dovetail with current established processes. We describe interoperability standards for germline sets, and an approach to transparency based on principles of findability, accessibility, interoperability, and reusability.

9.
Front Immunol ; 14: 1031914, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37153628

RESUMEN

Introduction: The success of the human body in fighting SARS-CoV2 infection relies on lymphocytes and their antigen receptors. Identifying and characterizing clinically relevant receptors is of utmost importance. Methods: We report here the application of a machine learning approach, utilizing B cell receptor repertoire sequencing data from severely and mildly infected individuals with SARS-CoV2 compared with uninfected controls. Results: In contrast to previous studies, our approach successfully stratifies non-infected from infected individuals, as well as disease level of severity. The features that drive this classification are based on somatic hypermutation patterns, and point to alterations in the somatic hypermutation process in COVID-19 patients. Discussion: These features may be used to build and adapt therapeutic strategies to COVID-19, in particular to quantitatively assess potential diagnostic and therapeutic antibodies. These results constitute a proof of concept for future epidemiological challenges.


Asunto(s)
Linfocitos B , COVID-19 , Humanos , Receptores de Antígenos de Linfocitos B/genética , ARN Viral , SARS-CoV-2/genética , Gravedad del Paciente
10.
J Immunol ; 210(10): 1607-1619, 2023 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-37027017

RESUMEN

Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5' RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5' RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.


Asunto(s)
Regiones Determinantes de Complementariedad , Humanos , Regiones Determinantes de Complementariedad/genética , Secuencia de Bases
11.
Cell Mol Gastroenterol Hepatol ; 16(1): 63-81, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36965814

RESUMEN

BACKGROUND & AIMS: Hepatocellular carcinoma (HCC) is a model of a diverse spectrum of cancers because it is induced by well-known etiologies, mainly hepatitis C virus (HCV) and hepatitis B virus. Here, we aimed to identify HCV-specific mutational signatures and explored the link between the HCV-related regional variation in mutations rates and HCV-induced alterations in genome-wide chromatin organization. METHODS: To identify an HCV-specific mutational signature in HCC, we performed high-resolution targeted sequencing to detect passenger mutations on 64 HCC samples from 3 etiology groups: hepatitis B virus, HCV, or other. To explore the link between the genomic signature and genome-wide chromatin organization we performed chromatin immunoprecipitation sequencing for the transcriptionally permissive H3K4Me3, H3K9Ac, and suppressive H3K9Me3 modifications after HCV infection. RESULTS: Regional variation in mutation rate analysis showed significant etiology-dependent regional mutation rates in 12 genes: LRP2, KRT84, TMEM132B, DOCK2, DMD, INADL, JAK2, DNAH6, MTMR9, ATM, SLX4, and ARSD. We found an enrichment of C->T transversion mutations in the HCV-associated HCC cases. Furthermore, these cases showed regional variation in mutation rates associated with genomic intervals in which HCV infection dictated epigenetic alterations. This signature may be related to the HCV-induced decreased expression of genes encoding key enzymes in the base excision repair pathway. CONCLUSIONS: We identified novel distinct HCV etiology-dependent mutation signatures in HCC associated with HCV-induced alterations in histone modification. This study presents a link between cancer-causing mutagenesis and the increased predisposition to liver cancer in chronic HCV-infected individuals, and unveils novel etiology-specific mechanisms leading to HCC and cancer in general.


Asunto(s)
Carcinoma Hepatocelular , Hepatitis C , Neoplasias Hepáticas , Humanos , Neoplasias Hepáticas/patología , Carcinoma Hepatocelular/patología , Hepatitis C/complicaciones , Hepatitis C/genética , Mutación/genética , Hepacivirus/genética , Virus de la Hepatitis B/genética , Epigénesis Genética/genética , Cromatina , Genómica , Proteínas Tirosina Fosfatasas no Receptoras/genética , Queratinas Tipo II/genética , Queratinas Específicas del Pelo/genética
12.
Nat Commun ; 14(1): 1462, 2023 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-36927854

RESUMEN

Protection from viral infections depends on immunoglobulin isotype switching, which endows antibodies with effector functions. Here, we find that the protein kinase DYRK1A is essential for B cell-mediated protection from viral infection and effective vaccination through regulation of class switch recombination (CSR). Dyrk1a-deficient B cells are impaired in CSR activity in vivo and in vitro. Phosphoproteomic screens and kinase-activity assays identify MSH6, a DNA mismatch repair protein, as a direct substrate for DYRK1A, and deletion of a single phosphorylation site impaired CSR. After CSR and germinal center (GC) seeding, DYRK1A is required for attenuation of B cell proliferation. These findings demonstrate DYRK1A-mediated biological mechanisms of B cell immune responses that may be used for therapeutic manipulation in antibody-mediated autoimmunity.


Asunto(s)
Linfocitos B , Cambio de Clase de Inmunoglobulina , Fosforilación , Cambio de Clase de Inmunoglobulina/genética , Centro Germinal , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo
13.
Genome Res ; 33(1): 71-79, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36526432

RESUMEN

Crohn's disease (CD) is a chronic relapsing-remitting inflammatory disorder of the gastrointestinal tract that is characterized by altered innate and adaptive immune function. Although massively parallel sequencing studies of the T cell receptor repertoire identified oligoclonal expansion of unique clones, much less is known about the B cell receptor (BCR) repertoire in CD. Here, we present a novel BCR repertoire sequencing data set from ileal biopsies from pediatric patients with CD and controls, and identify CD-specific somatic hypermutation (SHM) patterns, revealed by a machine learning (ML) algorithm trained on BCR repertoire sequences. Moreover, ML classification of a different data set from blood samples of adults with CD versus controls identified that V gene usage, clusters, or mutation frequencies yielded excellent results in classifying the disease (F1 > 90%). In summary, we show that an ML algorithm enables the classification of CD based on unique BCR repertoire features with high accuracy.


Asunto(s)
Enfermedad de Crohn , Adulto , Humanos , Niño , Enfermedad de Crohn/genética , Aprendizaje Automático , Biopsia , Algoritmos , Enfermedad Crónica
14.
Front Immunol ; 14: 1330153, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38406579

RESUMEN

Introduction: Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.


Asunto(s)
Genes de Inmunoglobulinas , Inmunoglobulinas , Humanos , Inmunoglobulinas/genética , Alelos , Recombinación V(D)J/genética , Células Germinativas
15.
Front Immunol ; 13: 888555, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35720344

RESUMEN

The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that current MGP assemblies are unsuitable for the comprehensive documentation of germline IGHVs and more efforts will be needed to establish strain-specific reference sets.


Asunto(s)
Cadenas Pesadas de Inmunoglobulina , Región Variable de Inmunoglobulina , Animales , Haplotipos , Cadenas Pesadas de Inmunoglobulina/genética , Región Variable de Inmunoglobulina/genética , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Análisis de Secuencia de ADN
16.
J Immunol ; 208(12): 2713-2725, 2022 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-35623663

RESUMEN

The immune system matures throughout childhood to achieve full functionality in protecting our bodies against threats. The immune system has a strong reciprocal symbiosis with the host bacterial population and the two systems co-develop, shaping each other. Despite their fundamental role in health physiology, the ontogeny of these systems is poorly characterized. In this study, we investigated the development of the BCR repertoire by analyzing high-throughput sequencing of their receptors in several time points of young C57BL/6J mice. In parallel, we explored the development of the gut microbiome. We discovered that the gut IgA repertoires change from birth to adolescence, including an increase in CDR3 lengths and somatic hypermutation levels. This contrasts with the spleen IgM repertoires that remain stable and distinct from the IgA repertoires in the gut. We also discovered that large clones that germinate in the gut are initially confined to a specific gut compartment, then expand to nearby compartments and later on expand also to the spleen and remain there. Finally, we explored the associations between diversity indices of the B cell repertoires and the microbiome, as well as associations between bacterial and BCR clusters. Our results shed light on the ontogeny of the adaptive immune system and the microbiome, providing a baseline for future research.


Asunto(s)
Microbiota , Animales , Secuenciación de Nucleótidos de Alto Rendimiento , Inmunoglobulina A/genética , Ratones , Ratones Endogámicos C57BL , Receptores de Antígenos de Linfocitos B/genética
17.
Cell ; 185(7): 1208-1222.e21, 2022 03 31.
Artículo en Inglés | MEDLINE | ID: mdl-35305314

RESUMEN

The tumor microenvironment hosts antibody-secreting cells (ASCs) associated with a favorable prognosis in several types of cancer. Patient-derived antibodies have diagnostic and therapeutic potential; yet, it remains unclear how antibodies gain autoreactivity and target tumors. Here, we found that somatic hypermutations (SHMs) promote antibody antitumor reactivity against surface autoantigens in high-grade serous ovarian carcinoma (HGSOC). Patient-derived tumor cells were frequently coated with IgGs. Intratumoral ASCs in HGSOC were both mutated and clonally expanded and produced tumor-reactive antibodies that targeted MMP14, which is abundantly expressed on the tumor cell surface. The reversion of monoclonal antibodies to their germline configuration revealed two types of classes: one dependent on SHMs for tumor binding and a second with germline-encoded autoreactivity. Thus, tumor-reactive autoantibodies are either naturally occurring or evolve through an antigen-driven selection process. These findings highlight the origin and potential applicability of autoantibodies directed at surface antigens for tumor targeting in cancer patients.


Asunto(s)
Anticuerpos Antineoplásicos , Neoplasias Ováricas , Anticuerpos Monoclonales , Autoanticuerpos , Autoantígenos , Femenino , Humanos , Neoplasias Ováricas/genética , Microambiente Tumoral
18.
Genome Med ; 14(1): 2, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34991709

RESUMEN

BACKGROUND: T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS: To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS: From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS: We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Receptores de Antígenos de Linfocitos T alfa-beta , Alelos , Células Germinativas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T alfa-beta/genética
19.
Gigascience ; 122022 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-37848619

RESUMEN

BACKGROUND: Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS: We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS: This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.


Asunto(s)
Benchmarking , Simulación por Computador
20.
iScience ; 24(10): 103192, 2021 Oct 22.
Artículo en Inglés | MEDLINE | ID: mdl-34693229

RESUMEN

Inference of germline polymorphisms in immunoglobulin genes from B cell receptor repertoires is complicated by somatic hypermutations, sequencing/PCR errors, and by varying length of reference alleles. The light chain inference is particularly challenging owing to large gene duplications and absence of D genes. We analyzed the light chain cDNA sequences from naïve B cell receptor repertoires from 100 individuals. We optimized light chain allele inference by tweaking parameters of the TIgGER functions, extending the germline reference sequences, and establishing mismatch frequency patterns at polymorphic positions to filter out false-positive candidates. We identified 48 previously unreported variants of light chain variable genes. We selected 14 variants for validation and successfully validated 11 by Sanger sequencing. Clustering of light chain 5'UTR, L-PART1, and L-PART2 revealed partial intron retention in 11 kappa and 9 lambda V alleles. Our results provide insight into germline variation in human light chain immunoglobulin loci.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA