Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39438077

RESUMO

Adaptive immune receptors, such as antibodies and T-cell receptors, recognize foreign threats with exquisite specificity. A major challenge in adaptive immunology is discovering the rules governing immune receptor-antigen binding in order to predict the antigen binding status of previously unseen immune receptors. Many studies assume that the antigen binding status of an immune receptor may be determined by the presence of a short motif in the complementarity determining region 3 (CDR3), disregarding other amino acids. To test this assumption, we present a method to discover short motifs which show high precision in predicting antigen binding and generalize well to unseen simulated and experimental data. Our analysis of a mutagenesis-based antibody dataset reveals 11 336 position-specific, mostly gapped motifs of 3-5 amino acids that retain high precision on independently generated experimental data. Using a subset of only 178 motifs, a simple classifier was made that on the independently generated dataset outperformed a deep learning model proposed specifically for such datasets. In conclusion, our findings support the notion that for some antibodies, antigen binding may be largely determined by a short CDR3 motif. As more experimental data emerge, our methodology could serve as a foundation for in-depth investigations into antigen binding signals.


Assuntos
Motivos de Aminoácidos , Antígenos , Regiões Determinantes de Complementaridade , Regiões Determinantes de Complementaridade/química , Regiões Determinantes de Complementaridade/imunologia , Regiões Determinantes de Complementaridade/genética , Antígenos/imunologia , Antígenos/química , Antígenos/metabolismo , Humanos , Anticorpos/imunologia , Anticorpos/química , Anticorpos/metabolismo , Aprendizado Profundo , Ligação Proteica , Biologia Computacional/métodos
2.
Genome Res ; 31(12): 2209-2224, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34815307

RESUMO

The process of recombination between variable (V), diversity (D), and joining (J) immunoglobulin (Ig) gene segments determines an individual's naive Ig repertoire and, consequently, (auto)antigen recognition. VDJ recombination follows probabilistic rules that can be modeled statistically. So far, it remains unknown whether VDJ recombination rules differ between individuals. If these rules differed, identical (auto)antigen-specific Ig sequences would be generated with individual-specific probabilities, signifying that the available Ig sequence space is individual specific. We devised a sensitivity-tested distance measure that enables inter-individual comparison of VDJ recombination models. We discovered, accounting for several sources of noise as well as allelic variation in Ig sequencing data, that not only unrelated individuals but also human monozygotic twins and even inbred mice possess statistically distinguishable immunoglobulin recombination models. This suggests that, in addition to genetic, there is also nongenetic modulation of VDJ recombination. We demonstrate that population-wide individualized VDJ recombination can result in orders of magnitude of difference in the probability to generate (auto)antigen-specific Ig sequences. Our findings have implications for immune receptor-based individualized medicine approaches relevant to vaccination, infection, and autoimmunity.

3.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35062022

RESUMO

T-cell receptor (TCR) sequencing has enabled the development of innovative diagnostic tests for cancers, autoimmune diseases and other applications. However, the rarity of many T-cell clonotypes presents a detection challenge, which may lead to misdiagnosis if diagnostically relevant TCRs remain undetected. To address this issue, we developed TCRpower, a novel computational pipeline for quantifying the statistical detection power of TCR sequencing methods. TCRpower calculates the probability of detecting a TCR sequence as a function of several key parameters: in-vivo TCR frequency, T-cell sample count, read sequencing depth and read cutoff. To calibrate TCRpower, we selected unique TCRs of 45 T-cell clones (TCCs) as spike-in TCRs. We sequenced the spike-in TCRs from TCCs, together with TCRs from peripheral blood, using a 5' RACE protocol. The 45 spike-in TCRs covered a wide range of sample frequencies, ranging from 5 per 100 to 1 per 1 million. The resulting spike-in TCR read counts and ground truth frequencies allowed us to calibrate TCRpower. In our TCR sequencing data, we observed a consistent linear relationship between sample and sequencing read frequencies. We were also able to reliably detect spike-in TCRs with frequencies as low as one per million. By implementing an optimized read cutoff, we eliminated most of the falsely detected sequences in our data (TCR α-chain 99.0% and TCR ß-chain 92.4%), thereby improving diagnostic specificity. TCRpower is publicly available and can be used to optimize future TCR sequencing experiments, and thereby enable reliable detection of disease-relevant TCRs for diagnostic applications.


Assuntos
Receptores de Antígenos de Linfócitos T , Humanos , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T alfa-beta/genética , Linfócitos T
4.
Bioinformatics ; 38(17): 4230-4232, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35852318

RESUMO

MOTIVATION: Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching. RESULTS: CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 104 AIRRs with 105 sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications. AVAILABILITY AND IMPLEMENTATION: CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Ecossistema , Software , Humanos , Aprendizado de Máquina , Benchmarking
6.
Gigascience ; 112022 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-35639633

RESUMO

BACKGROUND: Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required. RESULTS: To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRß CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences. CONCLUSIONS: We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.


Assuntos
Aprendizado de Máquina , Receptores Imunológicos
7.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-37848619

RESUMO

BACKGROUND: Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS: We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS: This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.


Assuntos
Benchmarking , Simulação por Computador
8.
MAbs ; 14(1): 2031482, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35377271

RESUMO

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.


Assuntos
Reações Antígeno-Anticorpo , Aprendizado de Máquina , Anticorpos Monoclonais/química , Sítios de Ligação de Anticorpos , Epitopos
9.
Nat Comput Sci ; 2(12): 845-865, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38177393

RESUMO

Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity. We formalized common immunological antibody-specificity prediction problems as ML tasks and confirmed that for both sequence- and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework has the potential to enable real-world relevant development and benchmarking of ML strategies for biotherapeutics design.


Assuntos
Anticorpos , Reações Antígeno-Anticorpo , Especificidade de Anticorpos , Epitopos/química , Aprendizado de Máquina
10.
Cell Rep ; 34(11): 108856, 2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-33730590

RESUMO

Antibody-antigen binding relies on the specific interaction of amino acids at the paratope-epitope interface. The predictability of antibody-antigen binding is a prerequisite for de novo antibody and (neo-)epitope design. A fundamental premise for the predictability of antibody-antigen binding is the existence of paratope-epitope interaction motifs that are universally shared among antibody-antigen structures. In a dataset of non-redundant antibody-antigen structures, we identify structural interaction motifs, which together compose a commonly shared structure-based vocabulary of paratope-epitope interactions. We show that this vocabulary enables the machine learnability of antibody-antigen binding on the paratope-epitope level using generative machine learning. The vocabulary (1) is compact, less than 104 motifs; (2) distinct from non-immune protein-protein interactions; and (3) mediates specific oligo- and polyreactive interactions between paratope-epitope pairs. Our work leverages combined structure- and sequence-based learning to demonstrate that machine-learning-driven predictive paratope and epitope engineering is feasible.


Assuntos
Reações Antígeno-Anticorpo/imunologia , Sítios de Ligação de Anticorpos/imunologia , Epitopos/imunologia , Motivos de Aminoácidos , Sequência de Aminoácidos , Anticorpos/química , Anticorpos/imunologia , Regiões Determinantes de Complementaridade/química , Epitopos/química , Aprendizado de Máquina , Ligação Proteica
11.
Nat Mach Intell ; 3(11): 936-944, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37396030

RESUMO

Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

12.
Sci Rep ; 8(1): 8538, 2018 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-29867163

RESUMO

Brucellosis is a rarely encountered infection in Norway. The aim of this study was to explore all Brucella melitensis isolates collected in Norway from 1999 to 2016 in relation to origin of infection and antimicrobial resistance patterns. A total of 23 isolates were analysed by whole-genome sequencing and compared with selected sequences of B. melitensis available from NCBI. Additionally, SNP analysis in antibiotic resistance determining genes was performed. The majority belonged to the East Mediterranean clade (genotype II), while the remaining isolates belonged to the African clade (genotype III). These results indicate that human brucellosis in Norway is related to travels or migration from the Middle East, Asia or Africa, in accordance with results from Germany, Denmark and Sweden. Antibiotic susceptibility patterns were determined by broth microdilution method and/or gradient strip method. All isolates were susceptible for all tested antibiotics, except for rifampicin where phenotypical results indicated resistance or intermediate resistance in all isolates based on broth microdilution method, and in four isolates based on gradient strip testing. In contrast, screening of the rpoB gene did not reveal any mutations in the previously described rpoB "hot spot" regions related to rifampicin resistance, indicating overestimation of resistance based on phenotypical results.


Assuntos
Brucella melitensis/genética , Brucelose/genética , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma , Brucella melitensis/efeitos dos fármacos , Brucelose/epidemiologia , Farmacorresistência Bacteriana , Feminino , Humanos , Masculino , Testes de Sensibilidade Microbiana , Noruega/epidemiologia , Rifampina/farmacologia
13.
Genome Announc ; 6(26)2018 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-29954913

RESUMO

We report here the draft genome sequence of a Streptococcus species belonging to the S. mitis group. While a clear species identification cannot be made for the isolate, it appears that its most recent common ancestor is the species S. pseudopneumoniae.

14.
Genome Biol ; 17(1): 238, 2016 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-27887642

RESUMO

Genome-wide association studies (GWAS) have become indispensable in human medicine and genomics, but very few have been carried out on bacteria. Here we introduce Scoary, an ultra-fast, easy-to-use, and widely applicable software tool that scores the components of the pan-genome for associations to observed phenotypic traits while accounting for population stratification, with minimal assumptions about evolutionary processes. We call our approach pan-GWAS to distinguish it from traditional, single nucleotide polymorphism (SNP)-based GWAS. Scoary is implemented in Python and is available under an open source GPLv3 license at https://github.com/AdmiralenOla/Scoary .

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa