Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
PLoS Comput Biol ; 20(3): e1011881, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38442111

RESUMO

Antibody-based therapeutics must not undergo chemical modifications that would impair their efficacy or hinder their developability. A commonly used technique to de-risk lead biotherapeutic candidates annotates chemical liability motifs on their sequence. By analyzing sequences from all major sources of data (therapeutics, patents, GenBank, literature, and next-generation sequencing outputs), we find that almost all antibodies contain an average of 3-4 such liability motifs in their paratopes, irrespective of the source dataset. This is in line with the common wisdom that liability motif annotation is over-predictive. Therefore, we have compiled three computational flags to prioritize liability motifs for removal from lead drug candidates: 1. germline, to reflect naturally occurring motifs, 2. therapeutic, reflecting chemical liability motifs found in therapeutic antibodies, and 3. surface, indicative of structural accessibility for chemical modification. We show that these flags annotate approximately 60% of liability motifs as benign, that is, the flagged liabilities have a smaller probability of undergoing degradation as benchmarked on two experimental datasets covering deamidation, isomerization, and oxidation. We combined the liability detection and flags into a tool called Liability Antibody Profiler (LAP), publicly available at lap.naturalantibody.com. We anticipate that LAP will save time and effort in de-risking therapeutic molecules.


Assuntos
Anticorpos , Sequenciamento de Nucleotídeos em Larga Escala , Anticorpos/uso terapêutico , Probabilidade
2.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35830864

RESUMO

Antibodies are versatile molecular binders with an established and growing role as therapeutics. Computational approaches to developing and designing these molecules are being increasingly used to complement traditional lab-based processes. Nowadays, in silico methods fill multiple elements of the discovery stage, such as characterizing antibody-antigen interactions and identifying developability liabilities. Recently, computational methods tackling such problems have begun to follow machine learning paradigms, in many cases deep learning specifically. This paradigm shift offers improvements in established areas such as structure or binding prediction and opens up new possibilities such as language-based modeling of antibody repertoires or machine-learning-based generation of novel sequences. In this review, we critically examine the recent developments in (deep) machine learning approaches to therapeutic antibody design with implications for fully computational antibody design.


Assuntos
Aprendizado Profundo , Anticorpos/uso terapêutico , Estudos de Viabilidade , Aprendizado de Máquina
3.
Nucleic Acids Res ; 50(D1): D1273-D1281, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34747487

RESUMO

Nanobodies, a subclass of antibodies found in camelids, are versatile molecular binding scaffolds composed of a single polypeptide chain. The small size of nanobodies bestows multiple therapeutic advantages (stability, tumor penetration) with the first therapeutic approval in 2018 cementing the clinical viability of this format. Structured data and sequence information of nanobodies will enable the accelerated clinical development of nanobody-based therapeutics. Though the nanobody sequence and structure data are deposited in the public domain at an accelerating pace, the heterogeneity of sources and lack of standardization hampers reliable harvesting of nanobody information. We address this issue by creating the Integrated Database of Nanobodies for Immunoinformatics (INDI, http://naturalantibody.com/nanobodies). INDI collates nanobodies from all the major public outlets of biological sequences: patents, GenBank, next-generation sequencing repositories, structures and scientific publications. We equip INDI with powerful nanobody-specific sequence and text search facilitating access to >11 million nanobody sequences. INDI should facilitate development of novel nanobody-specific computational protocols helping to deliver on the therapeutic promise of this drug format.


Assuntos
Camelidae/imunologia , Bases de Dados Genéticas , Neoplasias/terapia , Anticorpos de Domínio Único/imunologia , Sequência de Aminoácidos/genética , Animais , Anticorpos/classificação , Anticorpos/imunologia , Camelidae/classificação , Humanos , Imunoterapia/classificação , Neoplasias/imunologia , Anticorpos de Domínio Único/classificação
4.
Bioinformatics ; 38(3): 875-877, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34636883

RESUMO

MOTIVATION: Liquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (i) absence of balanced training data with large sample size; (ii) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (iii) lack of benchmarking of ML methods on specific LC-MS problems. RESULTS: We created the MS2AI pipeline that automates the process of gathering vast quantities of MS data for large-scale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data are stored in a standardized format amenable for ML, encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of the software can be found at https://gitlab.com/roettgerlab/ms2ai. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Cromatografia Líquida/métodos , Espectrometria de Massas em Tandem/métodos , Peptídeos/análise , Software , Proteoma/química
5.
Bioinformatics ; 38(9): 2628-2630, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35274671

RESUMO

MOTIVATION: Rational design of therapeutic antibodies can be improved by harnessing the natural sequence diversity of these molecules. Our understanding of the diversity of antibodies has recently been greatly facilitated through the deposition of hundreds of millions of human antibody sequences in next-generation sequencing (NGS) repositories. Contrasting a query therapeutic antibody sequence to naturally observed diversity in similar antibody sequences from NGS can provide a mutational roadmap for antibody engineers designing biotherapeutics. Because of the sheer scale of the antibody NGS datasets, performing queries across them is computationally challenging. RESULTS: To facilitate harnessing antibody NGS data, we developed AbDiver (http://naturalantibody.com/abdiver), a free portal allowing users to compare their query sequences to those observed in the natural repertoires. AbDiver offers three antibody-specific use-cases: (i) compare a query antibody to positional variability statistics precomputed from multiple independent studies, (ii) retrieve close full variable sequence matches to a query antibody and (iii) retrieve CDR3 or clonotype matches to a query antibody. We applied our system to a set of 742 therapeutic antibodies, demonstrating that for each use-case our system can retrieve relevant results for most sequences. AbDiver facilitates the navigation of vast antibody mutation space for the purpose of rational therapeutic antibody design. AVAILABILITY AND IMPLEMENTATION: AbDiver is freely accessible at http://naturalantibody.com/abdiver. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Anticorpos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anticorpos/uso terapêutico , Anticorpos/genética , Software
6.
Brief Bioinform ; 21(5): 1549-1567, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31626279

RESUMO

Antibodies are proteins that recognize the molecular surfaces of potentially noxious molecules to mount an adaptive immune response or, in the case of autoimmune diseases, molecules that are part of healthy cells and tissues. Due to their binding versatility, antibodies are currently the largest class of biotherapeutics, with five monoclonal antibodies ranked in the top 10 blockbuster drugs. Computational advances in protein modelling and design can have a tangible impact on antibody-based therapeutic development. Antibody-specific computational protocols currently benefit from an increasing volume of data provided by next generation sequencing and application to related drug modalities based on traditional antibodies, such as nanobodies. Here we present a structured overview of available databases, methods and emerging trends in computational antibody analysis and contextualize them towards the engineering of candidate antibody therapeutics.


Assuntos
Anticorpos Monoclonais/química , Anticorpos Monoclonais/imunologia , Anticorpos Monoclonais/uso terapêutico , Biologia Computacional/métodos , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular , Conformação Proteica
7.
Proc Natl Acad Sci U S A ; 116(10): 4025-4030, 2019 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-30765520

RESUMO

Therapeutic mAbs must not only bind to their target but must also be free from "developability issues" such as poor stability or high levels of aggregation. While small-molecule drug discovery benefits from Lipinski's rule of five to guide the selection of molecules with appropriate biophysical properties, there is currently no in silico analog for antibody design. Here, we model the variable domain structures of a large set of post-phase-I clinical-stage antibody therapeutics (CSTs) and calculate in silico metrics to estimate their typical properties. In each case, we contextualize the CST distribution against a snapshot of the human antibody gene repertoire. We describe guideline values for five metrics thought to be implicated in poor developability: the total length of the complementarity-determining regions (CDRs), the extent and magnitude of surface hydrophobicity, positive charge and negative charge in the CDRs, and asymmetry in the net heavy- and light-chain surface charges. The guideline cutoffs for each property were derived from the values seen in CSTs, and a flagging system is proposed to identify nonconforming candidates. On two mAb drug discovery sets, we were able to selectively highlight sequences with developability issues. We make available the Therapeutic Antibody Profiler (TAP), a computational tool that builds downloadable homology models of variable domain sequences, tests them against our five developability guidelines, and reports potential sequence liabilities and canonical forms. TAP is freely available at opig.stats.ox.ac.uk/webapps/sabdab-sabpred/TAP.php.


Assuntos
Regiões Determinantes de Complementaridade , Simulação por Computador , Modelos Moleculares , Anticorpos Monoclonais/química , Anticorpos Monoclonais/genética , Regiões Determinantes de Complementaridade/química , Regiões Determinantes de Complementaridade/genética , Descoberta de Drogas , Humanos
8.
Bioinformatics ; 36(6): 1750-1756, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31693112

RESUMO

MOTIVATION: Over the last few years, the field of protein structure prediction has been transformed by increasingly accurate contact prediction software. These methods are based on the detection of coevolutionary relationships between residues from multiple sequence alignments (MSAs). However, despite speculation, there is little evidence of a link between contact prediction and the physico-chemical interactions which drive amino-acid coevolution. Furthermore, existing protocols predict only a fraction of all protein contacts and it is not clear why some contacts are favoured over others. Using a dataset of 863 protein domains, we assessed the physico-chemical interactions of contacts predicted by CCMpred, MetaPSICOV and DNCON2, as examples of direct coupling analysis, meta-prediction and deep learning. RESULTS: We considered correctly predicted contacts and compared their properties against the protein contacts that were not predicted. Predicted contacts tend to form more bonds than non-predicted contacts, which suggests these contacts may be more important than contacts that were not predicted. Comparing the contacts predicted by each method, we found that metaPSICOV and DNCON2 favour accuracy, whereas CCMPred detects contacts with more bonds. This suggests that the push for higher accuracy may lead to a loss of physico-chemically important contacts. These results underscore the connection between protein physico-chemistry and the coevolutionary couplings that can be derived from MSAs. This relationship is likely to be relevant to protein structure prediction and functional analysis of protein structure and may be key to understanding their utility for different problems in structural biology. AVAILABILITY AND IMPLEMENTATION: We use publicly available databases. Our code is available for download at https://opig.stats.ox.ac.uk/. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.


Assuntos
Biologia Computacional , Análise de Sequência de Proteína , Algoritmos , Conformação Proteica , Proteínas/genética , Alinhamento de Sequência , Software
9.
J Med Internet Res ; 23(6): e28253, 2021 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-33900934

RESUMO

BACKGROUND: Before the advent of an effective vaccine, nonpharmaceutical interventions, such as mask-wearing, social distancing, and lockdowns, have been the primary measures to combat the COVID-19 pandemic. Such measures are highly effective when there is high population-wide adherence, which requires information on current risks posed by the pandemic alongside a clear exposition of the rules and guidelines in place. OBJECTIVE: Here we analyzed online news media coverage of COVID-19. We quantified the total volume of COVID-19 articles, their sentiment polarization, and leading subtopics to act as a reference to inform future communication strategies. METHODS: We collected 26 million news articles from the front pages of 172 major online news sources in 11 countries (available online at SciRide). Using topic detection, we identified COVID-19-related content to quantify the proportion of total coverage the pandemic received in 2020. The sentiment analysis tool Vader was employed to stratify the emotional polarity of COVID-19 reporting. Further topic detection and sentiment analysis was performed on COVID-19 coverage to reveal the leading themes in pandemic reporting and their respective emotional polarizations. RESULTS: We found that COVID-19 coverage accounted for approximately 25.3% of all front-page online news articles between January and October 2020. Sentiment analysis of English-language sources revealed that overall COVID-19 coverage was not exclusively negatively polarized, suggesting wide heterogeneous reporting of the pandemic. Within this heterogenous coverage, 16% of COVID-19 news articles (or 4% of all English-language articles) can be classified as highly negatively polarized, citing issues such as death, fear, or crisis. CONCLUSIONS: The goal of COVID-19 public health communication is to increase understanding of distancing rules and to maximize the impact of governmental policy. The extent to which the quantity and quality of information from different communication channels (eg, social media, government pages, and news) influence public understanding of public health measures remains to be established. Here we conclude that a quarter of all reporting in 2020 covered COVID-19, which is indicative of information overload. In this capacity, our data and analysis form a quantitative basis for informing health communication strategies along traditional news media channels to minimize the risks of COVID-19 while vaccination is rolled out.


Assuntos
COVID-19/epidemiologia , Mineração de Dados/métodos , Meios de Comunicação de Massa/estatística & dados numéricos , Saúde Pública/métodos , Mídias Sociais/estatística & dados numéricos , Recursos em Saúde , Humanos , Pandemias , SARS-CoV-2/isolamento & purificação
11.
J Immunol ; 201(12): 3694-3704, 2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-30397033

RESUMO

Next-generation sequencing of the Ig gene repertoire (Ig-seq) produces large volumes of information at the nucleotide sequence level. Such data have improved our understanding of immune systems across numerous species and have already been successfully applied in vaccine development and drug discovery. However, the high-throughput nature of Ig-seq means that it is afflicted by high error rates. This has led to the development of error-correction approaches. Computational error-correction methods use sequence information alone, primarily designating sequences as likely to be correct if they are observed frequently. In this work, we describe an orthogonal method for filtering Ig-seq data, which considers the structural viability of each sequence. A typical natural Ab structure requires the presence of a disulfide bridge within each of its variable chains to maintain the fold. Our Ab Sequence Selector (ABOSS) uses the presence/absence of this bridge as a way of both identifying structurally viable sequences and estimating the sequencing error rate. On simulated Ig-seq datasets, ABOSS is able to identify more than 99% of structurally viable sequences. Applying our method to six independent Ig-seq datasets (one mouse and five human), we show that our error calculations are in line with previous experimental and computational error estimates. We also show how ABOSS is able to identify structurally impossible sequences missed by other error-correction methods.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imunoglobulinas/genética , Software , Vacinas/imunologia , Algoritmos , Animais , Biologia Computacional , Bases de Dados como Assunto , Desenvolvimento de Medicamentos , Humanos , Camundongos , Conformação Proteica , Controle de Qualidade , Erro Científico Experimental , Relação Estrutura-Atividade
12.
J Immunol ; 201(8): 2502-2509, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30217829

RESUMO

Abs are immune system proteins that recognize noxious molecules for elimination. Their sequence diversity and binding versatility have made Abs the primary class of biopharmaceuticals. Recently, it has become possible to query their immense natural diversity using next-generation sequencing of Ig gene repertoires (Ig-seq). However, Ig-seq outputs are currently fragmented across repositories and tend to be presented as raw nucleotide reads, which means nontrivial effort is required to reuse the data for analysis. To address this issue, we have collected Ig-seq outputs from 55 studies, covering more than half a billion Ab sequences across diverse immune states, organisms (primarily human and mouse), and individuals. We have sorted, cleaned, annotated, translated, and numbered these sequences and make the data available via our Observed Antibody Space (OAS) resource at http://antibodymap.org The data within OAS will be regularly updated with newly released Ig-seq datasets. We believe OAS will facilitate data mining of immune repertoires for improved understanding of the immune system and development of better biotherapeutics.


Assuntos
Anticorpos/genética , Mineração de Dados/métodos , Imunoglobulinas/genética , Imunoterapia/métodos , Animais , Diversidade de Anticorpos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imunidade Humoral/genética , Camundongos , Anotação de Sequência Molecular
13.
Nucleic Acids Res ; 46(D1): D406-D412, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29087479

RESUMO

The Structural T-cell Receptor Database (STCRDab; http://opig.stats.ox.ac.uk/webapps/stcrdab) is an online resource that automatically collects and curates TCR structural data from the Protein Data Bank. For each entry, the database provides annotations, such as the α/ß or γ/δ chain pairings, major histocompatibility complex details, and where available, antigen binding affinities. In addition, the orientation between the variable domains and the canonical forms of the complementarity-determining region loops are also provided. Users can select, view, and download individual or bulk sets of structures based on these criteria. Where available, STCRDab also finds antibody structures that are similar to TCRs, helping users explore the relationship between TCRs and antibodies.


Assuntos
Antígenos/química , Regiões Determinantes de Complementaridade/química , Bases de Dados de Proteínas , Receptores de Antígenos de Linfócitos T/química , Software , Sequência de Aminoácidos , Antígenos/imunologia , Antígenos/metabolismo , Sítios de Ligação , Regiões Determinantes de Complementaridade/metabolismo , Humanos , Internet , Complexo Principal de Histocompatibilidade/genética , Complexo Principal de Histocompatibilidade/imunologia , Modelos Moleculares , Anotação de Sequência Molecular , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Receptores de Antígenos de Linfócitos T/imunologia , Receptores de Antígenos de Linfócitos T/metabolismo , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Linfócitos T/citologia , Linfócitos T/imunologia
14.
Bioinformatics ; 34(1): 41-48, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29028987

RESUMO

Availability and implementation: The code together with examples and tutorials are available from http://www.cs.ox.ac.uk/mosaics. Contact: peter.minary@cs.ox.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Simulação por Computador , Metilação de DNA , Epigenômica/métodos , Modelos Moleculares , Conformação de Ácido Nucleico , Software , Biologia Computacional/métodos , DNA/química , DNA/metabolismo
15.
Brief Bioinform ; 17(1): 117-31, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25971595

RESUMO

The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Sequência de Aminoácidos , Complexo Antígeno-Anticorpo/química , Sítios de Ligação , Biologia Computacional/métodos , Biologia Computacional/tendências , Bases de Dados de Proteínas/estatística & dados numéricos , Epitopos/química , Humanos , Imageamento Tridimensional , Aprendizado de Máquina , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas/genética , Mapeamento de Interação de Proteínas/métodos , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
16.
Nucleic Acids Res ; 44(W1): W474-8, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27131379

RESUMO

SAbPred is a server that makes predictions of the properties of antibodies focusing on their structures. Antibody informatics tools can help improve our understanding of immune responses to disease and aid in the design and engineering of therapeutic molecules. SAbPred is a single platform containing multiple applications which can: number and align sequences; automatically generate antibody variable fragment homology models; annotate such models with estimated accuracy alongside sequence and structural properties including potential developability issues; predict paratope residues; and predict epitope patches on protein antigens. The server is available at http://opig.stats.ox.ac.uk/webapps/sabpred.


Assuntos
Anticorpos/química , Anticorpos/imunologia , Internet , Software , Algoritmos , Antígenos/química , Antígenos/imunologia , Sítios de Ligação de Anticorpos/imunologia , Epitopos/química , Epitopos/imunologia , Região Variável de Imunoglobulina/química , Região Variável de Imunoglobulina/imunologia , Modelos Moleculares , Anotação de Sequência Molecular , Interface Usuário-Computador
17.
J Chem Inf Model ; 56(9): 1746-54, 2016 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-27500460

RESUMO

HIV-1 replication requires binding to occur between Trans-activation Response Element (TAR) RNA and the TAT protein. This TAR-TAT binding depends on the conformation of TAR, and therapeutic development has attempted to exploit this dynamic behavior. Here we simulate TAR dynamics in the context of mutations inhibiting TAR binding. We find that two tertiary elements, the apical loop and the bulge, can interact directly, and this interaction may be linked to the affinity of TAR for TAT.


Assuntos
HIV-1/genética , HIV-1/metabolismo , RNA Viral/genética , RNA Viral/metabolismo , Elementos de Resposta/genética , Produtos do Gene tat do Vírus da Imunodeficiência Humana/metabolismo , HIV-1/fisiologia , Modelos Moleculares , Mutação , Conformação de Ácido Nucleico , Ligação Proteica , RNA Viral/química , Replicação Viral
18.
Nucleic Acids Res ; 42(Database issue): D1140-6, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24214988

RESUMO

Structural antibody database (SAbDab; http://opig.stats.ox.ac.uk/webapps/sabdab) is an online resource containing all the publicly available antibody structures annotated and presented in a consistent fashion. The data are annotated with several properties including experimental information, gene details, correct heavy and light chain pairings, antigen details and, where available, antibody-antigen binding affinity. The user can select structures, according to these attributes as well as structural properties such as complementarity determining region loop conformation and variable domain orientation. Individual structures, datasets and the complete database can be downloaded.


Assuntos
Anticorpos/química , Bases de Dados de Proteínas , Anticorpos/genética , Afinidade de Anticorpos , Sítios de Ligação de Anticorpos , Regiões Determinantes de Complementaridade , Internet , Conformação Proteica , Terminologia como Assunto
19.
Bioinformatics ; 30(16): 2288-94, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24753488

RESUMO

MOTIVATION: Antibodies are currently the most important class of biopharmaceuticals. Development of such antibody-based drugs depends on costly and time-consuming screening campaigns. Computational techniques such as antibody-antigen docking hold the potential to facilitate the screening process by rapidly providing a list of initial poses that approximate the native complex. RESULTS: We have developed a new method to identify the epitope region on the antigen, given the structures of the antibody and the antigen-EpiPred. The method combines conformational matching of the antibody-antigen structures and a specific antibody-antigen score. We have tested the method on both a large non-redundant set of antibody-antigen complexes and on homology models of the antibodies and/or the unbound antigen structure. On a non-redundant test set, our epitope prediction method achieves 44% recall at 14% precision against 23% recall at 14% precision for a background random distribution. We use our epitope predictions to rescore the global docking results of two rigid-body docking algorithms: ZDOCK and ClusPro. In both cases including our epitope, prediction increases the number of near-native poses found among the top decoys. AVAILABILITY AND IMPLEMENTATION: Our software is available from http://www.stats.ox.ac.uk/research/proteins/resources.


Assuntos
Complexo Antígeno-Anticorpo/química , Epitopos de Linfócito B/química , Simulação de Acoplamento Molecular/métodos , Algoritmos , Humanos , Software
20.
Bioinform Adv ; 4(1): vbae033, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38560554

RESUMO

Motivation: Nanobodies are a subclass of immunoglobulins, whose binding site consists of only one peptide chain, bestowing favorable biophysical properties. Recently, the first nanobody therapy was approved, paving the way for further clinical applications of this antibody format. Further development of nanobody-based therapeutics could be streamlined by computational methods. One of such methods is infilling-positional prediction of biologically feasible mutations in nanobodies. Being able to identify possible positional substitutions based on sequence context, facilitates functional design of such molecules. Results: Here we present nanoBERT, a nanobody-specific transformer to predict amino acids in a given position in a query sequence. We demonstrate the need to develop such machine-learning based protocol as opposed to gene-specific positional statistics since appropriate genetic reference is not available. We benchmark nanoBERT with respect to human-based language models and ESM-2, demonstrating the benefit for domain-specific language models. We also demonstrate the benefit of employing nanobody-specific predictions for fine-tuning on experimentally measured thermostability dataset. We hope that nanoBERT will help engineers in a range of predictive tasks for designing therapeutic nanobodies. Availability and implementation: https://huggingface.co/NaturalAntibody/.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA