Pesquisa | Portal Regional da BVS

DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations.

Høie, Magnus Haraldson; Gade, Frederik Steensgaard; Johansen, Julie Maria; Würtzen, Charlotte; Winther, Ole; Nielsen, Morten; Marcatili, Paolo.

Front Immunol ; 15: 1322712, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38390326

RESUMO

Accurate computational identification of B-cell epitopes is crucial for the development of vaccines, therapies, and diagnostic tools. However, current structure-based prediction methods face limitations due to the dependency on experimentally solved structures. Here, we introduce DiscoTope-3.0, a markedly improved B-cell epitope prediction tool that innovatively employs inverse folding structure representations and a positive-unlabelled learning strategy, and is adapted for both solved and predicted structures. Our tool demonstrates a considerable improvement in performance over existing methods, accurately predicting linear and conformational epitopes across multiple independent datasets. Most notably, DiscoTope-3.0 maintains high predictive performance across solved, relaxed and predicted structures, alleviating the need for experimental structures and extending the general applicability of accurate B-cell epitope prediction by 3 orders of magnitude. DiscoTope-3.0 is made widely accessible on two web servers, processing over 100 structures per submission, and as a downloadable package. In addition, the servers interface with RCSB and AlphaFoldDB, facilitating large-scale prediction across over 200 million cataloged proteins. DiscoTope-3.0 is available at: https://services.healthtech.dtu.dk/service.php?DiscoTope-3.0.

Assuntos

Epitopos de Linfócito B , Conformação Molecular

Comparative Structure Analysis of the Multi-Domain, Cell Envelope Proteases of Lactic Acid Bacteria.

Christensen, Lise Friis; Høie, Magnus Haraldson; Bang-Berthelsen, Claus Heiner; Marcatili, Paolo; Hansen, Egon Bech.

Microorganisms ; 11(9)2023 Sep 08.

Artigo em Inglês | MEDLINE | ID: mdl-37764099

RESUMO

Lactic acid bacteria (LAB) have an extracellular proteolytic system that includes a multi-domain, cell envelope protease (CEP) with a subtilisin homologous protease domain. These CEPs have different proteolytic activities despite having similar protein sequences. Structural characterization has previously been limited to CEP homologs of dairy- and human-derived LAB strains, excluding CEPs of plant-derived LAB strains. CEP structures are a challenge to determine experimentally due to their large size and attachment to the cell envelope. This study aims to clarify the prevalence and structural diversity of CEPs by using the structure prediction software AlphaFold 2. Domain boundaries are clarified based on a comparative analysis of 21 three-dimensional structures, revealing novel domain architectures of CEP homologs that are not necessarily restricted to specific LAB species or ecological niches. The C-terminal flanking region of the protease domain is divided into fibronectin type-III-like domains with various structural traits. The analysis also emphasizes the existence of two distinct domains for cell envelope attachment that are preceded by an intrinsically disordered cell wall spanning domain. The domain variants and their combinations provide CEPs with different stability, proteolytic activity, and potentially adhesive properties, making CEPs targets for steering proteolytic activity with relevance for both food development and human health.

Widespread amyloidogenicity potential of multiple myeloma patient-derived immunoglobulin light chains.

Sternke-Hoffmann, Rebecca; Pauly, Thomas; Norrild, Rasmus K; Hansen, Jan; Tucholski, Florian; Høie, Magnus Haraldson; Marcatili, Paolo; Dupré, Mathieu; Duchateau, Magalie; Rey, Martial; Malosse, Christian; Metzger, Sabine; Boquoi, Amelie; Platten, Florian; Egelhaaf, Stefan U; Chamot-Rooke, Julia; Fenk, Roland; Nagel-Steger, Luitgard; Haas, Rainer; Buell, Alexander K.

BMC Biol ; 21(1): 21, 2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36737754

RESUMO

BACKGROUND: In a range of human disorders such as multiple myeloma (MM), immunoglobulin light chains (IgLCs) can be produced at very high concentrations. This can lead to pathological aggregation and deposition of IgLCs in different tissues, which in turn leads to severe and potentially fatal organ damage. However, IgLCs can also be highly soluble and non-toxic. It is generally thought that the cause for this differential solubility behaviour is solely found within the IgLC amino acid sequences, and a variety of individual sequence-related biophysical properties (e.g. thermal stability, dimerisation) have been proposed in different studies as major determinants of the aggregation in vivo. Here, we investigate biophysical properties underlying IgLC amyloidogenicity. RESULTS: We introduce a novel and systematic workflow, Thermodynamic and Aggregation Fingerprinting (ThAgg-Fip), for detailed biophysical characterisation, and apply it to nine different MM patient-derived IgLCs. Our set of pathogenic IgLCs spans the entire range of values in those parameters previously proposed to define in vivo amyloidogenicity; however, none actually forms amyloid in patients. Even more surprisingly, we were able to show that all our IgLCs are able to form amyloid fibrils readily in vitro under the influence of proteolytic cleavage by co-purified cathepsins. CONCLUSIONS: We show that (I) in vivo aggregation behaviour is unlikely to be mechanistically linked to any single biophysical or biochemical parameter and (II) amyloidogenic potential is widespread in IgLC sequences and is not confined to those sequences that form amyloid fibrils in patients. Our findings suggest that protein sequence, environmental conditions and presence and action of proteases all determine the ability of light chains to form amyloid fibrils in patients.

Assuntos

Cadeias Leves de Imunoglobulina , Mieloma Múltiplo , Humanos , Cadeias Leves de Imunoglobulina/química , Cadeias Leves de Imunoglobulina/metabolismo , Amiloide/metabolismo , Sequência de Aminoácidos , Proteólise

BepiPred-3.0: Improved B-cell epitope prediction using protein language models.

Clifford, Joakim Nøddeskov; Høie, Magnus Haraldson; Deleuran, Sebastian; Peters, Bjoern; Nielsen, Morten; Marcatili, Paolo.

Protein Sci ; 31(12): e4497, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36366745

RESUMO

B-cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development and disease diagnostics. The introduction of protein language models (LMs), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numeric representation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred-3.0, a sequence-based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance was further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server and a standalone package at https://services.healthtech.dtu.dk/service.php?BepiPred-3.0 with a user-friendly interface to navigate the results.

Assuntos

Epitopos de Linfócito B , Idioma , Epitopos de Linfócito B/química , Sequência de Aminoácidos , Mapeamento de Epitopos/métodos , Biologia Computacional/métodos

NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning.

Høie, Magnus Haraldson; Kiehl, Erik Nicolas; Petersen, Bent; Nielsen, Morten; Winther, Ole; Nielsen, Henrik; Hallgren, Jeppe; Marcatili, Paolo.

Nucleic Acids Res ; 50(W1): W510-W515, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35648435

RESUMO

Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

Assuntos

Aprendizado Profundo , Processamento de Linguagem Natural , Estrutura Secundária de Proteína , Proteínas , Sequência de Aminoácidos , Proteínas/química , Proteínas/metabolismo , Conjuntos de Dados como Assunto , Solventes/química , Fatores de Tempo , Internet , Computadores , Software

Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation.

Høie, Magnus Haraldson; Cagiada, Matteo; Beck Frederiksen, Anders Haagen; Stein, Amelie; Lindorff-Larsen, Kresten.

Cell Rep ; 38(2): 110207, 2022 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-35021073

RESUMO

Understanding and predicting the functional consequences of single amino acid changes is central in many areas of protein science. Here, we collect and analyze experimental measurements of effects of >150,000 variants in 29 proteins. We use biophysical calculations to predict changes in stability for each variant and assess them in light of sequence conservation. We find that the sequence analyses give more accurate prediction of variant effects than predictions of stability and that about half of the variants that show loss of function do so due to stability effects. We construct a machine learning model to predict variant effects from protein structure and sequence alignments and show how the two sources of information support one another and enable mechanistic interpretations. Together, our results show how one can leverage large-scale experimental assessments of variant effects to gain deeper and general insights into the mechanisms that cause loss of function.

Assuntos

Previsões/métodos , Estabilidade Proteica , Análise de Sequência de DNA/métodos , Substituição de Aminoácidos , Animais , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Mutação/genética , Mutação/fisiologia , Proteínas/metabolismo , Alinhamento de Sequência/métodos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA