Your browser doesn't support javascript.
loading
Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models.
Cuturello, Francesca; Celoria, Marco; Ansuini, Alessio; Cazzaniga, Alberto.
Afiliación
  • Cuturello F; AREA Science Park, Trieste, 34149, Italy.
  • Celoria M; AREA Science Park, Trieste, 34149, Italy.
  • Ansuini A; CINECA National Supercomputing Center, Bologna, 40033, Italy.
  • Cazzaniga A; AREA Science Park, Trieste, 34149, Italy.
Bioinformatics ; 2024 Jul 16.
Article en En | MEDLINE | ID: mdl-39012369
ABSTRACT
MOTIVATION Protein Language Models offer a new perspective for addressing challenges in structural biology, while relying solely on sequence information. Recent studies have investigated their effectiveness in forecasting shifts in thermodynamic stability caused by single amino acid mutations, a task known for its complexity due to the sparse availability of data, constrained by experimental limitations. To tackle this problem, we introduce two key novelties leveraging a Protein Language Model that incorporates Multiple Sequence Alignments to capture evolutionary information, and using a recently released mega-scale dataset with rigorous data pre-processing to mitigate overfitting.

RESULTS:

We ensure comprehensive comparisons by fine-tuning various pre-trained models, taking advantage of analyses such as ablation studies and baselines evaluation. Our methodology introduces a stringent policy to reduce the widespread issue of data leakage, rigorously removing sequences from the training set when they exhibit significant similarity with the test set. The MSA Transformer emerges as the most accurate among the models under investigation, given its capability to leverage co-evolution signals encoded in aligned homologous sequences. Moreover, the optimized MSA Transformer outperforms existing methods and exhibits enhanced generalization power, leading to a notable improvement in predicting changes in protein stability resulting from point mutations. AVAILABILITY AND IMPLEMENTATION Code and data at https//github.com/RitAreaSciencePark/PLM4Muts. SUPPLEMENTARY INFORMATION Supplementary Information is available at Bioinformatics online.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Italia

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Italia