Your browser doesn't support javascript.
loading
E-SNPs&GO: embedding of protein sequence and function improves the annotation of human pathogenic variants.
Manfredi, Matteo; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita.
Afiliación
  • Manfredi M; Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy.
  • Savojardo C; Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy.
  • Martelli PL; Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy.
  • Casadio R; Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy.
Bioinformatics ; 38(23): 5168-5174, 2022 11 30.
Article en En | MEDLINE | ID: mdl-36227117
MOTIVATION: The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new encoding schemes for an efficient annotation of protein variants. RESULTS: E-SNPs&GO is a novel method that, given an input protein sequence and a single amino acid variation, can predict whether the variation is related to diseases or not. The proposed method adopts an input encoding completely based on protein language models and embedding techniques, specifically devised to encode protein sequences and GO functional annotations. We trained our model on a newly generated dataset of 101 146 human protein single amino acid variants in 13 661 proteins, derived from public resources. When tested on a blind set comprising 10 266 variants, our method well compares to recent approaches released in literature for the same task, reaching a Matthews Correlation Coefficient score of 0.72. We propose E-SNPs&GO as a suitable, efficient and accurate large-scale annotator of protein variant datasets. AVAILABILITY AND IMPLEMENTATION: The method is available as a webserver at https://esnpsandgo.biocomp.unibo.it. Datasets and predictions are available at https://esnpsandgo.biocomp.unibo.it/datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Inteligencia Artificial / Polimorfismo de Nucleótido Simple Tipo de estudio: Clinical_trials / Prognostic_studies Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: Italia

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Inteligencia Artificial / Polimorfismo de Nucleótido Simple Tipo de estudio: Clinical_trials / Prognostic_studies Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: Italia