Your browser doesn't support javascript.
loading
TransPPMP: predicting pathogenicity of frameshift and non-sense mutations by a Transformer based on protein features.
Nie, Liangpeng; Quan, Lijun; Wu, Tingfang; He, Ruji; Lyu, Qiang.
Afiliação
  • Nie L; School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Quan L; School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Wu T; Province Key Lab for Information Processing Technologies, Soochow University, Suzhou 215006, China.
  • He R; Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China.
  • Lyu Q; School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
Bioinformatics ; 38(10): 2705-2711, 2022 05 13.
Article em En | MEDLINE | ID: mdl-35561183
ABSTRACT
MOTIVATION Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched.

RESULTS:

We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short. AVAILABILITY AND IMPLEMENTATION TransPPMP is available at https//github.com/lennylv/TransPPMP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Mutação da Fase de Leitura Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Mutação da Fase de Leitura Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article