Your browser doesn't support javascript.
loading
Exploiting protein language models for the precise classification of ion channels and ion transporters.
Ghazikhani, Hamed; Butler, Gregory.
Affiliation
  • Ghazikhani H; Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada.
  • Butler G; Centre for Structural and Functional Genomics, Concordia University, Montréal, Québec, Canada.
Proteins ; 92(8): 998-1055, 2024 Aug.
Article de En | MEDLINE | ID: mdl-38656743
ABSTRACT
This study introduces TooT-PLM-ionCT, a comprehensive framework that consolidates three distinct systems, each meticulously tailored for one of the following tasks distinguishing ion channels (ICs) from membrane proteins (MPs), segregating ion transporters (ITs) from MPs, and differentiating ICs from ITs. Drawing upon the strengths of six Protein Language Models (PLMs)-ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B parameters), TooT-PLM-ionCT employs a combination of traditional classifiers and deep learning models for nuanced protein classification. Originally validated on an existing dataset by previous researchers, our systems demonstrated superior performance in identifying ITs from MPs and distinguishing ICs from ITs, with the IC-MP discrimination achieving state-of-the-art results. In light of recommendations for additional validation, we introduced a new dataset, significantly enhancing the robustness and generalization of our models across bioinformatics challenges. This new evaluation underscored the effectiveness of TooT-PLM-ionCT in adapting to novel data while maintaining high classification accuracy. Furthermore, this study explores critical factors affecting classification accuracy, such as dataset balancing, the impact of using frozen versus fine-tuned PLM representations, and the variance between half and full precision in floating-point computations. To facilitate broader application and accessibility, a web server (https//tootsuite.encs.concordia.ca/service/TooT-PLM-ionCT) has been developed, allowing users to evaluate unknown protein sequences through our specialized systems for IC-MP, IT-MP, and IC-IT classification tasks.
Sujet(s)
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Biologie informatique / Apprentissage profond / Canaux ioniques Limites: Humans Langue: En Journal: Proteins Sujet du journal: BIOQUIMICA Année: 2024 Type de document: Article Pays d'affiliation: Canada

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Biologie informatique / Apprentissage profond / Canaux ioniques Limites: Humans Langue: En Journal: Proteins Sujet du journal: BIOQUIMICA Année: 2024 Type de document: Article Pays d'affiliation: Canada