Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.

Chu, Hongkang; Liu, Taigang.

Int J Mol Sci ; 25(8)2024 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-38674091

RESUMO

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.

Assuntos

Proteínas , Proteínas/metabolismo , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Matrizes de Pontuação de Posição Específica , Bases de Dados de Proteínas , Humanos , Algoritmos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA