Búsqueda | Portal de Búsqueda de la BVS

DR-BERT: A protein language model to annotate disordered regions.

Nambiar, Ananthan; Forsyth, John Malcolm; Liu, Simon; Maslov, Sergei.

Structure ; 32(8): 1260-1268.e3, 2024 Aug 08.

Artículo en Inglés | MEDLINE | ID: mdl-38701796

RESUMEN

Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT's ability to use contextual information.

Asunto(s)

Proteínas Intrínsecamente Desordenadas , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/metabolismo , Bases de Datos de Proteínas , Modelos Moleculares , Biología Computacional/métodos , Conformación Proteica , Anotación de Secuencia Molecular , Algoritmos

Transformer Neural Networks for Protein Family and Interaction Prediction Tasks.

Nambiar, Ananthan; Liu, Simon; Heflin, Maeve; Forsyth, John Malcolm; Maslov, Sergei; Hopkins, Mark; Ritz, Anna.

J Comput Biol ; 30(1): 95-111, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-35950958

RESUMEN

The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the-art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms other approaches for protein interaction prediction for two out of three different scenarios that we generated. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.

Asunto(s)

Redes Neurales de la Computación , Proteínas , Secuencia de Aminoácidos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA