Search | VHL Search Portal

DR-BERT: A protein language model to annotate disordered regions.

Nambiar, Ananthan; Forsyth, John Malcolm; Liu, Simon; Maslov, Sergei.

Structure ; 32(8): 1260-1268.e3, 2024 Aug 08.

Article in English | MEDLINE | ID: mdl-38701796

ABSTRACT

Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT's ability to use contextual information.

Subject(s)

Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Databases, Protein , Models, Molecular , Computational Biology/methods , Protein Conformation , Molecular Sequence Annotation , Algorithms

Transformer Neural Networks for Protein Family and Interaction Prediction Tasks.

Nambiar, Ananthan; Liu, Simon; Heflin, Maeve; Forsyth, John Malcolm; Maslov, Sergei; Hopkins, Mark; Ritz, Anna.

J Comput Biol ; 30(1): 95-111, 2023 01.

Article in English | MEDLINE | ID: mdl-35950958

ABSTRACT

The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the-art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms other approaches for protein interaction prediction for two out of three different scenarios that we generated. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.

Subject(s)

Neural Networks, Computer , Proteins , Amino Acid Sequence

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL