Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 551
Filter
1.
Sci Rep ; 14(1): 20692, 2024 09 05.
Article in English | MEDLINE | ID: mdl-39237735

ABSTRACT

Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.


Subject(s)
Evolution, Molecular , Proteins , Sequence Alignment , Proteins/metabolism , Proteins/genetics , Proteins/chemistry , Sequence Alignment/methods , Computational Biology/methods , Algorithms , Software , Sequence Analysis, Protein/methods
2.
J Mol Biol ; 436(17): 168531, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-39237204

ABSTRACT

Accurate models of protein tertiary structures are now available from numerous advanced prediction methods, although the accuracy of each method often varies depending on the specific protein target. Additionally, many models may still contain significant local errors. Therefore, reliable, independent model quality estimates are essential both for identifying errors and selecting the very best models for further biological investigations. ModFOLD9 is a leading independent server for detecting the local errors in models produced by any method, and it can accurately discriminate between high-quality models from multiple alternative approaches. ModFOLD9 incorporates several new scores from deep learning-based approaches, leading to greatly improved prediction accuracy compared with earlier versions of the server. ModFOLD9 is continuously independently benchmarked, and it is shown to be highly competitive with other public servers. ModFOLD9 is freely available at https://www.reading.ac.uk/bioinf/ModFOLD/.


Subject(s)
Internet , Models, Molecular , Protein Conformation , Proteins , Software , Proteins/chemistry , Proteins/metabolism , Computational Biology/methods , Deep Learning
3.
J Comput Biol ; 2024 Sep 09.
Article in English | MEDLINE | ID: mdl-39246251

ABSTRACT

The identification of intrinsically disordered proteins and their functional roles is largely dependent on the performance of computational predictors, necessitating a high standard of accuracy in these tools. In this context, we introduce a novel series of computational predictors, termed PDFll (Predictors of Disorder and Function of proteins from the Language of Life), which are designed to offer precise predictions of protein disorder and associated functional roles based on protein sequences. PDFll is developed through a two-step process. Initially, it leverages large-scale protein language models (pLMs), trained on an extensive dataset comprising billions of protein sequences. Subsequently, the embeddings derived from pLMs are integrated into streamlined, yet sophisticated, deep-learning models to generate predictions. These predictions notably surpass the performance of existing state-of-the-art predictors, particularly those that forecast disorder and function without utilizing evolutionary information.

4.
Front Mol Biosci ; 11: 1414916, 2024.
Article in English | MEDLINE | ID: mdl-39139810

ABSTRACT

Proteins, as the primary executors of physiological activity, serve as a key factor in disease diagnosis and treatment. Research into their structures, functions, and interactions is essential to better understand disease mechanisms and potential therapies. DeepMind's AlphaFold2, a deep-learning protein structure prediction model, has proven to be remarkably accurate, and it is widely employed in various aspects of diagnostic research, such as the study of disease biomarkers, microorganism pathogenicity, antigen-antibody structures, and missense mutations. Thus, AlphaFold2 serves as an exceptional tool to bridge fundamental protein research with breakthroughs in disease diagnosis, developments in diagnostic strategies, and the design of novel therapeutic approaches and enhancements in precision medicine. This review outlines the architecture, highlights, and limitations of AlphaFold2, placing particular emphasis on its applications within diagnostic research grounded in disciplines such as immunology, biochemistry, molecular biology, and microbiology.

5.
Clin Transl Med ; 14(8): e1789, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39090739

ABSTRACT

Recent advancements in artificial intelligence (AI) have accelerated the prediction of unknown protein structures. However, accurately predicting the three-dimensional (3D) structures of fusion proteins remains a difficult task because the current AI-based protein structure predictions are focused on the WT proteins rather than on the newly fused proteins in nature. Following the central dogma of biology, fusion proteins are translated from fusion transcripts, which are made by transcribing the fusion genes between two different loci through the chromosomal rearrangements in cancer. Accurately predicting the 3D structures of fusion proteins is important for understanding the functional roles and mechanisms of action of new chimeric proteins. However, predicting their 3D structure using a template-based model is challenging because known template structures are often unavailable in databases. Deep learning (DL) models that utilize multi-level protein information have revolutionized the prediction of protein 3D structures. In this review paper, we highlighted the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using DL models. We aim to explore both the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta and D-I-TASSER for modelling the 3D structures. HIGHLIGHTS: This review provides the overall pipeline and landscape of the prediction of the 3D structure of fusion protein. This review provides the factors that should be considered in predicting the 3D structures of fusion proteins using AI approaches in each step. This review highlights the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using deep learning models. This review explores the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta, and D-I-TASSER to model 3D structures.


Subject(s)
Artificial Intelligence , Humans , Protein Conformation , Deep Learning
6.
Proc Natl Acad Sci U S A ; 121(34): e2315002121, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39133843

ABSTRACT

Two years on from the initial release of AlphaFold, we have seen its widespread adoption as a structure prediction tool. Here, we discuss some of the latest work based on AlphaFold, with a particular focus on its use within the structural biology community. This encompasses use cases like speeding up structure determination itself, enabling new computational studies, and building new tools and workflows. We also look at the ongoing validation of AlphaFold, as its predictions continue to be compared against large numbers of experimental structures to further delineate the model's capabilities and limitations.

7.
Br J Pharmacol ; 2024 Aug 29.
Article in English | MEDLINE | ID: mdl-39209310

ABSTRACT

G protein-coupled receptors (GPCRs) play a crucial role in cell function by transducing signals from the extracellular environment to the inside of the cell. They mediate the effects of various stimuli, including hormones, neurotransmitters, ions, photons, food tastants and odorants, and are renowned drug targets. Advancements in structural biology techniques, including X-ray crystallography and cryo-electron microscopy (cryo-EM), have driven the elucidation of an increasing number of GPCR structures. These structures reveal novel features that shed light on receptor activation, dimerization and oligomerization, dichotomy between orthosteric and allosteric modulation, and the intricate interactions underlying signal transduction, providing insights into diverse ligand-binding modes and signalling pathways. However, a substantial portion of the GPCR repertoire and their activation states remain structurally unexplored. Future efforts should prioritize capturing the full structural diversity of GPCRs across multiple dimensions. To do so, the integration of structural biology with biophysical and computational techniques will be essential. We describe in this review the progress of nuclear magnetic resonance (NMR) to examine GPCR plasticity and conformational dynamics, of atomic force microscopy (AFM) to explore the spatial-temporal dynamics and kinetic aspects of GPCRs, and the recent breakthroughs in artificial intelligence for protein structure prediction to characterize the structures of the entire GPCRome. In summary, the journey through GPCR structural biology provided in this review illustrates how far we have come in decoding these essential proteins architecture and function. Looking ahead, integrating cutting-edge biophysics and computational tools offers a path to navigating the GPCR structural landscape, ultimately advancing GPCR-based applications.

8.
Top Curr Chem (Cham) ; 382(3): 23, 2024 Jul 04.
Article in English | MEDLINE | ID: mdl-38965117

ABSTRACT

In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.


Subject(s)
Deep Learning , Protein Conformation , Proteins , Proteins/chemistry , Proteins/metabolism , Databases, Protein , Algorithms
9.
J Comput Biol ; 31(7): 691-702, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38979621

ABSTRACT

Proteins are essential to life, and understanding their intrinsic roles requires determining their structure. The field of proteomics has opened up new opportunities by applying deep learning algorithms to large databases of solved protein structures. With the availability of large data sets and advanced machine learning methods, the prediction of protein residue interactions has greatly improved. Protein contact maps provide empirical evidence of the interacting residue pairs within a protein sequence. Template-free protein structure prediction systems rely heavily on this information. This article proposes UNet-CON, an attention-integrated UNet architecture, trained to predict residue-residue contacts in protein sequences. With the predicted contacts being more accurate than state-of-the-art methods on the PDB25 test set, the model paves the way for the development of more powerful deep learning algorithms for predicting protein residue interactions.


Subject(s)
Algorithms , Computational Biology , Databases, Protein , Proteins , Proteins/chemistry , Proteins/genetics , Computational Biology/methods , Deep Learning , Protein Conformation , Models, Molecular , Machine Learning
10.
Front Neurol ; 15: 1365787, 2024.
Article in English | MEDLINE | ID: mdl-39011359

ABSTRACT

Hereditary spastic paraplegia (HSP) is a rare neurodegenerative disease prominently characterized by slowly progressive lower limb weakness and spasticity. The significant genotypic and phenotypic heterogeneity of this disease makes its accurate diagnosis challenging. In this study, we identified the NM_001168272: c.2714A > G (chr3.hg19: g.4716912A > G, N905S) variant in the ITPR1 gene in a three-generation Chinese family with multiple individuals affected by HSP, which we believed to be associated with HSP pathogenesis. To confirm, we performed whole exome sequencing, copy number variant assays, dynamic mutation analysis of the entire family, and protein structure prediction. The variant identified in this study was in the coupling domain, and this is the first corroborated report assigning ITPR1 variants to HSP. These findings expand the clinical and genetic spectrum of HSP and provide important data for its genetic analysis and diagnosis.

11.
Methods Mol Biol ; 2780: 149-162, 2024.
Article in English | MEDLINE | ID: mdl-38987469

ABSTRACT

Protein-protein interactions are involved in almost all processes in a living cell and determine the biological functions of proteins. To obtain mechanistic understandings of protein-protein interactions, the tertiary structures of protein complexes have been determined by biophysical experimental methods, such as X-ray crystallography and cryogenic electron microscopy. However, as experimental methods are costly in resources, many computational methods have been developed that model protein complex structures. One of the difficulties in computational protein complex modeling (protein docking) is to select the most accurate models among many models that are usually generated by a docking method. This article reviews advances in protein docking model assessment methods, focusing on recent developments that apply deep learning to several network architectures.


Subject(s)
Deep Learning , Molecular Docking Simulation , Proteins , Molecular Docking Simulation/methods , Proteins/chemistry , Proteins/metabolism , Protein Binding , Computational Biology/methods , Protein Interaction Mapping/methods , Software , Protein Conformation , Crystallography, X-Ray/methods
12.
Methods Mol Biol ; 2836: 235-252, 2024.
Article in English | MEDLINE | ID: mdl-38995544

ABSTRACT

AlphaFold2 (AF2) has emerged in recent years as a groundbreaking innovation that has revolutionized several scientific fields, in particular structural biology, drug design, and the elucidation of disease mechanisms. Many scientists now use AF2 on a daily basis, including non-specialist users. This chapter is aimed at the latter. Tips and tricks for getting the most out of AF2 to produce a high-quality biological model are discussed here. We suggest to non-specialist users how to maintain a critical perspective when working with AF2 models and provide guidelines on how to properly evaluate them. After showing how to perform our own structure prediction using ColabFold, we list several ways to improve AF2 models by adding information that is missing from the original AF2 model. By using software such as AlphaFill to add cofactors and ligands to the models, or MODELLER to add disulfide bridges between cysteines, we guide users to build a high-quality biological model suitable for applications such as drug design, protein interaction, or molecular dynamics studies.


Subject(s)
Models, Molecular , Protein Conformation , Proteins , Software , Proteins/chemistry , Computational Biology/methods , Protein Folding , Algorithms , Humans
13.
Protein Sci ; 33(8): e5112, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39031445

ABSTRACT

The missense tolerance ratio (MTR) was developed as a novel approach to assess the deleteriousness of variants. Its three-dimensional successor, MTR3D, was demonstrated powerful at discriminating pathogenic from benign variants. However, its reliance on experimental structures and homologs limited its coverage of the proteome. We have now utilized AlphaFold2 models to develop MTR3D-AF2, which covers 89.31% of proteins and 85.39% of residues across the human proteome. This work has improved MTR3D's ability to distinguish clinically established pathogenic from benign variants. MTR3D-AF2 is freely available as an interactive web server at https://biosig.lab.uq.edu.au/mtr3daf2/.


Subject(s)
Mutation, Missense , Proteome , Humans , Proteome/chemistry , Proteome/genetics , Proteome/analysis , Proteome/metabolism , Software , Models, Molecular , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Databases, Protein
14.
Sci Rep ; 14(1): 16488, 2024 07 17.
Article in English | MEDLINE | ID: mdl-39020005

ABSTRACT

Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018-2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.


Subject(s)
Protein Structure, Secondary , Proteins , Proteins/chemistry , Deep Learning , Neural Networks, Computer , Computational Biology/methods , Databases, Protein , Models, Molecular
15.
Comput Biol Med ; 179: 108815, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38986287

ABSTRACT

Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.


Subject(s)
Protein Conformation , Proteins , Proteins/chemistry , Models, Molecular , Computational Biology/methods , Humans , Sequence Analysis, Protein/methods , Deep Learning , Databases, Protein
16.
bioRxiv ; 2024 Feb 18.
Article in English | MEDLINE | ID: mdl-38903115

ABSTRACT

Microproteins encoded by small open reading frames (smORFs) comprise the "dark matter" of proteomes. Although functional microproteins were identified in diverse organisms from all three domains of life, bacterial smORFs remain poorly characterized. In this comprehensive study of intergenic smORFs (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identified 67,297 clusters of ismORFs subject to purifying selection. The ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins. Using AlphaFold Multimer, we predicted interactions of some of the predicted microproteins encoded by transcribed ismORFs with proteins encoded by neighboring genes, revealing the potential of microproteins to regulate the activity of various proteins, particularly, under stress. We compiled a catalog of predicted microprotein families with different levels of evidence from synteny analysis, structure prediction, and transcription and translation data. This study offers a resource for investigation of biological functions of microproteins.

17.
J Mol Biol ; 436(17): 168593, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38718922

ABSTRACT

We develop a novel database Alpha&ESMhFolds which allows the direct comparison of AlphaFold2 and ESMFold predicted models for 42,942 proteins of the Reference Human Proteome, and when available, their comparison with 2,900 directly associated PDB structures with at least a structure to sequence coverage of 70%. Statistics indicate that good quality models tend to overlap with a TM-score >0.6 as long as some PDB structural information is available. As expected, a direct model superimposition to the PDB structure highlights that AlphaFold2 models are slightly superior to ESMFold ones. However, some 55% of the database is endowed with models overlapping with TM-score <0.6. This highlights the different outputs of the two methods. The database is freely available for usage at https://alpha-esmhfolds.biocomp.unibo.it/.


Subject(s)
Proteome , Software , Humans , Databases, Protein , Models, Molecular , Protein Folding , Internet , Computational Biology/methods , Protein Conformation
18.
J Integr Bioinform ; 21(2)2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38797876

ABSTRACT

Protein structure determination has made progress with the aid of deep learning models, enabling the prediction of protein folding from protein sequences. However, obtaining accurate predictions becomes essential in certain cases where the protein structure remains undescribed. This is particularly challenging when dealing with rare, diverse structures and complex sample preparation. Different metrics assess prediction reliability and offer insights into result strength, providing a comprehensive understanding of protein structure by combining different models. In a previous study, two proteins named ARM58 and ARM56 were investigated. These proteins contain four domains of unknown function and are present in Leishmania spp. ARM refers to an antimony resistance marker. The study's main objective is to assess the accuracy of the model's predictions, thereby providing insights into the complexities and supporting metrics underlying these findings. The analysis also extends to the comparison of predictions obtained from other species and organisms. Notably, one of these proteins shares an ortholog with Trypanosoma cruzi and Trypanosoma brucei, leading further significance to our analysis. This attempt underscored the importance of evaluating the diverse outputs from deep learning models, facilitating comparisons across different organisms and proteins. This becomes particularly pertinent in cases where no previous structural information is available.


Subject(s)
Protein Folding , Protozoan Proteins , Protozoan Proteins/chemistry , Protozoan Proteins/metabolism , Trypanosoma cruzi , Leishmania , Deep Learning , Trypanosoma brucei brucei/metabolism , Models, Molecular , Computational Biology/methods
19.
Structure ; 32(8): 1260-1268.e3, 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-38701796

ABSTRACT

Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT's ability to use contextual information.


Subject(s)
Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Databases, Protein , Models, Molecular , Computational Biology/methods , Protein Conformation , Molecular Sequence Annotation , Algorithms
SELECTION OF CITATIONS
SEARCH DETAIL