|

1.

Disentangling Glycan-Protein Interactions: Nuclear Magnetic Resonance (NMR) to the Rescue.

Bertuzzi, Sara; Poveda, Ana; Ardá, Ana; Gimeno, Ana; Jiménez-Barbero, Jesús.

J Vis Exp ; (207)2024 May 17.

Article En | MEDLINE | ID: mdl-38829120

The interactions of glycans with proteins modulate many events related to health and disease. In fact, the establishment of these recognition events and their biological consequences are intimately related to the three-dimensional structures of both partners, as well as to their dynamic features and their presentation on the corresponding cell compartments. NMR techniques are unique to disentangle these characteristics and, indeed, diverse NMR-based methodologies have been developed and applied to monitor the binding events of glycans with their associate receptors. This protocol outlines the procedures to acquire, process and analyze two of the most powerful NMR methodologies employed in the NMR-glycobiology field, 1H-Saturation transfer difference (STD) and 1H,15N-Heteronuclear single quantum coherence (HSQC) titration experiments, which complementarily offer information from the glycan and protein perspective, respectively. Indeed, when combined they offer a powerful toolkit for elucidating both the structural and dynamic aspects of molecular recognition processes. This comprehensive approach enhances our understanding of glycan-protein interactions and contributes to advancing research in the chemical glycobiology field.

Polysaccharides , Polysaccharides/chemistry , Polysaccharides/metabolism , Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Proteins/metabolism

2.

GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling.

Li, Bin; Ming, Dengming.

BMC Bioinformatics ; 25(1): 204, 2024 Jun 01.

Article En | MEDLINE | ID: mdl-38824535

BACKGROUND: Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS: In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS: GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .

Proteins , Solubility , Proteins/chemistry , Proteins/metabolism , Protein Conformation , Databases, Protein , Computational Biology/methods , Software , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/chemistry , Algorithms , Models, Molecular , Amino Acid Sequence

3.

VISH-Pred: an ensemble of fine-tuned ESM models for protein toxicity prediction.

Mall, Raghvendra; Singh, Ankita; Patel, Chirag N; Guirimand, Gregory; Castiglione, Filippo.

Brief Bioinform ; 25(4)2024 May 23.

Article En | MEDLINE | ID: mdl-38842509

Peptide- and protein-based therapeutics are becoming a promising treatment regimen for myriad diseases. Toxicity of proteins is the primary hurdle for protein-based therapies. Thus, there is an urgent need for accurate in silico methods for determining toxic proteins to filter the pool of potential candidates. At the same time, it is imperative to precisely identify non-toxic proteins to expand the possibilities for protein-based biologics. To address this challenge, we proposed an ensemble framework, called VISH-Pred, comprising models built by fine-tuning ESM2 transformer models on a large, experimentally validated, curated dataset of protein and peptide toxicities. The primary steps in the VISH-Pred framework are to efficiently estimate protein toxicities taking just the protein sequence as input, employing an under sampling technique to handle the humongous class-imbalance in the data and learning representations from fine-tuned ESM2 protein language models which are then fed to machine learning techniques such as Lightgbm and XGBoost. The VISH-Pred framework is able to correctly identify both peptides/proteins with potential toxicity and non-toxic proteins, achieving a Matthews correlation coefficient of 0.737, 0.716 and 0.322 and F1-score of 0.759, 0.696 and 0.713 on three non-redundant blind tests, respectively, outperforming other methods by over $10\%$ on these quality metrics. Moreover, VISH-Pred achieved the best accuracy and area under receiver operating curve scores on these independent test sets, highlighting the robustness and generalization capability of the framework. By making VISH-Pred available as an easy-to-use web server, we expect it to serve as a valuable asset for future endeavors aimed at discerning the toxicity of peptides and enabling efficient protein-based therapeutics.

Proteins , Proteins/metabolism , Proteins/chemistry , Machine Learning , Databases, Protein , Computational Biology/methods , Humans , Peptides/toxicity , Peptides/chemistry , Computer Simulation , Algorithms , Software

4.

MechanoProDB: a web-based database for exploring the mechanical properties of proteins.

Mesbah, Ismahene; Habermann, Bianca; Rico, Felix.

Database (Oxford) ; 20242024 Jun 05.

Article En | MEDLINE | ID: mdl-38837788

The mechanical stability of proteins is crucial for biological processes. To understand the mechanical functions of proteins, it is important to know the protein structure and mechanical properties. Protein mechanics is usually investigated through force spectroscopy experiments and simulations that probe the forces required to unfold the protein of interest. While there is a wealth of data in the literature on force spectroscopy experiments and steered molecular dynamics simulations of forced protein unfolding, this information is spread and difficult to access by non-experts. Here, we introduce MechanoProDB, a novel web-based database resource for collecting and mining data obtained from experimental and computational works. MechanoProDB provides a curated repository for a wide range of proteins, including muscle proteins, adhesion molecules and membrane proteins. The database incorporates relevant parameters that provide insights into the mechanical stability of proteins and their conformational stability such as the unfolding forces, energy landscape parameters and contour lengths of unfolding steps. Additionally, it provides intuitive annotations of the unfolding pathways of each protein, allowing users to explore the individual steps during mechanical unfolding. The user-friendly interface of MechanoProDB allows researchers to efficiently navigate, search and download data pertaining to specific protein folds or experimental conditions. Users can visualize protein structures using interactive tools integrated within the database, such as Mol*, and plot available data through integrated plotting tools. To ensure data quality and reliability, we have carefully manually verified and curated the data currently available on MechanoProDB. Furthermore, the database also features an interface that enables users to contribute new data and annotations, promoting community-driven comprehensiveness. The freely available MechanoProDB aims to streamline and accelerate research in the field of mechanobiology and biophysics by offering a unique platform for data sharing and analysis. MechanoProDB is freely available at https://mechanoprodb.ibdm.univ-amu.fr.

Databases, Protein , Internet , Proteins , Proteins/chemistry , Proteins/metabolism , User-Computer Interface , Protein Unfolding

5.

A microfluidic platform for the synthesis of polymer and polymer-protein-based protocells.

O'Callaghan, Jessica Ann; Kamat, Neha P; Vargo, Kevin B; Chattaraj, Rajarshi; Lee, Daeyeon; Hammer, Daniel A.

Eur Phys J E Soft Matter ; 47(6): 37, 2024 Jun 03.

Article En | MEDLINE | ID: mdl-38829453

In this study, we demonstrate the fabrication of polymersomes, protein-blended polymersomes, and polymeric microcapsules using droplet microfluidics. Polymersomes with uniform, single bilayers and controlled diameters are assembled from water-in-oil-in-water double-emulsion droplets. This technique relies on adjusting the interfacial energies of the droplet to completely separate the polymer-stabilized inner core from the oil shell. Protein-blended polymersomes are prepared by dissolving protein in the inner and outer phases of polymer-stabilized droplets. Cell-sized polymeric microcapsules are assembled by size reduction in the inner core through osmosis followed by evaporation of the middle phase. All methods are developed and validated using the same glass-capillary microfluidic apparatus. This integrative approach not only demonstrates the versatility of our setup, but also holds significant promise for standardizing and customizing the production of polymer-based artificial cells.

Artificial Cells , Polymers , Artificial Cells/chemistry , Polymers/chemistry , Polymers/chemical synthesis , Emulsions/chemistry , Capsules/chemistry , Microfluidics/methods , Water/chemistry , Microfluidic Analytical Techniques , Proteins/chemistry

6.

SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.

Zhang, Bin; Hou, Zilong; Yang, Yuning; Wong, Ka-Chun; Zhu, Haoran; Li, Xiangtao.

Commun Biol ; 7(1): 679, 2024 Jun 03.

Article En | MEDLINE | ID: mdl-38830995

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .

Deep Learning , Binding Sites , Nucleic Acids/metabolism , Nucleic Acids/chemistry , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Protein Binding , Computational Biology/methods

7.

Diversifying de novo TIM barrels by hallucination.

Beck, Julian; Shanmugaratnam, Sooruban; Höcker, Birte.

Protein Sci ; 33(6): e5001, 2024 Jun.

Article En | MEDLINE | ID: mdl-38723111

De novo protein design expands the protein universe by creating new sequences to accomplish tailor-made enzymes in the future. A promising topology to implement diverse enzyme functions is the ubiquitous TIM-barrel fold. Since the initial de novo design of an idealized four-fold symmetric TIM barrel, the family of de novo TIM barrels is expanding rapidly. Despite this and in contrast to natural TIM barrels, these novel proteins lack cavities and structural elements essential for the incorporation of binding sites or enzymatic functions. In this work, we diversified a de novo TIM barrel by extending multiple ßα-loops using constrained hallucination. Experimentally tested designs were found to be soluble upon expression in Escherichia coli and well-behaved. Biochemical characterization and crystal structures revealed successful extensions with defined α-helical structures. These diversified de novo TIM barrels provide a framework to explore a broad spectrum of functions based on the potential of natural TIM barrels.

Models, Molecular , Escherichia coli/genetics , Escherichia coli/metabolism , Crystallography, X-Ray , Protein Folding , Protein Engineering/methods , Proteins/chemistry , Proteins/metabolism

8.

TransPTM: a transformer-based model for non-histone acetylation site prediction.

Meng, Lingkuan; Chen, Xingjian; Cheng, Ke; Chen, Nanjun; Zheng, Zetian; Wang, Fuzhou; Sun, Hongyan; Wong, Ka-Chun.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38725156

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.

Neural Networks, Computer , Protein Processing, Post-Translational , Acetylation , Computational Biology/methods , Databases, Protein , Software , Algorithms , Humans , Proteins/chemistry , Proteins/metabolism

9.

Treatment of flexibility of protein backbone in simulations of protein-ligand interactions using steered molecular dynamics.

Truong, Duc Toan; Ho, Kiet; Pham, Dinh Quoc Huy; Chwastyk, Mateusz; Nguyen-Minh, Thai; Nguyen, Minh Tho.

Sci Rep ; 14(1): 10475, 2024 05 07.

Article En | MEDLINE | ID: mdl-38714683

To ensure that an external force can break the interaction between a protein and a ligand, the steered molecular dynamics simulation requires a harmonic restrained potential applied to the protein backbone. A usual practice is that all or a certain number of protein's heavy atoms or Cα atoms are fixed, being restrained by a small force. This present study reveals that while fixing both either all heavy atoms and or all Cα atoms is not a good approach, while fixing a too small number of few atoms sometimes cannot prevent the protein from rotating under the influence of the bulk water layer, and the pulled molecule may smack into the wall of the active site. We found that restraining the Cα atoms under certain conditions is more relevant. Thus, we would propose an alternative solution in which only the Cα atoms of the protein at a distance larger than 1.2 nm from the ligand are restrained. A more flexible, but not too flexible, protein will be expected to lead to a more natural release of the ligand.

Molecular Dynamics Simulation , Protein Binding , Proteins , Ligands , Proteins/chemistry , Proteins/metabolism , Protein Conformation

10.

Powerful 'nanopore' DNA sequencing method tackles proteins too.

Seydel, Caroline.

Nature ; 629(8011): 492-493, 2024 May.

Article En | MEDLINE | ID: mdl-38720035

Nanopore Sequencing , Proteins , Proteomics , Sequence Analysis, DNA , Nanopore Sequencing/methods , Nanopore Sequencing/trends , Proteins/analysis , Proteins/chemistry , Sequence Analysis, DNA/methods , Proteomics/methods , Proteomics/trends

11.

Protein function prediction through multi-view multi-label latent tensor reconstruction.

Armah-Sekum, Robert Ebo; Szedmak, Sandor; Rousu, Juho.

BMC Bioinformatics ; 25(1): 174, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698340

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .

Computational Biology , Proteins , Proteins/chemistry , Proteins/metabolism , Computational Biology/methods , Databases, Protein , Algorithms

12.

Identifying and avoiding radiation damage in macromolecular crystallography.

Shelley, Kathryn L; Garman, Elspeth F.

Acta Crystallogr D Struct Biol ; 80(Pt 5): 314-327, 2024 May 01.

Article En | MEDLINE | ID: mdl-38700059

Radiation damage remains one of the major impediments to accurate structure solution in macromolecular crystallography. The artefacts of radiation damage can manifest as structural changes that result in incorrect biological interpretations being drawn from a model, they can reduce the resolution to which data can be collected and they can even prevent structure solution entirely. In this article, we discuss how to identify and mitigate against the effects of radiation damage at each stage in the macromolecular crystal structure-solution pipeline.

Macromolecular Substances , Crystallography, X-Ray/methods , Macromolecular Substances/chemistry , Models, Molecular , Proteins/chemistry

13.

Exploring the fragmentation efficiency of proteins analyzed by MALDI-TOF-TOF tandem mass spectrometry using computational and statistical analyses.

Park, Jihyun; Fagerquist, Clifton K.

PLoS One ; 19(5): e0299287, 2024.

Article En | MEDLINE | ID: mdl-38701058

Matrix-assisted laser desorption/ionization time-of-flight-time-of-flight (MALDI-TOF-TOF) tandem mass spectrometry (MS/MS) is a rapid technique for identifying intact proteins from unfractionated mixtures by top-down proteomic analysis. MS/MS allows isolation of specific intact protein ions prior to fragmentation, allowing fragment ion attribution to a specific precursor ion. However, the fragmentation efficiency of mature, intact protein ions by MS/MS post-source decay (PSD) varies widely, and the biochemical and structural factors of the protein that contribute to it are poorly understood. With the advent of protein structure prediction algorithms such as Alphafold2, we have wider access to protein structures for which no crystal structure exists. In this work, we use a statistical approach to explore the properties of bacterial proteins that can affect their gas phase dissociation via PSD. We extract various protein properties from Alphafold2 predictions and analyze their effect on fragmentation efficiency. Our results show that the fragmentation efficiency from cleavage of the polypeptide backbone on the C-terminal side of glutamic acid (E) and asparagine (N) residues were nearly equal. In addition, we found that the rearrangement and cleavage on the C-terminal side of aspartic acid (D) residues that result from the aspartic acid effect (AAE) were higher than for E- and N-residues. From residue interaction network analysis, we identified several local centrality measures and discussed their implications regarding the AAE. We also confirmed the selective cleavage of the backbone at D-proline bonds in proteins and further extend it to N-proline bonds. Finally, we note an enhancement of the AAE mechanism when the residue on the C-terminal side of D-, E- and N-residues is glycine. To the best of our knowledge, this is the first report of this phenomenon. Our study demonstrates the value of using statistical analyses of protein sequences and their predicted structures to better understand the fragmentation of the intact protein ions in the gas phase.

Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization , Tandem Mass Spectrometry , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Tandem Mass Spectrometry/methods , Bacterial Proteins/chemistry , Proteomics/methods , Algorithms , Proteins/chemistry , Proteins/analysis

14.

DeepSS2GO: protein function prediction from secondary structure.

Song, Fu V; Su, Jiaqi; Huang, Sixing; Zhang, Neng; Li, Kaiyue; Ni, Ming; Liao, Maofu.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38701416

Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.

Algorithms , Computational Biology , Neural Networks, Computer , Protein Structure, Secondary , Proteins , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Computational Biology/methods , Databases, Protein , Gene Ontology , Sequence Analysis, Protein/methods , Software

15.

Scoring alignments by embedding vector similarity.

Ashrafzadeh, Sepehr; Golding, G Brian; Ilie, Silvana; Ilie, Lucian.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38695119

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.

Algorithms , Computational Biology , Sequence Alignment , Sequence Alignment/methods , Computational Biology/methods , Software , Sequence Analysis, Protein/methods , Amino Acid Sequence , Proteins/chemistry , Proteins/genetics , Deep Learning , Databases, Protein

16.

Freeprotmap: waiting-free prediction method for protein distance map.

Huang, Jiajian; Li, Jinpeng; Chen, Qinchang; Wang, Xia; Chen, Guangyong; Tang, Jin.

BMC Bioinformatics ; 25(1): 176, 2024 May 04.

Article En | MEDLINE | ID: mdl-38704533

BACKGROUND: Protein residue-residue distance maps are used for remote homology detection, protein information estimation, and protein structure research. However, existing prediction approaches are time-consuming, and hundreds of millions of proteins are discovered each year, necessitating the development of a rapid and reliable prediction method for protein residue-residue distances. Moreover, because many proteins lack known homologous sequences, a waiting-free and alignment-free deep learning method is needed. RESULT: In this study, we propose a learning framework named FreeProtMap. In terms of protein representation processing, the proposed group pooling in FreeProtMap effectively mitigates issues arising from high-dimensional sparseness in protein representation. In terms of model structure, we have made several careful designs. Firstly, it is designed based on the locality of protein structures and triangular inequality distance constraints to improve prediction accuracy. Secondly, inference speed is improved by using additive attention and lightweight design. Besides, the generalization ability is improved by using bottlenecks and a neural network block named local microformer. As a result, FreeProtMap can predict protein residue-residue distances in tens of milliseconds and has higher precision than the best structure prediction method. CONCLUSION: Several groups of comparative experiments and ablation experiments verify the effectiveness of the designs. The results demonstrate that FreeProtMap significantly outperforms other state-of-the-art methods in accurate protein residue-residue distance prediction, which is beneficial for lots of protein research works. It is worth mentioning that we could scan all proteins discovered each year based on FreeProtMap to find structurally similar proteins in a short time because the fact that the structure similarity calculation method based on distance maps is much less time-consuming than algorithms based on 3D structures.

Proteins , Proteins/chemistry , Computational Biology/methods , Databases, Protein , Protein Conformation , Algorithms , Sequence Analysis, Protein/methods , Neural Networks, Computer

17.

Studying protein stability in crowded environments by NMR.

Xu, Guohua; Cheng, Kai; Liu, Maili; Li, Conggang.

Prog Nucl Magn Reson Spectrosc ; 140-141: 42-48, 2024.

Article En | MEDLINE | ID: mdl-38705635

Most proteins perform their functions in crowded and complex cellular environments where weak interactions are ubiquitous between biomolecules. These complex environments can modulate the protein folding energy landscape and hence affect protein stability. NMR is a nondestructive and effective method to quantify the kinetics and equilibrium thermodynamic stability of proteins at an atomic level within crowded environments and living cells. Here, we review NMR methods that can be used to measure protein stability, as well as findings of studies on protein stability in crowded environments mimicked by polymer and protein crowders and in living cells. The important effects of chemical interactions on protein stability are highlighted and compared to spatial excluded volume effects.

Nuclear Magnetic Resonance, Biomolecular , Protein Stability , Proteins , Proteins/chemistry , Nuclear Magnetic Resonance, Biomolecular/methods , Thermodynamics , Humans , Protein Folding , Kinetics , Magnetic Resonance Spectroscopy/methods

18.

Evaluating large language models for annotating proteins.

Vitale, Rosario; Bugnon, Leandro A; Fenoy, Emilio Luis; Milone, Diego H; Stegmayer, Georgina.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38706315

In UniProtKB, up to date, there are more than 251 million proteins deposited. However, only 0.25% have been annotated with one of the more than 15000 possible Pfam family domains. The current annotation protocol integrates knowledge from manually curated family domains, obtained using sequence alignments and hidden Markov models. This approach has been successful for automatically growing the Pfam annotations, however at a low rate in comparison to protein discovery. Just a few years ago, deep learning models were proposed for automatic Pfam annotation. However, these models demand a considerable amount of training data, which can be a challenge with poorly populated families. To address this issue, we propose and evaluate here a novel protocol based on transfer learningThis requires the use of protein large language models (LLMs), trained with self-supervision on big unnanotated datasets in order to obtain sequence embeddings. Then, the embeddings can be used with supervised learning on a small and annotated dataset for a specialized task. In this protocol we have evaluated several cutting-edge protein LLMs together with machine learning architectures to improve the actual prediction of protein domain annotations. Results are significatively better than state-of-the-art for protein families classification, reducing the prediction error by an impressive 60% compared to standard methods. We explain how LLMs embeddings can be used for protein annotation in a concrete and easy way, and provide the pipeline in a github repo. Full source code and data are available at https://github.com/sinc-lab/llm4pfam.

Databases, Protein , Proteins , Proteins/chemistry , Molecular Sequence Annotation/methods , Computational Biology/methods , Machine Learning

19.

GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking.

Guo, Linyuan; Wang, Jianxin.

Brief Bioinform ; 25(3)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38706316

Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.

Molecular Docking Simulation , Proteins , Ligands , Proteins/chemistry , Proteins/metabolism , Protein Binding , Software , Algorithms , Computational Biology/methods , Protein Conformation , Databases, Protein , Deep Learning

20.

Biodynamer Nano-Complexes and -Emulsions for Peptide and Protein Drug Delivery.

Liu, Yun; Hamm, Timo; Eichinger, Thomas Ralf; Kamm, Walter; Wieland, Heike Andrea; Loretz, Brigitta; Hirsch, Anna K H; Lee, Sangeun; Lehr, Claus-Michael.

Int J Nanomedicine ; 19: 4429-4449, 2024.

Article En | MEDLINE | ID: mdl-38784761

Background: Therapeutic proteins and peptides offer great advantages compared to traditional synthetic molecular drugs. However, stable protein loading and precise control of protein release pose significant challenges due to the extensive range of physicochemical properties inherent to proteins. The development of a comprehensive protein delivery strategy becomes imperative accounting for the diverse nature of therapeutic proteins. Methods: Biodynamers are amphiphilic proteoid dynamic polymers consisting of amino acid derivatives connected through pH-responsive dynamic covalent chemistry. Taking advantage of the amphiphilic nature of the biodynamers, PNCs and DEs were possible to be prepared and investigated to compare the delivery efficiency in drug loading, stability, and cell uptake. Results: As a result, the optimized PNCs showed 3-fold encapsulation (<90%) and 5-fold loading capacity (30%) compared to DE-NPs. PNCs enhanced the delivery efficiency into the cells but aggregated easily on the cell membrane due to the limited stability. Although DE-NPs were limited in loading capacity compared to PNCs, they exhibit superior adaptability in stability and capacity for delivering a wider range of proteins compared to PNCs. Conclusion: Our study highlights the potential of formulating both PNCs and DE-NPs using the same biodynamers, providing a comparative view on protein delivery efficacy using formulation methods.

Emulsions , Peptides , Peptides/chemistry , Peptides/administration & dosage , Peptides/pharmacokinetics , Emulsions/chemistry , Humans , Proteins/chemistry , Proteins/administration & dosage , Proteins/pharmacokinetics , Drug Delivery Systems/methods , Polymers/chemistry , Nanoparticles/chemistry , Hydrogen-Ion Concentration , Amino Acids/chemistry , Drug Carriers/chemistry , Drug Carriers/pharmacokinetics , Drug Liberation , Cell Survival/drug effects