Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41.569
Filter
1.
Biotechnol J ; 19(8): e2400203, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39115336

ABSTRACT

Through iterative rounds of mutation and selection, proteins can be engineered to enhance their desired biological functions. Nevertheless, identifying optimal mutation sites for directed evolution remains challenging due to the vastness of the protein sequence landscape and the epistatic mutational effects across residues. To address this challenge, we introduce MLSmut, a deep learning-based approach that leverages multi-level structural features of proteins. MLSmut extracts salient information from protein co-evolution, sequence semantics, and geometric features to predict the mutational effect. Extensive benchmark evaluations on 10 single-site and two multi-site deep mutation scanning datasets demonstrate that MLSmut surpasses existing methods in predicting mutational outcomes. To overcome the limited training data availability, we employ a two-stage training strategy: initial coarse-tuning on a large corpus of unlabeled protein data followed by fine-tuning on a curated dataset of 40-100 experimental measurements. This approach enables our model to achieve satisfactory performance on downstream protein prediction tasks. Importantly, our model holds the potential to predict the mutational effects of any protein sequence. Collectively, these findings suggest that our approach can substantially reduce the reliance on laborious wet lab experiments and deepen our understanding of the intricate relationships between mutations and protein function.


Subject(s)
Deep Learning , Mutation , Proteins , Proteins/genetics , Proteins/chemistry , Computational Biology/methods , Databases, Protein , Protein Engineering/methods
2.
Nat Commun ; 15(1): 6699, 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39107330

ABSTRACT

Post-translational modifications (PTMs) are pivotal in modulating protein functions and influencing cellular processes like signaling, localization, and degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts unsupervised learning to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model's final decoder layer to elucidate sequence motifs essential for molecular recognition and analyze the effects of mutations at or near PTM sites to offer deeper insights into protein functionality. Comparative assessments reveal that PTMGPT2 outperforms existing methods across 19 PTM types, underscoring its potential in identifying disease associations and drug targets.


Subject(s)
Protein Processing, Post-Translational , Proteins/metabolism , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Humans , Computational Biology/methods , Algorithms , Databases, Protein
3.
Nat Commun ; 15(1): 7400, 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39191788

ABSTRACT

Significant research progress has been made in the field of protein structure and fitness prediction. Particularly, single-sequence-based structure prediction methods like ESMFold and OmegaFold achieve a balance between inference speed and prediction accuracy, showing promise for many downstream prediction tasks. Here, we propose SPIRED, a single-sequence-based structure prediction model that exhibits comparable performance to the state-of-the-art methods but with approximately 5-fold acceleration in inference and at least one order of magnitude reduction in training consumption. By integrating SPIRED with downstream neural networks, we compose an end-to-end framework named SPIRED-Fitness for the rapid prediction of both protein structure and fitness from single sequence with satisfactory accuracy. Moreover, SPIRED-Stab, the derivative of SPIRED-Fitness, achieves state-of-the-art performance in predicting the mutational effects on protein stability.


Subject(s)
Neural Networks, Computer , Protein Conformation , Proteins , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Computational Biology/methods , Algorithms , Protein Stability , Models, Molecular , Sequence Analysis, Protein/methods , Mutation
4.
Hum Genomics ; 18(1): 89, 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39192324

ABSTRACT

We describe the machine learning tool that we applied in the CAGI 6 experiment to predict whether single residue mutations in proteins are deleterious or benign. This tool was trained using only single sequences, i.e., without multiple sequence alignments or structural information. Instead, we used global characterizations of the protein sequence. Training and testing data for human gene mutations was obtained from ClinVar (ncbi.nlm.nih.gov/pub/ClinVar/), and for non-human gene mutations from Uniprot (www.uniprot.org). Testing was done on post-training data from ClinVar. This testing yielded high AUC and Matthews correlation coefficient (MCC) for well trained examples but low generalizability. For genes with either sparse or unbalanced training data, the prediction accuracy is poor. The resulting prediction server is available online at http://www.mamiris.com/Shoni.cagi6.


Subject(s)
Machine Learning , Mutation, Missense , Humans , Mutation, Missense/genetics , Software , Computational Biology/methods , Proteins/genetics
5.
J Chem Inf Model ; 64(16): 6676-6683, 2024 Aug 26.
Article in English | MEDLINE | ID: mdl-39116039

ABSTRACT

AlphaFold 3 (AF3), the latest version of protein structure prediction software, goes beyond its predecessors by predicting protein-protein complexes. It could revolutionize drug discovery and protein engineering, marking a major step toward comprehensive, automated protein structure prediction. However, independent validation of AF3's predictions is necessary. In this work, we evaluate AF3 complex structures using the SKEMPI 2.0 database which involves 317 protein-protein complexes and 8338 mutations. AF3 complex structures when applied to the most advanced TDL model, MT-TopLap (MultiTask-Topological Laplacian), give rise to a very good Pearson correlation coefficient of 0.86 for predicting protein-protein binding free energy changes upon mutation, which is slightly less than the 0.88 achieved earlier with the Protein Data Bank (PDB) structures. Nonetheless, AF3 complex structures led to a 8.6% increase in the prediction RMSE compared to original PDB complex structures. Additionally, some of AF3's complex structures have large errors, which were not captured in its ipTM performance metric. Finally, it is found that AF3's complex structures are not reliable for intrinsically flexible regions or domains.


Subject(s)
Databases, Protein , Mutation , Protein Binding , Proteins , Software , Thermodynamics , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Protein Conformation , Models, Molecular
6.
Nat Commun ; 15(1): 6839, 2024 Aug 09.
Article in English | MEDLINE | ID: mdl-39122697

ABSTRACT

There has been a dramatic increase in the identification of non-canonical translation and a significant expansion of the protein-coding genome. Among the strategies used to identify unannotated small Open Reading Frames (smORFs) that encode microproteins, Ribosome profiling (Ribo-Seq) is the gold standard for the annotation of novel coding sequences by reporting on smORF translation. In Ribo-Seq, ribosome-protected footprints (RPFs) that map to multiple genomic sites are removed since they cannot be unambiguously assigned to a specific genomic location. Furthermore, RPFs necessarily result in short (25-34 nucleotides) reads, increasing the chance of multi-mapping alignments, such that smORFs residing in these regions cannot be identified by Ribo-Seq. Moreover, it has been challenging to identify protein evidence for Ribo-Seq. To solve this, we developed Rp3, a pipeline that integrates proteogenomics and Ribosome profiling to provide unambiguous evidence for a subset of microproteins missed by current Ribo-Seq pipelines. Here, we show that Rp3 maximizes proteomics detection and confidence of microprotein-encoding smORFs.


Subject(s)
Open Reading Frames , Proteogenomics , Ribosomes , Ribosomes/metabolism , Ribosomes/genetics , Proteogenomics/methods , Open Reading Frames/genetics , Protein Biosynthesis , Humans , Proteomics/methods , Proteins/genetics , Proteins/metabolism , Ribosome Profiling
7.
Int J Mol Sci ; 25(15)2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39125888

ABSTRACT

Statistical analyses of homologous protein sequences can identify amino acid residue positions that co-evolve to generate family members with different properties. Based on the hypothesis that the coevolution of residue positions is necessary for maintaining protein structure, coevolutionary traits revealed by statistical models provide insight into residue-residue interactions that are important for understanding protein mechanisms at the molecular level. With the rapid expansion of genome sequencing databases that facilitate statistical analyses, this sequence-based approach has been used to study a broad range of protein families. An emerging application of this approach is to design hybrid transcriptional regulators as modular genetic sensors for novel wiring between input signals and genetic elements to control outputs. Among many allosterically regulated regulator families, the members contain structurally conserved and functionally independent protein domains, including a DNA-binding module (DBM) for interacting with a specific genetic element and a ligand-binding module (LBM) for sensing an input signal. By hybridizing a DBM and an LBM from two different family members, a hybrid regulator can be created with a new combination of signal-detection and DNA-recognition properties not present in natural systems. In this review, we present recent advances in the development of hybrid regulators and their applications in cellular engineering, especially focusing on the use of statistical analyses for characterizing DBM-LBM interactions and hybrid regulator design. Based on these studies, we then discuss the current limitations and potential directions for enhancing the impact of this sequence-based design approach.


Subject(s)
Evolution, Molecular , Models, Statistical , Protein Engineering/methods , Humans , Amino Acid Sequence , Proteins/genetics , Proteins/chemistry , Proteins/metabolism
8.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39129360

ABSTRACT

The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins-the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared with the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." A major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago or have never evolved (yet). By merging evolutionary algorithms, machine learning, and bioinformatics, we can develop highly customized "designer proteins." We dub the new subfield of computational evolution, which employs evolutionary algorithms with DNA string representations, biologically accurate molecular evolution, and bioinformatics-informed fitness functions, Evolutionary Algorithms Simulating Molecular Evolution.


Subject(s)
Algorithms , Computational Biology , Evolution, Molecular , Computational Biology/methods , Proteins/genetics , Proteins/chemistry , Proteins/metabolism , Computer Simulation
9.
NPJ Syst Biol Appl ; 10(1): 87, 2024 Aug 12.
Article in English | MEDLINE | ID: mdl-39134558

ABSTRACT

Network controllability is unifying the traditional control theory with the structural network information rooted in many large-scale biological systems of interest, from intracellular networks in molecular biology to brain neuronal networks. In controllability approaches, the set of minimum driver nodes is not unique, and critical nodes are the most important control elements because they appear in all possible solution sets. On the other hand, a common but largely unexplored feature in network control approaches is the probabilistic failure of edges or the uncertainty in the determination of interactions between molecules. This is particularly true when directed probabilistic interactions are considered. Until now, no efficient algorithm existed to determine critical nodes in probabilistic directed networks. Here we present a probabilistic control model based on a minimum dominating set framework that integrates the probabilistic nature of directed edges between molecules and determines the critical control nodes that drive the entire network functionality. The proposed algorithm, combined with the developed mathematical tools, offers practical efficiency in determining critical control nodes in large probabilistic networks. The method is then applied to the human intracellular signal transduction network revealing that critical control nodes are associated with important biological features and perturbed sets of genes in human diseases, including SARS-CoV-2 target proteins and rare disorders. We believe that the proposed methodology can be useful to investigate multiple biological systems in which directed edges are probabilistic in nature, both in natural systems or when determined with large uncertainties in-silico.


Subject(s)
Algorithms , COVID-19 , SARS-CoV-2 , Signal Transduction , Humans , Signal Transduction/physiology , Signal Transduction/genetics , Computational Biology/methods , Proteins/metabolism , Proteins/genetics , Probability , Models, Biological , Models, Statistical , Systems Biology/methods
10.
Bioinformatics ; 40(8)2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39152995

ABSTRACT

MOTIVATION: Spaln is the earliest practical tool for self-sufficient genome mapping and spliced alignment of protein query sequences onto a mammalian-sized eukaryotic genomic sequence. However, its computational speed has become inadequate for the analysis of rapidly growing genomic and transcript sequence data. RESULTS: The dynamic programming calculation of Spaln has been sped up in two ways: (i) the introduction of the multi-intermediate unidirectional Hirschberg method and (ii) SIMD-based vectorization. The new version, Spaln3, is ∼7 times faster than the latest Spaln version 2, and its gene prediction accuracy is consistently higher than that of Miniprot. AVAILABILITY AND IMPLEMENTATION: https://github.com/ogotoh/spaln.


Subject(s)
Chromosome Mapping , Software , Chromosome Mapping/methods , Sequence Alignment/methods , RNA Splicing , Algorithms , Animals , Humans , Genome , Proteins/genetics , Proteins/chemistry , Genomics/methods
11.
BMC Bioinformatics ; 25(1): 282, 2024 Aug 28.
Article in English | MEDLINE | ID: mdl-39198740

ABSTRACT

BACKGROUND: Thermostability is a fundamental property of proteins to maintain their biological functions. Predicting protein stability changes upon mutation is important for our understanding protein structure-function relationship, and is also of great interest in protein engineering and pharmaceutical design. RESULTS: Here we present mutDDG-SSM, a deep learning-based framework that uses the geometric representations encoded in protein structure to predict the mutation-induced protein stability changes. mutDDG-SSM consists of two parts: a graph attention network-based protein structural feature extractor that is trained with a self-supervised learning scheme using large-scale high-resolution protein structures, and an eXtreme Gradient Boosting model-based stability change predictor with an advantage of alleviating overfitting problem. The performance of mutDDG-SSM was tested on several widely-used independent datasets. Then, myoglobin and p53 were used as case studies to illustrate the effectiveness of the model in predicting protein stability changes upon mutations. Our results show that mutDDG-SSM achieved high performance in estimating the effects of mutations on protein stability. In addition, mutDDG-SSM exhibited good unbiasedness, where the prediction accuracy on the inverse mutations is as well as that on the direct mutations. CONCLUSION: Meaningful features can be extracted from our pre-trained model to build downstream tasks and our model may serve as a valuable tool for protein engineering and drug design.


Subject(s)
Mutation , Protein Stability , Proteins , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Myoglobin/chemistry , Myoglobin/genetics , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/chemistry , Tumor Suppressor Protein p53/metabolism , Computational Biology/methods , Deep Learning , Supervised Machine Learning , Databases, Protein , Protein Conformation
12.
Proc Natl Acad Sci U S A ; 121(34): e2314999121, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39133844

ABSTRACT

Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.


Subject(s)
Epistasis, Genetic , Mutation , Evolution, Molecular , Proteins/genetics , Proteins/chemistry , Proteins/metabolism , Catalytic Domain , Protein Engineering/methods
13.
J Chem Inf Model ; 64(15): 6216-6229, 2024 Aug 12.
Article in English | MEDLINE | ID: mdl-39092854

ABSTRACT

The critical importance of accurately predicting mutations in protein metal-binding sites for advancing drug discovery and enhancing disease diagnostic processes cannot be overstated. In response to this imperative, MetalTrans emerges as an accurate predictor for disease-associated mutations in protein metal-binding sites. The core innovation of MetalTrans lies in its seamless integration of multifeature splicing with the Transformer framework, a strategy that ensures exhaustive feature extraction. Central to MetalTrans's effectiveness is its deep feature combination strategy, which merges evolutionary-scale modeling amino acid embeddings with ProtTrans embeddings, thus shedding light on the biochemical properties of proteins. Employing the Transformer component, MetalTrans leverages the self-attention mechanism to delve into higher-level representations. Utilizing mutation site information for feature fusion not only enriches the feature set but also sidesteps the common pitfall of overestimation linked to protein sequence-based predictions. This nuanced approach to feature fusion is a key differentiator, enabling MetalTrans to outperform existing methods significantly, as evidenced by comparative analyses. Our evaluations across varied metal binding site data sets (specifically Zn, Ca, Mg, and Mix) underscore MetalTrans's superior performance, which achieved the average AUC values of 0.971, 0.965, 0.980, and 0.945 on multiple 5-fold cross-validation, respectively. Remarkably, against the multichannel convolutional neural network method on a benchmark independent test set, MetalTrans demonstrated unparalleled robustness and superiority, boasting the AUC score of 0.998 on multiple 5-fold cross-validation. Our comprehensive examination of the predicted outcomes further confirms the effectiveness of the model. The source codes, data sets, and prediction results for MetalTrans can be accessed for academic usage at https://github.com/EduardWang/MetalTrans.


Subject(s)
Metals , Mutation , Binding Sites , Metals/chemistry , Metals/metabolism , Humans , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Models, Molecular , Computational Biology/methods , Databases, Protein
14.
Int J Mol Sci ; 25(15)2024 Jul 31.
Article in English | MEDLINE | ID: mdl-39125949

ABSTRACT

Proteins, as crucial macromolecules performing diverse biological roles, are central to numerous biological processes. The ability to predict changes in protein thermal stability due to mutations is vital for both biomedical research and industrial applications. However, existing experimental methods are often costly and labor-intensive, while structure-based prediction methods demand significant computational resources. In this study, we introduce PON-Tm, a novel sequence-based method for predicting mutation-induced thermal stability variations in proteins. PON-Tm not only incorporates features predicted by a protein language model from protein sequences but also considers environmental factors such as pH and the thermostability of the wild-type protein. To evaluate the effectiveness of PON-Tm, we compared its performance to four well-established methods, and PON-Tm exhibited superior predictive capabilities. Furthermore, to facilitate easy access and utilization, we have developed a web server.


Subject(s)
Mutation, Missense , Protein Stability , Proteins , Proteins/chemistry , Proteins/genetics , Computational Biology/methods , Amino Acid Sequence , Software
15.
Nat Commun ; 15(1): 6462, 2024 Jul 31.
Article in English | MEDLINE | ID: mdl-39085232

ABSTRACT

Epithelial ovarian cancer (EOC) is a deadly disease with limited diagnostic biomarkers and therapeutic targets. Here we conduct a comprehensive proteomic profiling of ovarian tissue and plasma samples from 813 patients with different histotypes and therapeutic regimens, covering the expression of 10,715 proteins. We identify eight proteins associated with tumor malignancy in the tissue specimens, which are further validated as potential circulating biomarkers in plasma. Targeted proteomics assays are developed for 12 tissue proteins and 7 blood proteins, and machine learning models are constructed to predict one-year recurrence, which are validated in an independent cohort. These findings contribute to the understanding of EOC pathogenesis and provide potential biomarkers for early detection and monitoring of the disease. Additionally, by integrating mutation analysis with proteomic data, we identify multiple proteins related to DNA damage in recurrent resistant tumors, shedding light on the molecular mechanisms underlying treatment resistance. This study provides a multi-histotype proteomic landscape of EOC, advancing our knowledge for improved diagnosis and treatment strategies.


Subject(s)
Carcinoma, Ovarian Epithelial , Proteins , Proteome , Carcinoma, Ovarian Epithelial/diagnosis , Carcinoma, Ovarian Epithelial/genetics , Carcinoma, Ovarian Epithelial/pathology , Biomarkers, Tumor/blood , Machine Learning , Mutation , Humans , Female , Adult , Middle Aged , Prognosis , DNA Repair/genetics , Proteins/genetics , Proteins/metabolism , China
16.
Biol Pharm Bull ; 47(7): 1376-1382, 2024.
Article in English | MEDLINE | ID: mdl-39085077

ABSTRACT

Shwachman-Diamond syndrome (SDS) is an autosomal recessive disease caused by mutation in the Shwachman-Bodian-Diamond syndrome (SBDS) gene. SDS has a variety of clinical features, including exocrine pancreatic insufficiency and hematological dysfunction. Neutropenia is the most common symptom in patients with SDS. SDS is also associated with an elevated risk of developing myelodysplastic syndromes and acute myeloid leukemia. The SBDS protein is involved in ribosome biogenesis, ribosomal RNA metabolism, stabilization of mitotic spindles and cellular stress responses, yet the function of SBDS in detail is still incompletely understood. Considering the diverse function of SBDS, the effect of SBDS seems to be different in different cells and tissues. In this study, we established myeloid cell line 32Dcl3 with a common pathogenic SBDS variant on both alleles in intron 2, 258 + 2T > C, and examined the cellular damage that resulted. We found that the protein synthesis was markedly decreased in the mutant cells. Furthermore, reactive oxygen species (ROS) production was increased, and oxidation of the mitochondrial membrane lipids and DNA damage were induced. These findings provide new insights into the cellular and molecular pathology caused by SBDS deficiency in myeloid cells.


Subject(s)
DNA Damage , Mitochondrial Membranes , Mutation , Reactive Oxygen Species , Reactive Oxygen Species/metabolism , Animals , Mice , Mitochondrial Membranes/metabolism , Cell Line , Oxidation-Reduction , Myeloid Cells/metabolism , Proteins/metabolism , Proteins/genetics , Shwachman-Diamond Syndrome
17.
Nat Commun ; 15(1): 6170, 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39043654

ABSTRACT

Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.


Subject(s)
Mutation , Protein Stability , Proteins , Thermodynamics , Proteins/genetics , Proteins/chemistry , Protein Engineering/methods , Models, Molecular , Algorithms , Neural Networks, Computer , Protein Conformation , Computational Biology/methods
18.
Wiley Interdiscip Rev RNA ; 15(4): e1867, 2024.
Article in English | MEDLINE | ID: mdl-39048533

ABSTRACT

The mechanics of how proteins are generated from mRNA is increasingly well understood. However, much less is known about how protein production is coordinated and orchestrated within the crowded intracellular environment, especially in eukaryotic cells. Recent studies suggest that localized sites exist for the coordinated production of specific proteins. These sites have been termed "translation factories" and roles in protein complex formation, protein localization, inheritance, and translation regulation have been postulated. In this article, we review the evidence supporting the translation of mRNA at these sites, the details of their mechanism of formation, and their likely functional significance. Finally, we consider the key uncertainties regarding these elusive structures in cells. This article is categorized under: Translation Translation > Mechanisms RNA Export and Localization > RNA Localization Translation > Regulation.


Subject(s)
Protein Biosynthesis , RNA, Messenger , RNA, Messenger/metabolism , RNA, Messenger/genetics , Animals , Humans , Proteins/metabolism , Proteins/genetics
19.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-39038934

ABSTRACT

From the catalytic breakdown of nutrients to signaling, interactions between metabolites and proteins play an essential role in cellular function. An important case is cell-cell communication, where metabolites, secreted into the microenvironment, initiate signaling cascades by binding to intra- or extracellular receptors of neighboring cells. Protein-protein cell-cell communication interactions are routinely predicted from transcriptomic data. However, inferring metabolite-mediated intercellular signaling remains challenging, partially due to the limited size of intercellular prior knowledge resources focused on metabolites. Here, we leverage knowledge-graph infrastructure to integrate generalistic metabolite-protein with curated metabolite-receptor resources to create MetalinksDB. MetalinksDB is an order of magnitude larger than existing metabolite-receptor resources and can be tailored to specific biological contexts, such as diseases, pathways, or tissue/cellular locations. We demonstrate MetalinksDB's utility in identifying deregulated processes in renal cancer using multi-omics bulk data. Furthermore, we infer metabolite-driven intercellular signaling in acute kidney injury using spatial transcriptomics data. MetalinksDB is a comprehensive and customizable database of intercellular metabolite-protein interactions, accessible via a web interface (https://metalinks.omnipathdb.org/) and programmatically as a knowledge graph (https://github.com/biocypher/metalinks). We anticipate that by enabling diverse analyses tailored to specific biological contexts, MetalinksDB will facilitate the discovery of disease-relevant metabolite-mediated intercellular signaling processes.


Subject(s)
Signal Transduction , Humans , Cell Communication , Kidney Neoplasms/metabolism , Kidney Neoplasms/genetics , Acute Kidney Injury/metabolism , Acute Kidney Injury/genetics , Computational Biology/methods , Proteins/metabolism , Proteins/genetics , Software , Transcriptome
20.
Sheng Wu Gong Cheng Xue Bao ; 40(7): 2087-2099, 2024 Jul 25.
Article in Chinese | MEDLINE | ID: mdl-39044577

ABSTRACT

With the increasing of computer power and rapid expansion of biological data, the application of bioinformatics tools has become the mainstream approach to address biological problems. The accurate identification of protein function by bioinformatics tools is crucial for both biomedical research and drug discovery, making it a hot topic of research. In this paper, we categorize bioinformatics-based protein function prediction methods into three categories: protein sequence-based methods, protein structure-based methods, and protein interaction networks-based methods. We further analyze these specific algorithms, highlighting the latest research advancements and providing valuable references for the application of bioinformatics-based protein function prediction in biomedical research and drug discovery.


Subject(s)
Algorithms , Computational Biology , Proteins , Computational Biology/methods , Proteins/genetics , Proteins/metabolism , Proteins/chemistry , Protein Conformation , Protein Interaction Maps , Sequence Analysis, Protein , Amino Acid Sequence , Drug Discovery
SELECTION OF CITATIONS
SEARCH DETAIL