Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 65
1.
Res Sq ; 2024 Feb 23.
Article En | MEDLINE | ID: mdl-38464127

Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here, we first establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 1048. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.

2.
BMC Med Imaging ; 24(1): 20, 2024 Jan 19.
Article En | MEDLINE | ID: mdl-38243288

BACKGROUND: To explore the diagnostic value of multidetector computed tomography (MDCT) extramural vascular invasion (EMVI) in preoperative N Staging of gastric cancer patients. METHODS: According to the MR-defined EMVI scoring standard of rectal cancer, we developed a 5-point scale scoring system to evaluate the status of CT-detected extramural vascular invasion(ctEMVI), 0-2 points were ctEMVI-negative status, and 3-4 points were positive status for ctEMVI. Patients were divided into ctEMVI positive group and ctEMVI negative group. The correlation between ctEMVI and clinical features was analyzed. Receiver operating characteristic (ROC) curve was used to evaluate the diagnostic efficacy of ctEMVI for pathological metastatic lymph nodes and N staging, The sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) of pathological N staging using ctEMVI and short-axis diameter were generated and compared. RESULTS: The occurrence rate of lymphovascular invasion (LVI) and proportion of tumors with a greatest diameter > 6 cm in the ctEMVI positive group was higher than that in the ctEMVI negative group (P < 0.05). Spearman correlation analysis showed a positive correlation between ctEMVI and LVI, N stage, and tumor size (P < 0.05). For ctEMVI scores ≥ 3,The AUC of ctEMVI for diagnosing lymph node metastasis, N stage ≥ N2, and N3 stage were 0.857, 0.802, and 0.758, respectively. The sensitivity, NPV and accuracy of ctEMVI for diagnosing N stage ≥ N2 were superior to those of short-axis diameter (P < 0.05), while sensitivity, specificity, PPV, NPV, and accuracy of ctEMVI for diagnosing N3 stage were superior to those of short-axis diameter (P < 0.05). CONCLUSION: ctEMVI has important value in diagnosing metastatic lymph nodes and advanced N staging. As an important imaging marker, ctEMVI can be included in the preoperative imaging evaluation of patients, providing important assistance for clinical guidance and treatment.


Multidetector Computed Tomography , Stomach Neoplasms , Humans , Stomach Neoplasms/diagnostic imaging , Stomach Neoplasms/surgery , Stomach Neoplasms/pathology , Neoplasm Invasiveness/diagnostic imaging , Neoplasm Invasiveness/pathology , Retrospective Studies , Lymph Nodes/pathology , Neoplasm Staging
3.
Eur J Radiol ; 171: 111303, 2024 Feb.
Article En | MEDLINE | ID: mdl-38215532

PURPOSE: The objective of this study was to establish and validate a preoperative risk scoring system that incorporated both clinical and computed tomography(CT) variables to predict recurrence-free survival (RFS) in gastric cancer(GC) patients who underwent curative resection. METHOD: We retrospectively included consecutive patients with surgically confirmed GC who underwent preoperative CT scans between October 2017 and January 2022. Multivariate Cox regression analysis was employed in the derivation set to identify clinical and CT variables associated with RFS and to construct a risk score. This risk score was subsequently validated in an independent test set. RESULTS: A total of 346 patients were included in the study, with 213 in the derivation set and 133 in the test set. Five variables, namely ctEMVI, ctBorrmann, visceral obesity, sarcopenia, and NLR, were independently associated with RFS. In the test set, the preoperative risk score exhibited a c-index of 0.741, which outperformed the predictive accuracy of pathological tumor staging (c-index of 0.673, p = 0.021) at various time points. The preoperative risk score effectively stratified patients into low and high-risk groups. CONCLUSION: The developed preoperative risk scoring system demonstrated the ability to predict RFS following curative resection in GC patients.


Stomach Neoplasms , Humans , Stomach Neoplasms/diagnostic imaging , Stomach Neoplasms/surgery , Prognosis , Retrospective Studies , Risk Factors , Tomography, X-Ray Computed
4.
Cell Metab ; 35(6): 961-978.e10, 2023 06 06.
Article En | MEDLINE | ID: mdl-37178684

Metabolic alterations in the microenvironment significantly modulate tumor immunosensitivity, but the underlying mechanisms remain obscure. Here, we report that tumors depleted of fumarate hydratase (FH) exhibit inhibition of functional CD8+ T cell activation, expansion, and efficacy, with enhanced malignant proliferative capacity. Mechanistically, FH depletion in tumor cells accumulates fumarate in the tumor interstitial fluid, and increased fumarate can directly succinate ZAP70 at C96 and C102 and abrogate its activity in infiltrating CD8+ T cells, resulting in suppressed CD8+ T cell activation and anti-tumor immune responses in vitro and in vivo. Additionally, fumarate depletion by increasing FH expression strongly enhances the anti-tumor efficacy of anti-CD19 CAR T cells. Thus, these findings demonstrate a role for fumarate in controlling TCR signaling and suggest that fumarate accumulation in the tumor microenvironment (TME) is a metabolic barrier to CD8+ T cell anti-tumor function. And potentially, fumarate depletion could be an important strategy for tumor immunotherapy.


CD8-Positive T-Lymphocytes , Neoplasms , Humans , Fumarates/pharmacology , Fumarates/metabolism , Tumor Microenvironment , Neoplasms/metabolism , Signal Transduction
5.
J Chem Theory Comput ; 18(7): 4529-4543, 2022 Jul 12.
Article En | MEDLINE | ID: mdl-35723447

Proteins usually need to transit between different conformational states to fulfill their biological functions. In the mechanistic study of such transition processes by molecular dynamics simulations, identification of the minimum free energy path (MFEP) can substantially reduce the sampling space, thus enabling rigorous thermodynamic evaluation of the process. Conventionally, the MFEP is derived by iterative local optimization from an initial path, which is typically generated by simple brute force techniques like the targeted molecular dynamics (tMD). Therefore, the quality of the initial path determines the successfulness of MFEP estimation. In this work, we propose a method to improve derivation of the initial path. Through iterative relaxation-biasing simulations in a bidirectional manner, this method can construct a feasible transition pathway connecting two known states for a protein. Evaluation on small, fast-folding proteins against long equilibrium trajectories supports the good sampling efficiency of our method. When applied to larger proteins including the catalytic domain of human c-Src kinase as well as the converter domain of myosin VI, the paths generated by our method deviate significantly from those computed with the generic tMD approach. More importantly, free energy profiles and intermediate states obtained from our paths exhibit remarkable improvements over those from tMD paths with respect to both physical rationality and consistency with a priori knowledge.


Molecular Dynamics Simulation , Proteins , Humans , Molecular Conformation , Protein Folding , Thermodynamics
6.
Brief Bioinform ; 23(3)2022 05 13.
Article En | MEDLINE | ID: mdl-35348602

Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.


Deep Learning , Knowledge Bases , Proteins
7.
Bioinformatics ; 38(3): 648-654, 2022 01 12.
Article En | MEDLINE | ID: mdl-34643684

MOTIVATION: As one of the most important post-translational modifications (PTMs), protein lysine crotonylation (Kcr) has attracted wide attention, which involves in important physiological activities, such as cell differentiation and metabolism. However, experimental methods are expensive and time-consuming for Kcr identification. Instead, computational methods can predict Kcr sites in silico with high efficiency and low cost. RESULTS: In this study, we proposed a novel predictor, BERT-Kcr, for protein Kcr sites prediction, which was developed by using a transfer learning method with pre-trained bidirectional encoder representations from transformers (BERT) models. These models were originally used for natural language processing (NLP) tasks, such as sentence classification. Here, we transferred each amino acid into a word as the input information to the pre-trained BERT model. The features encoded by BERT were extracted and then fed to a BiLSTM network to build our final model. Compared with the models built by other machine learning and deep learning classifiers, BERT-Kcr achieved the best performance with AUROC of 0.983 for 10-fold cross validation. Further evaluation on the independent test set indicates that BERT-Kcr outperforms the state-of-the-art model Deep-Kcr with an improvement of about 5% for AUROC. The results of our experiment indicate that the direct use of sequence information and advanced pre-trained models of NLP could be an effective way for identifying PTM sites of proteins. AVAILABILITY AND IMPLEMENTATION: The BERT-Kcr model is publicly available on http://zhulab.org.cn/BERT-Kcr_models/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Lysine , Machine Learning , Lysine/metabolism , Language , Natural Language Processing , Protein Processing, Post-Translational
8.
Adv Theory Simul ; 4(10): 2100152, 2021 Oct.
Article En | MEDLINE | ID: mdl-34901736

SARS-CoV-2 is what has caused the COVID-19 pandemic. Early viral infection is mediated by the SARS-CoV-2 homo-trimeric Spike (S) protein with its receptor binding domains (RBDs) in the receptor-accessible state. Molecular dynamics simulation on the S protein with a focus on the function of its N-terminal domains (NTDs) is performed. The study reveals that the NTD acts as a "wedge" and plays a crucial regulatory role in the conformational changes of the S protein. The complete RBD structural transition is allowed only when the neighboring NTD that typically prohibits the RBD's movements as a wedge detaches and swings away. Based on this NTD "wedge" model, it is proposed that the NTD-RBD interface should be a potential drug target.

9.
Cancer Imaging ; 21(1): 40, 2021 May 26.
Article En | MEDLINE | ID: mdl-34039436

BACKGROUND: To establish and validate a high-resolution magnetic resonance imaging (HRMRI)-based radiomic nomogram for prediction of preoperative perineural invasion (PNI) of rectal cancer (RC). METHODS: Our retrospective study included 140 subjects with RC (99 in the training cohort and 41 in the validation cohort) who underwent a preoperative HRMRI scan between December 2016 and December 2019. All subjects underwent radical surgery, and then PNI status was evaluated by a qualified pathologist. A total of 396 radiomic features were extracted from oblique axial T2 weighted images, and optimal features were selected to construct a radiomic signature. A combined nomogram was established by incorporating the radiomic signature, HRMRI findings, and clinical risk factors selected by using multivariable logistic regression. RESULTS: The predictive nomogram of PNI included a radiomic signature, and MRI-reported tumor stage (mT-stage). Clinical risk factors failed to increase the predictive value. Favorable discrimination was achieved between PNI-positive and PNI-negative groups using the radiomic nomogram. The area under the curve (AUC) was 0.81 (95% confidence interval [CI], 0.71-0.91) in the training cohort and 0.75 (95% CI, 0.58-0.92) in the validation cohort. Moreover, our result highlighted that the radiomic nomogram was clinically beneficial, as evidenced by a decision curve analysis. CONCLUSIONS: HRMRI-based radiomic nomogram could be helpful in the prediction of preoperative PNI in RC patients.


Magnetic Resonance Imaging/methods , Nerve Sheath Neoplasms/etiology , Radiometry/methods , Rectal Neoplasms/diagnostic imaging , Rectal Neoplasms/radiotherapy , Adult , Aged , Aged, 80 and over , Female , Humans , Logistic Models , Male , Middle Aged , Nerve Sheath Neoplasms/pathology , Nomograms , Retrospective Studies
10.
Bioinformatics ; 37(22): 4075-4082, 2021 11 18.
Article En | MEDLINE | ID: mdl-34042965

MOTIVATION: Gradient descent-based protein modeling is a popular protein structure prediction approach that takes as input the predicted inter-residue distances and other necessary constraints and folds protein structures by minimizing protein-specific energy potentials. The constraints from multiple predicted protein properties provide redundant and sometime conflicting information that can trap the optimization process into local minima and impairs the modeling efficiency. RESULTS: To address these issues, we developed a self-adaptive protein modeling framework, SAMF. It eliminates redundancy of constraints and resolves conflicts, folds protein structures in an iterative way, and picks up the best structures by a deep quality analysis system. Without a large amount of complicated domain knowledge and numerous patches as barriers, SAMF achieves the state-of-the-art performance by exploiting the power of cutting-edge techniques of deep learning. SAMF has a modular design and can be easily customized and extended. As the quality of input constraints is ever growing, the superiority of SAMF will be amplified over time. AVAILABILITY AND IMPLEMENTATION: The source code and data for reproducing the results is available at https://msracb.blob.core.windows.net/pub/psp/SAMF.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Proteins , Software , Proteins/metabolism
11.
Adv Sci (Weinh) ; 7(19): 2001314, 2020 Oct.
Article En | MEDLINE | ID: mdl-33042750

Predicting protein structure from the amino acid sequence has been a challenge with theoretical and practical significance in biophysics. Despite the recent progresses elicited by improved inter-residue contact prediction, contact-based structure prediction has gradually reached the performance ceiling. New methods have been proposed to predict the inter-residue distance, but unanimously by simplifying the real-valued distance prediction into a multiclass classification problem. Here, a lightweight regression-based distance prediction method is shown, which adopts the generative adversarial network to capture the delicate geometric relationship between residue pairs and thus could predict the continuous, real-valued inter-residue distance rapidly and satisfactorily. The predicted residue distance map allows quick structure modeling by the CNS suite, and the constructed models approach the same level of quality as the other state-of-the-art protein structure prediction methods when tested on CASP13 targets. Moreover, this method can be used directly for the structure prediction of membrane proteins without transfer learning.

12.
J Chem Theory Comput ; 16(8): 4813-4821, 2020 Aug 11.
Article En | MEDLINE | ID: mdl-32585102

Traditional molecular dynamics (MD) simulations have difficulties in tracking the slow molecular motions, at least partially due to the waste of sampling in already sampled regions. Here, we proposed a new enhanced sampling method, frontier expansion sampling (FEXS), to improve the sampling efficiency of molecular simulations by iteratively selecting seed structures diversely distributed at the "frontier" of an already sampled region to initiate new simulations. Different from other enhanced sampling methods, FEXS identifies the "frontier" seeds by integrating the Gaussian mixture model and the convex hull algorithm, which effectively improves the structural variation among the selected seeds and thus the descendant simulations. Validation in three protein systems, including the folding of chignolin, open-to-closed transition of maltodextrin binding protein, and internal conformational change of bovine pancreatic trypsin inhibitor, confirmed the effectiveness of this novel method in enhancing the sampling of conventional MD simulations to observe the large-scale protein conformational changes. When compared with other enhanced sampling methods like the structural dissimilarity sampling (SDS), FEXS reached at least the same level of sampling efficiency but was capable of providing complementary information in the three tested protein systems.


Molecular Dynamics Simulation , Proteins/chemistry , Algorithms , Protein Conformation , Protein Folding
13.
BMC Bioinformatics ; 21(1): 133, 2020 Apr 03.
Article En | MEDLINE | ID: mdl-32245403

BACKGROUND: Despite the great advance of protein structure prediction, accurate prediction of the structures of mainly ß proteins is still highly challenging, but could be assisted by the knowledge of residue-residue pairing in ß strands. Previously, we proposed a ridge-detection-based algorithm RDb2C that adopted a multi-stage random forest framework to predict the ß-ß pairing given the amino acid sequence of a protein. RESULTS: In this work, we developed a second version of this algorithm, RDb2C2, by employing the residual neural network to further enhance the prediction accuracy. In the benchmark test, this new algorithm improves the F1-score by > 10 percentage points, reaching impressively high values of ~ 72% and ~ 73% in the BetaSheet916 and BetaSheet1452 sets, respectively. CONCLUSION: Our new method promotes the prediction accuracy of ß-ß pairing to a new level and the prediction results could better assist the structure modeling of mainly ß proteins. We prepared an online server of RDb2C2 at http://structpred.life.tsinghua.edu.cn/rdb2c2.html.


Algorithms , Protein Conformation, beta-Strand , Sequence Analysis, Protein/methods , Neural Networks, Computer
14.
Genomics Proteomics Bioinformatics ; 17(5): 478-495, 2019 10.
Article En | MEDLINE | ID: mdl-32035227

Accurate identification of compound-protein interactions (CPIs) in silico may deepen our understanding of the underlying mechanisms of drug action and thus remarkably facilitate drug discovery and development. Conventional similarity- or docking-based computational methods for predicting CPIs rarely exploit latent features from currently available large-scale unlabeled compound and protein data and often limit their usage to relatively small-scale datasets. In the present study, we propose DeepCPI, a novel general and scalable computational framework that combines effective feature embedding (a technique of representation learning) with powerful deep learning methods to accurately predict CPIs at a large scale. DeepCPI automatically learns the implicit yet expressive low-dimensional features of compounds and proteins from a massive amount of unlabeled data. Evaluations of the measured CPIs in large-scale databases, such as ChEMBL and BindingDB, as well as of the known drug-target interactions from DrugBank, demonstrated the superior predictive performance of DeepCPI. Furthermore, several interactions among small-molecule compounds and three G protein-coupled receptor targets (glucagon-like peptide-1 receptor, glucagon receptor, and vasoactive intestinal peptide receptor) predicted using DeepCPI were experimentally validated. The present study suggests that DeepCPI is a useful and powerful tool for drug discovery and repositioning. The source code of DeepCPI can be downloaded from https://github.com/FangpingWan/DeepCPI.


Deep Learning , User-Computer Interface , Area Under Curve , Databases, Chemical , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Proteins/chemistry , Proteins/metabolism , ROC Curve
15.
Comput Struct Biotechnol J ; 16: 503-510, 2018.
Article En | MEDLINE | ID: mdl-30505403

Information of residue-residue contacts is essential for understanding the mechanism of protein folding, and has been successfully applied as special topological restraints to simplify the conformational sampling in de novo protein structure prediction. Prediction of protein residue contacts has experienced amazingly rapid progresses recently, with prediction accuracy approaching impressively high levels in the past two years. In this work, we introduce a second version of our residue contact predictor, DeepConPred2, which exhibits substantially improved performance and sufficiently reduced running time after model re-optimization and feature updates. When testing on the CASP12 free modeling targets, our program reaches at least the same level of prediction accuracy as the best contact predictors so far and provides information complementary to other state-of-the-art methods in contact-assisted folding.

16.
Science ; 362(6412)2018 10 19.
Article En | MEDLINE | ID: mdl-30190309

Voltage-gated sodium (Nav) channels, which are responsible for action potential generation, are implicated in many human diseases. Despite decades of rigorous characterization, the lack of a structure of any human Nav channel has hampered mechanistic understanding. Here, we report the cryo-electron microscopy structure of the human Nav1.4-ß1 complex at 3.2-Å resolution. Accurate model building was made for the pore domain, the voltage-sensing domains, and the ß1 subunit, providing insight into the molecular basis for Na+ permeation and kinetic asymmetry of the four repeats. Structural analysis of reported functional residues and disease mutations corroborates an allosteric blocking mechanism for fast inactivation of Nav channels. The structure provides a path toward mechanistic investigation of Nav channels and drug discovery for Nav channelopathies.


NAV1.4 Voltage-Gated Sodium Channel/chemistry , Voltage-Gated Sodium Channel beta-4 Subunit/chemistry , Allosteric Regulation , Amino Acid Sequence , Channelopathies/genetics , Channelopathies/metabolism , Cryoelectron Microscopy , Drug Discovery , HEK293 Cells , Humans , Mutation , NAV1.4 Voltage-Gated Sodium Channel/genetics , NAV1.4 Voltage-Gated Sodium Channel/ultrastructure , Protein Domains , Voltage-Gated Sodium Channel beta-4 Subunit/genetics , Voltage-Gated Sodium Channel beta-4 Subunit/ultrastructure
17.
BMC Bioinformatics ; 19(1): 146, 2018 04 19.
Article En | MEDLINE | ID: mdl-29673311

BACKGROUND: Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in ß strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in ß-ß interactions. This information may benefit the tertiary structure prediction of mainly ß proteins. In this work, we propose a novel ridge-detection-based ß-ß contact predictor to identify residue pairing in ß strands from any predicted residue contact map. RESULTS: Our algorithm RDb2C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb2C remarkably outperforms all state-of-the-art methods on two conventional test sets of ß proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb2C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly ß proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb2C. CONCLUSION: Our method can significantly improve the prediction of ß-ß contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly ß proteins. AVAILABILITY: All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C .


Amino Acids/chemistry , Computational Biology/methods , Proteins/chemistry , Algorithms , Models, Molecular , Protein Conformation, beta-Strand , Protein Structure, Tertiary , Reproducibility of Results
20.
J R Soc Interface ; 14(137)2017 12.
Article En | MEDLINE | ID: mdl-29212760

The glycocalyx has a prominent role in orchestrating multiple biological processes occurring at the plasma membrane. In this paper, an all-atom flow/glycocalyx system is constructed with the bulk flow velocity in the physiologically relevant ranges for the first time. The system is simulated by molecular dynamics using 5.8 million atoms. Flow dynamics and statistics in the presence of the glycocalyx are presented and discussed. Complex dynamic behaviours of the glycocalyx, particularly the sugar chains, are observed in response to blood flow. In turn, the motion of the glycocalyx, including swing and swirling, disturbs the flow by altering the velocity profiles and modifying the vorticity distributions. As a result, the initially one-dimensional forcing is spread to all directions in the region near the endothelial cell surface. Furthermore, the coupled dynamics exist not only between the flow and the glycocalyx but also within the glycocalyx molecular constituents. Shear stress distributions between one-dimer and three-dimer cases are also conducted. Finally, potential force transmission pathways are discussed based on the dynamics of the glycocalyx constituents, which provides new insight into the mechanism of mechanotransduction of the glycocalyx. These findings have relevance in the pathologies of glycocalyx-related diseases, for example in renal or cardiovascular conditions.


Endothelial Cells/chemistry , Glycocalyx/chemistry , Models, Biological , Molecular Dynamics Simulation , Computer Simulation , Lipid Bilayers/chemistry
...