Search | VHL Regional Portal

1.

Recent advances and challenges in protein complex model accuracy estimation.

Liang, Fang; Sun, Meng; Xie, Lei; Zhao, Xuanfeng; Liu, Dong; Zhao, Kailong; Zhang, Guijun.

Comput Struct Biotechnol J ; 23: 1824-1832, 2024 Dec.

Article in English | MEDLINE | ID: mdl-38707538

ABSTRACT

Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.

2.

SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition.

Wang, Hui; Liu, Dong; Zhao, Kailong; Wang, Yajun; Zhang, Guijun.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38600663

ABSTRACT

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.

Subject(s)

Neural Networks, Computer , Proteins , Sequence Alignment , Amino Acid Sequence , Proteins/chemistry , Sequence Analysis, Protein/methods

3.

Improving the Performance of Si/PEDOT:PSS Hybrid Solar Cells with More Economical and Environmentally Friendly Alcohol Ether Solvents.

Zhang, Guijun; Peng, Hua; Wei, Qianwen; Zhou, Zheng; Wu, Haixia; Luo, Jingjing; Wang, Juan; Wen, Xiaoming; Yang, Yu.

ACS Omega ; 9(13): 15040-15051, 2024 Apr 02.

Article in English | MEDLINE | ID: mdl-38585058

ABSTRACT

The photoelectric characteristics of poly(3,4-ethylenedioxythiophene):polystyrene sulfonate (PEDOT:PSS) films significantly affect the power conversion efficiency and stability of Si/PEDOT:PSS hybrid solar cells. In this paper, we investigated PEDOT:PSS modification with alcohol ether solvents (dipropylene glycol methyl ether (DPM) and propylene glycol phenyl ether (PPH)). The reduction of PSS content and the transformation of the PEDOT chain from benzene to a quinone structure in PEDOT:PSS induced by doping with DPM or PPH are the reasons for the improved conductivity of PEDOT:PSS films. DPM and PPH doping improves the quality of silicon with the PEDOT:PSS heterojunction and silicon surface passivation, thereby reducing the surface recombination of charge carriers, which improves the photovoltaic performance of Si/PEDOT:PSS solar cells. Comparing the power conversion performance (PCE) and air stability of Si/PEDOT:PSS solar cells with DPM (13.24%), DPH (13.51%), ethylene glycol (EG, 13.07%), and dimethyl sulfoxide (DMSO, 12.62%), it is suggested that doping with DPM and DPH can replace DMSO and EG to enhance the performance of Si/PEDOT:PSS solar cells. The EG and DMSO solvents not only have a certain toxicity to the human body but also are not environmentally friendly. In comparison to DMSO and EG, DPM and DPH are more economical and environmentally friendly, helping to reduce the manufacturing cost of Si/PEDOT:PSS solar cells and making them more conducive to their commercial applications.

4.

Regulating Surface Defects to Achieve More Positive Light Soaking Effect in Perovskite Solar Cells.

Zhang, Guijun; Wei, Qianwen; Liu, Guangsheng; Li, Qi; Lu, Junlin; Ghasemi, Mehri; Wang, Juan; Yang, Yu; Jia, Baohua; Wen, Xiaoming.

ACS Appl Mater Interfaces ; 16(11): 14263-14274, 2024 Mar 20.

Article in English | MEDLINE | ID: mdl-38441548

ABSTRACT

The dynamic defect tolerance under light soaking is a crucial aspect of halide perovskites. However, the underlying physics of light soaking remains elusive and is subject to debate, exhibiting both positive and negative effects. In this investigation, we demonstrated that surface defects in perovskite films significantly impact the performance and stability of perovskite solar cells, closely correlated with light soaking behaviors. Removing the top surface layer through adhesive tape, the surface defect density noticeably decreases, leading to enhanced photoluminescence (PL) efficiency, prolonged carrier lifetime, and higher conductivity. Consequently, the power conversion efficiency (PCE) of solar cells improves from 17.70% to 20.5%. Furthermore, we confirmed a positive correlation between surface defects and the light soaking effect. Perovskite films with low surface defects surprisingly exhibit a 3-fold increase in PL intensity and an 85% increase in carrier lifetime under 500 s of continuous illumination at an intensity of 100 mW/cm2. Beyond the conventional strategy of suppressing defect trapping, we propose increasing the capability of dynamic defect tolerance as an effective strategy to enhance the optoelectronic properties and performance of perovskite solar cells.

5.

Effects of electroacupuncture on mitophagy mediated by SIRT3/PINK1/Parkin pathway in Parkinson's disease mice. / çµéå¯¹å¸éæ£®çå°é¼ SIRT3/PINK1/Parkinéè·¯ä»å¯¼ççº¿ç²ä½èªå¬çå½±å.

Zhang, Gui-Jun; Wang, Yao; Li, Jun-Ling; Ma, Jun; Wang, Yan-Chun.

Zhen Ci Yan Jiu ; 49(3): 221-230, 2024 Mar 25.

Article in English, Chinese | MEDLINE | ID: mdl-38500318

ABSTRACT

OBJECTIVES: To observe the effects of electroacupuncture (EA) at "Fengfu"(GV16), "Taichong"(LR3), and "Zusanli"(ST36) on mitophagy mediated by silencing regulatory protein 3 (SIRT3)/ PTEN induced putative kinase 1 (PINK1)/PARK2 gene coding protein (Parkin) in the midbrain substantia nigra of Parkinson's disease (PD) mice, and to explore the potential mechanisms of EA in treating PD. METHODS: C57BL/6 mice were randomly divided into the control, model, EA, and sham EA groups, with 12 mice in each group. The PD mouse model was established by intraperitoneal injection of 1-methyl-4-phenyl-1, 2, 3, 6-tetrahydropyridine (MPTP). The EA group received EA stimulation at GV16, LR3 and ST36, while the sham EA group received shallow needling 1 mm away from the above acupoints without electrical stimulation. The motor ability of mice in each group was evaluated using an open field experiment. Immunohistochemistry was used to detect the expression of tyrosine hydroxylase (TH) and α-synuclein (α-syn) in the substantia nigra of mice. The ultrastructure of neurons in substantia nigra was observed by transmission electron microscope (TEM). Immunofluorescence was used to detect the expression of the autophagy marker autophagy-associated protein light chain 3 (LC3). The expression levels of TH, α-syn, SIRT3, PINK1, Parkin, P62, Beclin-1, LC3â¡ mRNA and protein were detected by PCR and Western blot. RESULTS: Compared with the control group, mice in the model group showed a decrease in the total exercise distance, time, movement speed and times of crossing central region (P<0.01)ï¼the positive expressions of TH and LC3 were decreased (P<0.01), while the positive expression of α-syn increased (P<0.01), accompanied by mitochondrial swelling, mitochondrial cristae fragmentation and decrease, and decreased lysosome countï¼the expression levels of TH, SIRT3, PINK1, Parkin, Beclin-1, and LC3â¡ mRNA and protein in the midbrain substantia nigra were decreased (P<0.01), while the expression levels of α-syn and P62 mRNA and protein were increased (P<0.01, P<0.05). Compared with the model group, the mice in EA group showed a significant increase in the total exercise distance, time, movement speed and times of crossing central region (P<0.01, P<0.05)ï¼the positive expressions of TH and LC3 were increased (P<0.01, P<0.05), while the positive expression of α-syn was decreased (P<0.01), accompanied by an increase in mitochondrial count, appearance of autophagic va-cuoles, and a decrease in swelling, the expression levels of TH, SIRT3, PINK1, Parkin, Beclin-1 and LC3â¡ mRNA and protein in the midbrain substantia nigra were increased (P<0.01, P<0.05), while the mRNA and protein expression levels of α-syn and P62 were decreased (P<0.01)ï¼the sham EA group showed an increase in the total exercise distance and time(P<0.05), with an increase in the positive expression of TH (P<0.05) and a decrease in the positive expression of α-syn (P<0.05)ï¼some mitochondria exhibited swelling, and no autophagic vacuoles were observedï¼the protein expression levels of TH, SIRT3, Parkin and LC3â¡ were increased (P<0.01, P<0.05), and the expression levels of P62 mRNA, α-syn mRNA and protein were decreased (P<0.01, P<0.05), and LC3â¡ mRNA expression was increased (P<0.05). In comparison to the sham EA group, the EA group showed an extension in the total exercise time (P<0.01), the positive expression and mRNA expression levels of α-syn were decreased (P<0.01, P<0.05), while the expression levels of TH, SIRT3, PINK1, Parkin mRNA and SIRT3 protein were increased (P<0.05). CONCLUSIONS: EA at GV16, LR3, and ST36 can exert neuroprotective function and improve the motor ability of PD mice by activating the SIRT3/PINK1/Parkin pathway to enhance the expression of TH and reduce α-syn aggregation in the substantia nigra of PD mice.

Subject(s)

Electroacupuncture , Parkinson Disease , Sirtuin 3 , Mice , Animals , Parkinson Disease/genetics , Parkinson Disease/therapy , Sirtuin 3/genetics , Mitophagy/genetics , Protein Kinases/genetics , Beclin-1 , Mice, Inbred C57BL , Ubiquitin-Protein Ligases/genetics , Ubiquitin-Protein Ligases/metabolism , RNA, Messenger

6.

DEMO-EM2: assembling protein complex structures from cryo-EM maps through intertwined chain and domain fitting.

Zhang, Ziying; Cai, Yaxian; Zhang, Biao; Zheng, Wei; Freddolino, Lydia; Zhang, Guijun; Zhou, Xiaogen.

Brief Bioinform ; 25(2)2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38517699

ABSTRACT

The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.

Subject(s)

Cryoelectron Microscopy , Cryoelectron Microscopy/methods , Models, Molecular , Protein Conformation

7.

Protein Multiple Conformation Prediction Using Multi-Objective Evolution Algorithm.

Hou, Minghua; Jin, Sirong; Cui, Xinyue; Peng, Chunxiang; Zhao, Kailong; Song, Le; Zhang, Guijun.

Interdiscip Sci ; 2024 Jan 08.

Article in English | MEDLINE | ID: mdl-38190097

ABSTRACT

The breakthrough of AlphaFold2 and the publication of AlphaFold DB represent a significant advance in the field of predicting static protein structures. However, AlphaFold2 models tend to represent a single static structure, and multiple-conformation prediction remains a challenge. In this work, we proposed a method named MultiSFold, which uses a distance-based multi-objective evolutionary algorithm to predict multiple conformations. To begin, multiple energy landscapes are constructed using different competing constraints generated by deep learning. Subsequently, an iterative modal exploration and exploitation strategy is designed to sample conformations, incorporating multi-objective optimization, geometric optimization and structural similarity clustering. Finally, the final population is generated using a loop-specific sampling strategy to adjust the spatial orientations. MultiSFold was evaluated against state-of-the-art methods using a benchmark set containing 80 protein targets, each characterized by two representative conformational states. Based on the proposed metric, MultiSFold achieves a remarkable success ratio of 56.25% in predicting multiple conformations, while AlphaFold2 only achieves 10.00%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to generate conformations spanning the range between different conformational states. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate the performance of MultiSFold, with a TM-score better than that of AlphaFold2 by 2.97% and RoseTTAFold by 7.72%. The online server is at http://zhanglab-bioinf.com/MultiSFold .

8.

Recent Advances and Challenges in Protein Structure Prediction.

Peng, Chun-Xiang; Liang, Fang; Xia, Yu-Hao; Zhao, Kai-Long; Hou, Ming-Hua; Zhang, Gui-Jun.

J Chem Inf Model ; 64(1): 76-95, 2024 Jan 08.

Article in English | MEDLINE | ID: mdl-38109487

ABSTRACT

Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.

Subject(s)

Artificial Intelligence , Drug Discovery , Protein Folding , Research Design

9.

Multi-domain and complex protein structure prediction using inter-domain interactions from deep learning.

Xia, Yuhao; Zhao, Kailong; Liu, Dong; Zhou, Xiaogen; Zhang, Guijun.

Commun Biol ; 6(1): 1221, 2023 12 01.

Article in English | MEDLINE | ID: mdl-38040847

ABSTRACT

Accurately capturing domain-domain interactions is key to understanding protein function and designing structure-based drugs. Although AlphaFold2 has made a breakthrough on single domain, it should be noted that the structure modeling for multi-domain protein and complex remains a challenge. In this study, we developed a multi-domain and complex structure assembly protocol, named DeepAssembly, based on domain segmentation and single domain modeling algorithms. Firstly, DeepAssembly uses a population-based evolutionary algorithm to assemble multi-domain proteins by inter-domain interactions inferred from a developed deep learning network. Secondly, protein complexes are assembled by means of domains rather than chains using DeepAssembly. Experimental results show that on 219 multi-domain proteins, the average inter-domain distance precision by DeepAssembly is 22.7% higher than that of AlphaFold2. Moreover, DeepAssembly improves accuracy by 13.1% for 164 multi-domain structures with low confidence deposited in AlphaFold database. We apply DeepAssembly for the prediction of 247 heterodimers. We find that DeepAssembly successfully predicts the interface (DockQ ≥ 0.23) for 32.4% of the dimers, suggesting a lighter way to assemble complex structures by treating domains as assembly units and using inter-domain interactions learned from monomer structures.

Subject(s)

Deep Learning , Proteins/chemistry , Algorithms

10.

Assessing protein model quality based on deep graph coupled networks using protein language model.

Liu, Dong; Zhang, Biao; Liu, Jun; Li, Hui; Song, Le; Zhang, Guijun.

Brief Bioinform ; 25(1)2023 11 22.

Article in English | MEDLINE | ID: mdl-38018909

ABSTRACT

Model quality evaluation is a crucial part of protein structural biology. How to distinguish high-quality models from low-quality models, and to assess which high-quality models have relatively incorrect regions for improvement, are remain a challenge. More importantly, the quality assessment of multimer models is a hot topic for structure prediction. In this study, we propose GraphCPLMQA, a novel approach for evaluating residue-level model quality that combines graph coupled networks and embeddings from protein language models. The GraphCPLMQA consists of a graph encoding module and a transform-based convolutional decoding module. In encoding module, the underlying relational representations of sequence and high-dimensional geometry structure are extracted by protein language models with Evolutionary Scale Modeling. In decoding module, the mapping connection between structure and quality is inferred by the representations and low-dimensional features. Specifically, the triangular location and residue level contact order features are designed to enhance the association between the local structure and the overall topology. Experimental results demonstrate that GraphCPLMQA using single-sequence embedding achieves the best performance compared with the CASP15 residue-level interface evaluation methods among 9108 models in the local residue interface test set of CASP15 multimers. In CAMEO blind test (20 May 2022 to 13 August 2022), GraphCPLMQA ranked first compared with other servers (https://www.cameo3d.org/quality-estimation). GraphCPLMQA also outperforms state-of-the-art methods on 19, 035 models in CASP13 and CASP14 monomer test set.

Subject(s)

Computational Biology , Neural Networks, Computer , Computational Biology/methods , Proteins/chemistry , Language

11.

E2EDA: Protein Domain Assembly Based on End-to-End Deep Learning.

Zhu, Hai-Tao; Xia, Yu-Hao; Zhang, Gui-Jun.

J Chem Inf Model ; 63(20): 6451-6461, 2023 10 23.

Article in English | MEDLINE | ID: mdl-37788318

ABSTRACT

With the development of deep learning, almost all single-domain proteins can be predicted at experimental resolution. However, the structure prediction of multi-domain proteins remains a challenge. Achieving end-to-end protein domain assembly and further improving the accuracy of the full-chain modeling by accurately predicting inter-domain orientation while improving the assembly efficiency will provide significant insights into structure-based drug discovery. In this work, we propose an End-to-End Domain Assembly method based on deep learning, named E2EDA. We first develop RMNet, an EfficientNetV2-based deep learning model that fuses multiple features using an attention mechanism to predict inter-domain rigid motion. Then, the predicted rigid motions are transformed into inter-domain spatial transformations to directly assemble the full-chain model. Finally, the scoring strategy RMscore is designed to select the best model from multiple assembled models. The experimental results show that the average TM-score of the model assembled by E2EDA on the benchmark set (282) is 0.827, which is better than those of other domain assembly methods SADA (0.792) and DEMO (0.730). Meanwhile, on our constructed multi-domain data set from AlphaFold DB, the model reassembled by E2EDA is 7.0% higher in TM-score compared to the full-chain model predicted by AlphaFold2, indicating that E2EDA can capture more accurate inter-domain orientations to improve the quality of the model predicted by AlphaFold2. Furthermore, compared to SADA and AlphaFold2, E2EDA reduced the average runtime on the benchmark by 64.7% and 19.2%, respectively, indicating that E2EDA can significantly improve assembly efficiency through an end-to-end approach. The online server is available at http://zhanglab-bioinf.com/E2EDA.

Subject(s)

Deep Learning , Protein Domains , Proteins/chemistry

12.

Determinants of acute and subacute case-fatality in elderly patients with hypertensive intracerebral hemorrhage.

Zhu, Zhao-Ying; Hao, Li-Fang; Gao, Li-Chuan; Li, Xiao-Long; Zhao, Jie-Yi; Zhang, Tao; Zhang, Gui-Jun; You, Chao; Wang, Xiao-Yu.

Heliyon ; 9(10): e20781, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37876416

ABSTRACT

Background: Given that limited reports have described the survival and risk factors for elderly patients with hypertensive intracerebral hemorrhage (HICH), we aimed to develop a valid but simple prediction nomogram for the survival of HICH patients. Methods: All elderly patients ≥65 years old who were diagnosed with HICH between January 2011 and December 2019 were identified. We performed the least absolute shrinkage and selection operator (Lasso) on the Cox regression model with the R package glmnet. A concordance index was performed to calculate the nomogram discrimination; and calibration curves and decision curves were graphically evaluated by depicting the observed rates against the probabilities predicted by the nomogram. Results: A total of 204 eligible patients were analyzed, and over 20 % of the population was above the age of 80 (65-79 years old, n = 161; 80+ years old, n = 43). A hematoma volume ≥13.64 cm3 was associated with higher 7-day mortality (OR = 6.773, 95 % CI = 2.622-19.481; p < 0.001) and higher 90-day mortality (OR = 3.955, 95 % CI = 1.611-10.090, p = 0.003). A GCS score between 13 and 15 at admission was associated with a 7-day favorable outcome (OR = 0.025, 95 % CI = 0.005-0.086; p < 0.001) and a 90-day favorable outcome (OR = 0.033, 95 % CI = 0.010-0.099; p < 0.001). Conclusions: Our nomogram models were visualized and accurate. Neurosurgeons could use them to assess the prognostic factors and provide advice to patients and their relatives.

13.

Recent Advances in Protein Folding Pathway Prediction through Computational Methods.

Zhao, Kailong; Liang, Fang; Xia, Yuhao; Hou, Minghua; Zhang, Guijun.

Curr Med Chem ; 2023 Oct 11.

Article in English | MEDLINE | ID: mdl-37828669

ABSTRACT

The protein folding mechanisms are crucial to understanding the fundamental processes of life and solving many biological and medical problems. By studying the folding process, we can reveal how proteins achieve their biological functions through specific structures, providing insights into the treatment and prevention of diseases. With the advancement of AI technology in the field of protein structure prediction, computational methods have become increasingly important and promising for studying protein folding mechanisms. In this review, we retrospect the current progress in the field of protein folding mechanisms by computational methods from four perspectives: simulation of an inverse folding pathway from native state to unfolded state; prediction of early folding residues by machine learning; exploration of protein folding pathways through conformational sampling; prediction of protein folding intermediates based on templates. Finally, the challenges and future perspectives of the protein folding problem by computational methods are also discussed.

14.

DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes.

Liu, Jun; Liu, Dong; Zhang, Gui-Jun.

Bioinformatics ; 39(10)2023 Oct 03.

Article in English | MEDLINE | ID: mdl-37740296

ABSTRACT

MOTIVATION: Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. RESULTS: Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. AVAILABILITY AND IMPLEMENTATION: The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.

15.

Pathfinder: Protein folding pathway prediction based on conformational sampling.

Huang, Zhaohong; Cui, Xinyue; Xia, Yuhao; Zhao, Kailong; Zhang, Guijun.

PLoS Comput Biol ; 19(9): e1011438, 2023 09.

Article in English | MEDLINE | ID: mdl-37695768

ABSTRACT

The study of protein folding mechanism is a challenge in molecular biology, which is of great significance for revealing the movement rules of biological macromolecules, understanding the pathogenic mechanism of folding diseases, and designing protein engineering materials. Based on the hypothesis that the conformational sampling trajectory contain the information of folding pathway, we propose a protein folding pathway prediction algorithm named Pathfinder. Firstly, Pathfinder performs large-scale sampling of the conformational space and clusters the decoys obtained in the sampling. The heterogeneous conformations obtained by clustering are named seed states. Then, a resampling algorithm that is not constrained by the local energy basin is designed to obtain the transition probabilities of seed states. Finally, protein folding pathways are inferred from the maximum transition probabilities of seed states. The proposed Pathfinder is tested on our developed test set (34 proteins). For 11 widely studied proteins, we correctly predicted their folding pathways and specifically analyzed 5 of them. For 13 proteins, we predicted their folding pathways to be further verified by biological experiments. For 6 proteins, we analyzed the reasons for the low prediction accuracy. For the other 4 proteins without biological experiment results, potential folding pathways were predicted to provide new insights into protein folding mechanism. The results reveal that structural analogs may have different folding pathways to express different biological functions, homologous proteins may contain common folding pathways, and α-helices may be more prone to early protein folding than ß-strands.

Subject(s)

Algorithms , Molecular Biology , Cluster Analysis , Molecular Conformation , Protein Folding

16.

Improving DNA 6mA Site Prediction via Integrating Bidirectional Long Short-Term Memory, Convolutional Neural Network, and Self-Attention Mechanism.

Hu, Jun; Tang, Yu-Xuan; Zhou, Yu; Li, Zhe; Rao, Bing; Zhang, Gui-Jun.

J Chem Inf Model ; 63(17): 5689-5700, 2023 09 11.

Article in English | MEDLINE | ID: mdl-37603823

ABSTRACT

Identifying DNA N6-methyladenine (6mA) sites is significantly important to understanding the function of DNA. Many deep learning-based methods have been developed to improve the performance of 6mA site prediction. In this study, to further improve the performance of 6mA site prediction, we propose a new meta method, called Co6mA, to integrate bidirectional long short-term memory (BiLSTM), convolutional neural networks (CNNs), and self-attention mechanisms (SAM) via assembling two different deep learning-based models. The first model developed in this study is called CBi6mA, which is composed of CNN, BiLSTM, and fully connected modules. The second model is borrowed from LA6mA, which is an existing 6mA prediction method based on BiLSTM and SAM modules. Experimental results on two independent testing sets of different model organisms, i.e., Arabidopsis thaliana and Drosophila melanogaster, demonstrate that Co6mA can achieve an average accuracy of 91.8%, covering 89% of all 6mA samples while achieving an average Matthews correlation coefficient value (0.839), which is higher than the second-best method DeepM6A.

Subject(s)

Arabidopsis , Drosophila melanogaster , Animals , Memory, Short-Term , DNA , Neural Networks, Computer

17.

Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15.

Liu, Jun; Liu, Dong; He, Guangxing; Zhang, Guijun.

Proteins ; 91(12): 1861-1870, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37553848

ABSTRACT

This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.

Subject(s)

Deep Learning , Protein Conformation , Computational Biology/methods , Proteins/chemistry , Neural Networks, Computer

18.

GraphGPSM: a global scoring model for protein structure using graph neural networks.

He, Guangxing; Liu, Jun; Liu, Dong; Zhang, Guijun.

Brief Bioinform ; 24(4)2023 07 20.

Article in English | MEDLINE | ID: mdl-37317619

ABSTRACT

The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.

Subject(s)

Algorithms , Proteins , Protein Conformation , Databases, Protein , Proteins/chemistry , Neural Networks, Computer

19.

Identification and analysis of differentially expressed trihelix genes in maize (Zea mays) under abiotic stresses.

Zhao, Dongbo; Gao, Fengju; Guan, Peiyan; Gao, Jiansheng; Guo, Zhihui; Guo, Jianjun; Cui, Huini; Li, Yongjun; Zhang, Guijun; Li, Zhao; Guo, Lianghai.

PeerJ ; 11: e15312, 2023.

Article in English | MEDLINE | ID: mdl-37151290

ABSTRACT

Background: Trihelix transcription factors play important roles in triggering plant growth and imparting tolerance against biotic and abiotic stresses. However, a systematical analysis of the trihelix transcription factor family under heat and drought stresses in maize has not been reported. Methods: PlantTFDB and TBtools were employed to identify the trihelix domain-containing genes in the maize genome. The heat-regulated transcriptome data for maize were obtained from NCBI to screen differentially expressed ZmTHs genes through statistical analysis. The basic protein sequences, chromosomal localization, and subcellular localization were analyzed using Maize GDB, Expasy, SOMPA, TBtools, and Plant-mPLoc. The conserved motifs, evolutionary relationships, and cis-elements, were analyzed by MEME, MEGA7.0 and PlantCARE software, respectively. The tissue expression patterns of ZmTHs and their expression profiles under heat and drought stress were detected using quantitative real-time PCR (qRT-PCR). Results: A total of 44 trihelix family members were discovered, and members were distributed over 10 chromosomes in the maize genome. A total of 11 genes were identified that were regulated by heat stress; these were unevenly distributed on chromosomes 1, 2, 4, 5, and 10. ZmTHs encoded a total of 16 proteins, all of which were located in the nucleus; however, ZmTH04.1 was also distributed in the chloroplast. The protein length varied from 206 to 725 amino acids; the molecular weight ranged from 22.63 to 76.40 kD; and the theoretical isoelectric point (pI) ranged from 5.24 to 11.2. The protein's secondary structures were mainly found to be random coils and α-helices, with fewer instances of elongation chains and ß-rotations. Phylogenetic relationship analysis showed that these can be divided into five sub-groups. The conserved domain of ZmTHs was GT1 or MyB_DNA-Bind_4. The protein and gene structure of ZmTHs differed greatly among the subfamilies, while the structures within the subfamilies were similar. The promoter of ZmTHs contained abundant tissue-specific expression cis-acting elements and abiotic stress response elements. qRT-PCR analysis showed that ZmTHs expression levels were significantly different in different tissues. Furthermore, the expression of ZmTH08 was dramatically up-regulated by heat stress, while the expression of ZmTH03, ZmTH04, ZmTH05, ZmTH06, ZmTH07, ZmTH09, ZmTH10, and ZmTH11 were down-regulated by heat stress. Upon PEG-simulated drought stress, ZmTH06 was significantly up-regulated, while ZmTH01 and ZmTH07 were down-regulated. Conclusions: We performed a genome-wide, systematic identification and analysis of differentially expressed trihelix genes under heat and drought stresses in maize.

Subject(s)

Gene Expression Profiling , Zea mays , Zea mays/genetics , Phylogeny , Plant Proteins/genetics , Transcription Factors/genetics , Stress, Physiological/genetics

20.

Density and row spacing of short-season cotton suitable for machine picking in the cotton region of Yellow River Basin.

Li, Feng-Rui; Zhao, Wen-Chao; Zhang, Dong-Lou; Dong, Ling-Yan; Wang, Ru-Ming; Qi, Hong-Xin; Zhang, Chao; Zhang, Gui-Jun; Yang, Xiu-Feng; Shi, Jia-Liang.

Ying Yong Sheng Tai Xue Bao ; 34(4): 1002-1008, 2023 Apr.

Article in English | MEDLINE | ID: mdl-37078319

ABSTRACT

To determine the suitable planting density and row spacing of short-season cotton suitable for machine picking in the Yellow River Basin of China, we conducted a two-year field experiment in Dezhou during 2018-2019. The experiment followed a split-plot design, with planting density (82500 plants·hm-2 and 112500 plants·hm-2) as the main plots and row spacing (equal row spacing of 76 cm, wide-narrow row spacing of 66 cm+10 cm, equal row spacing of 60 cm) as the subplots. We examined the effects of planting density and row spacing on growth and development, canopy structure, seed cotton yield and fiber quality of short-season cotton. The results showed that plant height and LAI under high density treatment were significantly greater than those under low density treatment. The transmittance of the bottom layer was significantly lower than under low density treatment. Plant height under 76 cm equal row spacing was significantly higher than that under 60 cm equal row spacing, while that under wide-narrow row spacing (66 cm +10 cm) was significantly smaller than that under 60 cm equal row spacing in peak bolling stage. The effects of row spacing on LAI varied between the two years, densities, and growth stages. On the whole, the LAI under the wide-narrow row spacing (66 cm+10 cm) was higher, with the curve declining gently after the peak, and it was higher than that in the two cases of equal row spacing in the harvest time. The change in transmittance of the bottom layer presented the opposite trend. Density, row spacing, and their interaction had significant effects on seed cotton yield and its components. In both years, seed cotton yield was the highest (3832 kg·hm-2 in 2018, 3235 kg·hm-2 in 2019) under wide-narrow row spacing (66 cm+10 cm), and it was more stable at high densities. Fiber quality was less affected by density and row spacing. To sum up, the optimal density and row spacing of short-season cotton were as follows: density with 112500 plants·hm-2 and wide-narrow row spacing (66 cm+10 cm).

Subject(s)

Rivers , Seeds , Seasons , Biomass , Gossypium

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL