Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36502369

ABSTRACT

The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.


Subject(s)
Deep Learning , Ligands , Molecular Docking Simulation , Proteins/chemistry , Drug Design , Protein Binding
2.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35289359

ABSTRACT

Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions, generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications for docking and screening are limited. We describe a novel strategy to develop a reliable protein-ligand scoring function by augmenting the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structure-based drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and potentially target fishing in future.


Subject(s)
Drug Discovery , Proteins , Drug Discovery/methods , Ligands , Machine Learning , Molecular Docking Simulation , Protein Binding , Proteins/chemistry
3.
Bioinformatics ; 38(14): 3574-3581, 2022 07 11.
Article in English | MEDLINE | ID: mdl-35652719

ABSTRACT

MOTIVATION: Protein secondary structure prediction (PSSP) is one of the fundamental and challenging problems in the field of computational biology. Accurate PSSP relies on sufficient homologous protein sequences to build the multiple sequence alignment (MSA). Unfortunately, many proteins lack homologous sequences, which results in the low quality of MSA and poor performance. In this article, we propose the novel dynamic scoring matrix (DSM)-Distil to tackle this issue, which takes advantage of the pretrained BERT and exploits the knowledge distillation on the newly designed DSM features. Specifically, we propose the DSM to replace the widely used profile and PSSM (position-specific scoring matrix) features. DSM could automatically dig for the suitable feature for each residue, based on the original profile. Namely, DSM-Distil not only could adapt to the low homologous proteins but also is compatible with high homologous ones. Thanks to the dynamic property, DSM could adapt to the input data much better and achieve higher performance. Moreover, to compensate for low-quality MSA, we propose to generate the pseudo-DSM from a pretrained BERT model and aggregate it with the original DSM by adaptive residue-wise fusion, which helps to build richer and more complete input features. In addition, we propose to supervise the learning of low-quality DSM features using high-quality ones. To achieve this, a novel teacher-student model is designed to distill the knowledge from proteins with high homologous sequences to that of low ones. Combining all the proposed methods, our model achieves the new state-of-the-art performance for low homologous proteins. RESULTS: Compared with the previous state-of-the-art method 'Bagging', DSM-Distil achieves an improvement about 5% and 7.3% improvement for proteins with MSA count ≤30 and extremely low homologous cases, respectively. We also compare DSM-Distil with Alphafold2 which is a state-of-the-art framework for protein structure prediction. DSM-Distil outperforms Alphafold2 by 4.1% on extremely low-quality MSA on 8-state secondary structure prediction. Moreover, we release a large-scale up-to-date test dataset BC40 for low-quality MSA structure prediction evaluation. AVAILABILITY AND IMPLEMENTATION: BC40 dataset: https://drive.google.com/drive/folders/15vwRoOjAkhhwfjDk6-YoKGf4JzZXIMC. HardCase dataset: https://drive.google.com/drive/folders/1BvduOr2b7cObUHy6GuEWk-aUkKJgzTUv. Code: https://github.com/qinwang-ai/DSM-Distil.


Subject(s)
Computational Biology , Neural Networks, Computer , Humans , Protein Structure, Secondary , Sequence Alignment , Computational Biology/methods , Position-Specific Scoring Matrices , Proteins/chemistry
4.
Nat Methods ; 16(2): 199-204, 2019 02.
Article in English | MEDLINE | ID: mdl-30664775

ABSTRACT

We present a robust, computationally efficient method ( https://github.com/kussell-lab/mcorr ) for inferring the parameters of homologous recombination in bacteria, which can be applied in diverse datasets, from whole-genome sequencing to metagenomic shotgun sequencing data. Using correlation profiles of synonymous substitutions, we determine recombination rates and diversity levels of the shared gene pool that has contributed to a given sample. We validated the recombination parameters using data from laboratory experiments. We determined the recombination parameters for a wide range of bacterial species, and inferred the distribution of shared gene pools for global Helicobacter pylori isolates. Using metagenomics data of the infant gut microbiome, we measured the recombination parameters of multidrug-resistant Escherichia coli ST131. Lastly, we analyzed ancient samples of bacterial DNA from the Copper Age 'Iceman' mummy and from 14th century victims of the Black Death, obtaining measurements of bacterial recombination rates and gene pool diversity of earlier eras.


Subject(s)
Computational Biology/methods , DNA, Ancient , Drug Resistance, Bacterial/genetics , Metagenomics/methods , Recombination, Genetic , Sequence Analysis, DNA , Computer Simulation , DNA, Bacterial , Databases, Genetic , Escherichia coli/genetics , Gastrointestinal Microbiome , Genetic Techniques , Genetic Variation , Helicobacter pylori/genetics , History, Medieval , Humans , Models, Genetic , Mutation , Plague/history , Plague/microbiology , Yersinia pestis/genetics
5.
Plant Cell ; 23(3): 911-22, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21441435

ABSTRACT

Predicted interactions are a valuable complement to experimentally reported interactions in molecular mechanism studies, particularly for higher organisms, for which reported experimental interactions represent only a small fraction of their total interactomes. With careful engineering consideration of the lessons from previous efforts, the predicted arabidopsis interactome resource (PAIR; ) presents 149,900 potential molecular interactions, which are expected to cover approximately 24% of the entire interactome with approximately 40% precision. This study demonstrates that, although PAIR still has limited coverage, it is rich enough to capture many significant functional linkages within and between higher-order biological systems, such as pathways and biological processes. These inferred interactions can nicely power several network topology-based systems biology analyses, such as gene set linkage analysis, protein function prediction, and identification of regulatory genes demonstrating insignificant expression changes. The drastically expanded molecular network in PAIR has considerably improved the capability of these analyses to integrate existing knowledge and suggest novel insights into the function and coordination of genes and gene networks.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , Databases, Protein , Gene Expression Profiling , Protein Interaction Mapping/methods , Arabidopsis/metabolism , Arabidopsis Proteins/metabolism , Computational Biology , Gene Expression Regulation, Plant , Genetic Linkage , Metabolic Networks and Pathways , Software
6.
Nucleic Acids Res ; 39(Database issue): D1134-40, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20952401

ABSTRACT

The predicted Arabidopsis interactome resource (PAIR, http://www.cls.zju.edu.cn/pair/), comprised of 5990 experimentally reported molecular interactions in Arabidopsis thaliana together with 145,494 predicted interactions, is currently the most comprehensive data set of the Arabidopsis interactome with high reliability. PAIR predicts interactions by a fine-tuned support vector machine model that integrates indirect evidences for interaction, such as gene co-expressions, domain interactions, shared GO annotations, co-localizations, phylogenetic profile similarities and homologous interactions in other organisms (interologs). These predictions were expected to cover 24% of the entire Arabidopsis interactome, and their reliability was estimated to be 44%. Two independent example data sets were used to rigorously validate the prediction accuracy. PAIR features a user-friendly query interface, providing rich annotation on the relationships between two proteins. A graphical interaction network browser has also been integrated into the PAIR web interface to facilitate mining of specific pathways.


Subject(s)
Arabidopsis Proteins/metabolism , Databases, Protein , Protein Interaction Mapping , Computer Graphics , User-Computer Interface
7.
Elife ; 112022 07 08.
Article in English | MEDLINE | ID: mdl-35801696

ABSTRACT

Recombination is essential to microbial evolution, and is involved in the spread of antibiotic resistance, antigenic variation, and adaptation to the host niche. However, assessing the impact of homologous recombination on accessory genes which are only present in a subset of strains of a given species remains challenging due to their complex phylogenetic relationships. Quantifying homologous recombination for accessory genes (which are important for niche-specific adaptations) in comparison to core genes (which are present in all strains and have essential functions) is critical to understanding how selection acts on variation to shape species diversity and genome structures of bacteria. Here, we apply a computationally efficient, non-phylogenetic approach to measure homologous recombination rates in the core and accessory genome using >100,000 whole genome sequences from Streptococcus pneumoniae and several additional species. By analyzing diverse sets of sequence clusters, we show that core genes often have higher recombination rates than accessory genes, and for some bacterial species the associated effect sizes for these differences are pronounced. In a subset of species, we find that gene frequency and homologous recombination rate are positively correlated. For S. pneumoniae and several additional species, we find that while the recombination rate is higher for the core genome, the mutational divergence is lower, indicating that divergence-based homologous recombination barriers could contribute to differences in recombination rates between the core and accessory genome. Homologous recombination may therefore play a key role in increasing the efficiency of selection in the most conserved parts of the genome.


Subject(s)
Evolution, Molecular , Genome, Bacterial , Bacteria/genetics , Genome, Bacterial/genetics , Homologous Recombination/genetics , Phylogeny , Streptococcus pneumoniae/genetics
8.
Am J Transl Res ; 13(3): 1717-1725, 2021.
Article in English | MEDLINE | ID: mdl-33841694

ABSTRACT

OBJECTIVE: To analyze the effect of predictive nursing on postoperative rehabilitation index and complications in patients after hip replacement with maintenance hemodialysis. METHODS: A total of 81 cases of patients underwent hip replacement and maintenance hemodialysis in our hospital were selected as the research objects and divided into study group (n=41) and control group (n=40) with retrospective analysis method based on different intervention method. Patients in the study group received predictive nursing, while patients in the control group received routine nursing. The hip function and activity, duration of walking with and without crutches, adverse emotions, pain and the incidence of various complications after intervention were compared between the two groups. RESULTS: There was no significant difference in Harris score between the two groups at 7 days after intervention (P>0.05); the Harris scores of the study group were significantly higher than those of the control group (P<0.05) at 1, 3 and 6 months after intervention. Before intervention, there was no significant difference in hip activity between the two groups (P>0.05); 3 months surgery operation, the hip extension, abduction and rotation angle of the study group were significantly higher than those of the control group (P<0.05); the duration of walking with and without crutches in the study group were significantly shorter than those in the control group (P<0.05); the scores of adverse emotions, pain and complications in the study group were significantly lower than those in the control group (P<0.05). CONCLUSION: The implementation of predictive nursing to patients underwent hip replacement and maintenance hemodialysis can improve hip activity and joint function after surgery, accelerate postoperative recovery, relieve postoperative pain symptoms, and reduce the incidence of various complications.

9.
Genetics ; 205(2): 891-917, 2017 02.
Article in English | MEDLINE | ID: mdl-28007887

ABSTRACT

Inferring the rate of homologous recombination within a bacterial population remains a key challenge in quantifying the basic parameters of bacterial evolution. Due to the high sequence similarity within a clonal population, and unique aspects of bacterial DNA transfer processes, detecting recombination events based on phylogenetic reconstruction is often difficult, and estimating recombination rates using coalescent model-based methods is computationally expensive, and often infeasible for large sequencing data sets. Here, we present an efficient solution by introducing a set of mutational correlation functions computed using pairwise sequence comparison, which characterize various facets of bacterial recombination. We provide analytical expressions for these functions, which precisely recapitulate simulation results of neutral and adapting populations under different coalescent models. We used these to fit correlation functions measured at synonymous substitutions using whole-genome data on Escherichia coli and Streptococcus pneumoniae populations. We calculated and corrected for the effect of sample selection bias, i.e., the uneven sampling of individuals from natural microbial populations that exists in most datasets. Our method is fast and efficient, and does not employ phylogenetic inference or other computationally intensive numerics. By simply fitting analytical forms to measurements from sequence data, we show that recombination rates can be inferred, and the relative ages of different samples can be estimated. Our approach, which is based on population genetic modeling, is broadly applicable to a wide variety of data, and its computational efficiency makes it particularly attractive for use in the analysis of large sequencing datasets.


Subject(s)
Genome, Bacterial , Homologous Recombination , Models, Genetic , Mutation , Selection, Genetic , Algorithms , Escherichia coli/genetics , Evolution, Molecular , Streptococcus pneumoniae/genetics
10.
Plant Physiol ; 151(1): 34-46, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19592425

ABSTRACT

Knowledge of the protein interaction network is useful to assist molecular mechanism studies. Several major repositories have been established to collect and organize reported protein interactions. Many interactions have been reported in several model organisms, yet a very limited number of plant interactions can thus far be found in these major databases. Computational identification of potential plant interactions, therefore, is desired to facilitate relevant research. In this work, we constructed a support vector machine model to predict potential Arabidopsis (Arabidopsis thaliana) protein interactions based on a variety of indirect evidence. In a 100-iteration bootstrap evaluation, the confidence of our predicted interactions was estimated to be 48.67%, and these interactions were expected to cover 29.02% of the entire interactome. The sensitivity of our model was validated with an independent evaluation data set consisting of newly reported interactions that did not overlap with the examples used in model training and testing. Results showed that our model successfully recognized 28.91% of the new interactions, similar to its expected sensitivity (29.02%). Applying this model to all possible Arabidopsis protein pairs resulted in 224,206 potential interactions, which is the largest and most accurate set of predicted Arabidopsis interactions at present. In order to facilitate the use of our results, we present the Predicted Arabidopsis Interactome Resource, with detailed annotations and more specific per interaction confidence measurements. This database and related documents are freely accessible at http://www.cls.zju.edu.cn/pair/.


Subject(s)
Arabidopsis Proteins/metabolism , Arabidopsis/genetics , Arabidopsis/metabolism , Computer Simulation , Gene Expression Regulation, Plant/physiology , Arabidopsis Proteins/genetics , Meiosis/physiology , Recombination, Genetic
11.
Proteomics ; 7(23): 4255-63, 2007 Dec.
Article in English | MEDLINE | ID: mdl-17963289

ABSTRACT

Current drug discovery and development approaches rely extensively on the identification and validation of appropriate targets; for example, those with marketable and robust therapeutics. Wide-ranging efforts have been directed at this problem and various approaches have been developed to identify disease-associated genes as candidates. In this work, we show with statistical significance that successful drug targets, in addition to their linkage to disease, share common characteristics that are disease-independent. For example, marked differences in functional category, tissue specificity, and sequence variability are observed between known targets and average proteins. These results lead to an interesting hypothesis: potentially good drug targets shall have some desired properties, which we refer to as "drug target-likeness" that are beyond their disease-associations. Because of the limited availability of comprehensive protein characteristics data, we tried to learn the drug target-likeness property at the sequence level. Results show that a support vector machine model is able to accurately distinguish targets from nontargets entirely with sequence features. It is our hope that these encouraging results will invite future systematic proteomic scale experiments to gather necessary protein characteristics data for the accurate and predictive definition of "drug target-likeness", providing a new perspective toward understanding and pursuing effective therapeutics.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Models, Statistical , Pharmaceutical Preparations/chemistry , Proteins/chemistry , Algorithms , Amino Acid Sequence , Databases, Protein , Gene Expression , Genetic Variation , Humans , Pharmaceutical Preparations/metabolism , Protein Binding , Proteins/genetics , Proteins/metabolism , ROC Curve , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL