Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 190
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38436560

RESUMO

RNA is a complex macromolecule that plays central roles in the cell. While it is well known that its structure is directly related to its functions, understanding and predicting RNA structures is challenging. Assessing the real or predictive quality of a structure is also at stake with the complex 3D possible conformations of RNAs. Metrics have been developed to measure model quality while scoring functions aim at assigning quality to guide the discrimination of structures without a known and solved reference. Throughout the years, many metrics and scoring functions have been developed, and no unique assessment is used nowadays. Each developed assessment method has its specificity and might be complementary to understanding structure quality. Therefore, to evaluate RNA 3D structure predictions, it would be important to calculate different metrics and/or scoring functions. For this purpose, we developed RNAdvisor, a comprehensive automated software that integrates and enhances the accessibility of existing metrics and scoring functions. In this paper, we present our RNAdvisor tool, as well as state-of-the-art existing metrics, scoring functions and a set of benchmarks we conducted for evaluating them. Source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr.


Assuntos
Benchmarking , RNA , Modelos Estruturais , RNA/genética , Software
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38581420

RESUMO

Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.


Assuntos
Aprendizado Profundo , Proteínas , Proteínas/química , Ligação Proteica , Ligantes , Desenho de Fármacos
3.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37738401

RESUMO

Cracking the entangling code of protein-ligand interaction (PLI) is of great importance to structure-based drug design and discovery. Different physical and biochemical representations can be used to describe PLI such as energy terms and interaction fingerprints, which can be analyzed by machine learning (ML) algorithms to create ML-based scoring functions (MLSFs). Here, we propose the ML-based PLI capturer (ML-PLIC), a web platform that automatically characterizes PLI and generates MLSFs to identify the potential binders of a specific protein target through virtual screening (VS). ML-PLIC comprises five modules, including Docking for ligand docking, Descriptors for PLI generation, Modeling for MLSF training, Screening for VS and Pipeline for the integration of the aforementioned functions. We validated the MLSFs constructed by ML-PLIC in three benchmark datasets (Directory of Useful Decoys-Enhanced, Active as Decoys and TocoDecoy), demonstrating accuracy outperforming traditional docking tools and competitive performance to the deep learning-based SF, and provided a case study of the Serine/threonine-protein kinase WEE1 in which MLSFs were developed by using the ML-based VS pipeline in ML-PLIC. Underpinning the latest version of ML-PLIC is a powerful platform that incorporates physical and biological knowledge about PLI, leveraging PLI characterization and MLSF generation into the design of structure-based VS pipeline. The ML-PLIC web platform is now freely available at http://cadd.zju.edu.cn/plic/.


Assuntos
Algoritmos , Benchmarking , Ligantes , Desenho de Fármacos , Aprendizado de Máquina
4.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36627113

RESUMO

Protein-ligand binding affinity prediction is an important task in structural bioinformatics for drug discovery and design. Although various scoring functions (SFs) have been proposed, it remains challenging to accurately evaluate the binding affinity of a protein-ligand complex with the known bound structure because of the potential preference of scoring system. In recent years, deep learning (DL) techniques have been applied to SFs without sophisticated feature engineering. Nevertheless, existing methods cannot model the differential contribution of atoms in various regions of proteins, and the relationship between atom properties and intermolecular distance is also not fully explored. We propose a novel empirical graph neural network for accurate protein-ligand binding affinity prediction (EGNA). Graphs of protein, ligand and their interactions are constructed based on different regions of each bound complex. Proteins and ligands are effectively represented by graph convolutional layers, enabling the EGNA to capture interaction patterns precisely by simulating empirical SFs. The contributions of different factors on binding affinity can thus be transparently investigated. EGNA is compared with the state-of-the-art machine learning-based SFs on two widely used benchmark data sets. The results demonstrate the superiority of EGNA and its good generalization capability.


Assuntos
Redes Neurais de Computação , Proteínas , Ligantes , Proteínas/química , Ligação Proteica , Algoritmos
5.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36502369

RESUMO

The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.


Assuntos
Aprendizado Profundo , Ligantes , Simulação de Acoplamento Molecular , Proteínas/química , Desenho de Fármacos , Ligação Proteica
6.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36573474

RESUMO

Covalent inhibitors have received extensive attentions in the past few decades because of their long residence time, high binding efficiency and strong selectivity. Therefore, it is valuable to develop computational tools like molecular docking for modeling of covalent protein-ligand interactions or screening of potential covalent drugs. Meeting the needs, we have proposed HCovDock, an efficient docking algorithm for covalent protein-ligand interactions by integrating a ligand sampling method of incremental construction and a scoring function with covalent bond-based energy. Tested on a benchmark containing 207 diverse protein-ligand complexes, HCovDock exhibits a significantly better performance than seven other state-of-the-art covalent docking programs (AutoDock, Cov_DOX, CovDock, FITTED, GOLD, ICM-Pro and MOE). With the criterion of ligand root-mean-squared distance < 2.0 Å, HCovDock obtains a high success rate of 70.5% and 93.2% in reproducing experimentally observed structures for top 1 and top 10 predictions. In addition, HCovDock is also validated in virtual screening against 10 receptors of three proteins. HCovDock is computationally efficient and the average running time for docking a ligand is only 5 min with as fast as 1 sec for ligands with one rotatable bond and about 18 min for ligands with 23 rotational bonds. HCovDock can be freely assessed at http://huanglab.phys.hust.edu.cn/hcovdock/.


Assuntos
Algoritmos , Proteínas , Simulação de Acoplamento Molecular , Ligantes , Proteínas/química , Ligação Proteica
7.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36681903

RESUMO

Binding affinity prediction largely determines the discovery efficiency of lead compounds in drug discovery. Recently, machine learning (ML)-based approaches have attracted much attention in hopes of enhancing the predictive performance of traditional physics-based approaches. In this study, we evaluated the impact of structural dynamic information on the binding affinity prediction by comparing the models trained on different dimensional descriptors, using three targets (i.e. JAK1, TAF1-BD2 and DDR1) and their corresponding ligands as the examples. Here, 2D descriptors are traditional ECFP4 fingerprints, 3D descriptors are the energy terms of the Smina and NNscore scoring functions and 4D descriptors contain the structural dynamic information derived from the trajectories based on molecular dynamics (MD) simulations. We systematically investigate the MD-refined binding affinity prediction performance of three classical ML algorithms (i.e. RF, SVR and XGB) as well as two common virtual screening methods, namely Glide docking and MM/PBSA. The outcomes of the ML models built using various dimensional descriptors and their combinations reveal that the MD refinement with the optimized protocol can improve the predictive performance on the TAF1-BD2 target with considerable structural flexibility, but not for the less flexible JAK1 and DDR1 targets, when taking docking poses as the initial structure instead of the crystal structures. The results highlight the importance of the initial structures to the final performance of the model through conformational analysis on the three targets with different flexibility.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Ligantes , Proteínas/química , Ligação Proteica , Aprendizado de Máquina , Simulação de Acoplamento Molecular
8.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37874948

RESUMO

Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.


Assuntos
Aprendizado de Máquina , Peptídeo Hidrolases , Peptídeo Hidrolases/metabolismo , Especificidade por Substrato , Algoritmos
9.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36642412

RESUMO

Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.


Assuntos
Proteínas , Proteínas/metabolismo , Bases de Dados Factuais , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica
10.
BMC Bioinformatics ; 25(1): 129, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38532339

RESUMO

BACKGROUND: The RNA-Recognition motif (RRM) is a protein domain that binds single-stranded RNA (ssRNA) and is present in as much as 2% of the human genome. Despite this important role in biology, RRM-ssRNA interactions are very challenging to study on the structural level because of the remarkable flexibility of ssRNA. In the absence of atomic-level experimental data, the only method able to predict the 3D structure of protein-ssRNA complexes with any degree of accuracy is ssRNA'TTRACT, an ssRNA fragment-based docking approach using ATTRACT. However, since ATTRACT parameters are not ssRNA-specific and were determined in 2010, there is substantial opportunity for enhancement. RESULTS: Here we present HIPPO, a composite RRM-ssRNA scoring potential derived analytically from contact frequencies in near-native versus non-native docking models. HIPPO consists of a consensus of four distinct potentials, each extracted from a distinct reference pool of protein-trinucleotide docking decoys. To score a docking pose with one potential, for each pair of RNA-protein coarse-grained bead types, each contact is awarded or penalised according to the relative frequencies of this contact distance range among the correct and incorrect poses of the reference pool. Validated on a fragment-based docking benchmark of 57 experimentally solved RRM-ssRNA complexes, HIPPO achieved a threefold or higher enrichment for half of the fragments, versus only a quarter with the ATTRACT scoring function. In particular, HIPPO drastically improved the chance of very high enrichment (12-fold or higher), a scenario where the incremental modelling of entire ssRNA chains from fragments becomes viable. However, for the latter result, more research is needed to make it directly practically applicable. Regardless, our approach already improves upon the state of the art in RRM-ssRNA modelling and is in principle extendable to other types of protein-nucleic acid interactions.


Assuntos
Proteínas , RNA , Humanos , Ligação Proteica , Proteínas/química , RNA/química , Simulação de Acoplamento Molecular , Conformação Proteica
11.
J Comput Chem ; 2024 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-39325045

RESUMO

Human dihydroorotate dehydrogenase (hDHODH) is a flavin mononucleotide-dependent enzyme that can limit de novo pyrimidine synthesis, making it a therapeutic target for diseases such as autoimmune disorders and cancer. In this study, using the docking structures of complexes generated by AutoDock Vina, we integrate interaction features and ligand features, and employ support vector regression to develop a target-specific scoring function for hDHODH (TSSF-hDHODH). The Pearson correlation coefficient values of TSSF-hDHODH in the cross-validation and external validation are 0.86 and 0.74, respectively, both of which are far superior to those of classic scoring function AutoDock Vina and random forest (RF) based generic scoring function RF-Score. TSSF-hDHODH is further used for the virtual screening of potential inhibitors in the FDA-Approved & Pharmacopeia Drug Library. In conjunction with the results from molecular dynamics simulations, crizotinib is identified as a candidate for subsequent structural optimization. This study can be useful for the discovery of hDHODH inhibitors and the development of scoring functions for additional targets.

12.
J Comput Chem ; 45(27): 2333-2346, 2024 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-38900052

RESUMO

Classical scoring functions may exhibit low accuracy in determining ligand binding affinity for proteins. The availability of both protein-ligand structures and affinity data make it possible to develop machine-learning models focused on specific protein systems with superior predictive performance. Here, we report a new methodology named SAnDReS that combines AutoDock Vina 1.2 with 54 regression methods available in Scikit-Learn to calculate binding affinity based on protein-ligand structures. This approach allows exploration of the scoring function space. SAnDReS generates machine-learning models based on crystal, docked, and AlphaFold-generated structures. As a proof of concept, we examine the performance of SAnDReS-generated models in three case studies. For all three cases, our models outperformed classical scoring functions. Also, SAnDReS-generated models showed predictive performance close to or better than other machine-learning models such as KDEEP, CSM-lig, and ΔVinaRF20. SAnDReS 2.0 is available to download at https://github.com/azevedolab/sandres.


Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Proteínas/metabolismo , Ligantes , Software , Simulação de Acoplamento Molecular
13.
J Comput Chem ; 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39223071

RESUMO

Predicting protein-ligand binding affinity is a crucial and challenging task in structure-based drug discovery. With the accumulation of complex structures and binding affinity data, various machine-learning scoring functions, particularly those based on deep learning, have been developed for this task, exhibiting superiority over their traditional counterparts. A fusion model sequentially connecting a graph neural network (GNN) and a convolutional neural network (CNN) to predict protein-ligand binding affinity is proposed in this work. In this model, the intermediate outputs of the GNN layers, as supplementary descriptors of atomic chemical environments at different levels, are concatenated with the input features of CNN. The model demonstrates a noticeable improvement in performance on CASF-2016 benchmark compared to its constituent CNN models. The generalization ability of the model is evaluated by setting a series of thresholds for ligand extended-connectivity fingerprint similarity or protein sequence similarity between the training and test sets. Masking experiment reveals that model can capture key interaction regions. Furthermore, the fusion model is applied to a virtual screening task for a novel target, PI5P4Kα. The fusion strategy significantly improves the ability of the constituent CNN model to identify active compounds. This work offers a novel approach to enhancing the accuracy of deep learning models in predicting binding affinity through fusion strategies.

14.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35289359

RESUMO

Scoring functions are important components in molecular docking for structure-based drug discovery. Traditional scoring functions, generally empirical- or force field-based, are robust and have proven to be useful for identifying hits and lead optimizations. Although multiple highly accurate deep learning- or machine learning-based scoring functions have been developed, their direct applications for docking and screening are limited. We describe a novel strategy to develop a reliable protein-ligand scoring function by augmenting the traditional scoring function Vina score using a correction term (OnionNet-SFCT). The correction term is developed based on an AdaBoost random forest model, utilizing multiple layers of contacts formed between protein residues and ligand atoms. In addition to the Vina score, the model considerably enhances the AutoDock Vina prediction abilities for docking and screening tasks based on different benchmarks (such as cross-docking dataset, CASF-2016, DUD-E and DUD-AD). Furthermore, our model could be combined with multiple docking applications to increase pose selection accuracies and screening abilities, indicating its wide usage for structure-based drug discoveries. Furthermore, in a reverse practice, the combined scoring strategy successfully identified multiple known receptors of a plant hormone. To summarize, the results show that the combination of data-driven model (OnionNet-SFCT) and empirical scoring function (Vina score) is a good scoring strategy that could be useful for structure-based drug discoveries and potentially target fishing in future.


Assuntos
Descoberta de Drogas , Proteínas , Descoberta de Drogas/métodos , Ligantes , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química
15.
J Comput Aided Mol Des ; 38(1): 15, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38532176

RESUMO

Here, we introduce the use of ANI-ML potentials as a rescoring function in the host-guest interaction in molecular docking. Our results show that the "docking power" of ANI potentials can compete with the current scoring functions at the same level of computational cost. Benchmarking studies on CASF-2016 dataset showed that ANI is ranked in the top 5 scoring functions among the other 34 tested. In particular, the ANI predicted interaction energies when used in conjunction with GOLD-PLP scoring function can boost the top ranked solution to be the closest to the x-ray structure. Rapid and accurate calculation of interaction energies between ligand and protein also enables screening of millions of drug candidates/docking poses. Using a unique protocol in which docking by GOLD-PLP, rescoring by ANI-ML potentials and extensive MD simulations along with end state free energy methods are combined, we have screened FDA approved drugs against the SARS-CoV-2 main protease (Mpro). The top six drug molecules suggested by the consensus of these free energy methods have already been in clinical trials or proposed as potential drug molecules in previous theoretical and experimental studies, approving the validity and the power of accuracy in our screening method.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Simulação de Acoplamento Molecular , Ligação Proteica , Benchmarking , Inibidores de Proteases
16.
Molecules ; 29(15)2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39125005

RESUMO

Polarization and charge-transfer interactions play an important role in ligand-receptor complexes containing metals, and only quantum mechanics methods can adequately describe their contribution to the binding energy. In this work, we selected a set of benzenesulfonamide ligands of human Carbonic Anhydrase II (hCA II)-an important druggable target containing a Zn2+ ion in the active site-as a case study to predict the binding free energy in metalloprotein-ligand complexes and designed specialized computational methods that combine the ab initio fragment molecular orbital (FMO) method and GRID approach. To reproduce the experimental binding free energy in these systems, we adopted a machine-learning approach, here named formula generator (FG), considering different FMO energy terms, the hydrophobic interaction energy (computed by GRID) and logP. The main advantage of the FG approach is that it can find nonlinear relations between the energy terms used to predict the binding free energy, explicitly showing their mathematical relation. This work showed the effectiveness of the FG approach, and therefore, it might represent an important tool for the development of new scoring functions. Indeed, our scoring function showed a high correlation with the experimental binding free energy (R2 = 0.76-0.95, RMSE = 0.34-0.18), revealing a nonlinear relation between energy terms and highlighting the relevant role played by hydrophobic contacts. These results, along with the FMO characterization of ligand-receptor interactions, represent important information to support the design of new and potent hCA II inhibitors.


Assuntos
Anidrase Carbônica II , Inibidores da Anidrase Carbônica , Ligação Proteica , Ligantes , Anidrase Carbônica II/antagonistas & inibidores , Anidrase Carbônica II/química , Anidrase Carbônica II/metabolismo , Humanos , Inibidores da Anidrase Carbônica/química , Inibidores da Anidrase Carbônica/farmacologia , Termodinâmica , Interações Hidrofóbicas e Hidrofílicas , Sulfonamidas/química , Sulfonamidas/farmacologia , Metaloproteínas/química , Metaloproteínas/antagonistas & inibidores , Metaloproteínas/metabolismo , Modelos Moleculares , Aprendizado de Máquina , Benzenossulfonamidas , Sítios de Ligação
17.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32484221

RESUMO

Machine learning-based scoring functions (MLSFs) have attracted extensive attention recently and are expected to be potential rescoring tools for structure-based virtual screening (SBVS). However, a major concern nowadays is whether MLSFs trained for generic uses rather than a given target can consistently be applicable for VS. In this study, a systematic assessment was carried out to re-evaluate the effectiveness of 14 reported MLSFs in VS. Overall, most of these MLSFs could hardly achieve satisfactory results for any dataset, and they could even not outperform the baseline of classical SFs such as Glide SP. An exception was observed for RFscore-VS trained on the Directory of Useful Decoys-Enhanced dataset, which showed its superiority for most targets. However, in most cases, it clearly illustrated rather limited performance on the targets that were dissimilar to the proteins in the corresponding training sets. We also used the top three docking poses rather than the top one for rescoring and retrained the models with the updated versions of the training set, but only minor improvements were observed. Taken together, generic MLSFs may have poor generalization capabilities to be applicable for the real VS campaigns. Therefore, it should be quite cautious to use this type of methods for VS.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Interface Usuário-Computador , Conjuntos de Dados como Assunto , Simulação de Acoplamento Molecular , Estrutura Molecular , Ligação Proteica
18.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33758923

RESUMO

Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.


Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Simulação de Acoplamento Molecular/métodos , Proteínas Quinases/metabolismo , Receptores Acoplados a Proteínas G/metabolismo , Algoritmos , Cristalização , Bases de Dados de Proteínas , Avaliação Pré-Clínica de Medicamentos/métodos , Humanos , Ligantes , Estrutura Molecular , Ligação Proteica , Proteínas Quinases/química , Receptores Acoplados a Proteínas G/química
19.
Brief Bioinform ; 22(1): 497-514, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31982914

RESUMO

How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.


Assuntos
Desenvolvimento de Medicamentos/métodos , Aprendizado de Máquina/normas , Proteômica/métodos , Animais , Desenvolvimento de Medicamentos/normas , Humanos , Ligantes , Ligação Proteica , Proteoma/metabolismo , Proteômica/normas
20.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32496540

RESUMO

Scoring functions (SFs) based on complex machine learning (ML) algorithms have gradually emerged as a promising alternative to overcome the weaknesses of classical SFs. However, extensive efforts have been devoted to the development of SFs based on new protein-ligand interaction representations and advanced alternative ML algorithms instead of the energy components obtained by the decomposition of existing SFs. Here, we propose a new method named energy auxiliary terms learning (EATL), in which the scoring components are extracted and used as the input for the development of three levels of ML SFs including EATL SFs, docking-EATL SFs and comprehensive SFs with ascending VS performance. The EATL approach not only outperforms classical SFs for the absolute performance (ROC) and initial enrichment (BEDROC) but also yields comparable performance compared with other advanced ML-based methods on the diverse subset of Directory of Useful Decoys: Enhanced (DUD-E). The test on the relatively unbiased actives as decoys (AD) dataset also proved the effectiveness of EATL. Furthermore, the idea of learning from SF components to yield improved screening power can also be extended to other docking programs and SFs available.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Proteínas/química , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA