Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38385872

RESUMEN

Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT.


Asunto(s)
Aprendizaje Profundo , Humanos , Desarrollo de Medicamentos , Descubrimiento de Drogas , Inhibidores de Poli(ADP-Ribosa) Polimerasas
2.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36681902

RESUMEN

Identification of potential targets for known bioactive compounds and novel synthetic analogs is of considerable significance. In silico target fishing (TF) has become an alternative strategy because of the expensive and laborious wet-lab experiments, explosive growth of bioactivity data and rapid development of high-throughput technologies. However, these TF methods are based on different algorithms, molecular representations and training datasets, which may lead to different results when predicting the same query molecules. This can be confusing for practitioners in practical applications. Therefore, this study systematically evaluated nine popular ligand-based TF methods based on target and ligand-target pair statistical strategies, which will help practitioners make choices among multiple TF methods. The evaluation results showed that SwissTargetPrediction was the best method to produce the most reliable predictions while enriching more targets. High-recall similarity ensemble approach (SEA) was able to find real targets for more compounds compared with other TF methods. Therefore, SwissTargetPrediction and SEA can be considered as primary selection methods in future studies. In addition, the results showed that k = 5 was the optimal number of experimental candidate targets. Finally, a novel ensemble TF method based on consensus voting is proposed to improve the prediction performance. The precision of the ensemble TF method outperforms the individual TF method, indicating that the ensemble TF method can more effectively identify real targets within a given top-k threshold. The results of this study can be used as a reference to guide practitioners in selecting the most effective methods in computational drug discovery.


Asunto(s)
Algoritmos , Ligandos
3.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37080761

RESUMEN

Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.


Asunto(s)
Aprendizaje Profundo , Transcriptoma , Teorema de Bayes , Perfilación de la Expresión Génica , Comunicación Celular
4.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37344167

RESUMEN

Adverse drug events (ADEs) are common in clinical practice and can cause significant harm to patients and increase resource use. Natural language processing (NLP) has been applied to automate ADE detection, but NLP systems become less adaptable when drug entities are missing or multiple medications are specified in clinical narratives. Additionally, no Chinese-language NLP system has been developed for ADE detection due to the complexity of Chinese semantics, despite ˃10 million cases of drug-related adverse events occurring annually in China. To address these challenges, we propose DKADE, a deep learning and knowledge graph-based framework for identifying ADEs. DKADE infers missing drug entities and evaluates their correlations with ADEs by combining medication orders and existing drug knowledge. Moreover, DKADE can automatically screen for new adverse drug reactions. Experimental results show that DKADE achieves an overall F1-score value of 91.13%. Furthermore, the adaptability of DKADE is validated using real-world external clinical data. In summary, DKADE is a powerful tool for studying drug safety and automating adverse event monitoring.


Asunto(s)
Aprendizaje Profundo , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Reconocimiento de Normas Patrones Automatizadas , Semántica , Procesamiento de Lenguaje Natural
5.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36642412

RESUMEN

Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.


Asunto(s)
Proteínas , Proteínas/metabolismo , Bases de Datos Factuales , Ligandos , Simulación del Acoplamiento Molecular , Unión Proteica
6.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37401373

RESUMEN

Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.


Asunto(s)
Inteligencia Artificial , Redes Neurales de la Computación , Humanos , Interacciones Farmacológicas , Procesamiento de Lenguaje Natural , Descubrimiento de Drogas
7.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38243703

RESUMEN

MOTIVATION: Spatial clustering is essential and challenging for spatial transcriptomics' data analysis to unravel tissue microenvironment and biological function. Graph neural networks are promising to address gene expression profiles and spatial location information in spatial transcriptomics to generate latent representations. However, choosing an appropriate graph deep learning module and graph neural network necessitates further exploration and investigation. RESULTS: In this article, we present GRAPHDeep to assemble a spatial clustering framework for heterogeneous spatial transcriptomics data. Through integrating 2 graph deep learning modules and 20 graph neural networks, the most appropriate combination is decided for each dataset. The constructed spatial clustering method is compared with state-of-the-art algorithms to demonstrate its effectiveness and superiority. The significant new findings include: (i) the number of genes or proteins of spatial omics data is quite crucial in spatial clustering algorithms; (ii) the variational graph autoencoder is more suitable for spatial clustering tasks than deep graph infomax module; (iii) UniMP, SAGE, SuperGAT, GATv2, GCN, and TAG are the recommended graph neural networks for spatial clustering tasks; and (iv) the used graph neural network in the existent spatial clustering frameworks is not the best candidate. This study could be regarded as desirable guidance for choosing an appropriate graph neural network for spatial clustering. AVAILABILITY AND IMPLEMENTATION: The source code of GRAPHDeep is available at https://github.com/narutoten520/GRAPHDeep. The studied spatial omics data are available at https://zenodo.org/record/8141084.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Redes Neurales de la Computación , Programas Informáticos , Análisis por Conglomerados
8.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35212357

RESUMEN

Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature.


Asunto(s)
Aprendizaje Profundo , Estructura Molecular , Redes Neurales de la Computación
9.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34849567

RESUMEN

MOTIVATION: Understanding chemical-gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. RESULTS: We developed BioNet, a deep biological networkmodel with a graph encoder-decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.


Asunto(s)
Biología Computacional , Simulación por Computador , Modelos Biológicos , Redes Neurales de la Computación
10.
J Chem Inf Model ; 64(8): 3222-3236, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38498003

RESUMEN

Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.


Asunto(s)
Inteligencia Artificial , Microsomas Hepáticos , Microsomas Hepáticos/metabolismo , Animales , Ratones , Ratas , Humanos , Aprendizaje Automático , Descubrimiento de Drogas/métodos , Preparaciones Farmacéuticas/metabolismo , Preparaciones Farmacéuticas/química
11.
J Chem Inf Model ; 64(1): 96-109, 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38132638

RESUMEN

Detecting drug-drug interactions (DDIs) is an essential step in drug development and drug administration. Given the shortcomings of current experimental methods, the machine learning (ML) approach has become a reliable alternative, attracting extensive attention from the academic and industrial fields. With the rapid development of computational science and the growing popularity of cross-disciplinary research, a large number of DDI prediction studies based on ML methods have been published in recent years. To give an insight into the current situation and future direction of DDI prediction research, we systemically review these studies from three aspects: (1) the classic DDI databases, mainly including databases of drugs, side effects, and DDI information; (2) commonly used drug attributes, which focus on chemical, biological, and phenotypic attributes for representing drugs; (3) popular ML approaches, such as shallow learning-based, deep learning-based, recommender system-based, and knowledge graph-based methods for DDI detection. For each section, related studies are described, summarized, and compared, respectively. In the end, we conclude the research status of DDI prediction based on ML methods and point out the existing issues, future challenges, potential opportunities, and subsequent research direction.


Asunto(s)
Bases del Conocimiento , Aprendizaje Automático , Interacciones Farmacológicas , Preparaciones Farmacéuticas , Bases de Datos Factuales
12.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32496540

RESUMEN

Scoring functions (SFs) based on complex machine learning (ML) algorithms have gradually emerged as a promising alternative to overcome the weaknesses of classical SFs. However, extensive efforts have been devoted to the development of SFs based on new protein-ligand interaction representations and advanced alternative ML algorithms instead of the energy components obtained by the decomposition of existing SFs. Here, we propose a new method named energy auxiliary terms learning (EATL), in which the scoring components are extracted and used as the input for the development of three levels of ML SFs including EATL SFs, docking-EATL SFs and comprehensive SFs with ascending VS performance. The EATL approach not only outperforms classical SFs for the absolute performance (ROC) and initial enrichment (BEDROC) but also yields comparable performance compared with other advanced ML-based methods on the diverse subset of Directory of Useful Decoys: Enhanced (DUD-E). The test on the relatively unbiased actives as decoys (AD) dataset also proved the effectiveness of EATL. Furthermore, the idea of learning from SF components to yield improved screening power can also be extended to other docking programs and SFs available.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Simulación del Acoplamiento Molecular , Proteínas/química , Unión Proteica
13.
Brief Bioinform ; 22(1): 474-484, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31885044

RESUMEN

BACKGROUND: With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. RESULTS: We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. CONCLUSION: BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.


Asunto(s)
Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Manejo de Datos/métodos , Bases de Datos de Compuestos Químicos , Bases de Datos Genéticas , Humanos
14.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32892221

RESUMEN

BACKGROUND: High-throughput screening (HTS) and virtual screening (VS) have been widely used to identify potential hits from large chemical libraries. However, the frequent occurrence of 'noisy compounds' in the screened libraries, such as compounds with poor drug-likeness, poor selectivity or potential toxicity, has greatly weakened the enrichment capability of HTS and VS campaigns. Therefore, the development of comprehensive and credible tools to detect noisy compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS: In this study, we developed a freely available integrated python library for negative design, called Scopy, which supports the functions of data preparation, calculation of descriptors, scaffolds and screening filters, and data visualization. The current version of Scopy can calculate 39 basic molecular properties, 3 comprehensive molecular evaluation scores, 2 types of molecular scaffolds, 6 types of substructure descriptors and 2 types of fingerprints. A number of important screening rules are also provided by Scopy, including 15 drug-likeness rules (13 drug-likeness rules and 2 building block rules), 8 frequent hitter rules (four assay interference substructure filters and four promiscuous compound substructure filters), and 11 toxicophore filters (five human-related toxicity substructure filters, three environment-related toxicity substructure filters and three comprehensive toxicity substructure filters). Moreover, this library supports four different visualization functions to help users to gain a better understanding of the screened data, including basic feature radar chart, feature-feature-related scatter diagram, functional group marker gram and cloud gram. CONCLUSION: Scopy provides a comprehensive Python package to filter out compounds with undesirable properties or substructures, which will benefit the design of high-quality chemical libraries for drug design and discovery. It is freely available at https://github.com/kotori-y/Scopy.


Asunto(s)
Bases de Datos Farmacéuticas/estadística & datos numéricos , Diseño de Fármacos , Desarrollo de Medicamentos/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Bibliotecas de Moléculas Pequeñas , Productos Biológicos/química , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Estabilidad de Medicamentos , Humanos , Estructura Molecular , Preparaciones Farmacéuticas/química , Reproducibilidad de los Resultados , Proyectos de Investigación
15.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34427296

RESUMEN

Computational methods have become indispensable tools to accelerate the drug discovery process and alleviate the excessive dependence on time-consuming and labor-intensive experiments. Traditional feature-engineering approaches heavily rely on expert knowledge to devise useful features, which could be costly and sometimes biased. The emerging deep learning (DL) methods deliver a data-driven method to automatically learn expressive representations from complex raw data. Inspired by this, researchers have attempted to apply various deep neural network models to simplified molecular input line entry specification (SMILES) strings, which contain all the composition and structure information of molecules. However, current models usually suffer from the scarcity of labeled data. This results in a low generalization ability of SMILES-based DL models, which prevents them from competing with the state-of-the-art computational methods. In this study, we utilized the BiLSTM (bidirectional long short term merory) attention network (BAN) in which we employed a novel multi-step attention mechanism to facilitate the extracting of key features from the SMILES strings. Meanwhile, SMILES enumeration was utilized as a data augmentation method in the training phase to substantially increase the number of labeled data and enlarge the probability of mining more patterns from complex SMILES. We again took advantage of SMILES enumeration in the prediction phase to rectify model prediction bias and provide a more accurate prediction. Combined with the BAN model, our strategies can greatly improve the performance of latent features learned from SMILES strings. In 11 canonical absorption, distribution, metabolism, excretion and toxicity-related tasks, our method outperformed the state-of-the-art approaches.


Asunto(s)
Quimioinformática/métodos , Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Programas Informáticos , Algoritmos , Desarrollo de Medicamentos , Proyectos de Investigación
16.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33418563

RESUMEN

Matched molecular pairs analysis (MMPA) has become a powerful tool for automatically and systematically identifying medicinal chemistry transformations from compound/property datasets. However, accurate determination of matched molecular pair (MMP) transformations largely depend on the size and quality of existing experimental data. Lack of high-quality experimental data heavily hampers the extraction of more effective medicinal chemistry knowledge. Here, we developed a new strategy called quantitative structure-activity relationship (QSAR)-assisted-MMPA to expand the number of chemical transformations and took the logD7.4 property endpoint as an example to demonstrate the reliability of the new method. A reliable logD7.4 consensus prediction model was firstly established, and its applicability domain was strictly assessed. By applying the reliable logD7.4 prediction model to screen two chemical databases, we obtained more high-quality logD7.4 data by defining a strict applicability domain threshold. Then, MMPA was performed on the predicted data and experimental data to derive more chemical rules. To validate the reliability of the chemical rules, we compared the magnitude and directionality of the property changes of the predicted rules with those of the measured rules. Then, we compared the novel chemical rules generated by our proposed approach with the published chemical rules, and found that the magnitude and directionality of the property changes were consistent, indicating that the proposed QSAR-assisted-MMPA approach has the potential to enrich the collection of rule types or even identify completely novel rules. Finally, we found that the number of the MMP rules derived from the experimental data could be amplified by the predicted data, which is helpful for us to analyze the medicinal chemical rules in local chemical environment. In summary, the proposed QSAR-assisted-MMPA approach could be regarded as a very promising strategy to expand the chemical transformation space for lead optimization, especially when no enough experimental data can support MMPA.


Asunto(s)
Técnicas de Química Sintética/métodos , Química Farmacéutica/métodos , Descubrimiento de Drogas/métodos , Drogas en Investigación/síntesis química , Modelos Estadísticos , Biotransformación , Bases de Datos de Compuestos Químicos , Conjuntos de Datos como Asunto , Descubrimiento de Drogas/estadística & datos numéricos , Drogas en Investigación/metabolismo , Humanos , Estructura Molecular , Relación Estructura-Actividad Cuantitativa , Reproducibilidad de los Resultados
17.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33709154

RESUMEN

BACKGROUND: Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. RESULTS: In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. CONCLUSION: PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash.


Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Aprendizaje Automático , Programas Informáticos , Pruebas de Carcinogenicidad/métodos , Carcinógenos , Ensayos de Selección de Medicamentos Antitumorales/métodos , Humanos
18.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33951729

RESUMEN

MOTIVATION: Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability. RESULTS: In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks.


Asunto(s)
Modelos Teóricos , Redes Neurales de la Computación
19.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33201188

RESUMEN

BACKGROUND: Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery. RESULTS: In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection. CONCLUSION: ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/.


Asunto(s)
Descubrimiento de Drogas , Colorantes Fluorescentes/química , Aprendizaje Automático , Modelos Químicos , Bibliotecas de Moléculas Pequeñas , Fluorescencia
20.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33940596

RESUMEN

The poly (ADP-ribose) polymerase-1 (PARP1) has been regarded as a vital target in recent years and PARP1 inhibitors can be used for ovarian and breast cancer therapies. However, it has been realized that most of PARP1 inhibitors have disadvantages of low solubility and permeability. Therefore, by discovering more molecules with novel frameworks, it would have greater opportunities to apply it into broader clinical fields and have a more profound significance. In the present study, multiple virtual screening (VS) methods had been employed to evaluate the screening efficiency of ligand-based, structure-based and data fusion methods on PARP1 target. The VS methods include 2D similarity screening, structure-activity relationship (SAR) models, docking and complex-based pharmacophore screening. Moreover, the sum rank, sum score and reciprocal rank were also adopted for data fusion methods. The evaluation results show that the similarity searching based on Torsion fingerprint, six SAR models, Glide docking and pharmacophore screening using Phase have excellent screening performance. The best data fusion method is the reciprocal rank, but the sum score also performs well in framework enrichment. In general, the ligand-based VS methods show better performance on PARP1 inhibitor screening. These findings confirmed that adding ligand-based methods to the early screening stage will greatly improve the screening efficiency, and be able to enrich more highly active PARP1 inhibitors with diverse structures.


Asunto(s)
Bases de Datos de Compuestos Químicos , Simulación del Acoplamiento Molecular , Poli(ADP-Ribosa) Polimerasa-1/antagonistas & inhibidores , Inhibidores de Poli(ADP-Ribosa) Polimerasas/química , Evaluación Preclínica de Medicamentos , Humanos , Poli(ADP-Ribosa) Polimerasa-1/química , Relación Estructura-Actividad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA