Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 266
Filtrar
1.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37903414

RESUMEN

The drug discovery process can be significantly improved by applying deep reinforcement learning (RL) methods that learn to generate compounds with desired pharmacological properties. Nevertheless, RL-based methods typically condense the evaluation of sampled compounds into a single scalar value, making it difficult for the generative agent to learn the optimal policy. This work combines self-attention mechanisms and RL to generate promising molecules. The idea is to evaluate the relative significance of each atom and functional group in their interaction with the target, and to utilize this information for optimizing the Generator. Therefore, the framework for de novo drug design is composed of a Generator that samples new compounds combined with a Transformer-encoder and a biological affinity Predictor that evaluate the generated structures. Moreover, it takes the advantage of the knowledge encapsulated in the Transformer's attention weights to evaluate each token individually. We compared the performance of two output prediction strategies for the Transformer: standard and masked language model (MLM). The results show that the MLM Transformer is more effective in optimizing the Generator compared with the state-of-the-art works. Additionally, the evaluation models identified the most important regions of each molecule for the biological interaction with the target. As a case study, we generated synthesizable hit compounds that can be putative inhibitors of the enzyme ubiquitin-specific protein 7 (USP7).


Asunto(s)
Diseño de Fármacos , Aprendizaje , Descubrimiento de Drogas
2.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38033290

RESUMEN

Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.


Asunto(s)
Descubrimiento de Drogas , Intuición , Humanos , Aprendizaje
3.
BMC Bioinformatics ; 25(1): 225, 2024 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-38926641

RESUMEN

PURPOSE: Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the ability to decode SMILES strings into vector representations. METHOD: We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks, focusing on two key applications: molecular property prediction and drug-drug interaction prediction. RESULTS: We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to pre-trained models on SMILES in molecular prediction tasks and outperform the pre-trained models for the DDI prediction tasks. CONCLUSION: The performance of LLMs in generating SMILES embeddings shows great potential for further investigation of these models for molecular embedding. We hope our study bridges the gap between LLMs and molecular embedding, motivating additional research into the potential of LLMs in the molecular representation field. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-GPT .


Asunto(s)
Quimioinformática , Quimioinformática/métodos , Interacciones Farmacológicas , Estructura Molecular
4.
BMC Bioinformatics ; 25(1): 255, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39090573

RESUMEN

BACKGROUND: Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized. RESULTS: This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs. CONCLUSION: The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.


Asunto(s)
Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Inteligencia Artificial , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Simulación por Computador , Preparaciones Farmacéuticas/metabolismo , Preparaciones Farmacéuticas/química
5.
BMC Bioinformatics ; 25(1): 47, 2024 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-38291362

RESUMEN

Drug-drug interactions (DDI) are a critical concern in healthcare due to their potential to cause adverse effects and compromise patient safety. Supervised machine learning models for DDI prediction need to be optimized to learn abstract, transferable features, and generalize to larger chemical spaces, primarily due to the scarcity of high-quality labeled DDI data. Inspired by recent advances in computer vision, we present SMR-DDI, a self-supervised framework that leverages contrastive learning to embed drugs into a scaffold-based feature space. Molecular scaffolds represent the core structural motifs that drive pharmacological activities, making them valuable for learning informative representations. Specifically, we pre-trained SMR-DDI on a large-scale unlabeled molecular dataset. We generated augmented views for each molecule via SMILES enumeration and optimized the embedding process through contrastive loss minimization between views. This enables the model to capture relevant and robust molecular features while reducing noise. We then transfer the learned representations for the downstream prediction of DDI. Experiments show that the new feature space has comparable expressivity to state-of-the-art molecular representations and achieved competitive DDI prediction results while training on less data. Additional investigations also revealed that pre-training on more extensive and diverse unlabeled molecular datasets improved the model's capability to embed molecules more effectively. Our results highlight contrastive learning as a promising approach for DDI prediction that can identify potentially hazardous drug combinations using only structural information.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Interacciones Farmacológicas , Aprendizaje Automático Supervisado
6.
J Comput Chem ; 45(27): 2308-2317, 2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-38850166

RESUMEN

Here, TS-tools is presented, a Python package facilitating the automated localization of transition states (TS) based on a textual reaction SMILES input. TS searches can either be performed at xTB or DFT level of theory, with the former yielding guesses at marginal computational cost, and the latter directly yielding accurate structures at greater expense. On a benchmarking dataset of mono- and bimolecular reactions, TS-tools reaches an excellent success rate of 95% already at xTB level of theory. For tri- and multimolecular reaction pathways - which are typically not benchmarked when developing new automated TS search approaches, yet are relevant for various types of reactivity, cf. solvent- and autocatalysis and enzymatic reactivity - TS-tools retains its ability to identify TS geometries, though a DFT treatment becomes essential in many cases. Throughout the presented applications, a particular emphasis is placed on solvation-induced mechanistic changes, another issue that received limited attention in the automated TS search literature so far.

7.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35062019

RESUMEN

In the past few decades, chronic hepatitis B caused by hepatitis B virus (HBV) has been one of the most serious diseases to human health. The development of innovative systems is essential for preventing the complex pathogenesis of hepatitis B and reducing side effects caused by drugs. HBV inhibitory drugs have been developed through various compounds, and they are often limited by routine experimental screening and delay drug development. More recently, virtual screening of compounds has gradually been used in drug research with strong computational capability and is further applied in anti-HBV drug screening, thus facilitating a reliable drug screening process. However, the lack of structural information in traditional compound analysis is an important hurdle for unsatisfactory efficiency in drug screening. Here, a natural language processing technique was adopted to analyze compound simplified molecular input line entry system strings. By using the targeted optimized word2vec model for pretraining, we can accurately represent the relationship between the compound and its substructure. The machine learning model based on training results can effectively predict the inhibitory effect of compounds on HBV and liver toxicity. The reliability of the model is verified by the results of wet-lab experiments. In addition, a tool has been published to predict potential compounds. Hence, this article provides a new perspective on the prediction of compound properties for anti-HBV drugs that can help improve hepatitis B diagnosis and further develop human health in the future.


Asunto(s)
Virus de la Hepatitis B , Hepatitis B , Antivirales/farmacología , Antivirales/uso terapéutico , Descubrimiento de Drogas/métodos , Hepatitis B/tratamiento farmacológico , Humanos , Reproducibilidad de los Resultados
8.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35325050

RESUMEN

DNA N6-methyladenine (6mA) is produced by the N6 position of the adenine being methylated, which occurs at the molecular level, and is involved in numerous vital biological processes in the rice genome. Given the shortcomings of biological experiments, researchers have developed many computational methods to predict 6mA sites and achieved good performance. However, the existing methods do not consider the occurrence mechanism of 6mA to extract features from the molecular structure. In this paper, a novel deep learning method is proposed by devising DNA molecular graph feature and residual block structure for 6mA sites prediction in rice, named MGF6mARice. Firstly, the DNA sequence is changed into a simplified molecular input line entry system (SMILES) format, which reflects chemical molecular structure. Secondly, for the molecular structure data, we construct the DNA molecular graph feature based on the principle of graph convolutional network. Then, the residual block is designed to extract higher level, distinguishable features from molecular graph features. Finally, the prediction module is used to obtain the result of whether it is a 6mA site. By means of 10-fold cross-validation, MGF6mARice outperforms the state-of-the-art approaches. Multiple experiments have shown that the molecular graph feature and residual block can promote the performance of MGF6mARice in 6mA prediction. To the best of our knowledge, it is the first time to derive a feature of DNA sequence by considering the chemical molecular structure. We hope that MGF6mARice will be helpful for researchers to analyze 6mA sites in rice.


Asunto(s)
Retraso en el Despertar Posanestésico , Oryza , Adenina , ADN/genética , Metilación de ADN , Retraso en el Despertar Posanestésico/genética , Oryza/genética
9.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35438145

RESUMEN

Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Bases del Conocimiento , Preparaciones Farmacéuticas , Proyectos de Investigación
10.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36002937

RESUMEN

The ability of a compound to permeate across the blood-brain barrier (BBB) is a significant factor for central nervous system drug development. Thus, for speeding up the drug discovery process, it is crucial to perform high-throughput screenings to predict the BBB permeability of the candidate compounds. Although experimental methods are capable of determining BBB permeability, they are still cost-ineffective and time-consuming. To complement the shortcomings of existing methods, we present a deep learning-based multi-model framework model, called Deep-B3, to predict the BBB permeability of candidate compounds. In Deep-B3, the samples are encoded in three kinds of features, namely molecular descriptors and fingerprints, molecular graph and simplified molecular input line entry system (SMILES) text notation. The pre-trained models were built to extract latent features from the molecular graph and SMILES. These features depicted the compounds in terms of tabular data, image and text, respectively. The validation results yielded from the independent dataset demonstrated that the performance of Deep-B3 is superior to that of the state-of-the-art models. Hence, Deep-B3 holds the potential to become a useful tool for drug development. A freely available online web-server for Deep-B3 was established at http://cbcb.cdutcm.edu.cn/deepb3/, and the source code and dataset of Deep-B3 are available at https://github.com/GreatChenLab/Deep-B3.


Asunto(s)
Barrera Hematoencefálica , Aprendizaje Profundo , Transporte Biológico , Fármacos del Sistema Nervioso Central/farmacología , Permeabilidad
11.
Pharm Res ; 41(3): 493-500, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38337105

RESUMEN

PURPOSE: In order to ensure that drug administration is safe during pregnancy, it is crucial to have the possibility to predict the placental permeability of drugs in humans. The experimental method which is most widely used for the said purpose is in vitro human placental perfusion, though the approach is highly expensive and time consuming. Quantitative structure-activity relationship (QSAR) modeling represents a powerful tool for the assessment of the drug placental transfer, and can be successfully employed to be an alternative in in vitro experiments. METHODS: The conformation-independent QSAR models covered in the present study were developed through the use of the SMILES notation descriptors and local molecular graph invariants. What is more, the Monte Carlo optimization method, was used in the test sets and the training sets as the model developer with three independent molecular splits. RESULTS: A range of different statistical parameters was used to validate the developed QSAR model, including the standard error of estimation, mean absolute error, root-mean-square error (RMSE), correlation coefficient, cross-validated correlation coefficient, Fisher ratio, MAE-based metrics and the correlation ideality index. Once the mentioned statistical methods were employed, an excellent predictive potential and robustness of the developed QSAR model was demonstrated. In addition, the molecular fragments, which are derived from the SMILES notation descriptors accounting for the decrease or increase in the investigated activity, were revealed. CONCLUSION: The presented QSAR modeling can be an invaluable tool for the high-throughput screening of the placental permeability of drugs.


Asunto(s)
Placenta , Relación Estructura-Actividad Cuantitativa , Femenino , Embarazo , Humanos , Modelos Moleculares , Método de Montecarlo , Permeabilidad
12.
J Chem Inf Model ; 64(16): 6259-6280, 2024 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-39136669

RESUMEN

Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.


Asunto(s)
Aprendizaje Automático , Descubrimiento de Drogas/métodos , Aprendizaje Profundo
13.
Arch Toxicol ; 98(8): 2647-2658, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38619593

RESUMEN

Cytochrome P450 enzymes are a superfamily of enzymes responsible for the metabolism of a variety of medicines and xenobiotics. Among the Cytochrome P450 family, five isozymes that include 1A2, 2C9, 2C19, 2D6, and 3A4 are most important for the metabolism of xenobiotics. Inhibition of any of these five CYP isozymes causes drug-drug interactions with high pharmacological and toxicological effects. So, the inhibition or non-inhibition prediction of these isozymes is of great importance. Many techniques based on machine learning and deep learning algorithms are currently being used to predict whether these isozymes will be inhibited or not. In this study, three different molecular or substructural properties that include Morgan, MACCS and Morgan (combined) and RDKit of the various molecules are used to train a distinct SVM model against each isozyme (1A2, 2C9, 2C19, 2D6, and 3A4). On the independent dataset, Morgan fingerprints provided the best results, while MACCS and Morgan (combined) achieved comparable results in terms of balanced accuracy (BA), sensitivity (Sn), and Mathews correlation coefficient (MCC). For the Morgan fingerprints, balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4 on an independent dataset ranged between 0.81 and 0.85, 0.61 and 0.70, 0.72 and 0.83, respectively. Similarly, on the independent dataset, MACCS and Morgan (combined) fingerprints achieved competitive results in terms of balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4, which ranged between 0.79 and 0.85, 0.59 and 0.69, 0.69 and 0.82, respectively.


Asunto(s)
Inhibidores Enzimáticos del Citocromo P-450 , Sistema Enzimático del Citocromo P-450 , Aprendizaje Automático , Inhibidores Enzimáticos del Citocromo P-450/farmacología , Sistema Enzimático del Citocromo P-450/metabolismo , Humanos , Isoenzimas/metabolismo , Interacciones Farmacológicas , Xenobióticos/toxicidad , Xenobióticos/metabolismo , Máquina de Vectores de Soporte
14.
Toxicol Mech Methods ; 34(7): 737-742, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38572596

RESUMEN

Models of toxicity to tadpoles have been developed as single parameters based on special descriptors which are sums of correlation weights, molecular features, and experimental conditions. This information is presented by quasi-SMILES. Fragments of local symmetry (FLS) are involved in the development of the model and the use of FLS correlation weights improves their predictive potential. In addition, the index of ideality correlation (IIC) and correlation intensity index (CII) are compared. These two potential predictive criteria were tested in models built through Monte Carlo optimization. The CII was more effective than IIC for the models considered here.


Asunto(s)
Larva , Método de Montecarlo , Relación Estructura-Actividad Cuantitativa , Larva/efectos de los fármacos , Larva/crecimiento & desarrollo , Animales , Anuros
15.
Angew Chem Int Ed Engl ; : e202412320, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39225193

RESUMEN

Circularly polarized luminescence (CPL) from chrial molecules is attracting much attention due to its potential in optical materials. However, formulation of CPL emitters as molecular solids typically deteriorates photophysical properties in the aggregated state leading to quenching and unpredictable changes in CPL behavior impeding materials development. To circumvent these shortcomings, a supramolecular approach can be used to isolate cationic dyes in a lattice of cyanostar-anion complexes that suppress aggregation-caused quenching and which we hypothesize can preserve chiroptical properties. Herein, we verify for the first time that supramolecular assembly of small-molecule, ionic isolation lattices (SMILES), allows translation of molecular ECD and CPL properties to solids. A series of cationic helicenes that display increasing chiroptical response, is investigated. Crystal structures of three different packing motifs all show spatial isolation of dyes by the anion complexes. We observe the photophysical and chiroptical properties of all helicenes are seamlessly translated to water soluble nanoparticles by the SMILES method. Also, a DMQA helicene is used as FRET acceptor in SMILES nanoparticles of intensely absorbing rhodamine antennae to generate an 18-fold boost in CPL brightness. These features offer promise for reliably accessing bright materials with programmable CPL properties.

16.
Angew Chem Int Ed Engl ; 63(37): e202408154, 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-38887967

RESUMEN

The radical Truce-Smiles rearrangement is a straightforward strategy for incorporating aryl groups into organic molecules for which asymmetric processes remains rare. By employing a readily available and non-expensive chiral auxiliary, we developed a highly efficient asymmetric photocatalytic acyl and alkyl radical Truce-Smiles rearrangement of α-substituted acrylamides using tetrabutylammonium decatungstate (TBADT) as a hydrogen atom-transfer photocatalyst, along with aldehydes or C-H containing precursors. The rearranged products exhibited excellent diastereoselectivities (7 : 1 to >98 : 2 d.r.) and chiral auxiliary was easily removed. Mechanistic studies allowed understanding the transformation in which density functional theory (DFT) calculations provided insights into the stereochemistry-determining step.

17.
Angew Chem Int Ed Engl ; 63(17): e202319158, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38506603

RESUMEN

An efficient asymmetric remote arylation of C(sp3)-H bonds under photoredox conditions is described here. The reaction features the addition radicals to a double bond followed by a site-selective radical translocation (1,n-hydrogen atom transfer) as well as a stereocontrolled aryl migration via sulfinyl-Smiles rearrangement furnishing a wide range of chiral α-arylated amides with up to >99 : 1 er. Mechanistic studies indicate that the sulfinamide group governs the stereochemistry of the product with the aryl migration being the rate determining step preceded by a kinetically favored 1,n-HAT process.

18.
J Comput Chem ; 44(2): 76-92, 2023 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-36264601

RESUMEN

Chemical yield is the percentage of the reactants converted to the desired products. Chemists use predictive algorithms to select high-yielding reactions and score synthesis routes, saving time and reagents. This study suggests a novel graph neural network architecture for chemical yield prediction. The network combines structural information about participants of the transformation as well as molecular and reaction-level descriptors. It works with incomplete chemical reactions and generates reactants-product atom mapping. We show that the network benefits from advanced information by comparing it with several machine learning models and molecular representations. Models included logistic regression, support vector machine, CatBoost, and Bidirectional Encoder Representations from Transformers. Molecular representations included extended-connectivity fingerprints, Morgan fingerprints, SMILESVec embeddings, and textual. Classification and regression objectives were assessed for each model and feature set. The goal of each classification model was to separate zero- and non-zero-yielding reactions. The models were trained and evaluated on a proprietary dataset of 10 reaction types. Also, the models were benchmarked on two public single reaction type datasets. The study was supplemented with analysis of data, results, and errors, as well as the impact of steric factors, side reactions, isolation, and purification efficiency. The supplementary code is available at https://github.com/SoftServeInc/yield-paper.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Aprendizaje Automático , Máquina de Vectores de Soporte
19.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34427296

RESUMEN

Computational methods have become indispensable tools to accelerate the drug discovery process and alleviate the excessive dependence on time-consuming and labor-intensive experiments. Traditional feature-engineering approaches heavily rely on expert knowledge to devise useful features, which could be costly and sometimes biased. The emerging deep learning (DL) methods deliver a data-driven method to automatically learn expressive representations from complex raw data. Inspired by this, researchers have attempted to apply various deep neural network models to simplified molecular input line entry specification (SMILES) strings, which contain all the composition and structure information of molecules. However, current models usually suffer from the scarcity of labeled data. This results in a low generalization ability of SMILES-based DL models, which prevents them from competing with the state-of-the-art computational methods. In this study, we utilized the BiLSTM (bidirectional long short term merory) attention network (BAN) in which we employed a novel multi-step attention mechanism to facilitate the extracting of key features from the SMILES strings. Meanwhile, SMILES enumeration was utilized as a data augmentation method in the training phase to substantially increase the number of labeled data and enlarge the probability of mining more patterns from complex SMILES. We again took advantage of SMILES enumeration in the prediction phase to rectify model prediction bias and provide a more accurate prediction. Combined with the BAN model, our strategies can greatly improve the performance of latent features learned from SMILES strings. In 11 canonical absorption, distribution, metabolism, excretion and toxicity-related tasks, our method outperformed the state-of-the-art approaches.


Asunto(s)
Quimioinformática/métodos , Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Programas Informáticos , Algoritmos , Desarrollo de Medicamentos , Proyectos de Investigación
20.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33003203

RESUMEN

Quorum sensing interference (QSI), the disruption and manipulation of quorum sensing (QS) in the dynamic control of bacteria populations could be widely applied in synthetic biology to realize dynamic metabolic control and develop potential clinical therapies. Conventionally, limited QSI molecules (QSIMs) were developed based on molecular structures or for specific QS receptors, which are in short supply for various interferences and manipulations of QS systems. In this study, we developed QSIdb (http://qsidb.lbci.net/), a specialized repository of 633 reported QSIMs and 73 073 expanded QSIMs including both QS agonists and antagonists. We have collected all reported QSIMs in literatures focused on the modifications of N-acyl homoserine lactones, natural QSIMs and synthetic QS analogues. Moreover, we developed a pipeline with SMILES-based similarity assessment algorithms and docking-based validations to mine potential QSIMs from existing 138 805 608 compounds in the PubChem database. In addition, we proposed a new measure, pocketedit, for assessing the similarities of active protein pockets or QSIMs crosstalk, and obtained 273 possible potential broad-spectrum QSIMs. We provided user-friendly browsing and searching facilities for easy data retrieval and comparison. QSIdb could assist the scientific community in understanding QS-related therapeutics, manipulating QS-based genetic circuits in metabolic engineering, developing potential broad-spectrum QSIMs and expanding new ligands for other receptors.


Asunto(s)
Bacterias/química , Bases de Datos de Compuestos Químicos , Percepción de Quorum , 4-Butirolactona/análogos & derivados , 4-Butirolactona/química , 4-Butirolactona/metabolismo , Bacterias/metabolismo
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda