Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Bioinformatics ; 38(19): 4573-4580, 2022 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-35961025

RESUMEN

MOTIVATION: Extracting useful molecular features is essential for molecular property prediction. Atom-level representation is a common representation of molecules, ignoring the sub-structure or branch information of molecules to some extent; however, it is vice versa for the substring-level representation. Both atom-level and substring-level representations may lose the neighborhood or spatial information of molecules. While molecular graph representation aggregating the neighborhood information of a molecule has a weak ability in expressing the chiral molecules or symmetrical structure. In this article, we aim to make use of the advantages of representations in different granularities simultaneously for molecular property prediction. To this end, we propose a fusion model named MultiGran-SMILES, which integrates the molecular features of atoms, sub-structures and graphs from the input. Compared with the single granularity representation of molecules, our method leverages the advantages of various granularity representations simultaneously and adjusts the contribution of each type of representation adaptively for molecular property prediction. RESULTS: The experimental results show that our MultiGran-SMILES method achieves state-of-the-art performance on BBBP, LogP, HIV and ClinTox datasets. For the BACE, FDA and Tox21 datasets, the results are comparable with the state-of-the-art models. Moreover, the experimental results show that the gains of our proposed method are bigger for the molecules with obvious functional groups or branches. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this work are available on GitHub at https://github. com/Jiangjing0122/MultiGran. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Appl Intell (Dordr) ; 53(12): 15246-15260, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36405344

RESUMEN

Molecular property prediction is an essential but challenging task in drug discovery. The recurrent neural network (RNN) and Transformer are the mainstream methods for sequence modeling, and both have been successfully applied independently for molecular property prediction. As the local information and global information of molecules are very important for molecular properties, we aim to integrate the bi-directional gated recurrent unit (BiGRU) into the original Transformer encoder, together with self-attention to better capture local and global molecular information simultaneously. To this end, we propose the TranGRU approach, which encodes the local and global information of molecules by using the BiGRU and self-attention, respectively. Then, we use a gate mechanism to reasonably fuse the two molecular representations. In this way, we enhance the ability of the proposed model to encode both local and global molecular information. Compared to the baselines and state-of-the-art methods when treating each task as a single-task classification on Tox21, the proposed approach outperforms the baselines on 9 out of 12 tasks and state-of-the-art methods on 5 out of 12 tasks. TranGRU also obtains the best ROC-AUC scores on BBBP, FDA, LogP, and Tox21 (multitask classification) and has a comparable performance on ToxCast, BACE, and ecoli. On the whole, TranGRU achieves better performance for molecular property prediction. The source code is available in GitHub: https://github.com/Jiangjing0122/TranGRU.

3.
J Chem Inf Model ; 62(17): 4122-4133, 2022 09 12.
Artículo en Inglés | MEDLINE | ID: mdl-36036609

RESUMEN

To develop a realistic electrostatic model that allows for the anisotropy of the atomic electron density, high-rank atomic multipole moments computed by quantum chemical calculations have been studied extensively. However, it is hard to process huge RNA systems only relying on quantum chemical calculations due to its highly computational cost. In this study, we employ five machine learning methods of Gaussian process regression with automatic relevance determination (ARDGPR), Kriging, radial basis function neural networks, Bagging, and generalized regression neural network to predict atomic multipole moments. Atom-atom electrostatic interaction energies are subsequently computed using the predicted atomic multipole moments in the pilot system pentose of RNA. Here, the performance of the five methods is compared in terms of both the multipole moment prediction errors and the electrostatic energy prediction errors. For the predicted high-rank multipole moments of the four elements (O, C, N, and H) in capped pentose, ARDGPR and Kriging consistently outperform the other three methods. Therefore, the multipole moments predicted by the two best methods of ARDGPR and Kriging are then used to predict electrostatic interaction energy of each pentose. Finally, the absolute average energy errors of ARDGPR and Kriging are 1.83 and 4.33 kJ mol-1, respectively. Compared to Kriging, the ARDGPR method achieves a 58% decrease in the absolute average energy error. These satisfactory results demonstrated that the ARDGPR method with the strong feature extraction ability can predict the electrostatic interaction energy of pentose in RNA correctly and reliably.


Asunto(s)
Pentosas , ARN , Aprendizaje Automático , Distribución Normal , Electricidad Estática
4.
J Comput Chem ; 42(11): 771-786, 2021 04 30.
Artículo en Inglés | MEDLINE | ID: mdl-33586809

RESUMEN

Molecular dynamics (MD) simulations that rely on force field methods has been widely used to explore the structure and function of RNAs. However, the current commonly used force fields are limited by the electrostatic description offered by atomic charge, dipole and at most quadrupole moments, failing to capture the anisotropic picture of electronic features. Actually, the distribution of electrons around atomic nuclei is not spherically symmetric but is geometry dependent. A multipolar electrostatic model based on high rank multipole moments is described in this work, which allows us to combine polarizability and anisotropy of electron density. RNA secondary structure was taken as a research system, and its substructures including stem, loops (hairpin loop, bulge loop, internal loop, and multi-branch loop), and pseudoknots (H-type and K-type) were investigated, respectively, as well as the hairpin. First, the atom-atom electrostatic properties derived from one chain of a duplex RNA 2MVY in our previous work (Ref. 58) were measured by the pilot RNA systems of hairpin, hairpin loop, stem, and H-type pseudoknot, respectively. The prediction results were not satisfactory. Consequently, to obtain a general set of electrostatic parameters for RNA force fields, the convergence behavior of the atom-atom electrostatic interactions in the pilot RNA systems was explored using high rank atomic multipole moments. The pilot RNA systems were cut into four types of different-sized molecular fragments, and the single nucleotide fragment and nucleotide-paired fragment proved to be the most reasonable systems for base-unpairing regions and base-pairing regions to investigate the convergence behavior of all types of atom-atom electrostatic interactions, respectively. Transferability of the electrostatic properties drawn from the pilot RNA systems to the corresponding test systems was also investigated. Furthermore, the convergence behavior of atomic electrostatic interactions in other substructures including bulge loop, internal loop, multi-branch loop, and K-type pseudoknot was expected to be modeled via the hairpin.


Asunto(s)
ARN/química , Modelos Moleculares , Simulación de Dinámica Molecular , Conformación de Ácido Nucleico , Teoría Cuántica , Electricidad Estática , Termodinámica
5.
Ecotoxicol Environ Saf ; 172: 373-379, 2019 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-30731268

RESUMEN

Considering the large-scale production of diversified nanomaterials, it is paramount importance to unravel the structural details of interactions between nanoparticles and biological systems, and thus to explore the potential adverse impacts of nanoparticles. Estrogen receptors (ER) is one of the most important receptor of human reproductive system and the binding of carbon nanotubes to estrogen receptors was the possible trigger leading to the reproductive toxicity of carbon nanotubes. Thus, with single-walled carbon nanotube (SWCNT) treated as model nanomaterials, a combination of in vivo experiments, spectroscopy assay and molecular dynamic modeling was applied to help us unravel some important issues on the binding characterization between SWCNT and the ligand binding domain (LBD) of ER alpha (ERα). The fluorescence assay and molecular dynamics simulations together validated the binding of SWCNT to ERα, suggesting the possible molecular initiating event. As a consequence, SWCNT binding led to a conformational change on tertiary structure levels and hydrophobic interaction was recognized as the driving force governing the binding behavior between SWCNT and LBD of ERα. A in vivo process presented that the exposure of SWCNT increased ERα expression from 26.43 pg/ml to 259.01 pg/ml, suggesting a potential estrogen interference effects of SWCNT. Our study offers insight on the binding of SWCNT and ERα LBD at atomic level, helpful to accurately evaluate the potential health risks of SWCNT.


Asunto(s)
Simulación de Dinámica Molecular , Nanotubos de Carbono/química , Receptores de Estrógenos/metabolismo , Animales , Estradiol/sangre , Femenino , Fluorescencia , Estructura Molecular , Conformación Proteica , Ratas , Ratas Sprague-Dawley
6.
J Chem Inf Model ; 58(11): 2239-2254, 2018 11 26.
Artículo en Inglés | MEDLINE | ID: mdl-30362754

RESUMEN

Computational investigations of RNA properties often rely on a molecular mechanical approach to define molecular potential energy. Force fields for RNA typically employ a point charge model of electrostatics, which does not provide a realistic quantum-mechanical picture. In reality, electron distributions around nuclei are not spherically symmetric and are geometry dependent. A multipole expansion method which allows for incorporation of polarizability and anisotropy in a force field is described, and its applicability to modeling the behavior of RNA molecules is investigated. Transferability of the model, critical for force field development, is also investigated.


Asunto(s)
ARN/química , Electrones , Enlace de Hidrógeno , Simulación de Dinámica Molecular , Teoría Cuántica , Electricidad Estática , Termodinámica
7.
J Comput Chem ; 35(5): 343-59, 2014 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-24449043

RESUMEN

Accurate electrostatics necessitates the use of multipole moments centered on nuclei or extra point charges centered away from the nuclei. Here, we follow the former alternative and investigate the convergence behavior of atom-atom electrostatic interactions in the pilot protein crambin. Amino acids are cut out from a Protein Data Bank structure of crambin, as single amino acids, di, or tripeptides, and are then capped with a peptide bond at each side. The atoms in the amino acids are defined through Quantum Chemical Topology (QCT) as finite volume electron density fragments. Atom-atom electrostatic energies are computed by means of a multipole expansion with regular spherical harmonics, up to a total interaction rank of L = ℓA+ ℓB + 1 = 10. The minimum internuclear distance in the convergent region of all the 15 possible types of atom-atom interactions in crambin that were calculated based on single amino acids are close to the values calculated from di and tripeptides. Values obtained at B3LYP/aug-cc-pVTZ and MP2/aug-cc-pVTZ levels are only slightly larger than those calculated at HF/6-31G(d,p) level. This convergence behavior is transferable to the well-known amyloid beta polypeptide Aß1-42. Moreover, for a selected central atom, the influence of its neighbors on its multipole moments is investigated, and how far away this influence can be ignored is also determined. Finally, the convergence behavior of AMBER becomes closer to that of QCT with increasing internuclear distance.


Asunto(s)
Proteínas de Plantas/química , Electricidad Estática , Modelos Moleculares , Teoría Cuántica
8.
J Phys Chem A ; 118(36): 7876-91, 2014 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-25084473

RESUMEN

Energy minima of the 20 natural amino acids (capped by a peptide bond at both the N and the C termini, CH3-C(═O)-N(H)-(H)Cα(R)-C(═O)-N(H)-CH3), were obtained by ab initio geometry optimization. Starting with a large number of minima, quickly generated by MarvinView, geometry optimization at the HF/6-31G(d,p) level of theory reduced the number of minima, followed by further optimization at the B3LYP/apc-1 and MP2/cc-pVDZ levels, which caused some minima to disappear and some stable minima to migrate on the Ramachandran map. There is a relation between the number of minima and the size and the flexibility of the side chain. The energy minima of the 20 amino acids are mainly located in the regions of ßL, γL, δL, and αL of the Ramachandran map. Multipole moments of atoms occurring in the fragment [-NH-Cα-C(═O)-] common to all 20 amino acids were calculated at the three levels of theory mentioned above. The near parallelism in behavior of these moments between levels of theory is beneficial toward estimating moments with the more expensive B3LYP and MP2 methods from data calculated with the cheaper HF method. Finally, we explored the transferability of properties between different amino acids: the bond length and angles of the common fragment [-NH-Cα(HαCß)-C'(═O)-] in all amino acids except Gly and Pro. All bond lengths are highly transferable between different amino acids, and the standard deviations are small.


Asunto(s)
Aminoácidos/química , Modelos Moleculares , Conformación Proteica , Enlace de Hidrógeno , Teoría Cuántica , Termodinámica
9.
Comput Biol Med ; 182: 109207, 2024 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-39341115

RESUMEN

Precise estimations of RNA secondary structures have the potential to reveal the various roles that non-coding RNAs play in regulating cellular activity. However, the mainstay of traditional RNA secondary structure prediction methods relies on thermos-dynamic models via free energy minimization, a laborious process that requires a lot of prior knowledge. Here, RNA secondary structure prediction using Wfold, an end-to-end deep learning-based approach, is suggested. Wfold is trained directly on annotated data and base-pairing criteria. It makes use of an image-like representation of RNA sequences, which an enhanced U-net incorporated with a transformer encoder can process effectively. Wfold eventually increases the accuracy of RNA secondary structure prediction by combining the benefits of self-attention mechanism's mining of long-range information with U-net's ability to gather local information. We compare Wfold's performance using RNA datasets that are within and across families. When trained and evaluated on different RNA families, it achieves a similar performance as the traditional methods, but dramatically outperforms the state-of-the-art methods on within-family datasets. Moreover, Wfold can also reliably forecast pseudoknots. The findings imply that Wfold may be useful for improving sequence alignment, functional annotations, and RNA structure modeling.

10.
J Chem Theory Comput ; 20(7): 2947-2958, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38501645

RESUMEN

The ordered assembly of Tau protein into filaments characterizes Alzheimer's and other neurodegenerative diseases, and thus, stabilization of Tau protein is a promising avenue for tauopathies therapy. To dissect the underlying aggregation mechanisms on Tau, we employ a set of molecular simulations and the Markov state model to determine the kinetics of ensemble of K18. K18 is the microtubule-binding domain of Tau protein and plays a vital role in the microtubule assembly, recycling processes, and amyloid fibril formation. Here, we efficiently explore the conformation of K18 with about 150 µs lifetimes in silico. Our results observe that all four repeat regions (R1-R4) are very dynamic, featuring frequent conformational conversion and lacking stable conformations, and the R2 region is more flexible than the R1, R3, and R4 regions. Additionally, it is worth noting that residues 300-310 in R2-R3 and residues 319-336 in R3 tend to form sheet structures, indicating that K18 has a broader functional role than individual repeat monomers. Finally, the simulations combined with Markov state models and deep learning reveal 5 key conformational states along the transition pathway and provide the information on the microsecond time scale interstate transition rates. Overall, this study offers significant insights into the molecular mechanism of Tau pathological aggregation and develops novel strategies for both securing tauopathies and advancing drug discovery.


Asunto(s)
Aprendizaje Profundo , Melfalán , Tauopatías , gammaglobulinas , Humanos , Proteínas tau/metabolismo , Secuencia de Aminoácidos , Estructura Secundaria de Proteína
11.
Comput Biol Med ; 177: 108612, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38838556

RESUMEN

Alzheimer's disease (AD) is one of the most prevalent chronic neurodegenerative disorders globally, with a rapidly growing population of AD patients and currently no effective therapeutic interventions available. Consequently, the development of therapeutic anti-AD drugs and the identification of AD targets represent one of the most urgent tasks. In this study, in addition to considering known drugs and targets, we explore compound-protein interactions (CPIs) between compounds and proteins relevant to AD. We propose a deep learning model called CKG-IMC to predict Alzheimer's disease compound-protein interaction relationships. CKG-IMC comprises three modules: a collaborative knowledge graph (CKG), a principal neighborhood aggregation graph neural network (PNA), and an inductive matrix completion (IMC). The collaborative knowledge graph is used to learn semantic associations between entities, PNA is employed to extract structural features of the relationship network, and IMC is utilized for CPIs prediction. Compared with a total of 16 baseline models based on similarities, knowledge graphs, and graph neural networks, our model achieves state-of-the-art performance in experiments of 10-fold cross-validation and independent test. Furthermore, we use CKG-IMC to predict compounds interacting with two confirmed AD targets, 42-amino-acid ß-amyloid (Aß42) protein and microtubule-associated protein tau (tau protein), as well as proteins interacting with five FDA-approved anti-AD drugs. The results indicate that the majority of predictions are supported by literature, and molecular docking experiments demonstrate a strong affinity between the predicted compounds and targets.


Asunto(s)
Enfermedad de Alzheimer , Aprendizaje Profundo , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/tratamiento farmacológico , Humanos , Redes Neurales de la Computación , Mapas de Interacción de Proteínas , Biología Computacional/métodos
12.
Interdiscip Sci ; 16(3): 741-754, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38710957

RESUMEN

Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.


Asunto(s)
Aprendizaje Automático , Algoritmos , Modelos Moleculares
13.
J Mol Model ; 30(2): 26, 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38191945

RESUMEN

CONTEXT: The reaction between Na and HF is a typical harpooning reaction which is of great interest due to its significance in understanding the elementary chemical reaction kinetics. This work aims to investigate the detailed reaction mechanisms of sodium with hydrogen fluoride and the adsorption of HF on the resultant NaF as well as the (NaF)4 tetramer. The results suggest that the reaction between Na and HF leads to the formation of sodium fluoride salt NaF and hydrogen gas. Na interacts with HF to form a complex HF···Na, and then the approaching of F atom of HF to Na results in a transition state H···F···Na. Accompanied by the broken of H-F bond, the bond forms between F and Na atoms as NaF, then the product NaF is yielded due to the removal of H atom. The resultant NaF can further form (NaF)4 tetramer. The interaction of NaF with HF leads to the complex NaF···HF; the form I as well as II of (NaF)4 can interact with HF to produce two complexes (i.e., (NaF)4(I-1)···HF, (NaF)4(I-2)···HF, (NaF)4(II-1)···HF and (NaF)4(II-2)···HF), but the form III of (NaF)4 can interact with HF to produce only one complex (NaF)4(III)···HF. These complexes were explored in terms of noncovalent interaction (NCI) and quantum theory of atoms in molecules (QTAIM) analyses. NCI analyses confirm the existences of attractive interactions in the complexes HF···Na, NaF···HF, (NaF)4(I-1)···HF, (NaF)4(I-2)···HF, (NaF)4(II-1)···HF and (NaF)4(II-2)···HF, and (NaF)4(III)···HF. QTAIM analyses suggest that the F···Na interaction forms in the HF···Na complex while the F···H hydrogen bonds form in NaF···HF, (NaF)4(I-1)···HF, (NaF)4(I-2)···HF, (NaF)4(II-1)···HF and (NaF)4(II-2)···HF, and (NaF)4(III)···HF complexes. Natural bond orbital (NBO) analyses were also applied to analyze the intermolecular donor-acceptor orbital interactions in these complexes. These results would provide valuable insight into the chemical reaction of Na and HF and the adsorption interaction between sodium fluoride salt and HF. METHODS: The calculations were carried out at the M06-L/6-311++G(2d,2p) level of theory which were performed using the Gaussian16 program. Intrinsic reaction coordinate (IRC) calculations were carried out at the same level of theory to confirm that the obtained transition state was true. The molecular surface electrostatic potential (MSEP) was employed to understand how the complex forms. Quantum theory of atoms in molecules (QTAIM) and noncovalent interaction (NCI) analysis was used to know the topology parameters at bond critical points (BCPs) and intermolecular interactions in the complex and intermediate. The topology parameters and the BCP plots were obtained by the Multiwfn software.

14.
J Comput Chem ; 34(21): 1850-61, 2013 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-23720381

RESUMEN

We propose a generic method to model polarization in the context of high-rank multipolar electrostatics. This method involves the machine learning technique kriging, here used to capture the response of an atomic multipole moment of a given atom to a change in the positions of the atoms surrounding this atom. The atoms are malleable boxes with sharp boundaries, they do not overlap and exhaust space. The method is applied to histidine where it is able to predict atomic multipole moments (up to hexadecapole) for unseen configurations, after training on 600 geometries distorted using normal modes of each of its 24 local energy minima at B3LYP/apc-1 level. The quality of the predictions is assessed by calculating the Coulomb energy between an atom for which the moments have been predicted and the surrounding atoms (having exact moments). Only interactions between atoms separated by three or more bonds ("1, 4 and higher" interactions) are included in this energy error. This energy is compared with that of a central atom with exact multipole moments interacting with the same environment. The resulting energy discrepancies are summed for 328 atom-atom interactions, for each of the 29 atoms of histidine being a central atom in turn. For 80% of the 539 test configurations (outside the training set), this summed energy deviates by less than 1 kcal mol(-1).


Asunto(s)
Histidina/química , Modelos Químicos , Péptidos/química , Conformación Molecular , Electricidad Estática
15.
Comput Biol Chem ; 107: 107972, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37883905

RESUMEN

Accurately predicting protein-ligand binding affinities is crucial for determining molecular properties and understanding their physical effects. Neural networks and transformers are the predominant methods for sequence modeling, and both have been successfully applied independently for protein-ligand binding affinity prediction. As local and global information of molecules are vital for protein-ligand binding affinity prediction, we aim to combine bi-directional gated recurrent unit (BiGRU) and convolutional neural network (CNN) to effectively capture both local and global molecular information. Additionally, attention mechanisms can be incorporated to automatically learn and adjust the level of attention given to local and global information, thereby enhancing the performance of the model. To achieve this, we propose the PLAsformer approach, which encodes local and global information of molecules using 3DCNN and BiGRU with attention mechanism, respectively. This approach enhances the model's ability to encode comprehensive local and global molecular information. PLAsformer achieved a Pearson's correlation coefficient of 0.812 and a Root Mean Square Error (RMSE) of 1.284 when comparing experimental and predicted affinity on the PDBBind-2016 dataset. These results surpass the current state-of-the-art methods for binding affinity prediction. The high accuracy of PLAsformer's predictive performance, along with its excellent generalization ability, is clearly demonstrated by these findings.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Ligandos , Proteínas/química , Unión Proteica
16.
Mar Pollut Bull ; 196: 115675, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37864859

RESUMEN

Understanding the effects of pollution on reproductive performance and sexual selection is crucial for the conservation of biodiversity in an increasingly polluted world. The present study focused on the effect of environmental heavy metal pollution on sexually selected traits, including morphological characteristics and acoustic parameters, as well as mate choice in Strauchbufo raddei, an anuran species widely distributed in Northern China. The results showed that male courtship signals, including forelimb length, forelimb force, and advertisement calls, have evolved under the pressure of heavy metal pollution in young S. raddei. In addition, the breeding age was lower in the polluted areas, and younger individuals had more mating opportunities. However, males with heightened reproductive performance did not show the expected higher individual quality. The current study suggests that exposure to heavy metal pollution can induce stress in males, altering reproductive performance and further disrupting mate choice.


Asunto(s)
Metales Pesados , Humanos , Animales , Masculino , Metales Pesados/análisis , Bufonidae , Contaminación Ambiental , Reproducción , Fenotipo
17.
J Mol Graph Model ; 121: 108454, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36963306

RESUMEN

Simplified Molecular-Input Line-Entry System (SMILES) is one of a widely used molecular representation methods for molecular property prediction. We conjecture that all the characters in the SMILES string of a molecule are essential for making up the molecules, but most of them make little contribution to determining a particular property of the molecule. Therefore, we verified the conjecture in the pre-experiment. Motivated by the result, we propose to inject proper noisy information into the SMILES to augment the training data by increasing the diversity of the labeled molecules. To this end, we explore injecting perturbing noise into the original labeled SMILES strings to construct augmented data for alleviating the limitation of the labeled compound data and enhancing the model to extract more useful molecular representation for molecular property prediction. Specifically, we directly adopt mask, swap, deletion, and fusion operations on SMILES strings to randomly mask, swap, and delete atoms in SMILES strings. Then, the augmented data is used by two strategies: each epoch alternately feeds the original and perturbing noisy molecules, or each batch alternately feeds the original and perturbing noisy molecules. We conduct experiments on both Transformer and BiGRU models to validate the effectiveness by adopting widely used datasets from MoleculeNet and ZINC. Experimental results demonstrate that the proposed method outperforms strong baselines on all the datasets. NoiseMol obtains the best performance on BBBP and FDA when compared with state-of-the-art methods. Besides, NoiseMol achieves the best accuracy on LogP. Therefore, injecting perturbing noise into the labeled SMILES strings is an effective and efficient method, which improves the prediction performance, generalization, and robustness of the deep learning models.

18.
J Mol Graph Model ; 122: 108498, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37126908

RESUMEN

Innovations in drug-target interactions (DTIs) prediction accelerate the progression of drug development. The introduction of deep learning models has a dramatic impact on DTIs prediction, with a distinct influence on saving time and money in drug discovery. This study develops an end-to-end deep collaborative learning model for DTIs prediction, called EDC-DTI, to identify new targets for existing drugs based on multiple drug-target-related information including homogeneous information and heterogeneous information by the way of deep learning. Our end-to-end model is composed of a feature builder and a classifier. Feature builder consists of two collaborative feature construction algorithms that extract the molecular properties and the topology property of networks, and the classifier consists of a feature encoder and a feature decoder which are designed for feature integration and DTIs prediction, respectively. The feature encoder, mainly based on the improved graph attention network, incorporates heterogeneous information into drug features and target features separately. The feature decoder is composed of multiple neural networks for predictions. Compared with six popular baseline models, EDC-DTI achieves highest predictive performance in the case of low computational costs. Robustness tests demonstrate that EDC-DTI is able to maintain strong predictive performance on sparse datasets. As well, we use the model to predict the most likely targets to interact with Simvastatin (DB00641), Nifedipine (DB01115) and Afatinib (DB08916) as examples. Results show that most of the predictions can be confirmed by literature with clear evidence.


Asunto(s)
Prácticas Interdisciplinarias , Desarrollo de Medicamentos/métodos , Descubrimiento de Drogas/métodos , Redes Neurales de la Computación , Algoritmos
19.
Artículo en Inglés | MEDLINE | ID: mdl-35886188

RESUMEN

To achieve the long-term goals outlined in the Paris Agreement that address climate change, many countries have committed to carbon neutrality targets. The study of the characteristics and emissions trends of these economies is essential for the realistic formulation of accurate corresponding carbon neutral policies. In this study, we investigate the convergence characteristics of per capita carbon emissions (PCCEs) in 121 countries with carbon neutrality targets from 1990 to 2019 using a nonlinear time-varying factor model-based club convergence analysis, followed by an ordered logit model to explore the mechanism of convergence club formation. The results reveal three relevant findings. (1) Three convergence clubs for the PCCEs of countries with proposed carbon neutrality targets were evident, and the PCCEs of different convergence clubs converged in multiple steady-state levels along differing transition paths. (2) After the Kyoto Protocol came into effect, some developed countries were moved to the club with lower emissions levels, whereas some developing countries displayed elevated emissions, converging with the higher-level club. (3) It was shown that countries with higher initial emissions, energy intensity, industrial structure, and economic development levels are more likely to converge with higher-PCCEs clubs, whereas countries with higher urbanization levels are more likely to converge in clubs with lower PCCEs.


Asunto(s)
Dióxido de Carbono , Carbono , Dióxido de Carbono/análisis , Desarrollo Económico , Organizaciones , Urbanización
20.
J Mol Model ; 27(5): 137, 2021 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-33903935

RESUMEN

Force fields are actively used to study RNA. Development of accurate force fields relies on a knowledge of how the variation of properties of molecules depends on their structure. Detailed scrutiny of RNA's conformational preferences is needed to guide such development. Towards this end, minimum energy structures for each of a set of 16 small RNA-derived molecules were obtained by geometry optimization at the HF/6-31G(d,p), B3LYP/apc-1, and MP2/cc-pVDZ levels of theory. The number of minima computed for a given fragment was found to be related to both its size and flexibility. Atomic electrostatic multipole moments of atoms occurring in the [HO-P(O3)-CH2-] fragment of 30 sugar-phosphate-sugar geometries were calculated at the HF/6-31G(d,p) and B3LYP/apc-1 levels of theory, and the transferability of these properties between different conformations was investigated. The atomic multipole moments were found to be highly transferable between different conformations with small standard deviations. These results indicate necessary elements of the development of accurate RNA force fields.


Asunto(s)
Modelos Moleculares , ARN/química , Química Computacional , Conformación de Ácido Nucleico , Teoría Cuántica , ARN/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA