Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
J Chem Inf Model ; 64(14): 5480-5491, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-38982757

ABSTRACT

Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.


Subject(s)
Neural Networks, Computer , Solvents , Solvents/chemistry , Solubility , Hydrogen-Ion Concentration , Machine Learning , Models, Chemical , Acids/chemistry
2.
J Chem Inf Model ; 64(7): 2383-2392, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37706462

ABSTRACT

The pKa of C-H acids is an important parameter in the fields of organic synthesis, drug discovery, and materials science. However, the prediction of pKa is still a great challenge due to the limit of experimental data and the lack of chemical insight. Here, a new model for predicting the pKa values of C-H acids is proposed on the basis of graph neural networks (GNNs) and data augmentation. A message passing unit (MPU) was used to extract the topological and target-related information from the molecular graph data, and a readout layer was utilized to retrieve the information on the ionization site C atom. The retrieved information then was adopted to predict pKa by a fully connected network. Furthermore, to increase the diversity of the training data, a knowledge-infused data augmentation technique was established by replacing the H atoms in a molecule with substituents exhibiting different electronic effects. The MPU was pretrained with the augmented data. The efficacy of data augmentation was confirmed by visualizing the distribution of compounds with different substituents and by classifying compounds. The explainability of the model was studied by examining the change of pKa values when a specific atom was masked. This explainability was used to identify the key substituents for pKa. The model was evaluated on two data sets from the iBonD database. Dataset1 includes the experimental pKa values of C-H acids measured in DMSO, while dataset2 comprises the pKa values measured in water. The results show that the knowledge-infused data augmentation technique greatly improves the predictive accuracy of the model, especially when the number of samples is small.


Subject(s)
Drug Discovery , Electronics , Databases, Factual , Materials Science , Naphthalenesulfonates , Neural Networks, Computer
3.
Spectrochim Acta A Mol Biomol Spectrosc ; 296: 122674, 2023 Aug 05.
Article in English | MEDLINE | ID: mdl-36996517

ABSTRACT

Investigating the structures of water on metal oxides is helpful for understanding the mechanism of the adsorptions in the presence of water. In this work, the structures of adsorbed water molecules on anatase TiO2 (101) were studied by diffuse reflectance near-infrared spectroscopy (DR-NIRS). With resolution enhanced spectrum by continuous wavelet transform (CWT), the spectral features of adsorbed water at different sites were found. In the spectrum of dried TiO2 powder, there is only the spectral feature of the water adsorbed at 5-coordinated titanium atoms (Ti5c). With the increase of the adsorbed water, the spectral feature of the water at 2-coordinated oxygen atoms (O2c) emerges first, and then that of the water interacting with the adsorbed water can be observed. When adenosine triphosphate (ATP) was adsorbed on TiO2, the intensity of the peaks related to the adsorbed water decreases, indicating that the adsorbed water is replaced by ATP due to the strong affinity to Ti5c. Therefore, there is a clear correlation between the peak intensity of the adsorbed water and the adsorbed quantity of ATP. Water can be a NIR spectroscopic probe to detect the quantity of the adsorbed ATP. A partial least squares (PLS) model was established to predict the content of adsorbed ATP by the spectral peaks of water. The recoveries of validation samples are in the range of 92.00-114.96% with the relative standard deviations (RSDs) in a range of 2.13-5.82%.

SELECTION OF CITATIONS
SEARCH DETAIL