Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Commun Chem ; 7(1): 102, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720065

ABSTRACT

Breakthroughs in efficient use of biogas fuel depend on successful separation of carbon dioxide/methane streams and identification of appropriate separation materials. In this work, machine learning models are trained to predict biogas separation properties of metal-organic frameworks (MOFs). Training data are obtained using grand canonical Monte Carlo simulations of experimental MOFs which have been carefully curated to ensure data quality and structural viability. The models show excellent performance in predicting gas uptake and classifying MOFs according to the trade-off between gas uptake and selectivity, with R2 values consistently above 0.9 for the validation set. We make prospective predictions on an independent external set of hypothetical MOFs, and examine these predictions in comparison to the results of grand canonical Monte Carlo calculations. The best-performing trained models correctly filter out over 90% of low-performing unseen MOFs, illustrating their applicability to other MOF datasets.

2.
J Cheminform ; 16(1): 60, 2024 May 28.
Article in English | MEDLINE | ID: mdl-38807181

ABSTRACT

Selecting greener solvents during experiment design is imperative for greener chemistry. While many solvent selection guides are currently used in the pharmaceutical industry, these are often paper-based guides which can make it difficult to identify and compare specific solvents. This work presents a stand-alone version of the solvent flashcards that were developed as part of the AI4Green electronic laboratory notebook. The functionality is an intuitive and interactive interface for the visualisation of data from CHEM21, a pharmaceutical solvent selection guide that categorises solvents according to "greenness". This open-source software is written in Python, JavaScript, HTML and CSS and allows users to directly contrast and compare specific solvents by generating colour-coded flashcards. It can be installed locally using pip, or alternatively the source code is available on GitHub: https://github.com/AI4Green/solvent_flashcards . The documentation can also be found on GitHub or on the corresponding Python Package Index webpage: https://pypi.org/project/solvent-guide/ . SCIENTIFIC CONTRIBUTION: This simple and easy-to-use digital tool provides a visualisation of solvent greenness data through a novel intuitive interface and encourages green chemistry. It offers numerous advantages over traditional solvent selection guides, allowing users to directly customise the solvent list and generate side-by-side comparisons of only the most important solvents. The release as a standalone package will maximise the benefit of this software.

3.
Chem Sci ; 15(15): 5764-5774, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38638222

ABSTRACT

A principal component surfactant_map was developed for 91 commonly accessible surfactants for use in surfactant-enabled organic reactions in water, an important approach for sustainable chemical processes. This map was built using 22 experimental and theoretical descriptors relevant to the physicochemical nature of these surfactant-enabled reactions, and advanced principal component analysis algorithms. It is comprised of all classes of surfactants, i.e. cationic, anionic, zwitterionic and neutral surfactants, including designer surfactants. The value of this surfactant_map was demonstrated in activating simple inorganic fluoride salts as effective nucleophiles in water, with the right surfactant. This led to the rapid development (screening 13-15 surfactants) of two fluorination reactions for ß-bromosulfides and sulfonyl chlorides in water. The latter was demonstrated in generating a sulfonyl fluoride with sufficient purity for direct use in labelling of chymotrypsin, under physiological conditions.

4.
J Cheminform ; 16(1): 43, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38622648

ABSTRACT

Multiple metrics are used when assessing and validating the performance of quantitative structure-activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model's perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data. SCIENTIFIC CONTRIBUTION: This work provides a formal, unified framework for understanding prevalence dependence in model validation metrics, deriving balanced metric expressions beyond balanced accuracy, and demonstrating their practical utility for common use cases. In contrast to prior literature, it introduces the derived confusion matrix to express metrics as functions of sensitivity, specificity and prevalence without needing additional coefficients. The manuscript extends the concept of balanced metrics to Matthews' correlation coefficient and other widely used performance indicators, enabling robust comparisons under prevalence shifts.

5.
J Chem Inf Model ; 63(10): 2895-2901, 2023 05 22.
Article in English | MEDLINE | ID: mdl-37155346

ABSTRACT

An Electronic Laboratory Notebook (ELN) combining features, including data archival, collaboration tools, and green and sustainability metrics for organic chemistry, is presented. AI4Green is a web-based application, available as open-source code and free to use. It offers the core functionality of an ELN, namely, the ability to store reactions securely and share them among different members of a research team. As users plan their reactions and record them in the ELN, green and sustainable chemistry is encouraged by automatically calculating green metrics and color-coding hazards, solvents, and reaction conditions. The interface links a database constructed from data extracted from PubChem, enabling the automatic collation of information for reactions. The application's design facilitates the development of auxiliary sustainability applications, such as our Solvent Guide. As more reaction data are captured, subsequent work will include providing "intelligent" sustainability suggestions to the user.


Subject(s)
Laboratories , Software , Electronics , Databases, Factual
6.
J Chem Inf Model ; 61(10): 4890-4899, 2021 10 25.
Article in English | MEDLINE | ID: mdl-34549957

ABSTRACT

Solvent-dependent reactivity is a key aspect of synthetic science, which controls reaction selectivity. The contemporary focus on new, sustainable solvents highlights a need for reactivity predictions in different solvents. Herein, we report the excellent machine learning prediction of the nucleophilicity parameter N in the four most-common solvents for nucleophiles in the Mayr's reactivity parameter database (R2 = 0.93 and 81.6% of predictions within ±2.0 of the experimental values with Extra Trees algorithm). A Causal Structure Property Relationship (CSPR) approach was utilized, with focus on the physicochemical relationships between the descriptors and the predicted parameters, and on rational improvements of the prediction models. The nucleophiles were represented with a series of electronic and steric descriptors and the solvents were represented with principal component analysis (PCA) descriptors based on the ACS Solvent Tool. The models indicated that steric factors do not contribute significantly, because of bias in the experimental database. The most important descriptors are solvent-dependent HOMO energy and Hirshfeld charge of the nucleophilic atom. Replacing DFT descriptors with Parameterization Method 6 (PM6) descriptors for the nucleophiles led to an 8.7-fold decrease in computational time, and an ∼10% decrease in the percentage of predictions within ±2.0 and ±1.0 of the experimental values.


Subject(s)
Algorithms , Principal Component Analysis , Solvents
7.
Nat Commun ; 11(1): 5753, 2020 11 13.
Article in English | MEDLINE | ID: mdl-33188226

ABSTRACT

Solubility prediction remains a critical challenge in drug development, synthetic route and chemical process design, extraction and crystallisation. Here we report a successful approach to solubility prediction in organic solvents and water using a combination of machine learning (ANN, SVM, RF, ExtraTrees, Bagging and GP) and computational chemistry. Rational interpretation of dissolution process into a numerical problem led to a small set of selected descriptors and subsequent predictions which are independent of the applied machine learning method. These models gave significantly more accurate predictions compared to benchmarked open-access and commercial tools, achieving accuracy close to the expected level of noise in training data (LogS ± 0.7). Finally, they reproduced physicochemical relationship between solubility and molecular properties in different solvents, which led to rational approaches to improve the accuracy of each models.

8.
J Cheminform ; 9(1): 63, 2017 Dec 13.
Article in English | MEDLINE | ID: mdl-29238891

ABSTRACT

In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.

SELECTION OF CITATIONS
SEARCH DETAIL
...