Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
J Chem Inf Model ; 64(8): 3180-3191, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38533705

RESUMEN

In the pursuit of improved compound identification and database search tasks, this study explores heteronuclear single quantum coherence (HSQC) spectra simulation and matching methodologies. HSQC spectra serve as unique molecular fingerprints, enabling a valuable balance of data collection time and information richness. We conducted a comprehensive evaluation of the following four HSQC simulation techniques: ACD/Labs (ACD), MestReNova (MNova), Gaussian NMR calculations (DFT), and a graph-based neural network (ML). For the latter two techniques, we developed a reconstruction logic to combine proton and carbon 1D spectra into HSQC spectra. The methodology involved the implementation of three peak-matching strategies (minimum-sum, Euclidean-distance, and Hungarian distance) combined with three padding strategies (zero-padding, peak-truncated, and nearest-neighbor double assignment). We found that coupling these strategies with a robust simulation technique facilitates the accurate identification of correct molecules from similar analogues (regio- and stereoisomers) and allows for fast and accurate large database searches. Furthermore, we demonstrated the efficacy of the best-performing methodology by rectifying the structures of a set of previously misidentified molecules. This research indicates that effective HSQC spectral simulation and matching methodologies significantly facilitate molecular structure elucidation. Furthermore, we offer a Google Colab notebook for researchers to use our methods on their own data (https://github.com/AstraZeneca/hsqc_structure_elucidation.git).


Asunto(s)
Simulación por Computador , Redes Neurales de la Computación
2.
J Chem Inf Model ; 63(9): 2667-2678, 2023 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-37058588

RESUMEN

High-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design. However, existing collections of machine-learning-ready public datasets do not exploit the multiple data modalities present in real-world HTS projects. Thus, the largest fraction of experimental measurements, corresponding to hundreds of thousands of "noisy" activity values from primary screening, are effectively ignored in the majority of machine learning models of HTS data. To address these limitations, we introduce Multifidelity PubChem BioAssay (MF-PCBA), a curated collection of 60 datasets that includes two data modalities for each dataset, corresponding to primary and confirmatory screening, an aspect that we call multifidelity. Multifidelity data accurately reflect real-world HTS conventions and present a new, challenging task for machine learning: the integration of low- and high-fidelity measurements through molecular representation learning, taking into account the orders-of-magnitude difference in size between the primary and confirmatory screens. Here we detail the steps taken to assemble MF-PCBA in terms of data acquisition from PubChem and the filtering steps required to curate the raw data. We also provide an evaluation of a recent deep-learning-based method for multifidelity integration across the introduced datasets, demonstrating the benefit of leveraging all HTS modalities, and a discussion in terms of the roughness of the molecular activity landscape. In total, MF-PCBA contains over 16.6 million unique molecule-protein interactions. The datasets can be easily assembled by using the source code available at https://github.com/davidbuterez/mf-pcba.


Asunto(s)
Benchmarking , Ensayos Analíticos de Alto Rendimiento , Ensayos Analíticos de Alto Rendimiento/métodos , Descubrimiento de Drogas/métodos , Aprendizaje Automático , Bioensayo
3.
Acc Chem Res ; 54(3): 532-545, 2021 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-33480674

RESUMEN

The variability of chemical bonding in open-shell transition-metal complexes not only motivates their study as functional materials and catalysts but also challenges conventional computational modeling tools. Here, tailoring ligand chemistry can alter preferred spin or oxidation states as well as electronic structure properties and reactivity, creating vast regions of chemical space to explore when designing new materials atom by atom. Although first-principles density functional theory (DFT) remains the workhorse of computational chemistry in mechanism deduction and property prediction, it is of limited use here. DFT is both far too computationally costly for widespread exploration of transition-metal chemical space and also prone to inaccuracies that limit its predictive performance for localized d electrons in transition-metal complexes. These challenges starkly contrast with the well-trodden regions of small-organic-molecule chemical space, where the analytical forms of molecular mechanics force fields and semiempirical theories have for decades accelerated the discovery of new molecules, accurate DFT functional performance has been demonstrated, and gold-standard methods from correlated wavefunction theory can predict experimental results to chemical accuracy.The combined promise of transition-metal chemical space exploration and lack of established tools has mandated a distinct approach. In this Account, we outline the path we charted in exploration of transition-metal chemical space starting from the first machine learning (ML) models (i.e., artificial neural network and kernel ridge regression) and representations for the prediction of open-shell transition-metal complex properties. The distinct importance of the immediate coordination environment of the metal center as well as the lack of low-level methods to accurately predict structural properties in this coordination environment first motivated and then benefited from these ML models and representations. Once developed, the recipe for prediction of geometric, spin state, and redox potential properties was straightforwardly extended to a diverse range of other properties, including in catalysis, computational "feasibility", and the gas separation properties of periodic metal-organic frameworks. Interpretation of selected features most important for model prediction revealed new ways to encapsulate design rules and confirmed that models were robustly mapping essential structure-property relationships. Encountering the special challenge of ensuring that good model performance could generalize to new discovery targets motivated investigation of how to best carry out model uncertainty quantification. Distance-based approaches, whether in model latent space or in carefully engineered feature space, provided intuitive measures of the domain of applicability. With all of these pieces together, ML can be harnessed as an engine to tackle the large-scale exploration of transition-metal chemical space needed to satisfy multiple objectives using efficient global optimization methods. In practical terms, bringing these artificial intelligence tools to bear on the problems of transition-metal chemical space exploration has resulted in ML-model assessments of large, multimillion compound spaces in minutes and validated new design leads in weeks instead of decades.

4.
J Chem Phys ; 156(7): 074101, 2022 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-35183086

RESUMEN

Strategies for machine-learning (ML)-accelerated discovery that are general across material composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets such as open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (∼1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the Periodic Table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to the graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the group number alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data are limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the Periodic Table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the Periodic Table, a property we expect to be broadly useful for other material domains.

5.
J Chem Phys ; 157(18): 184112, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36379790

RESUMEN

To accelerate the exploration of chemical space, it is necessary to identify the compounds that will provide the most additional information or value. A large-scale analysis of mononuclear octahedral transition metal complexes deposited in an experimental database confirms an under-representation of lower-symmetry complexes. From a set of around 1000 previously studied Fe(II) complexes, we show that the theoretical space of synthetically accessible complexes formed from the relatively small number of unique ligands is significantly (∼816k) larger. For the properties of these complexes, we validate the concept of ligand additivity by inferring heteroleptic properties from a stoichiometric combination of homoleptic complexes. An improved interpolation scheme that incorporates information about cis and trans isomer effects predicts the adiabatic spin-splitting energy to around 2 kcal/mol and the HOMO level to less than 0.2 eV. We demonstrate a multi-stage strategy to discover leads from the 816k Fe(II) complexes within a targeted property region. We carry out a coarse interpolation from homoleptic complexes that we refine over a subspace of ligands based on the likelihood of generating complexes with targeted properties. We validate our approach on nine new binary and ternary complexes predicted to be in a targeted zone of discovery, suggesting opportunities for efficient transition metal complex discovery.

6.
Bioorg Med Chem ; 44: 116308, 2021 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-34280849

RESUMEN

We have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol 1 as a starting point in a ligand based design context, we have shown in a retrospective study that a 3D similarity enabled generative model can discover new leads in the absence of any other information. It can be efficiently used for scaffold hopping and generation of novel series. 3D similarity based models were compared against 2D QSAR based, indicating a significant degree of orthogonality of the generated outputs and with the former having a more diverse output. In addition, when the two scoring components are combined together for training of the generative model, it results in more efficient exploration of desirable chemical space compared to the individual components.


Asunto(s)
Diseño de Fármacos , Haloperidol/farmacología , Receptores de Dopamina D2/metabolismo , Haloperidol/síntesis química , Haloperidol/química , Humanos , Ligandos , Modelos Moleculares , Estructura Molecular , Relación Estructura-Actividad Cuantitativa , Relación Estructura-Actividad
7.
J Phys Chem A ; 124(16): 3286-3299, 2020 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-32223165

RESUMEN

Determination of ground-state spins of open-shell transition-metal complexes is critical to understanding catalytic and materials properties but also challenging with approximate electronic structure methods. As an alternative approach, we demonstrate how structure alone can be used to guide assignment of ground-state spin from experimentally determined crystal structures of transition-metal complexes. We first identify the limits of distance-based heuristics from distributions of metal-ligand bond lengths of over 2000 unique mononuclear Fe(II)/Fe(III) transition-metal complexes. To overcome these limits, we employ artificial neural networks (ANNs) to predict spin-state-dependent metal-ligand bond lengths and classify experimental ground-state spins based on agreement of experimental structures with the ANN predictions. Although the ANN is trained on hybrid density functional theory data, we exploit the method-insensitivity of geometric properties to enable assignment of ground states for the majority (ca. 80-90%) of structures. We demonstrate the utility of the ANN by data-mining the literature for spin-crossover (SCO) complexes, which have experimentally observed temperature-dependent geometric structure changes, by correctly assigning almost all (>95%) spin states in the 46 Fe(II) SCO complex set. This approach represents a promising complement to more conventional energy-based spin-state assignment from electronic structure theory at the low cost of a machine learning model.

8.
Inorg Chem ; 58(16): 10592-10606, 2019 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-30834738

RESUMEN

Recent transformative advances in computing power and algorithms have made computational chemistry central to the discovery and design of new molecules and materials. First-principles simulations are increasingly accurate and applicable to large systems with the speed needed for high-throughput computational screening. Despite these strides, the combinatorial challenges associated with the vastness of chemical space mean that more than just fast and accurate computational tools are needed for accelerated chemical discovery. In transition-metal chemistry and catalysis, unique challenges arise. The variable spin, oxidation state, and coordination environments favored by elements with well-localized d or f electrons provide great opportunity for tailoring properties in catalytic or functional (e.g., magnetic) materials but also add layers of uncertainty to any design strategy. We outline five key mandates for realizing computationally driven accelerated discovery in inorganic chemistry: (i) fully automated simulation of new compounds, (ii) knowledge of prediction sensitivity or accuracy, (iii) faster-than-fast property prediction methods, (iv) maps for rapid chemical space traversal, and (v) a means to reveal design rules on the kilocompound scale. Through case studies in open-shell transition-metal chemistry, we describe how advances in methodology and software in each of these areas bring about new chemical insights. We conclude with our outlook on the next steps in this process toward realizing fully autonomous discovery in inorganic chemistry using computational chemistry.

9.
J Phys Chem A ; 121(46): 8939-8954, 2017 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-29095620

RESUMEN

Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.

10.
J Chem Phys ; 147(19): 191101, 2017 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-29166114

RESUMEN

The flat-plane condition is the union of two exact constraints in electronic structure theory: (i) energetic piecewise linearity with fractional electron removal or addition and (ii) invariant energetics with change in electron spin in a half filled orbital. Semi-local density functional theory (DFT) fails to recover the flat plane, exhibiting convex fractional charge errors (FCE) and concave fractional spin errors (FSE) that are related to delocalization and static correlation errors. We previously showed that DFT+U eliminates FCE but now demonstrate that, like other widely employed corrections (i.e., Hartree-Fock exchange), it worsens FSE. To find an alternative strategy, we examine the shape of semi-local DFT deviations from the exact flat plane and we find this shape to be remarkably consistent across ions and molecules. We introduce the judiciously modified DFT (jmDFT) approach, wherein corrections are constructed from few-parameter, low-order functional forms that fit the shape of semi-local DFT errors. We select one such physically intuitive form and incorporate it self-consistently to correct semi-local DFT. We demonstrate on model systems that jmDFT represents the first easy-to-implement, no-overhead approach to recovering the flat plane from semi-local DFT.

12.
Nat Commun ; 15(1): 1517, 2024 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-38409255

RESUMEN

We investigate the potential of graph neural networks for transfer learning and improving molecular property prediction on sparse and expensive to acquire high-fidelity data by leveraging low-fidelity measurements as an inexpensive proxy for a targeted property of interest. This problem arises in discovery processes that rely on screening funnels for trading off the overall costs against throughput and accuracy. Typically, individual stages in these processes are loosely connected and each one generates data at different scale and fidelity. We consider this setup holistically and demonstrate empirically that existing transfer learning techniques for graph neural networks are generally unable to harness the information from multi-fidelity cascades. Here, we propose several effective transfer learning strategies and study them in transductive and inductive settings. Our analysis involves a collection of more than 28 million unique experimental protein-ligand interactions across 37 targets from drug discovery by high-throughput screening and 12 quantum properties from the dataset QMugs. The results indicate that transfer learning can improve the performance on sparse tasks by up to eight times while using an order of magnitude less high-fidelity training data. Moreover, the proposed methods consistently outperform existing transfer learning strategies for graph-structured data on drug discovery and quantum mechanics datasets.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje , Ensayos Analíticos de Alto Rendimiento , Redes Neurales de la Computación , Aprendizaje Automático
13.
Chem Sci ; 15(11): 4146-4160, 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38487235

RESUMEN

Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.

14.
J Cheminform ; 16(1): 20, 2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38383444

RESUMEN

REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.

15.
Curr Opin Struct Biol ; 80: 102575, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36966692

RESUMEN

In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days. The experimental validations conducted thus far should be considered proof-of-principle, providing confidence that the field is moving in the right direction.


Asunto(s)
Algoritmos , Inteligencia Artificial , Automatización , Diseño de Fármacos
16.
Commun Chem ; 6(1): 262, 2023 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-38030692

RESUMEN

Atom-centred neural networks represent the state-of-the-art for approximating the quantum chemical properties of molecules, such as internal energies. While the design of machine learning architectures that respect chemical principles has continued to advance, the final atom pooling operation that is necessary to convert from atomic to molecular representations in most models remains relatively undeveloped. The most common choices, sum and average pooling, compute molecular representations that are naturally a good fit for many physical properties, while satisfying properties such as permutation invariance which are desirable from a geometric deep learning perspective. However, there are growing concerns that such simplistic functions might have limited representational power, while also being suboptimal for physical properties that are highly localised or intensive. Based on recent advances in graph representation learning, we investigate the use of a learnable pooling function that leverages an attention mechanism to model interactions between atom representations. The proposed pooling operation is a drop-in replacement requiring no changes to any of the other architectural components. Using SchNet and DimeNet++ as starting models, we demonstrate consistent uplifts in performance compared to sum and mean pooling and a recent physics-aware pooling operation designed specifically for orbital energies, on several datasets, properties, and levels of theory, with up to 85% improvements depending on the specific task.

17.
Commun Chem ; 6(1): 82, 2023 Apr 27.
Artículo en Inglés | MEDLINE | ID: mdl-37106032

RESUMEN

In drug discovery, computational methods are a key part of making informed design decisions and prioritising experiments. In particular, optimizing compound affinity is a central concern during the early stages of development. In the last 10 years, alchemical free energy (FE) calculations have transformed our ability to incorporate accurate in silico potency predictions in design decisions, and represent the 'gold standard' for augmenting experiment-driven drug discovery. However, relative FE calculations are complex to set up, require significant expert intervention to prepare the calculation and analyse the results or are provided only as closed-source software, not allowing for fine-grained control over the underlying settings. In this work, we introduce an end-to-end relative FE workflow based on the non-equilibrium switching approach that facilitates calculation of binding free energies starting from SMILES strings. The workflow is implemented using fully modular steps, allowing various components to be exchanged depending on licence availability. We further investigate the dependence of the calculated free energy accuracy on the initial ligand pose generated by various docking algorithms. We show that both commercial and open-source docking engines can be used to generate poses that lead to good correlation of free energies with experimental reference data.

18.
Chem Sci ; 14(25): 7057-7067, 2023 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-37389247

RESUMEN

Understanding allosteric regulation in biomolecules is of great interest to pharmaceutical research and computational methods emerged during the last decades to characterize allosteric coupling. However, the prediction of allosteric sites in a protein structure remains a challenging task. Here, we integrate local binding site information, coevolutionary information, and information on dynamic allostery into a structure-based three-parameter model to identify potentially hidden allosteric sites in ensembles of protein structures with orthosteric ligands. When tested on five allosteric proteins (LFA-1, p38-α, GR, MAT2A, and BCKDK), the model successfully ranked all known allosteric pockets in the top three positions. Finally, we identified a novel druggable site in MAT2A confirmed by X-ray crystallography and SPR and a hitherto unknown druggable allosteric site in BCKDK validated by biochemical and X-ray crystallography analyses. Our model can be applied in drug discovery to identify allosteric pockets.

19.
J Cheminform ; 13(1): 89, 2021 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-34789335

RESUMEN

Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream .

20.
ACS Cent Sci ; 6(4): 513-524, 2020 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-32342001

RESUMEN

The accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of multimillion compound libraries over which even density functional theory (DFT) screening is intractable. Machine learning (e.g., artificial neural network, ANN, or Gaussian process, GP) models for this task are limited by training data availability and predictive uncertainty quantification (UQ). We overcome such limitations by using efficient global optimization (EGO) with the multidimensional expected improvement (EI) criterion. EGO balances exploitation of a trained model with acquisition of new DFT data at the Pareto front, the region of chemical space that contains the optimal trade-off between multiple design criteria. We demonstrate this approach for the simultaneous optimization of redox potential and solubility in candidate M(II)/M(III) redox couples for redox flow batteries from a space of 2.8 M transition metal complexes designed for stability in practical redox flow battery (RFB) applications. We show that a multitask ANN with latent-distance-based UQ surpasses the generalization performance of a GP in this space. With this approach, ANN prediction and EI scoring of the full space are achieved in minutes. Starting from ca. 100 representative points, EGO improves both properties by over 3 standard deviations in only five generations. Analysis of lookahead errors confirms rapid ANN model improvement during the EGO process, achieving suitable accuracy for predictive design in the space of transition metal complexes. The ANN-driven EI approach achieves at least 500-fold acceleration over random search, identifying a Pareto-optimal design in around 5 weeks instead of 50 years.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA