RESUMO
Artificial intelligence-guided closed-loop experimentation has emerged as a promising method for optimization of objective functions1,2, but the substantial potential of this traditionally black-box approach to uncovering new chemical knowledge has remained largely untapped. Here we report the integration of closed-loop experiments with physics-based feature selection and supervised learning, denoted as closed-loop transfer (CLT), to yield chemical insights in parallel with optimization of objective functions. CLT was used to examine the factors dictating the photostability in solution of light-harvesting donor-acceptor molecules used in a variety of organic electronics applications, and showed fundamental insights including the importance of high-energy regions of the triplet state manifold. This was possible following automated modular synthesis and experimental characterization of only around 1.5% of the theoretical chemical space. This physics-informed model for photostability was strengthened using multiple experimental test sets and validated by tuning the triplet excited-state energy of the solvent to break out of the observed plateau in the closed-loop photostability optimization process. Further applications of CLT to additional materials systems support the generalizability of this strategy for augmenting closed-loop strategies. Broadly, these findings show that combining interpretable supervised learning models and physics-based features with closed-loop discovery processes can rapidly provide fundamental chemical insights.
RESUMO
Self-driving laboratories (SDLs) promise an accelerated application of the scientific method. Through the automation of experimental workflows, along with autonomous experimental planning, SDLs hold the potential to greatly accelerate research in chemistry and materials discovery. This review provides an in-depth analysis of the state-of-the-art in SDL technology, its applications across various scientific disciplines, and the potential implications for research and industry. This review additionally provides an overview of the enabling technologies for SDLs, including their hardware, software, and integration with laboratory infrastructure. Most importantly, this review explores the diverse range of scientific domains where SDLs have made significant contributions, from drug discovery and materials science to genomics and chemistry. We provide a comprehensive review of existing real-world examples of SDLs, their different levels of automation, and the challenges and limitations associated with each domain.
RESUMO
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
RESUMO
Quantum computers are expected to outperform classical computers for specific problems in quantum chemistry. Such calculations remain expensive, but costs can be lowered through the partition of the molecular system. In the present study, partition was achieved with range-separated density functional theory (RS-DFT). The use of RS-DFT reduces both the basis set size and the active space size dependence of the ground state energy in comparison with the use of wave function theory (WFT) alone. The utilization of pair natural orbitals (PNOs) in place of canonical molecular orbitals (MOs) results in more compact qubit Hamiltonians. To test this strategy, a basis-set independent framework, known as multiresolution analysis (MRA), was employed to generate PNOs. Tests were conducted with the variational quantum eigensolver for a number of molecules. The results show that the proposed approach reduces the number of qubits needed to reach a target energy accuracy.
RESUMO
Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline current applications across a diversity of problems in chemistry. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.
RESUMO
Molecules with an inverted energy gap between their first singlet and triplet excited states have promising applications in the next generation of organic light-emitting diode (OLED) materials. Unfortunately, such molecules are rare, and only a handful of examples are currently known. High-throughput virtual screening could assist in finding novel classes of these molecules, but current efforts are hampered by the high computational cost of the required quantum chemical methods. We present a method based on the semiempirical Pariser-Parr-Pople theory augmented by perturbation theory and show that it reproduces inverted gaps at a fraction of the cost of currently employed excited-state calculations. Our study paves the way for ultrahigh-throughput virtual screening and inverse design to accelerate the discovery and development of this new generation of OLED materials.
RESUMO
Structure determination is necessary to identify unknown organic molecules, such as those in natural products, forensic samples, the interstellar medium, and laboratory syntheses. Rotational spectroscopy enables structure determination by providing accurate 3D information about small organic molecules via their moments of inertia. Using these moments, Kraitchman analysis determines isotopic substitution coordinates, which are the unsigned |x|, |y|, |z| coordinates of all atoms with natural isotopic abundance, including carbon, nitrogen, and oxygen. While unsigned substitution coordinates can verify guesses of structures, the missing +/- signs make it challenging to determine the actual structure from the substitution coordinates alone. To tackle this inverse problem, we develop Kreed (Kraitchman REflection-Equivariant Diffusion), a generative diffusion model that infers a molecule's complete 3D structure from only its molecular formula, moments of inertia, and unsigned substitution coordinates of heavy atoms. Kreed's top-1 predictions identify the correct 3D structure with near-perfect accuracy on large simulated datasets when provided with substitution coordinates of all heavy atoms with natural isotopic abundance. Accuracy decreases as fewer substitution coordinates are provided, but is retained for smaller molecules. On a test set of experimentally measured substitution coordinates gathered from the literature, Kreed predicts the correct all-atom 3D structure in 25 of 33 cases, demonstrating experimental potential for de novo 3D structure determination with rotational spectroscopy.
RESUMO
CNDOL is an a priori, approximate Fockian for molecular wave functions. In this study, we employ several modes of singly excited configuration interaction (CIS) to model molecular excitation properties by using four combinations of the one electron operator terms. Those options are compared to the experimental and theoretical data for a carefully selected set of molecules. The resulting excitons are represented by CIS wave functions that encompass all valence electrons in the system for each excited state energy. The Coulomb-exchange term associated to the calculated excitation energies is rationalized to evaluate theoretical exciton binding energies. This property is shown to be useful for discriminating the charge donation ability of molecular and supermolecular systems. Multielectronic 3D maps of exciton formal charges are showcased, demonstrating the applicability of these approximate wave functions for modeling properties of large molecules and clusters at nanoscales. This modeling proves useful in designing molecular photovoltaic devices. Our methodology holds potential applications in systematic evaluations of such systems and the development of fundamental artificial intelligence databases for predicting related properties.
RESUMO
A palladium-catalyzed domino C-N coupling/Cacchi reaction is reported. Design of photoluminescent bis-heterocycles, aided by density functional theory calculations, was performed with synthetic yields up to 98%. The photophysical properties of the products accessed via this strategy were part of a comprehensive study that led to broad emission spectra and quantum yields of up to 0.59. Mechanistic experiments confirmed bromoalkynes as competent intermediates, and a density functional theory investigation suggests a pathway involving initial oxidative addition into the cis C-Br bond of the gem-dihaloolefin.
RESUMO
We must accelerate the pace at which we make technological advancements to address climate change and disease risks worldwide. This swifter pace of discovery requires faster research and development cycles enabled by better integration between hypothesis generation, design, experimentation, and data analysis. Typical research cycles take months to years. However, data-driven automated laboratories, or self-driving laboratories, can significantly accelerate molecular and materials discovery. Recently, substantial advancements have been made in the areas of machine learning and optimization algorithms that have allowed researchers to extract valuable knowledge from multidimensional data sets. Machine learning models can be trained on large data sets from the literature or databases, but their performance can often be hampered by a lack of negative results or metadata. In contrast, data generated by self-driving laboratories can be information-rich, containing precise details of the experimental conditions and metadata. Consequently, much larger amounts of high-quality data are gathered in self-driving laboratories. When placed in open repositories, this data can be used by the research community to reproduce experiments, for more in-depth analysis, or as the basis for further investigation. Accordingly, high-quality open data sets will increase the accessibility and reproducibility of science, which is sorely needed.In this Account, we describe our efforts to build a self-driving lab for the development of a new class of materials: organic semiconductor lasers (OSLs). Since they have only recently been demonstrated, little is known about the molecular and material design rules for thin-film, electrically-pumped OSL devices as compared to other technologies such as organic light-emitting diodes or organic photovoltaics. To realize high-performing OSL materials, we are developing a flexible system for automated synthesis via iterative Suzuki-Miyaura cross-coupling reactions. This automated synthesis platform is directly coupled to the analysis and purification capabilities. Subsequently, the molecules of interest can be transferred to an optical characterization setup. We are currently limited to optical measurements of the OSL molecules in solution. However, material properties are ultimately most important in the solid state (e.g., as a thin-film device). To that end and for a different scientific goal, we are developing a self-driving lab for inorganic thin-film materials focused on the oxygen evolution reaction.While the future of self-driving laboratories is very promising, numerous challenges still need to be overcome. These challenges can be split into cognition and motor function. Generally, the cognitive challenges are related to optimization with constraints or unexpected outcomes for which general algorithmic solutions have yet to be developed. A more practical challenge that could be resolved in the near future is that of software control and integration because few instrument manufacturers design their products with self-driving laboratories in mind. Challenges in motor function are largely related to handling heterogeneous systems, such as dispensing solids or performing extractions. As a result, it is critical to understand that adapting experimental procedures that were designed for human experimenters is not as simple as transferring those same actions to an automated system, and there may be more efficient ways to achieve the same goal in an automated fashion. Accordingly, for self-driving laboratories, we need to carefully rethink the translation of manual experimental protocols.
Assuntos
Algoritmos , Laboratórios , Humanos , Reprodutibilidade dos TestesRESUMO
One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the effects of this technique on the performance of machine learning models trained on three polymer datasets and compare them to common molecular representations. Data augmentation does not yield significant improvements in machine learning property prediction performance compared to equivalent (non-augmented) representations. In datasets where the target property is primarily influenced by the polymer sequence rather than experimental parameters, this data augmentation technique provides molecular embedding with more information to improve property prediction accuracy.
Assuntos
Aprendizado de Máquina , Polímeros , Processamento de Linguagem NaturalRESUMO
De novo drug design with desired biological activities is crucial for developing novel therapeutics for patients. The drug development process is time- and resource-consuming, and it has a low probability of success. Recent advances in machine learning and deep learning technology have reduced the time and cost of the discovery process and therefore, improved pharmaceutical research and development. In this paper, we explore the combination of two rapidly developing fields with lead candidate discovery in the drug development process. First, artificial intelligence has already been demonstrated to successfully accelerate conventional drug design approaches. Second, quantum computing has demonstrated promising potential in different applications, such as quantum chemistry, combinatorial optimizations, and machine learning. This article explores hybrid quantum-classical generative adversarial networks (GAN) for small molecule discovery. We substituted each element of GAN with a variational quantum circuit (VQC) and demonstrated the quantum advantages in the small drug discovery. Utilizing a VQC in the noise generator of a GAN to generate small molecules achieves better physicochemical properties and performance in the goal-directed benchmark than the classical counterpart. Moreover, we demonstrate the potential of a VQC with only tens of learnable parameters in the generator of GAN to generate small molecules. We also demonstrate the quantum advantage of a VQC in the discriminator of GAN. In this hybrid model, the number of learnable parameters is significantly less than the classical ones, and it can still generate valid molecules. The hybrid model with only tens of training parameters in the quantum discriminator outperforms the MLP-based one in terms of both generated molecule properties and the achieved KL divergence. However, the hybrid quantum-classical GANs still face challenges in generating unique and valid molecules compared to their classical counterparts.
Assuntos
Inteligência Artificial , Redes Neurais de Computação , Humanos , Metodologias Computacionais , Teoria Quântica , Preparações FarmacêuticasRESUMO
Coherence phenomena arise from interference, or the addition, of wave-like amplitudes with fixed phase differences. Although coherence has been shown to yield transformative ways for improving function, advances have been confined to pristine matter and coherence was considered fragile. However, recent evidence of coherence in chemical and biological systems suggests that the phenomena are robust and can survive in the face of disorder and noise. Here we survey the state of recent discoveries, present viewpoints that suggest that coherence can be used in complex chemical systems, and discuss the role of coherence as a design element in realizing function.
Assuntos
Biofísica , Modelos Biológicos , Modelos Químicos , Elétrons , Transferência de Energia , Metais/química , Modelos Moleculares , Movimento (Física) , Teoria Quântica , Análise Espectral , Fatores de Tempo , VibraçãoRESUMO
Semiempirical quantum chemistry has recently seen a renaissance with applications in high-throughput virtual screening and machine learning. The simplest semiempirical model still in widespread use in chemistry is Hückel's π-electron molecular orbital theory. In this work, we implemented a Hückel program using differentiable programming with the JAX framework based on limited modifications of a pre-existing NumPy version. The auto-differentiable Hückel code enabled efficient gradient-based optimization of model parameters tuned for excitation energies and molecular polarizabilities, respectively, based on as few as 100 data points from density functional theory simulations. In particular, the facile computation of the polarizability, a second-order derivative, via auto-differentiation shows the potential of differentiable programming to bypass the need for numeric differentiation or derivation of analytical expressions. Finally, we employ gradient-based optimization of atom identity for inverse design of organic electronic materials with targeted orbital energy gaps and polarizabilities. Optimized structures are obtained after as little as 15 iterations using standard gradient-based optimization algorithms.
RESUMO
Redox biochemistry plays a key role in the transduction of chemical energy in living systems. However, the compounds observed in metabolic redox reactions are a minuscule fraction of chemical space. It is not clear whether compounds that ended up being selected as metabolites display specific properties that distinguish them from nonbiological compounds. Here, we introduce a systematic approach for comparing the chemical space of all possible redox states of linear-chain carbon molecules to the corresponding metabolites that appear in biology. Using cheminformatics and quantum chemistry, we analyze the physicochemical and thermodynamic properties of the biological and nonbiological compounds. We find that, among all compounds, aldose sugars have the highest possible number of redox connections to other molecules. Metabolites are enriched in carboxylic acid functional groups and depleted of ketones and aldehydes and have higher solubility than nonbiological compounds. Upon constructing the energy landscape for the full chemical space as a function of pH and electron-donor potential, we find that metabolites tend to have lower Gibbs energies than nonbiological molecules. Finally, we generate Pourbaix phase diagrams that serve as a thermodynamic atlas to indicate which compounds are energy minima in redox chemical space across a set of pH values and electron-donor potentials. While escape from thermodynamic equilibrium toward kinetically driven states is a hallmark of life and its origin, we envision that a deeper quantitative understanding of the environment-dependent thermodynamic landscape of putative prebiotic molecules will provide a crucial reference for future origins-of-life models.
Assuntos
Quimioinformática/métodos , Simulação de Dinâmica Molecular , Açúcares/química , Aldeídos/química , Configuração de Carboidratos , Ácidos Carboxílicos/química , Cetonas/química , OxirreduçãoRESUMO
We present a review of the Unitary Coupled Cluster (UCC) ansatz and related ansätze which are used to variationally solve the electronic structure problem on quantum computers. A brief history of coupled cluster (CC) methods is provided, followed by a broad discussion of the formulation of CC theory. This includes touching on the merits and difficulties of the method and several variants, UCC among them, in the classical context, to motivate their applications on quantum computers. In the core of the text, the UCC ansatz and its implementation on a quantum computer are discussed at length, in addition to a discussion on several derived and related ansätze specific to quantum computing. The review concludes with a unified perspective on the discussed ansätze, attempting to bring them under a common framework, as well as with a reflection upon open problems within the field.
RESUMO
The design of molecular catalysts typically involves reconciling multiple conflicting property requirements, largely relying on human intuition and local structural searches. However, the vast number of potential catalysts requires pruning of the candidate space by efficient property prediction with quantitative structure-property relationships. Data-driven workflows embedded in a library of potential catalysts can be used to build predictive models for catalyst performance and serve as a blueprint for novel catalyst designs. Herein we introduce kraken, a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles. Using quantum-mechanical methods, we calculated descriptors for 1558 ligands, including commercially available examples, and trained machine learning models to predict properties of over 300000 new ligands. We demonstrate the application of kraken to systematically explore the property space of organophosphorus ligands and how existing data sets in catalysis can be used to accelerate ligand selection during reaction optimization.
RESUMO
The choice of simulation methods in computational materials science is driven by a fundamental trade-off: bridging large time- and length-scales with highly accurate simulations at an affordable computational cost. Venturing the investigation of complex phenomena on large scales requires fast yet accurate computational methods. We review the emerging field of machine-learned potentials, which promises to reach the accuracy of quantum mechanical computations at a substantially reduced computational cost. This Review will summarize the basic principles of the underlying machine learning methods, the data acquisition process and active learning procedures. We highlight multiple recent applications of machine-learned potentials in various fields, ranging from organic chemistry and biomolecules to inorganic crystal structure predictions and surface science. We furthermore discuss the developments required to promote a broader use of ML potentials, and the possibility of using them to help solve open questions in materials science and facilitate fully computational materials design.
RESUMO
The ongoing revolution of the natural sciences by the advent of machine learning and artificial intelligence sparked significant interest in the material science community in recent years. The intrinsically high dimensionality of the space of realizable materials makes traditional approaches ineffective for large-scale explorations. Modern data science and machine learning tools developed for increasingly complicated problems are an attractive alternative. An imminent climate catastrophe calls for a clean energy transformation by overhauling current technologies within only several years of possible action available. Tackling this crisis requires the development of new materials at an unprecedented pace and scale. For example, organic photovoltaics have the potential to replace existing silicon-based materials to a large extent and open up new fields of application. In recent years, organic light-emitting diodes have emerged as state-of-the-art technology for digital screens and portable devices and are enabling new applications with flexible displays. Reticular frameworks allow the atom-precise synthesis of nanomaterials and promise to revolutionize the field by the potential to realize multifunctional nanoparticles with applications from gas storage, gas separation, and electrochemical energy storage to nanomedicine. In the recent decade, significant advances in all these fields have been facilitated by the comprehensive application of simulation and machine learning for property prediction, property optimization, and chemical space exploration enabled by considerable advances in computing power and algorithmic efficiency.In this Account, we review the most recent contributions of our group in this thriving field of machine learning for material science. We start with a summary of the most important material classes our group has been involved in, focusing on small molecules as organic electronic materials and crystalline materials. Specifically, we highlight the data-driven approaches we employed to speed up discovery and derive material design strategies. Subsequently, our focus lies on the data-driven methodologies our group has developed and employed, elaborating on high-throughput virtual screening, inverse molecular design, Bayesian optimization, and supervised learning. We discuss the general ideas, their working principles, and their use cases with examples of successful implementations in data-driven material discovery and design efforts. Furthermore, we elaborate on potential pitfalls and remaining challenges of these methods. Finally, we provide a brief outlook for the field as we foresee increasing adaptation and implementation of large scale data-driven approaches in material discovery and design campaigns.
RESUMO
Nonpairwise multiqubit interactions present a useful resource for quantum information processors. Their implementation would facilitate more efficient quantum simulations of molecules and combinatorial optimization problems, and they could simplify error suppression and error correction schemes. Here, we present a superconducting circuit architecture in which a coupling module mediates two-local and three-local interactions between three flux qubits by design. The system Hamiltonian is estimated via multiqubit pulse sequences that implement Ramsey-type interferometry between all neighboring excitation manifolds in the system. The three-local interaction is coherently tunable over several MHz via the coupler flux biases and can be turned off, which is important for applications in quantum annealing, analog quantum simulation, and gate-model quantum computation.