RESUMEN
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
RESUMEN
Recent years have seen revived interest in computer-assisted organic synthesis1,2. The use of reaction- and neural-network algorithms that can plan multistep synthetic pathways have revolutionized this field1,3-7, including examples leading to advanced natural products6,7. Such methods typically operate on full, literature-derived 'substrate(s)-to-product' reaction rules and cannot be easily extended to the analysis of reaction mechanisms. Here we show that computers equipped with a comprehensive knowledge-base of mechanistic steps augmented by physical-organic chemistry rules, as well as quantum mechanical and kinetic calculations, can use a reaction-network approach to analyse the mechanisms of some of the most complex organic transformations: namely, cationic rearrangements. Such rearrangements are a cornerstone of organic chemistry textbooks and entail notable changes in the molecule's carbon skeleton8-12. The algorithm we describe and deploy at https://HopCat.allchemy.net/ generates, within minutes, networks of possible mechanistic steps, traces plausible step sequences and calculates expected product distributions. We validate this algorithm by three sets of experiments whose analysis would probably prove challenging even to highly trained chemists: (1) predicting the outcomes of tail-to-head terpene (THT) cyclizations in which substantially different outcomes are encoded in modular precursors differing in minute structural details; (2) comparing the outcome of THT cyclizations in solution or in a supramolecular capsule; and (3) analysing complex reaction mixtures. Our results support a vision in which computers no longer just manipulate known reaction types1-7 but will help rationalize and discover new, mechanistically complex transformations.
Asunto(s)
Algoritmos , Técnicas de Química Sintética , Ciclización , Redes Neurales de la Computación , Terpenos , Cationes/química , Bases del Conocimiento , Terpenos/química , Técnicas de Química Sintética/métodos , Productos Biológicos/síntesis química , Productos Biológicos/química , Reproducibilidad de los Resultados , SolucionesRESUMEN
As the chemical industry continues to produce considerable quantities of waste chemicals1,2, it is essential to devise 'circular chemistry'3-8 schemes to productively back-convert at least a portion of these unwanted materials into useful products. Despite substantial progress in the degradation of some classes of harmful chemicals9, work on 'closing the circle'-transforming waste substrates into valuable products-remains fragmented and focused on well known areas10-15. Comprehensive analyses of which valuable products are synthesizable from diverse chemical wastes are difficult because even small sets of waste substrates can, within few steps, generate millions of putative products, each synthesizable by multiple routes forming densely connected networks. Tracing all such syntheses and selecting those that also meet criteria of process and 'green' chemistries is, arguably, beyond the cognition of human chemists. Here we show how computers equipped with broad synthetic knowledge can help address this challenge. Using the forward-synthesis Allchemy platform16, we generate giant synthetic networks emanating from approximately 200 waste chemicals recycled on commercial scales, retrieve from these networks tens of thousands of routes leading to approximately 300 important drugs and agrochemicals, and algorithmically rank these syntheses according to the accepted metrics of sustainable chemistry17-19. Several of these routes we validate by experiment, including an industrially realistic demonstration on a 'pharmacy on demand' flow-chemistry platform20. Wide adoption of computerized waste-to-valuable algorithms can accelerate productive reuse of chemicals that would otherwise incur storage or disposal costs, or even pose environmental hazards.
Asunto(s)
Industria Química , Diseño de Fármacos , Reposicionamiento de Medicamentos , ReciclajeRESUMEN
In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.
RESUMEN
This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms.
RESUMEN
Teaching computers to plan multistep syntheses of arbitrary target molecules-including natural products-has been one of the oldest challenges in chemistry, dating back to the 1960s. This Account recapitulates two decades of our group's work on the software platform called Chematica, which very recently achieved this long-sought objective and has been shown capable of planning synthetic routes to complex natural products, several of which were validated in the laboratory.For the machine to plan syntheses at an expert level, it must know the rules describing chemical reactions and use these rules to expand and search the networks of synthetic options. The rules must be of high quality: They must delineate accurately the scope of admissible substituents, capture all relevant stereochemical information, detect potential reactivity conflicts, and protection requirements. They should yield only those synthons that are chemically stable and energetically allowed (e.g., not too strained) and should be able to extrapolate beyond examples already published in the literature. In parallel, the network-search algorithms must be able to assign meaningful scores to the sets of synthons they encounter, make judicious choices which of the network's branches to expand, and when to withdraw from unpromising ones. They must be able to strategize over multiple steps to resolve intermittent reactivity conflicts, exchange functional groups, or overcome local maxima of molecular complexity.Meeting all these requirements makes the problem of computer-driven retrosynthesis very multifaceted, combining expert and AI approaches further supplemented by quantum-mechanical and molecular-mechanics calculations. Development of Chematica has been a very long and gradual process because all these components are needed. Any shortcuts-for example, reliance on only expert or only data-based approaches-yield chemically naïve and often erroneous syntheses, especially for complex targets. On the bright side, once all the requisite algorithms are implemented-as they now are-they not only streamline conventional synthetic planning but also enable completely new modalities that would challenge any human chemist, for example, synthesis with multiple constraints imposed simultaneously or library-wide syntheses in which the machine constructs "global plans" leading to multiple targets and benefiting from the use of common intermediates. These types of analyses will have profound impact on the practice of chemical industry, designing more economical, more green, and less hazardous pathways.
RESUMEN
Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1-7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8-14 are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.
Asunto(s)
Inteligencia Artificial , Productos Biológicos/síntesis química , Técnicas de Química Sintética/métodos , Química Orgánica/métodos , Programas Informáticos , Inteligencia Artificial/normas , Automatización/métodos , Automatización/normas , Bencilisoquinolinas/síntesis química , Bencilisoquinolinas/química , Técnicas de Química Sintética/normas , Química Orgánica/normas , Indanos/síntesis química , Indanos/química , Alcaloides Indólicos/síntesis química , Alcaloides Indólicos/química , Bases del Conocimiento , Lactonas/síntesis química , Lactonas/química , Macrólidos/síntesis química , Macrólidos/química , Reproducibilidad de los Resultados , Sesquiterpenos/síntesis química , Sesquiterpenos/química , Programas Informáticos/normas , Tetrahidroisoquinolinas/síntesis química , Tetrahidroisoquinolinas/químicaRESUMEN
A computer program for retrosynthetic planning helps develop multiple "synthetic contingency" plans for hydroxychloroquine and also routes leading to remdesivir, both promising but yet unproven medications against COVID-19. These plans are designed to navigate, as much as possible, around known and patented routes and to commence from inexpensive and diverse starting materials, so as to ensure supply in case of anticipated market shortages of commonly used substrates. Looking beyond the current COVID-19 pandemic, development of similar contingency syntheses is advocated for other already-approved medications, in case such medications become urgently needed in mass quantities to face other public-health emergencies.
RESUMEN
The challenge of prebiotic chemistry is to trace the syntheses of life's key building blocks from a handful of primordial substrates. Here we report a forward-synthesis algorithm that generates a full network of prebiotic chemical reactions accessible from these substrates under generally accepted conditions. This network contains both reported and previously unidentified routes to biotic targets, as well as plausible syntheses of abiotic molecules. It also exhibits three forms of nontrivial chemical emergence, as the molecules within the network can act as catalysts of downstream reaction types; form functional chemical systems, including self-regenerating cycles; and produce surfactants relevant to primitive forms of biological compartmentalization. To support these claims, computer-predicted, prebiotic syntheses of several biotic molecules as well as a multistep, self-regenerative cycle of iminodiacetic acid were validated by experiment.
Asunto(s)
Compuestos Orgánicos/síntesis química , Origen de la Vida , Simulación por ComputadorRESUMEN
Mapping atoms across chemical reactions is important for substructure searches, automatic extraction of reaction rules, identification of metabolic pathways, and more. Unfortunately, the existing mapping algorithms can deal adequately only with relatively simple reactions but not those in which expert chemists would benefit from computer's help. Here we report how a combination of algorithmics and expert chemical knowledge significantly improves the performance of atom mapping, allowing the machine to deal with even the most mechanistically complex chemical and biochemical transformations. The key feature of our approach is the use of few but judiciously chosen reaction templates that are used to generate plausible "intermediate" atom assignments which then guide a graph-theoretical algorithm towards the chemically correct isomorphic mappings. The algorithm performs significantly better than the available state-of-the-art reaction mappers, suggesting its uses in database curation, mechanism assignments, and - above all - machine extraction of reaction rules underlying modern synthesis-planning programs.
RESUMEN
Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections ("corpora"), including those deposited on the Internet - indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic "chemical words" that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular "keywords" by which such collections are best characterized and annotated.
RESUMEN
Analysis of the chemical-organic knowledge represented as a giant network reveals that it contains millions of reaction sequences closing into cycles. Without realizing it, independent chemists working at different times have jointly created examples of cyclic sequences that allow for the recovery of useful reagents and for the autoamplification of synthetically important molecules, those that mimic biological cycles, and those that can be operated one-pot.
RESUMEN
Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted organic synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan organic syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in organic synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chemical rules (with full stereo- and regiochemistry) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial organic molecules in a matter of seconds to minutes. The Review begins with an overview of some basic theoretical concepts essential for the big-data analysis of chemical syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of organic-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.
RESUMEN
A thermodynamically guided calculation of free energies of substrate and product molecules allows for the estimation of the yields of organic reactions. The non-ideality of the system and the solvent effects are taken into account through the activity coefficients calculated at the molecular level by perturbed-chain statistical associating fluid theory (PC-SAFT). The model is iteratively trained using a diverse set of reactions with yields that have been reported previously. This trained model can then estimate aâ priori the yields of reactions not included in the training set with an accuracy of ca. ±15 %. This ability has the potential to translate into significant economic savings through the selection and then execution of only those reactions that can proceed in good yields.