Pesquisa | Portal Regional da BVS

1.

Fuelling the Digital Chemistry Revolution with Language Models.

Cardinale, Antonio; Castrogiovanni, Alessandro; Gaudin, Theophile; Geluykens, Joppe; Laino, Teodoro; Manica, Matteo; Probst, Daniel; Schwaller, Philippe; Sobczyk, Aleksandros; Toniato, Alessandra; Vaucher, Alain C; Wolf, Heiko; Zipoli, Federico.

Chimia (Aarau) ; 77(7-8): 484-488, 2023 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-38047789

RESUMO

The RXN for Chemistry project, initiated by IBM Research Europe - Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry. This research adopts an innovative concept by treating chemical reaction data as language records, treating the prediction of a synthetic organic chemistry reaction as a translation task between precursor and product languages. Over the years, the IBM Research team has successfully developed language models for various applications including forward reaction prediction, retrosynthesis, reaction classification, atom-mapping, procedure extraction from text, inference of experimental protocols and its use in programming commercial automation hardware to implement an autonomous chemical laboratory. Furthermore, the project has recently incorporated biochemical data in training models for greener and more sustainable chemical reactions. The remarkable ease of constructing prediction models and continually enhancing them through data augmentation with minimal human intervention has led to the widespread adoption of language model technologies, facilitating the digitalization of chemistry in diverse industrial sectors such as pharmaceuticals and chemical manufacturing. This manuscript provides a concise overview of the scientific components that contributed to the prestigious Sandmeyer Award in 2022.

2.

Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets.

Toniato, Alessandra; Vaucher, Alain C; Lehmann, Marzena Maria; Luksch, Torsten; Schwaller, Philippe; Stenta, Marco; Laino, Teodoro.

Chem Mater ; 35(21): 8806-8815, 2023 Nov 14.

Artigo em Inglês | MEDLINE | ID: mdl-38027545

RESUMO

The world is on the verge of a new industrial revolution, and language models are poised to play a pivotal role in this transformative era. Their ability to offer intelligent insights and forecasts has made them a valuable asset for businesses seeking a competitive advantage. The chemical industry, in particular, can benefit significantly from harnessing their power. Since 2016 already, language models have been applied to tasks such as predicting reaction outcomes or retrosynthetic routes. While such models have demonstrated impressive abilities, the lack of publicly available data sets with universal coverage is often the limiting factor for achieving even higher accuracies. This makes it imperative for organizations to incorporate proprietary data sets into their model training processes to improve their performance. So far, however, these data sets frequently remain untapped as there are no established criteria for model customization. In this work, we report a successful methodology for retraining language models on reaction outcome prediction and single-step retrosynthesis tasks, using proprietary, nonpublic data sets. We report a considerable boost in accuracy by combining patent and proprietary data in a multidomain learning formulation. This exercise, inspired by a real-world use case, enables us to formulate guidelines that can be adopted in different corporate settings to customize chemical language models easily.

3.

Unbiasing Retrosynthesis Language Models with Disconnection Prompts.

Thakkar, Amol; Vaucher, Alain C; Byekwaso, Andrea; Schwaller, Philippe; Toniato, Alessandra; Laino, Teodoro.

ACS Cent Sci ; 9(7): 1488-1498, 2023 Jul 26.

Artigo em Inglês | MEDLINE | ID: mdl-37529205

RESUMO

Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user's digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical.

4.

Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planning.

Toniato, Alessandra; Unsleber, Jan P; Vaucher, Alain C; Weymuth, Thomas; Probst, Daniel; Laino, Teodoro; Reiher, Markus.

Digit Discov ; 2(3): 663-673, 2023 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-37312681

RESUMO

Data-driven synthesis planning has seen remarkable successes in recent years by virtue of modern approaches of artificial intelligence that efficiently exploit vast databases with experimental data on chemical reactions. However, this success story is intimately connected to the availability of existing experimental data. It may well occur in retrosynthetic and synthesis design tasks that predictions in individual steps of a reaction cascade are affected by large uncertainties. In such cases, it will, in general, not be easily possible to provide missing data from autonomously conducted experiments on demand. However, first-principles calculations can, in principle, provide missing data to enhance the confidence of an individual prediction or for model retraining. Here, we demonstrate the feasibility of such an ansatz and examine resource requirements for conducting autonomous first-principles calculations on demand.

5.

Enhancing diversity in language based models for single-step retrosynthesis.

Toniato, Alessandra; Vaucher, Alain C; Schwaller, Philippe; Laino, Teodoro.

Digit Discov ; 2(2): 489-501, 2023 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-37065677

RESUMO

Over the past four years, several research groups demonstrated the combination of domain-specific language representation with recent NLP architectures to accelerate innovation in a wide range of scientific fields. Chemistry is a great example. Among the various chemical challenges addressed with language models, retrosynthesis demonstrates some of the most distinctive successes and limitations. Single-step retrosynthesis, the task of identifying reactions able to decompose a complex molecule into simpler structures, can be cast as a translation problem, in which a text-based representation of the target molecule is converted into a sequence of possible precursors. A common issue is a lack of diversity in the proposed disconnection strategies. The suggested precursors typically fall in the same reaction family, which limits the exploration of the chemical space. We present a retrosynthesis Transformer model that increases the diversity of the predictions by prepending a classification token to the language representation of the target molecule. At inference, the use of these prompt tokens allows us to steer the model towards different kinds of disconnection strategies. We show that the diversity of the predictions improves consistently, which enables recursive synthesis tools to circumvent dead ends and consequently, suggests synthesis pathways for more complex molecules.

6.

Inferring experimental procedures from text-based representations of chemical reactions.

Vaucher, Alain C; Schwaller, Philippe; Geluykens, Joppe; Nair, Vishnu H; Iuliano, Anna; Laino, Teodoro.

Nat Commun ; 12(1): 2573, 2021 05 06.

Artigo em Inglês | MEDLINE | ID: mdl-33958589

RESUMO

The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.

7.

Automated extraction of chemical synthesis actions from experimental procedures.

Vaucher, Alain C; Zipoli, Federico; Geluykens, Joppe; Nair, Vishnu H; Schwaller, Philippe; Laino, Teodoro.

Nat Commun ; 11(1): 3601, 2020 07 17.

Artigo em Inglês | MEDLINE | ID: mdl-32681088

RESUMO

Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on manually annotated samples. Predictions on our test set result in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.

8.

GuacaMol: Benchmarking Models for de Novo Molecular Design.

Brown, Nathan; Fiscato, Marco; Segler, Marwin H S; Vaucher, Alain C.

J Chem Inf Model ; 59(3): 1096-1108, 2019 03 25.

Artigo em Inglês | MEDLINE | ID: mdl-30887799

RESUMO

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multiobjective optimization tasks. The benchmarking open-source Python code and a leaderboard can be found on https://benevolent.ai/guacamol .

Assuntos

Benchmarking/métodos , Aprendizado Profundo , Preparações Farmacêuticas/química , Desenho de Fármacos , Isomerismo , Modelos Moleculares , Estrutura Molecular , Método de Monte Carlo , Relação Quantitativa Estrutura-Atividade

9.

Training Neural Nets To Learn Reactive Potential Energy Surfaces Using Interactive Quantum Chemistry in Virtual Reality.

Amabilino, Silvia; Bratholm, Lars A; Bennie, Simon J; Vaucher, Alain C; Reiher, Markus; Glowacki, David R.

J Phys Chem A ; 123(20): 4486-4499, 2019 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-30892040

RESUMO

While the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in an interesting paradigm shift, which places increasing value on issues related to data curation-that is, data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open-source graphics processing unit-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs). To obtain training data for this NN framework, we investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new data curation strategy that enables human users to rapidly sample geometries along reaction pathways. Focusing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely, molecular dynamics (MD) constrained to sample a predefined grid of points along the hydrogen abstraction reaction coordinate. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitative analysis shows that NN learning is sensitive to the data set used for training. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable excellent sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies that are close to the MEP but less well predicting energies for "off-path" structures. The NN trained on the constrained MD data does better predicting high-energy off-path structures, given that it included a number of such structures in its training set.

10.

Exploration of Reaction Pathways and Chemical Transformation Networks.

Simm, Gregor N; Vaucher, Alain C; Reiher, Markus.

J Phys Chem A ; 123(2): 385-399, 2019 Jan 17.

Artigo em Inglês | MEDLINE | ID: mdl-30421924

RESUMO

For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and in an automated fashion. These approaches differ in their application range, the level of completeness of the exploration, and the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.

11.

Minimum Energy Paths and Transition States by Curve Optimization.

Vaucher, Alain C; Reiher, Markus.

J Chem Theory Comput ; 14(6): 3091-3099, 2018 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-29648812

RESUMO

Transition states and minimum energy paths are essential to understand and predict chemical reactivity. Double-ended methods represent a standard approach for their determination. We introduce a new double-ended method that optimizes reaction paths described by curves. Unlike other methods, our approach optimizes the curve parameters rather than distinct structures along the path. With molecular paths represented as continuous curves, the optimization can benefit from the advantages of an integral-based formulation. We call this approach ReaDuct and demonstrate its applicability for molecular paths parametrized by B-spline curves.

12.

Integrated Reaction Path Processing from Sampled Structure Sequences.

Heuer, Michael A; Vaucher, Alain C; Haag, Moritz P; Reiher, Markus.

J Chem Theory Comput ; 14(4): 2052-2062, 2018 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-29518323

RESUMO

Sampled structure sequences obtained, for instance, from real-time reactivity explorations or first-principles molecular dynamics simulations contain valuable information about chemical reactivity. Eventually, such sequences allow for the construction of reaction networks that are required for the kinetic analysis of chemical systems. For this purpose, however, the sampled information must be processed to obtain stable chemical structures and associated transition states. The manual extraction of valuable information from such reaction paths is straightforward but unfeasible for large and complex reaction networks. For real-time quantum chemistry, this implies automatization of the extraction and relaxation process while maintaining immersion in the virtual chemical environment. Here, we describe an efficient path processing scheme for the on-the-fly construction of an exploration network by approximating the explored paths as continuous basis-spline curves.

13.

Steering Orbital Optimization out of Local Minima and Saddle Points Toward Lower Energy.

Vaucher, Alain C; Reiher, Markus.

J Chem Theory Comput ; 13(3): 1219-1228, 2017 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-28207264

RESUMO

The general procedure underlying Hartree-Fock and Kohn-Sham density functional theory calculations consists in optimizing orbitals for a self-consistent solution of the Roothaan-Hall equations in an iterative process. It is often ignored that multiple self-consistent solutions can exist, several of which may correspond to minima of the energy functional. In addition to the difficulty sometimes encountered to converge the calculation to a self-consistent solution, one must ensure that the correct self-consistent solution was found, typically the one with the lowest electronic energy. Convergence to an unwanted solution is in general not trivial to detect and will deliver incorrect energy and molecular properties and accordingly a misleading description of chemical reactivity. Wrong conclusions based on incorrect self-consistent field convergence are particularly cumbersome in automated calculations met in high-throughput virtual screening, structure optimizations, ab initio molecular dynamics, and in real-time explorations of chemical reactivity, where the vast amount of data can hardly be manually inspected. Here, we introduce a fast and automated approach to detect and cure incorrect orbital convergence, which is especially suited for electronic structure calculations on sequences of molecular structures. Our approach consists of a randomized perturbation of the converged electron density (matrix) intended to push orbital convergence to solutions that correspond to another stationary point (of potentially lower electronic energy) in the variational parameter space of an electronic wave function approximation.

14.

Molecular Propensity as a Driver for Explorative Reactivity Studies.

Vaucher, Alain C; Reiher, Markus.

J Chem Inf Model ; 56(8): 1470-8, 2016 08 22.

Artigo em Inglês | MEDLINE | ID: mdl-27447367

RESUMO

Quantum chemical studies of reactivity involve calculations on a large number of molecular structures and the comparison of their energies. Already the setup of these calculations limits the scope of the results that one will obtain, because several system-specific variables such as the charge and spin need to be set prior to the calculation. For a reliable exploration of reaction mechanisms, a considerable number of calculations with varying global parameters must be taken into account, or important facts about the reactivity of the system under consideration can remain undetected. For example, one could miss crossings of potential energy surfaces for different spin states or might not note that a molecule is prone to oxidation. Here, we introduce the concept of molecular propensity to account for the predisposition of a molecular system to react across different electronic states in certain nuclear configurations or with other reactants present in the reaction liquor. Within our real-time quantum chemistry framework, we developed an algorithm that automatically detects and flags such a propensity of a system under consideration.

Assuntos

Modelos Moleculares , Teoria Quântica , Reação de Cicloadição , Compostos de Epóxi/química , Compostos Férricos/química , Hidrogênio/química , Conformação Molecular , Oxirredução , Processos Fotoquímicos , Prótons , Termodinâmica

15.

Accelerating Wave Function Convergence in Interactive Quantum Chemical Reactivity Studies.

Mühlbach, Adrian H; Vaucher, Alain C; Reiher, Markus.

J Chem Theory Comput ; 12(3): 1228-35, 2016 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-26788887

RESUMO

The inherently high computational cost of iterative self-consistent field (SCF) methods proves to be a critical issue delaying visual and haptic feedback in real-time quantum chemistry. In this work, we introduce two schemes for SCF acceleration. They provide a guess for the initial density matrix of the SCF procedure generated by extrapolation techniques. SCF optimizations then converge in fewer iterations, which decreases the execution time of the SCF optimization procedure. To benchmark the proposed propagation schemes, we developed a test bed for performing quantum chemical calculations on sequences of molecular structures mimicking real-time quantum chemical explorations. Explorations of a set of six model reactions employing the semi-empirical methods PM6 and DFTB3 in this testing environment showed that the proposed propagation schemes achieved speedups of up to 30% as a consequence of a reduced number of SCF iterations.

16.

One Bronze Medal for Switzerland at the 48^th International Chemistry Olympiad in Tbilisi, Georgia.

Vaucher, Alain C.

Chimia (Aarau) ; 70(12): 911-912, 2016 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-28661372

RESUMO

Four Swiss high school students participated in the 48th International Chemistry Olympiad (IChO), which took place from July 23 to August 1 in Tbilisi, Georgia. Dominic Egger, Nicolà Gantenbein, Simone Heimgartner and Diego Zenhäusern competed against 260 other students from 71 countries. Dominic Egger brought home a well-deserved bronze medal.

17.

Real-time feedback from iterative electronic structure calculations.

Vaucher, Alain C; Haag, Moritz P; Reiher, Markus.

J Comput Chem ; 37(9): 805-12, 2016 Apr 05.

Artigo em Inglês | MEDLINE | ID: mdl-26678030

RESUMO

Real-time feedback from iterative electronic structure calculations requires to mediate between the inherently unpredictable execution times of the iterative algorithm used and the necessity to provide data in fixed and short time intervals for real-time rendering. We introduce the concept of a mediator as a component able to deal with infrequent and unpredictable reference data to generate reliable feedback. In the context of real-time quantum chemistry, the mediator takes the form of a surrogate potential that has the same local shape as the first-principles potential and can be evaluated efficiently to deliver atomic forces as real-time feedback. The surrogate potential is updated continuously by electronic structure calculations and guarantees to provide a reliable response to the operator for any molecular structure. To demonstrate the application of iterative electronic structure methods in real-time reactivity exploration, we implement self-consistent semiempirical methods as the data source and apply the surrogate-potential mediator to deliver reliable real-time feedback.

18.

Two Bronze Medals for Switzerland at the 46th International Chemistry Olympiad in Hanoi, Vietnam.

Ludwig, Peter E; Vaucher, Alain C; Lê, Thanh Phong; Suter, Yannick.

Chimia (Aarau) ; 69(1-2): 71-2, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26507094

19.

Two Bronze Medals for Switzerland at the 46^th International Chemistry Olympiad in Hanoi, Vietnam.

Ludwig, Peter E; Vaucher, Alain C; Lê, Thanh Phong; Suter, Yannick.

Chimia (Aarau) ; 69(1): 71-72, 2015 Feb 25.

Artigo em Inglês | MEDLINE | ID: mdl-28982473

20.

Interactive chemical reactivity exploration.

Haag, Moritz P; Vaucher, Alain C; Bosson, Maël; Redon, Stéphane; Reiher, Markus.

Chemphyschem ; 15(15): 3301-19, 2014 Oct 20.

Artigo em Inglês | MEDLINE | ID: mdl-25205397

RESUMO

Elucidating chemical reactivity in complex molecular assemblies of a few hundred atoms is, despite the remarkable progress in quantum chemistry, still a major challenge. Black-box search methods to find intermediates and transition-state structures might fail in such situations because of the high-dimensionality of the potential energy surface. Here, we propose the concept of interactive chemical reactivity exploration to effectively introduce the chemist's intuition into the search process. We employ a haptic pointer device with force feedback to allow the operator the direct manipulation of structures in three dimensions along with simultaneous perception of the quantum mechanical response upon structure modification as forces. We elaborate on the details of how such an interactive exploration should proceed and which technical difficulties need to be overcome. All reactivity-exploration concepts developed for this purpose have been implemented in the samson programming environment.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA