Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 247
Filtrar
1.
Chem Sci ; 15(22): 8380-8389, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38846388

RESUMO

Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.

2.
J Chem Phys ; 160(21)2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38828812

RESUMO

CNDOL is an a priori, approximate Fockian for molecular wave functions. In this study, we employ several modes of singly excited configuration interaction (CIS) to model molecular excitation properties by using four combinations of the one electron operator terms. Those options are compared to the experimental and theoretical data for a carefully selected set of molecules. The resulting excitons are represented by CIS wave functions that encompass all valence electrons in the system for each excited state energy. The Coulomb-exchange term associated to the calculated excitation energies is rationalized to evaluate theoretical exciton binding energies. This property is shown to be useful for discriminating the charge donation ability of molecular and supermolecular systems. Multielectronic 3D maps of exciton formal charges are showcased, demonstrating the applicability of these approximate wave functions for modeling properties of large molecules and clusters at nanoscales. This modeling proves useful in designing molecular photovoltaic devices. Our methodology holds potential applications in systematic evaluations of such systems and the development of fundamental artificial intelligence databases for predicting related properties.

3.
Adv Mater ; : e2402369, 2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38794859

RESUMO

Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.

4.
Science ; 384(6697): eadk9227, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38753786

RESUMO

Contemporary materials discovery requires intricate sequences of synthesis, formulation, and characterization that often span multiple locations with specialized expertise or instrumentation. To accelerate these workflows, we present a cloud-based strategy that enabled delocalized and asynchronous design-make-test-analyze cycles. We showcased this approach through the exploration of molecular gain materials for organic solid-state lasers as a frontier application in molecular optoelectronics. Distributed robotic synthesis and in-line property characterization, orchestrated by a cloud-based artificial intelligence experiment planner, resulted in the discovery of 21 new state-of-the-art materials. Gram-scale synthesis ultimately allowed for the verification of best-in-class stimulated emission in a thin-film device. Demonstrating the asynchronous integration of five laboratories across the globe, this workflow provides a blueprint for delocalizing-and democratizing-scientific discovery.

5.
J Comput Chem ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38709143

RESUMO

Quantum computers are expected to outperform classical computers for specific problems in quantum chemistry. Such calculations remain expensive, but costs can be lowered through the partition of the molecular system. In the present study, partition was achieved with range-separated density functional theory (RS-DFT). The use of RS-DFT reduces both the basis set size and the active space size dependence of the ground state energy in comparison with the use of wave function theory (WFT) alone. The utilization of pair natural orbitals (PNOs) in place of canonical molecular orbitals (MOs) results in more compact qubit Hamiltonians. To test this strategy, a basis-set independent framework, known as multiresolution analysis (MRA), was employed to generate PNOs. Tests were conducted with the variational quantum eigensolver for a number of molecules. The results show that the proposed approach reduces the number of qubits needed to reach a target energy accuracy.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38728616

RESUMO

Inverted singlet-triplet gap (INVEST) materials have promising photophysical properties for optoelectronic applications due to an inversion of their lowest singlet (S1) and triplet (T1) excited states. This results in an exothermic reverse intersystem crossing (rISC) process that potentially enhances triplet harvesting, compared to thermally activated delayed fluorescence (TADF) emitters with endothermic rISCs. However, the processes and phenomena that facilitate conversion between excited states for INVEST materials are underexplored. We investigate the complex potential energy surfaces (PESs) of the excited states of three heavily studied azaphenalene INVEST compounds, namely, cyclazine, pentazine, and heptazine using two state-of-the-art computational methodologies, namely, RMS-CASPT2 and SCS-ADC(2) methods. Our findings suggest that ISC and rISC processes take place directly between the S1 and T1 electronic states in all three compounds through a minimum-energy crossing point (MECP) with an activation energy barrier between 0.11 to 0.58 eV above the S1 state for ISC and between 0.06 and 0.36 eV above the T1 state for rISC. We predict that higher-lying triplet states are not populated, since the crossing point structures to these states are not energetically accessible. Furthermore, the conical intersection (CI) between the ground and S1 states is high in energy for all compounds (between 0.4 to 2.0 eV) which makes nonradiative decay back to the ground state a relatively slow process. We demonstrate that the spin-orbit coupling (SOC) driving the S1-T1 conversion is enhanced by vibronic coupling with higher-lying singlet and triplet states possessing vibrational modes of proper symmetry. We also rationalize that the experimentally observed anti-Kasha emission of cyclazine is due to the energetically inaccessible CI between the bright S2 and the dark S1 states, hindering internal conversion. Finally, we show that SCS-ADC(2) is able to qualitatively reproduce excited state features, but consistently overpredict relative energies of excited state structural minima compared to RMS-CASPT2. The identification of these excited state features elaborates design rules for new INVEST emitters with improved emission quantum yields.

7.
J Am Chem Soc ; 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38598363

RESUMO

Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.

8.
J Phys Chem A ; 128(12): 2445-2456, 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38485448

RESUMO

Molecules with an inverted energy gap between their first singlet and triplet excited states have promising applications in the next generation of organic light-emitting diode (OLED) materials. Unfortunately, such molecules are rare, and only a handful of examples are currently known. High-throughput virtual screening could assist in finding novel classes of these molecules, but current efforts are hampered by the high computational cost of the required quantum chemical methods. We present a method based on the semiempirical Pariser-Parr-Pople theory augmented by perturbation theory and show that it reproduces inverted gaps at a fraction of the cost of currently employed excited-state calculations. Our study paves the way for ultrahigh-throughput virtual screening and inverse design to accelerate the discovery and development of this new generation of OLED materials.

9.
J Chem Phys ; 160(12)2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38545949

RESUMO

Structure determination is necessary to identify unknown organic molecules, such as those in natural products, forensic samples, the interstellar medium, and laboratory syntheses. Rotational spectroscopy enables structure determination by providing accurate 3D information about small organic molecules via their moments of inertia. Using these moments, Kraitchman analysis determines isotopic substitution coordinates, which are the unsigned |x|, |y|, |z| coordinates of all atoms with natural isotopic abundance, including carbon, nitrogen, and oxygen. While unsigned substitution coordinates can verify guesses of structures, the missing +/- signs make it challenging to determine the actual structure from the substitution coordinates alone. To tackle this inverse problem, we develop Kreed (Kraitchman REflection-Equivariant Diffusion), a generative diffusion model that infers a molecule's complete 3D structure from only its molecular formula, moments of inertia, and unsigned substitution coordinates of heavy atoms. Kreed's top-1 predictions identify the correct 3D structure with near-perfect accuracy on large simulated datasets when provided with substitution coordinates of all heavy atoms with natural isotopic abundance. Accuracy decreases as fewer substitution coordinates are provided, but is retained for smaller molecules. On a test set of experimentally measured substitution coordinates gathered from the literature, Kreed predicts the correct all-atom 3D structure in 25 of 33 cases, demonstrating experimental potential for de novo 3D structure determination with rotational spectroscopy.

10.
Chem Sci ; 15(12): 4489-4503, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38516092

RESUMO

Density functional theory (DFT) is the workhorse of computational quantum chemistry. One of its main limitations is that choosing the right functional is a non-trivial task left for human experts. The choice is particularly hard for excited state calculations when using its time-dependent formulation (TD-DFT). This is due to the approximations of the method, but also because the photophysical properties of a molecule are defined by a manifold of states that all need to be properly described. This includes not only the relative energy of the states, but also capturing the correct character, order, and intensity of the transitions. In this work, we developed a neural network to recommend functionals to be used on molecules for TD-DFT calculations, by simultaneously considering all these properties for a manifold of states. This was possible by developing a scoring system to define the accuracy of an excited state's calculation against a higher-accuracy reference. The scoring system is generalizable to any level of theory; we here applied it to evaluate the performance of common functionals of different rungs against a higher accuracy method on a large set of organic molecules. The results are collected in a database that we released and made open, providing four million data points to the community for future applications. The scoring system assigns a value between zero and one hundred to each functional for each molecule, transforming the complicated task of learning photophysical properties into a simpler regression task. We used the dataset to train a graph attention neural network to predict the scores for unseen molecules. We call this oracle DELFI (Data-driven EvaLuation of Functionals by Inference), which can be used to quickly screen and predict the ranking of functionals to calculate the optical properties of organic molecules. We validated DELFI in two in silico experiments: choosing a common functional for a series of spiropyran-merocyanine isomers and a unique functional to screen a large dataset of over 50 000 organic photovoltaic molecules, for which an extensive benchmark would be unfeasible. A corresponding web application allows DELFI to be easily run and the results to be analyzed, alleviating the hurdle of choosing the right functional for TD-DFT calculations.

11.
Nat Comput Sci ; 4(2): 89-91, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38388845
12.
Chem Sci ; 15(7): 2618-2639, 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38362419

RESUMO

The design of molecules requires multi-objective optimizations in high-dimensional chemical space with often conflicting target properties. To navigate this space, classical workflows rely on the domain knowledge and creativity of human experts, which can be the bottleneck in high-throughput approaches. Herein, we present an artificial molecular design workflow relying on a genetic algorithm and a deep neural network to find a new family of organic emitters with inverted singlet-triplet gaps and appreciable fluorescence rates. We combine high-throughput virtual screening and inverse design infused with domain knowledge and artificial intelligence to accelerate molecular generation significantly. This enabled us to explore more than 800 000 potential emitter molecules and find more than 10 000 candidates estimated to have inverted singlet-triplet gaps (INVEST) and appreciable fluorescence rates, many of which likely emit blue light. This class of molecules has the potential to realize a new generation of organic light-emitting diodes.

13.
Cell Chem Biol ; 31(4): 760-775.e17, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38402621

RESUMO

Candida species are among the most prevalent causes of systemic fungal infections, which account for ∼1.5 million annual fatalities. Here, we build on a compound screen that identified the molecule N-pyrimidinyl-ß-thiophenylacrylamide (NP-BTA), which strongly inhibits Candida albicans growth. NP-BTA was hypothesized to target C. albicans glutaminyl-tRNA synthetase, Gln4. Here, we confirmed through in vitro amino-acylation assays NP-BTA is a potent inhibitor of Gln4, and we defined how NP-BTA arrests Gln4's transferase activity using co-crystallography. This analysis also uncovered Met496 as a critical residue for the compound's species-selective target engagement and potency. Structure-activity relationship (SAR) studies demonstrated the NP-BTA scaffold is subject to oxidative and non-oxidative metabolism, making it unsuitable for systemic administration. In a mouse dermatomycosis model, however, topical application of the compound provided significant therapeutic benefit. This work expands the repertoire of antifungal protein synthesis target mechanisms and provides a path to develop Gln4 inhibitors.


Assuntos
Aminoacil-tRNA Sintetases , Antifúngicos , Animais , Camundongos , Antifúngicos/farmacologia , Aminoacil-tRNA Sintetases/genética , Candida albicans , Relação Estrutura-Atividade
14.
Digit Discov ; 3(1): 23-33, 2024 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-38239898

RESUMO

In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.

15.
bioRxiv ; 2024 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-37873443

RESUMO

The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has led to significant global morbidity and mortality. A crucial viral protein, the non-structural protein 14 (nsp14), catalyzes the methylation of viral RNA and plays a critical role in viral genome replication and transcription. Due to the low mutation rate in the nsp region among various SARS-CoV-2 variants, nsp14 has emerged as a promising therapeutic target. However, discovering potential inhibitors remains a challenge. In this work, we introduce a computational pipeline for the rapid and efficient identification of potential nsp14 inhibitors by leveraging virtual screening and the NCI open compound collection, which contains 250,000 freely available molecules for researchers worldwide. The introduced pipeline provides a cost-effective and efficient approach for early-stage drug discovery by allowing researchers to evaluate promising molecules without incurring synthesis expenses. Our pipeline successfully identified seven promising candidates after experimentally validating only 40 compounds. Notably, we discovered NSC620333, a compound that exhibits a strong binding affinity to nsp14 with a dissociation constant of 427 ± 84 nM. In addition, we gained new insights into the structure and function of this protein through molecular dynamics simulations. We identified new conformational states of the protein and determined that residues Phe367, Tyr368, and Gln354 within the binding pocket serve as stabilizing residues for novel ligand interactions. We also found that metal coordination complexes are crucial for the overall function of the binding pocket. Lastly, we present the solved crystal structure of the nsp14-MTase complexed with SS148 (PDB:8BWU), a potent inhibitor of methyltransferase activity at the nanomolar level (IC50 value of 70 ± 6 nM). Our computational pipeline accurately predicted the binding pose of SS148, demonstrating its effectiveness and potential in accelerating drug discovery efforts against SARS-CoV-2 and other emerging viruses.

16.
Drug Deliv Transl Res ; 14(7): 1872-1887, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38158474

RESUMO

Due to its cost-effectiveness, convenience, and high patient adherence, oral drug administration normally remains the preferred approach. Yet, the effective delivery of hydrophobic drugs via the oral route is often hindered by their limited water solubility and first-pass metabolism. To mitigate these challenges, advanced delivery systems such as solid lipid nanoparticles (SLNs) and nanostructured lipid carriers (NLCs) have been developed to encapsulate hydrophobic drugs and enhance their bioavailability. However, traditional design methodologies for these complex formulations often present intricate challenges because they are restricted to a relatively narrow design space. Here, we present a data-driven approach for the accelerated design of SLNs/NLCs encapsulating a model hydrophobic drug, cannabidiol, that combines experimental automation and machine learning. A small subset of formulations, comprising 10% of all formulations in the design space, was prepared in-house, leveraging miniaturized experimental automation to improve throughput and decrease the quantity of drug and materials required. Machine learning models were then trained on the data generated from these formulations and used to predict properties of all SLNs/NLCs within this design space (i.e., 1215 formulations). Notably, formulations predicted to be high-performers via this approach were confirmed to significantly enhance the solubility of the drug by up to 3000-fold and prevented degradation of drug. Moreover, the high-performance formulations significantly enhanced the oral bioavailability of the drug compared to both its free form and an over-the-counter version. Furthermore, this bioavailability matched that of a formulation equivalent in composition to the FDA-approved product, Epidiolex®.


Assuntos
Canabidiol , Interações Hidrofóbicas e Hidrofílicas , Lipídeos , Nanopartículas , Nanopartículas/química , Nanopartículas/administração & dosagem , Administração Oral , Lipídeos/química , Lipídeos/administração & dosagem , Canabidiol/química , Canabidiol/administração & dosagem , Canabidiol/farmacocinética , Aprendizado de Máquina , Portadores de Fármacos/química , Solubilidade , Disponibilidade Biológica , Composição de Medicamentos
17.
J Am Chem Soc ; 145(49): 26623-26631, 2023 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-38039391

RESUMO

A palladium-catalyzed domino C-N coupling/Cacchi reaction is reported. Design of photoluminescent bis-heterocycles, aided by density functional theory calculations, was performed with synthetic yields up to 98%. The photophysical properties of the products accessed via this strategy were part of a comprehensive study that led to broad emission spectra and quantum yields of up to 0.59. Mechanistic experiments confirmed bromoalkynes as competent intermediates, and a density functional theory investigation suggests a pathway involving initial oxidative addition into the cis C-Br bond of the gem-dihaloolefin.

18.
Digit Discov ; 2(4): 897-908, 2023 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-38013816

RESUMO

String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies).

19.
Adv Drug Deliv Rev ; 202: 115108, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37774977

RESUMO

Over the past few years, the adoption of machine learning (ML) techniques has rapidly expanded across many fields of research including formulation science. At the same time, the use of lipid nanoparticles to enable the successful delivery of mRNA vaccines in the recent COVID-19 pandemic demonstrated the impact of formulation science. Yet, the design of advanced pharmaceutical formulations is non-trivial and primarily relies on costly and time-consuming wet-lab experimentation. In 2021, our group published a review article focused on the use of ML as a means to accelerate drug formulation development. Since then, the field has witnessed significant growth and progress, reflected by an increasing number of studies published in this area. This updated review summarizes the current state of ML directed drug formulation development, introduces advanced ML techniques that have been implemented in formulation design and shares the progress on making self-driving laboratories a reality. Furthermore, this review highlights several future applications of ML yet to be fully exploited to advance drug formulation research and development.


Assuntos
Aprendizado de Máquina , Pandemias , Humanos , Composição de Medicamentos
20.
ACS Cent Sci ; 9(7): 1453-1465, 2023 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-37521801

RESUMO

Chemical and molecular-based computers may be promising alternatives to modern silicon-based computers. In particular, hybrid systems, where tasks are split between a chemical medium and traditional silicon components, may provide access and demonstration of chemical advantages such as scalability, low power dissipation, and genuine randomness. This work describes the development of a hybrid classical-molecular computer (HCMC) featuring an electrochemical reaction on top of an array of discrete electrodes with a fluorescent readout. The chemical medium, optical readout, and electrode interface combined with a classical computer generate a feedback loop to solve several canonical optimization problems in computer science such as number partitioning and prime factorization. Importantly, the HCMC makes constructive use of experimental noise in the optical readout, a milestone for molecular systems, to solve these optimization problems, as opposed to in silico random number generation. Specifically, we show calculations stranded in local minima can consistently converge on a global minimum in the presence of experimental noise. Scalability of the hybrid computer is demonstrated by expanding the number of variables from 4 to 7, increasing the number of possible solutions by 1 order of magnitude. This work provides a stepping stone to fully molecular approaches to solving complex computational problems using chemistry.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA