Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 573(7773): 251-255, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31511682

RESUMO

Most chemical experiments are planned by human scientists and therefore are subject to a variety of human cognitive biases1, heuristics2 and social influences3. These anthropogenic chemical reaction data are widely used to train machine-learning models4 that are used to predict organic5 and inorganic6,7 syntheses. However, it is known that societal biases are encoded in datasets and are perpetuated in machine-learning models8. Here we identify as-yet-unacknowledged anthropogenic biases in both the reagent choices and reaction conditions of chemical reaction datasets using a combination of data mining and experiments. We find that the amine choices in the reported crystal structures of hydrothermal synthesis of amine-templated metal oxides9 follow a power-law distribution in which 17% of amine reactants occur in 79% of reported compounds, consistent with distributions in social influence models10-12. An analysis of unpublished historical laboratory notebook records shows similarly biased distributions of reaction condition choices. By performing 548 randomly generated experiments, we demonstrate that the popularity of reactants or the choices of reaction conditions are uncorrelated to the success of the reaction. We show that randomly generated experiments better illustrate the range of parameter choices that are compatible with crystal formation. Machine-learning models that we train on a smaller randomized reaction dataset outperform models trained on larger human-selected reaction datasets, demonstrating the importance of identifying and addressing anthropogenic biases in scientific data.


Assuntos
Viés , Técnicas de Química Sintética/estatística & dados numéricos , Pessoal de Laboratório/estatística & dados numéricos , Aprendizado de Máquina , Humanos , Pessoal de Laboratório/psicologia
2.
J Am Chem Soc ; 145(40): 21699-21716, 2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37754929

RESUMO

Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.

3.
Nature ; 533(7601): 73-6, 2016 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-27147027

RESUMO

Inorganic-organic hybrid materials such as organically templated metal oxides, metal-organic frameworks (MOFs) and organohalide perovskites have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility, photovoltaic properties, gas adsorption capacity or lithium-ion intercalation) to identify promising target candidates for synthetic efforts; determination of the structure-property relationship from large bodies of experimental data, enabled by integration with high-throughput synthesis and measurement tools; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification or gas adsorption properties). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions--failed or unsuccessful hydrothermal syntheses--collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.

4.
J Chem Phys ; 156(6): 064108, 2022 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-35168359

RESUMO

Autonomous experimentation systems use algorithms and data from prior experiments to select and perform new experiments in order to meet a specified objective. In most experimental chemistry situations, there is a limited set of prior historical data available, and acquiring new data may be expensive and time consuming, which places constraints on machine learning methods. Active learning methods prioritize new experiment selection by using machine learning model uncertainty and predicted outcomes. Meta-learning methods attempt to construct models that can learn quickly with a limited set of data for a new task. In this paper, we applied the model-agnostic meta-learning (MAML) model and the Probabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning (PLATIPUS) approach, which extends MAML to active learning, to the problem of halide perovskite growth by inverse temperature crystallization. Using a dataset of 1870 reactions conducted using 19 different organoammonium lead iodide systems, we determined the optimal strategies for incorporating historical data into active and meta-learning models to predict reaction compositions that result in crystals. We then evaluated the best three algorithms (PLATIPUS and active-learning k-nearest neighbor and decision tree algorithms) with four new chemical systems in experimental laboratory tests. With a fixed budget of 20 experiments, PLATIPUS makes superior predictions of reaction outcomes compared to other active-learning algorithms and a random baseline.

5.
J Chem Inf Model ; 61(4): 1593-1602, 2021 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-33797887

RESUMO

Combinatorial fusion analysis (CFA) is an approach for combining multiple scoring systems using the rank-score characteristic function and cognitive diversity measure. One example is to combine diverse machine learning models to achieve better prediction quality. In this work, we apply CFA to the synthesis of metal halide perovskites containing organic ammonium cations via inverse temperature crystallization. Using a data set generated by high-throughput experimentation, four individual models (support vector machines, random forests, weighted logistic classifier, and gradient boosted trees) were developed. We characterize each of these scoring systems and explore 66 possible combinations of the models. When measured by the precision on predicting crystal formation, the majority of the combination models improves the individual model results. The best combination models outperform the best individual models by 3.9 percentage points in precision. In addition to improving prediction quality, we demonstrate how the fusion models can be used to identify mislabeled input data and address issues of data quality. In particular, we identify example cases where all single models and all fusion models do not give the correct prediction. Experimental replication of these syntheses reveals that these compositions are sensitive to modest temperature variations across the different locations of the heating element that can hinder or enhance the crystallization process. In summary, we demonstrate that model fusion using CFA can not only identify a previously unconsidered influence on reaction outcome but also be used as a form of quality control for high-throughput experimentation.


Assuntos
Aprendizado de Máquina , Máquina de Vetores de Suporte , Compostos de Cálcio , Óxidos , Titânio
6.
Environ Sci Technol ; 55(19): 12741-12754, 2021 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-34403250

RESUMO

The rapid increase in both the quantity and complexity of data that are being generated daily in the field of environmental science and engineering (ESE) demands accompanied advancement in data analytics. Advanced data analysis approaches, such as machine learning (ML), have become indispensable tools for revealing hidden patterns or deducing correlations for which conventional analytical methods face limitations or challenges. However, ML concepts and practices have not been widely utilized by researchers in ESE. This feature explores the potential of ML to revolutionize data analysis and modeling in the ESE field, and covers the essential knowledge needed for such applications. First, we use five examples to illustrate how ML addresses complex ESE problems. We then summarize four major types of applications of ML in ESE: making predictions; extracting feature importance; detecting anomalies; and discovering new materials or chemicals. Next, we introduce the essential knowledge required and current shortcomings in ML applications in ESE, with a focus on three important but often overlooked components when applying ML: correct model development, proper model interpretation, and sound applicability analysis. Finally, we discuss challenges and future opportunities in the application of ML tools in ESE to highlight the potential of ML in this field.


Assuntos
Ciência Ambiental , Aprendizado de Máquina
7.
J Chem Phys ; 154(18): 184708, 2021 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-34241022

RESUMO

Amine-templated metal oxides are a class of hybrid organic-inorganic compounds with great structural diversity; by varying the compositions, 0D, 1D, 2D, and 3D inorganic dimensionalities can be achieved. In this work, we created a dataset of 3725 amine-templated metal oxides (including some metalloid oxides), their composition, amine identity, and dimensionality, extracted from the Cambridge Structure Database (CSD), which spans 71 elements, 25 main group building units, and 349 amines. We characterize the diversity of this dataset over reactants and in time. Artificial neural network models trained on this dataset can predict the most and least probable outcome dimensionalities with 71% and 95% accuracies, respectively, using only information about reactant identities, without stoichiometric information. Surprisingly, the amine identity plays only a minor role in most cases, as omitting this information only reduces the accuracy by <2%. The generality of this model is demonstrated on a time held-out test set of 36 amine-templated lanthanide oxalates, vanadium tellurites, vanadium selenites, vanadates, molybdates, and molybdenum sulfates, whose syntheses and structural characterizations are reported here for the first time, and which contain two new element combinations and four amines that are not present in the CSD.

8.
J Am Chem Soc ; 142(16): 7555-7566, 2020 04 22.
Artigo em Inglês | MEDLINE | ID: mdl-32233475

RESUMO

Racemates have recently received attention as nonlinear optical and piezoelectric materials. Here, a machine-learning-assisted composition space approach was applied to synthesize the missing M = Ti, Zr members of the Δ,Λ-[Cu(bpy)2(H2O)]2[MF6]2·3H2O (M = Ti, Zr, Hf; bpy = 2,2'-bipyridine) family (space group: Pna21). In each (CuO, MO2)/bpy/HF(aq) (M = Ti, Zr, Hf) system, the polar noncentrosymmetric racemate (M-NCS) forms in competition with a centrosymmetric one-dimensional chain compound (M-CS) based on alternating Cu(bpy)(H2O)22+ and MF62- basic building units (space groups: Ti-CS (Pnma), Zr-CS (P1̅), Hf-CS (P2/n)). Machine learning models were trained on reaction parameters to gain unbiased insight into the underlying statistical trends in each composition space. A human-interpretable decision tree shows that phase selection is driven primarily by the bpy:CuO molar ratio for reactions containing Zr or Hf, and predicts that formation of the Ti-NCS compound requires that the amount of HF present be decreased to raise the pH, which we verified experimentally. Predictive leave-one-metal-out (LOO) models further confirm that behavior in the Ti system is distinct from that of the Zr and Hf systems. The chemical origin of this distinction was probed via fluorine K-edge X-ray absorption spectroscopy. Pre-edge features in the F1s X-ray absorption spectra reveal the strong ligand-to-metal π bonding between Ti(3d - t2g) and F(2p) states that distinguishes the TiF62- anion from the ZrF62- and HfF62- anions.

9.
J Chem Inf Model ; 60(8): 3804-3811, 2020 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-32668151

RESUMO

Coulomb matrix eigenvalues (CMEs) are global 3D representations of molecular structure, which have been previously used to predict atomization energies, prioritize geometry searches, and interpret rotational spectra. The properties of the CME representation and its relationship to molecular structure are established using the Gershgorin circle theorem. Numerical bounds are studied using a data set of 309 000 conformational samples of all constitutional isomers of acyclic alkanes, CnH2n+2, from methane (n = 1) to undecane (n = 11), to establish the extent to which the CME preserves chemical intuitions about isomer and conformer similarity and its ability to distinguish constitutional isomers. Neither supervised nor unsupervised machine-learning algorithms can perfectly distinguish constitutional isomers as the molecular size increases, but the misclassification rate can be kept below 1%.


Assuntos
Algoritmos , Aprendizado de Máquina não Supervisionado , Isomerismo , Conformação Molecular , Estrutura Molecular
10.
Inorg Chem ; 54(2): 694-703, 2015 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-25569171

RESUMO

Structural differences in [V2Te2O10]n(2n-) chain metrics are directly ascribed to variations in noncovalent interactions in a series of organically templated vanadium tellurites, including [C6H17N3][V2Te2O10]·H2O, [C5H16N2][V2Te2O10], and [C4H14N2][V2Te2O10]. The noncovalent interaction (NCI) method was used to locate, quantify, and visualize intermolecular interactions in [C4H14N2][V2Te2O10] and [C5H16N2][V2Te2O10]. Variations in the van der Waals attractions between [1,4-diaminobutaneH2](2+) and [1,5-diaminopentaneH2](2+) result in divergent packing motifs for these cations, which causes a reorganization of N-H···O hydrogen bonding and variances in the [V2Te2O10]n(2n-) chain metrics. The application of the NCI method to this type of solid-state structure provides a direct method to elucidate the structural effects of weak noncovalent interactions.

11.
Inorg Chem ; 53(22): 12027-35, 2014 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-25365238

RESUMO

A series of organically templated vanadium selenites has been prepared under mild hydrothermal conditions. Single crystals of [C5H14N2][(VO)3(SeO3)2(HSeO3)4], [C5H14N2][VO(SeO3)2], [(R)-C5H14N2][(VO)3(SeO3)2(HSeO3)4], and [(S)-C5H14N2][(VO)3(SeO3)2(HSeO3)4] were grown from VOSO4, SeO2, and 2-methylpiperazine. Controlling the initial pH of the reaction mixture allows for one to select between the compounds found in the VOSO4/SeO2/2-methylpiperazine system, as the solution pH directly affects the relative ratio of the HSeO3(-) and SeO3(2-) concentrations. Moreover, partial resolution of racemic 2-methylpiperazine is observed in [C5H14N2][(VO)3(SeO3)2(HSeO3)4], which is understood through the use of a one-dimensional Ising model. The use of enantiomerically pure 2-methylpiperazine results in fully ordered and fully resolved structures.


Assuntos
Compostos Organometálicos/química , Compostos Organometálicos/síntese química , Piperidinas/química , Ácido Selenioso/química , Compostos de Vanádio/química , Ligação de Hidrogênio , Concentração de Íons de Hidrogênio , Modelos Moleculares , Estrutura Molecular , Espectroscopia de Infravermelho com Transformada de Fourier , Difração de Raios X
12.
J Phys Chem A ; 123(15): 3239-3240, 2019 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-30995844
13.
J Phys Chem A ; 118(33): 6457-65, 2014 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-24854987

RESUMO

Graphene is impermeable to gases, but introducing subnanometer pores can allow for selective gas separation. Because graphene is only one atom thick, tunneling can play an important role, especially for low-mass gases such as helium, and this has been proposed as a means of separating (3)He from (4)He. In this paper, we consider the possibility of utilizing resonant tunneling of helium isotopes through nanoporous graphene bilayers. Using a model potential fit to previously reported DFT potential energy surfaces, we calculate the thermal rate constant as a function of interlayer separation using a recently described time-independent method for arbitrary multibarrier potentials. Resonant transmission allows for the total flux rate of (3)He to remain the same as the best-known single-barrier pores but doubles the selectivity with respect to (4)He when the optimal interlayer spacing of 4.6 Å is used. The high flux rate and selectivity are robust against variations of the interlayer spacing and asymmetries in the potential that may occur in experiment.

14.
ACS Nano ; 18(22): 14514-14522, 2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38776469

RESUMO

Ligands play a critical role in the optical properties and chemical stability of colloidal nanocrystals (NCs), but identifying ligands that can enhance NC properties is daunting, given the high dimensionality of chemical space. Here, we use machine learning (ML) and robotic screening to accelerate the discovery of ligands that enhance the photoluminescence quantum yield (PLQY) of CsPbBr3 perovskite NCs. We developed a ML model designed to predict the relative PL enhancement of perovskite NCs when coordinated with a ligand selected from a pool of 29,904 candidate molecules. Ligand candidates were selected using an active learning (AL) approach that accounted for uncertainty quantified by twin regressors. After eight experimental iterations of batch AL (corresponding to 21 initial and 72 model-recommended ligands), the uncertainty of the model decreased, demonstrating an increased confidence in the model predictions. Feature importance and counterfactual analyses of model predictions illustrate the potential use of ligand field strength in designing PL-enhancing ligands. Our versatile AL framework can be readily adapted to screen the effect of ligands on a wide range of colloidal nanomaterials.

15.
Digit Discov ; 3(1): 23-33, 2024 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-38239898

RESUMO

In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.

16.
Acta Crystallogr C Struct Chem ; 79(Pt 1): 12-17, 2023 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-36602016

RESUMO

The title compound, [Al4(CH3)8(C2H7N)2H2], crystallizes as eight-membered rings with -(CH3)2Al-(CH3)2N-(CH3)2Al- moieties connected by single hydride bridges. In the X-ray structure, the ring has a chair conformation, with the hydride H atoms being close to the plane through the four Al atoms. An optimized structure was also calculated by all-electron density functional theory (DFT) methods, which agrees with the X-ray structure but gives a somewhat different geometry for the hydride bridge. Charges on the individual atoms were determined by valence shell occupancy refinements using MoPro and also by DFT calculations analyzed by several different methods. All methods agree in assigning a positive charge to the Al atoms, negative charges to the C, N, and hydride H atoms, and small positive charges to the methyl H atoms.

17.
J Phys Chem B ; 127(37): 7964-7973, 2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37682958

RESUMO

Aqueous, two-phase systems (ATPSs) may form upon mixing two solutions of independently water-soluble compounds. Many separation, purification, and extraction processes rely on ATPSs. Predicting the miscibility of solutions can accelerate and reduce the cost of the discovery of new ATPSs for these applications. Whereas previous machine learning approaches to ATPS prediction used physicochemical properties of each solute as a descriptor, in this work, we show how to impute missing miscibility outcomes directly from an incomplete collection of pairwise miscibility experiments. We use graph-regularized logistic matrix factorization (GR-LMF) to learn a latent vector of each solution from (i) the observed entries in the pairwise miscibility matrix and (ii) a graph where each node is a solution and edges are relationships indicating the general category of the solute (i.e., polymer, surfactant, salt, protein). For an experimental data set of the pairwise miscibility of 68 solutions from Peacock et al. [ACS Appl. Mater. Interfaces 2021, 13, 11449-11460], we find that GR-LMF more accurately predicts missing (im)miscibility outcomes of pairs of solutions than ordinary logistic matrix factorization and random forest classifiers that use physicochemical features of the solutes. GR-LMF obviates the need for features of the solutions and solutions to impute missing miscibility outcomes, but it cannot predict the miscibility of a new solution without some observations of its miscibility with other solutions in the training data set.

18.
Inorg Chem ; 51(20): 11040-8, 2012 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-23003324

RESUMO

A series of organically templated vanadium selenites have been prepared under mild hydrothermal conditions. Single crystals were grown from mixtures of VOSO(4), SeO(2), and either 1,4-dimethylpiperazine, 2,5-dimethylpiperazine, or 2-methylpiperazine in H(2)O. Each compound contains one-dimensional [VO(SeO(3))(HSeO(3))](n)(n-) secondary building units, which connect to form three-dimensional frameworks in the presence of 2,5-dimethylpiperazine or 2-methylpiperazine. Differences in composition and both intra-secondary building unit and organic-inorganic hydrogen-bonding between compounds dictate the dimensionality of the resulting inorganic structures. [1,4-dimethylpiperazineH(2)][VO(SeO(3))(HSeO(3))](2) contains one-dimensional [VO(SeO(3))(HSeO(3))](n)(n-) chains, while [2,5-dimethylpiperazineH(2)][VO(SeO(3))(HSeO(3))](2)·2H(2)O contains a three-dimensional [VO(SeO(3))(HSeO(3))](n)(n-) framework. The use of racemic 2-methylpiperazine also results in a compound containing a three-dimensional [VO(SeO(3))(HSeO(3))](n)(n-) framework, crystallizing in the noncentrosymmetric polar, achiral space group Pca2(1) (no. 29), while analogous reactions containing either (R)-2-methylpiperazine or (S)-2-methylpiperazine result in noncentrosymmetric, nonpolar chiral frameworks that crystallize in P2(1)2(1)2 (no. 18). The formation of these noncentrosymmetric framework materials is dictated by the structure, symmetry, and hydrogen-bonding properties of the [2-methylpiperazineH(2)](2+) cations.

19.
HardwareX ; 12: e00319, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35677813

RESUMO

The Sidekick is a desktop liquid dispenser, compatible with standard SBS microplates and designed for accessible laboratory automation. It features an armature-based motion system and a fully 3D-printed chassis to reduce overall mechanical complexity and accommodate user modification. Liquid dispensing is achieved with four commercially available solenoid driven positive displacement pumps that deliver liquid in 10 µL increments. A Raspberry Pi Pico RP2040 processor programmed in MicroPython is used for control, and exposes a USB serial interface for users to submit commands using either a simple vocabulary of commands or a subset of G-Code. At a total cost of $710 USD, the Sidekick offers laboratories an easy to build, easily maintained, open-source liquid dispensing system for both research and pedagogical introductions to lab automation.

20.
Nat Rev Chem ; 6(5): 357-370, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-37117931

RESUMO

The physical sciences community is increasingly taking advantage of the possibilities offered by modern data science to solve problems in experimental chemistry and potentially to change the way we design, conduct and understand results from experiments. Successfully exploiting these opportunities involves considerable challenges. In this Expert Recommendation, we focus on experimental co-design and its importance to experimental chemistry. We provide examples of how data science is changing the way we conduct experiments, and we outline opportunities for further integration of data science and experimental chemistry to advance these fields. Our recommendations include establishing stronger links between chemists and data scientists; developing chemistry-specific data science methods; integrating algorithms, software and hardware to 'co-design' chemistry experiments from inception; and combining diverse and disparate data sources into a data network for chemistry research.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA