Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
2.
Microsc Microanal ; 29(2): 552-562, 2023 Apr 05.
Article in English | MEDLINE | ID: mdl-37749717

ABSTRACT

The information content of atomic-resolution scanning transmission electron microscopy (STEM) images can often be reduced to a handful of parameters describing each atomic column, chief among which is the column position. Neural networks (NNs) are high performance, computationally efficient methods to automatically locate atomic columns in images, which has led to a profusion of NN models and associated training datasets. We have developed a benchmark dataset of simulated and experimental STEM images and used it to evaluate the performance of two sets of recent NN models for atom location in STEM images. Both models exhibit high performance for images of varying quality from several different crystal lattices. However, there are important differences in performance as a function of image quality, and both models perform poorly for images outside the training data, such as interfaces with large difference in background intensity. Both the benchmark dataset and the models are available using the Foundry service for dissemination, discovery, and reuse of machine learning models.

4.
Sci Data ; 10(1): 356, 2023 06 05.
Article in English | MEDLINE | ID: mdl-37277408

ABSTRACT

The availability of materials data for impact-mitigating materials has lagged behind applications-based data. For example, data describing on-field helmeted impacts are available, whereas material behaviors for the constituent impact-mitigating materials used in helmet designs lack open datasets. Here, we describe a new FAIR (findable, accessible, interoperable, reusable) data framework with structural and mechanical response data for one example elastic impact protection foam. The continuum-scale behavior of foams emerges from the interplay of polymer properties, internal gas, and geometric structure. This behavior is rate and temperature sensitive, therefore, describing structure-property characteristics requires data collected across several types of instruments. Data included are from structure imaging via micro-computed tomography, finite deformation mechanical measurements from universal test systems with full-field displacement and strain, and visco-thermo-elastic properties from dynamic mechanical analysis. These data facilitate modeling and design efforts in foam mechanics, e.g., homogenization, direct numerical simulation, or phenomenological fitting. The data framework is implemented using data services and software from the Materials Data Facility of the Center for Hierarchical Materials Design.

5.
Sci Rep ; 13(1): 2105, 2023 02 06.
Article in English | MEDLINE | ID: mdl-36747041

ABSTRACT

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/metabolism , Artificial Intelligence , Molecular Docking Simulation , Ligands , Proteins/metabolism
6.
Sci Data ; 9(1): 657, 2022 Nov 10.
Article in English | MEDLINE | ID: mdl-36357431

ABSTRACT

A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale® system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery.

7.
Patterns (N Y) ; 3(10): 100606, 2022 Oct 14.
Article in English | MEDLINE | ID: mdl-36277824

ABSTRACT

Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines-what we call flows-that link instruments, computers (e.g., for analysis, simulation, artificial intelligence [AI] model training), edge computing (e.g., for analysis), data stores, metadata catalogs, and high-speed networks. We review common patterns associated with such flows and describe methods for instantiating these patterns. We present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages powerful computers for data inversion,model training, or other purposes. We also discuss implications of such methods for operators and users of scientific facilities.

8.
Front Chem ; 10: 962161, 2022.
Article in English | MEDLINE | ID: mdl-36186597

ABSTRACT

Bioinspired photocatalysis has resulted in efficient solutions for many areas of science and technology spanning from solar cells to medicine. Here we show a new bioinspired semiconductor nanocomposite (nanoTiO2-DOPA-luciferase, TiDoL) capable of converting light energy within cancerous tissues into chemical species that are highly disruptive to cell metabolism and lead to cell death. This localized activity of semiconductor nanocomposites is triggered by cancer-generated activators. Adenosine triphosphate (ATP) is produced in excess in cancer tissues only and activates nearby immobilized TiDoL composites, thereby eliminating its off-target toxicity. The interaction of TiDoL with cancerous cells was probed in situ and in real-time to establish a detailed mechanism of nanoparticle activation, triggering of the apoptotic signaling cascade, and finally, cancer cell death. Activation of TiDoL with non-cancerous cells did not result in cell toxicity. Exploring the activation of antibody-targeted semiconductor conjugates using ATP is a step toward a universal approach to single-cell-targeted medical therapies with more precision, efficacy, and potentially fewer side effects.

9.
J Chem Inf Model ; 62(1): 116-128, 2022 01 10.
Article in English | MEDLINE | ID: mdl-34793155

ABSTRACT

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.


Subject(s)
COVID-19 , Protease Inhibitors , Antiviral Agents , Coronavirus 3C Proteases , Humans , Molecular Docking Simulation , Molecular Dynamics Simulation , Orotic Acid/analogs & derivatives , Piperazines , SARS-CoV-2
10.
J Chem Phys ; 155(15): 154702, 2021 Oct 21.
Article in English | MEDLINE | ID: mdl-34686040

ABSTRACT

Recent machine learning models for bandgap prediction that explicitly encode the structure information to the model feature set significantly improve the model accuracy compared to both traditional machine learning and non-graph-based deep learning methods. The ongoing rapid growth of open-access bandgap databases can benefit such model construction not only by expanding their domain of applicability but also by requiring constant updating of the model. Here, we build a new state-of-the-art multi-fidelity graph network model for bandgap prediction of crystalline compounds from a large bandgap database of experimental and density functional theory (DFT) computed bandgaps with over 806 600 entries (1500 experimental, 775 700 low-fidelity DFT, and 29 400 high-fidelity DFT). The model predicts bandgaps with a 0.23 eV mean absolute error in cross validation for high-fidelity data, and including the mixed data from all different fidelities improves the prediction of the high-fidelity data. The prediction error is smaller for high-symmetry crystals than for low symmetry crystals. Our data are published through a new cloud-based computing environment, called the "Foundry," which supports easy creation and revision of standardized data structures and will enable cloud accessible containerized models, allowing for continuous model development and data accumulation in the future.

11.
Front Mol Biosci ; 8: 636077, 2021.
Article in English | MEDLINE | ID: mdl-34527701

ABSTRACT

Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of viral research. However, this literature is too large for human review and features unusual vocabularies for which existing named entity recognition (NER) models are ineffective. We report here on a project that leverages both human and artificial intelligence to detect references to such molecules in free text. We present 1) a iterative model-in-the-loop method that makes judicious use of scarce human expertise in generating training data for a NER model, and 2) the application and evaluation of this method to the problem of identifying drug-like molecules in the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198,875 papers. We show that by repeatedly presenting human labelers only with samples for which an evolving NER model is uncertain, our human-machine hybrid pipeline requires only modest amounts of non-expert human labeling time (tens of hours to label 1778 samples) to generate an NER model with an F-1 score of 80.5%-on par with that of non-expert humans-and when applied to CORD'19, identifies 10,912 putative drug-like molecules. This enriched the computational screening team's targets by 3,591 molecules, of which 18 ranked in the top 0.1% of all 6.6 million molecules screened for docking against the 3CLPro protein.

12.
J Phys Chem A ; 125(27): 5990-5998, 2021 Jul 15.
Article in English | MEDLINE | ID: mdl-34191512

ABSTRACT

The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spaces to discover improved molecules for many applications.

13.
J Phys Chem Lett ; 10(21): 6835-6841, 2019 Nov 07.
Article in English | MEDLINE | ID: mdl-31642678

ABSTRACT

This letter announces the Virtual Excited State Reference for the Discovery of Electronic Materials Database (VERDE materials DB), the first database to include downloadable excited-state structures (S0, S1, T1) and photophysical properties. VERDE materials DB is searchable, open-access via www.verdedb.org , and focused on light-responsive π-conjugated organic molecules with applications in green chemistry, organic solar cells, and organic redox flow batteries. It includes results of our active and past virtual screening studies; to date, more than 13 000 density functional theory (DFT) calculations have been performed on 1 500 molecules to obtain frontier molecular orbitals and photophysical properties, including excitation energies, dipole moments, and redox potentials. To improve community access, we have made VERDE materials DB available via an integration with the Materials Data Facility. We are leveraging VERDE materials DB to train machine learning algorithms to identify new materials and structure-property relationships between molecular ground- and excited-states. We present a case-study involving photoaffinity labels, including predictions of new diazirine-based photoaffinity labels anticipated to have high photostabilities.

SELECTION OF CITATIONS
SEARCH DETAIL
...