Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nat Chem ; 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38862641

RESUMEN

Conjugated organic photoredox catalysts (OPCs) can promote a wide range of chemical transformations. It is challenging to predict the catalytic activities of OPCs from first principles, either by expert knowledge or by using a priori calculations, as catalyst activity depends on a complex range of interrelated properties. Organic photocatalysts and other catalyst systems have often been discovered by a mixture of design and trial and error. Here we report a two-step data-driven approach to the targeted synthesis of OPCs and the subsequent reaction optimization for metallophotocatalysis, demonstrated for decarboxylative sp3-sp2 cross-coupling of amino acids with aryl halides. Our approach uses a Bayesian optimization strategy coupled with encoding of key physical properties using molecular descriptors to identify promising OPCs from a virtual library of 560 candidate molecules. This led to OPC formulations that are competitive with iridium catalysts by exploring just 2.4% of the available catalyst formulation space (107 of 4,500 possible reaction conditions).

2.
Curr Opin Struct Biol ; 87: 102826, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38733863

RESUMEN

Biomolecular simulation can act as both a digital microscope and a crystal ball; offering the potential for a deeper understanding of experimental observations whilst also presenting a forward-looking avenue for the in silico design and evaluation of hitherto unsynthesized compounds. Indeed, as the intricacy of our scientific inquiries has grown, so too has the computational prowess we seek to deploy in our pursuit of answers. As we enter the Exascale era, this mini-review surveys the computational landscape from both the point of view of the development of new and ever more powerful systems, and the simulations that are run on them. Moreover, as we stand on the cusp of a transformative phase in computational biology, this article offers a contemplative glance into the future, speculating on the profound implications of artificial intelligence and quantum computing for large-scale biomolecular simulations.

3.
Angew Chem Int Ed Engl ; 62(3): e202214511, 2023 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-36346840

RESUMEN

The optimization of multistep chemical syntheses is critical for the rapid development of new pharmaceuticals. However, concatenating individually optimized reactions can lead to inefficient multistep syntheses, owing to chemical interdependencies between the steps. Herein, we develop an automated continuous flow platform for the simultaneous optimization of telescoped reactions. Our approach is applied to a Heck cyclization-deprotection reaction sequence, used in the synthesis of a precursor for 1-methyltetrahydroisoquinoline C5 functionalization. A simple method for multipoint sampling with a single online HPLC instrument was designed, enabling accurate quantification of each reaction, and an in-depth understanding of the reaction pathways. Notably, integration of Bayesian optimization techniques identified an 81 % overall yield in just 14 h, and revealed a favorable competing pathway for formation of the desired product.


Asunto(s)
Teorema de Bayes , Ciclización
4.
J Chem Inf Model ; 62(19): 4660-4671, 2022 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-36112568

RESUMEN

In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular property landscapes is one of their most studied geometric attributes, as it can characterize the presence of activity cliffs, with rougher landscapes generally expected to pose tougher optimization challenges. Here, we introduce a general, quantitative measure for describing the roughness of molecular property landscapes. The proposed roughness index (ROGI) is loosely inspired by the concept of fractal dimension and strongly correlates with the out-of-sample error achieved by machine learning models on numerous regression tasks.


Asunto(s)
Diseño de Fármacos , Aprendizaje Automático
5.
J Chem Inf Model ; 62(16): 3854-3862, 2022 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-35938299

RESUMEN

High-throughput virtual screening is an indispensable technique utilized in the discovery of small molecules. In cases where the library of molecules is exceedingly large, the cost of an exhaustive virtual screen may be prohibitive. Model-guided optimization has been employed to lower these costs through dramatic increases in sample efficiency compared to random selection. However, these techniques introduce new costs to the workflow through the surrogate model training and inference steps. In this study, we propose an extension to the framework of model-guided optimization that mitigates inference costs using a technique we refer to as design space pruning (DSP), which irreversibly removes poor-performing candidates from consideration. We study the application of DSP to a variety of optimization tasks and observe significant reductions in overhead costs while exhibiting similar performance to the baseline optimization. DSP represents an attractive extension of model-guided optimization that can limit overhead costs in optimization settings where these costs are non-negligible relative to objective costs, such as docking.


Asunto(s)
Ensayos Analíticos de Alto Rendimiento , Flujo de Trabajo
6.
PLoS One ; 17(2): e0263248, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35196350

RESUMEN

Inflammatory bowel diseases (IBDs), including ulcerative colitis and Crohn's disease, affect several million individuals worldwide. These diseases are heterogeneous at the clinical, immunological and genetic levels and result from complex host and environmental interactions. Investigating drug efficacy for IBD can improve our understanding of why treatment response can vary between patients. We propose an explainable machine learning (ML) approach that combines bioinformatics and domain insight, to integrate multi-modal data and predict inter-patient variation in drug response. Using explanation of our models, we interpret the ML models' predictions to infer unique combinations of important features associated with pharmacological responses obtained during preclinical testing of drug candidates in ex vivo patient-derived fresh tissues. Our inferred multi-modal features that are predictive of drug efficacy include multi-omic data (genomic and transcriptomic), demographic, medicinal and pharmacological data. Our aim is to understand variation in patient responses before a drug candidate moves forward to clinical trials. As a pharmacological measure of drug efficacy, we measured the reduction in the release of the inflammatory cytokine TNFα from the fresh IBD tissues in the presence/absence of test drugs. We initially explored the effects of a mitogen-activated protein kinase (MAPK) inhibitor; however, we later showed our approach can be applied to other targets, test drugs or mechanisms of interest. Our best model predicted TNFα levels from demographic, medicinal and genomic features with an error of only 4.98% on unseen patients. We incorporated transcriptomic data to validate insights from genomic features. Our results showed variations in drug effectiveness (measured by ex vivo assays) between patients that differed in gender, age or condition and linked new genetic polymorphisms to patient response variation to the anti-inflammatory treatment BIRB796 (Doramapimod). Our approach models IBD drug response while also identifying its most predictive features as part of a transparent ML precision medicine strategy.


Asunto(s)
Colitis Ulcerosa/genética , Colitis Ulcerosa/metabolismo , Enfermedad de Crohn/genética , Enfermedad de Crohn/metabolismo , Genómica/métodos , Aprendizaje Automático , Medicina de Precisión/métodos , Adolescente , Adulto , Anciano , Antiinflamatorios no Esteroideos/farmacología , Colitis Ulcerosa/patología , Enfermedad de Crohn/patología , Evaluación Preclínica de Medicamentos/métodos , Femenino , Humanos , Masculino , Mesalamina/farmacología , Persona de Mediana Edad , Naftalenos/farmacología , Compuestos de Fenilurea/farmacología , Prednisolona/farmacología , Pirazoles/farmacología , Transducción de Señal/efectos de los fármacos , Transcriptoma/genética , Factor de Necrosis Tumoral alfa/metabolismo , Adulto Joven
7.
Proc Natl Acad Sci U S A ; 118(32)2021 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-34353905

RESUMEN

The circadian clock is an important adaptation to life on Earth. Here, we use machine learning to predict complex, temporal, and circadian gene expression patterns in Arabidopsis Most significantly, we classify circadian genes using DNA sequence features generated de novo from public, genomic resources, facilitating downstream application of our methods with no experimental work or prior knowledge needed. We use local model explanation that is transcript specific to rank DNA sequence features, providing a detailed profile of the potential circadian regulatory mechanisms for each transcript. Furthermore, we can discriminate the temporal phase of transcript expression using the local, explanation-derived, and ranked DNA sequence features, revealing hidden subclasses within the circadian class. Model interpretation/explanation provides the backbone of our methodological advances, giving insight into biological processes and experimental design. Next, we use model interpretation to optimize sampling strategies when we predict circadian transcripts using reduced numbers of transcriptomic timepoints. Finally, we predict the circadian time from a single, transcriptomic timepoint, deriving marker transcripts that are most impactful for accurate prediction; this could facilitate the identification of altered clock function from existing datasets.


Asunto(s)
Proteínas de Arabidopsis/genética , Relojes Circadianos/genética , Ritmo Circadiano/fisiología , Aprendizaje Automático , Modelos Biológicos , Apoproteínas/genética , Arabidopsis/genética , Arabidopsis/fisiología , Relojes Circadianos/fisiología , Ritmo Circadiano/genética , Ecotipo , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Fitocromo/genética , Fitocromo A/genética , Secuencias Reguladoras de Ácidos Nucleicos
8.
Sci Adv ; 7(33)2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34389543

RESUMEN

While energy-structure-function (ESF) maps are a powerful new tool for in silico materials design, the cost of acquiring an ESF map for many properties is too high for routine integration into high-throughput virtual screening workflows. Here, we propose the next evolution of the ESF map. This uses parallel Bayesian optimization to selectively acquire energy and property data, generating the same levels of insight at a fraction of the computational cost. We use this approach to obtain a two orders of magnitude speedup on an ESF study that focused on the discovery of molecular crystals for methane capture, saving more than 500,000 central processing unit hours from the original protocol. By accelerating the acquisition of insight from ESF maps, we pave the way for the use of these maps in automated ultrahigh-throughput screening pipelines by greatly reducing the opportunity risk associated with the choice of system to calculate.

9.
Sci Rep ; 11(1): 4565, 2021 02 25.
Artículo en Inglés | MEDLINE | ID: mdl-33633172

RESUMEN

Alterations in the human microbiome have been observed in a variety of conditions such as asthma, gingivitis, dermatitis and cancer, and much remains to be learned about the links between the microbiome and human health. The fusion of artificial intelligence with rich microbiome datasets can offer an improved understanding of the microbiome's role in human health. To gain actionable insights it is essential to consider both the predictive power and the transparency of the models by providing explanations for the predictions. We combine the collection of leg skin microbiome samples from two healthy cohorts of women with the application of an explainable artificial intelligence (EAI) approach that provides accurate predictions of phenotypes with explanations. The explanations are expressed in terms of variations in the relative abundance of key microbes that drive the predictions. We predict skin hydration, subject's age, pre/post-menopausal status and smoking status from the leg skin microbiome. The changes in microbial composition linked to skin hydration can accelerate the development of personalized treatments for healthy skin, while those associated with age may offer insights into the skin aging process. The leg microbiome signatures associated with smoking and menopausal status are consistent with previous findings from oral/respiratory tract microbiomes and vaginal/gut microbiomes respectively. This suggests that easily accessible microbiome samples could be used to investigate health-related phenotypes, offering potential for non-invasive diagnosis and condition monitoring. Our EAI approach sets the stage for new work focused on understanding the complex relationships between microbial communities and phenotypes. Our approach can be applied to predict any condition from microbiome samples and has the potential to accelerate the development of microbiome-based personalized therapeutics and non-invasive diagnostics.


Asunto(s)
Inteligencia Artificial , Biodiversidad , Microbiota , Fenotipo , Piel/microbiología , Adulto , Anciano , Envejecimiento , Biología Computacional/métodos , Análisis de Datos , Aprendizaje Profundo , Femenino , Humanos , Masculino , Menopausia , Metagenoma , Metagenómica/métodos , Persona de Mediana Edad , Fumadores , Adulto Joven
10.
Phys Chem Chem Phys ; 22(23): 13041-13048, 2020 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-32478374

RESUMEN

Chemical representations derived from deep learning are emerging as a powerful tool in areas such as drug discovery and materials innovation. Currently, this methodology has three major limitations - the cost of representation generation, risk of inherited bias, and the requirement for large amounts of data. We propose the use of multi-task learning in tandem with transfer learning to address these limitations directly. In order to avoid introducing unknown bias into multi-task learning through the task selection itself, we calculate task similarity through pairwise task affinity, and use this measure to programmatically select tasks. We test this methodology on several real-world data sets to demonstrate its potential for execution in complex and low-data environments. Finally, we utilise the task similarity to further probe the expressiveness of the learned representation through a comparison to a commonly used cheminformatics fingerprint, and show that the deep representation is able to capture more expressive task-based information.


Asunto(s)
Aprendizaje Profundo , Bromo/química , Carbono/química , Cloro/química , Flúor/química , Hidrógeno/química , Yodo/química , Metales/química , Nitrógeno/química , Oxígeno/química , Fósforo/química , Azufre/química
11.
Sci Rep ; 10(1): 9522, 2020 06 12.
Artículo en Inglés | MEDLINE | ID: mdl-32533004

RESUMEN

During the development of new drugs or compounds there is a requirement for preclinical trials, commonly involving animal tests, to ascertain the safety of the compound prior to human trials. Machine learning techniques could provide an in-silico alternative to animal models for assessing drug toxicity, thus reducing expensive and invasive animal testing during clinical trials, for drugs that are most likely to fail safety tests. Here we present a machine learning model to predict kidney dysfunction, as a proxy for drug induced renal toxicity, in rats. To achieve this, we use inexpensive transcriptomic profiles derived from human cell lines after chemical compound treatment to train our models combined with compound chemical structure information. Genomics data due to its sparse, high-dimensional and noisy nature presents significant challenges in building trustworthy and transparent machine learning models. Here we address these issues by judiciously building feature sets from heterogenous sources and coupling them with measures of model uncertainty achieved through Gaussian Process based Bayesian models. We combine the use of insight into the feature-wise contributions to our predictions with the use of predictive uncertainties recovered from the Gaussian Process to improve the transparency and trustworthiness of the model.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/genética , Perfilación de la Expresión Génica , Aprendizaje Automático , Modelos Teóricos , Animales , Humanos , Control de Calidad , Incertidumbre
12.
J Chem Inf Model ; 59(10): 4278-4288, 2019 10 28.
Artículo en Inglés | MEDLINE | ID: mdl-31549507

RESUMEN

We present a machine learning approach to automated force field development in dissipative particle dynamics (DPD). The approach employs Bayesian optimization to parametrize a DPD force field against experimentally determined partition coefficients. The optimization process covers a discrete space of over 40 000 000 points, where each point represents the set of potentials that jointly forms a force field. We find that Bayesian optimization is capable of reaching a force field of comparable performance to the current state-of-the-art within 40 iterations. The best iteration during the optimization achieves an R2 of 0.78 and an RMSE of 0.63 log units on the training set of data, these metrics are maintained when a validation set is included, giving R2 of 0.8 and an RMSE of 0.65 log units. This work hence provides a proof-of-concept, expounding the utility of coupling automated and efficient global optimization with a top down data driven approach to force field parametrization. Compared to commonly employed alternative methods, Bayesian optimization offers global parameter searching and a low time to solution.


Asunto(s)
Aprendizaje Automático , Simulación de Dinámica Molecular , Algoritmos , Teorema de Bayes , Ingeniería Química/métodos , Termodinámica
13.
Microbiome ; 7(1): 40, 2019 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-30878035

RESUMEN

BACKGROUND: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for tyrhe compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. RESULTS: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed 'histosketch' that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a 'real life' example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. CONCLUSIONS: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space. ( https://github.com/will-rowe/hulk ).


Asunto(s)
Bacterias/clasificación , Microbioma Gastrointestinal , Metagenómica/métodos , Antibacterianos/uso terapéutico , Infecciones Bacterianas/tratamiento farmacológico , Estudios de Cohortes , Humanos , Recién Nacido , Recien Nacido Prematuro , Aprendizaje Automático , Análisis de Secuencia de ADN , Programas Informáticos
14.
J Chem Phys ; 148(24): 241744, 2018 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-29960328

RESUMEN

Simulation and data analysis have evolved into powerful methods for discovering and understanding molecular modes of action and designing new compounds to exploit these modes. The combination provides a strong impetus to create and exploit new tools and techniques at the interfaces between physics, biology, and data science as a pathway to new scientific insight and accelerated discovery. In this context, we explore the rational design of novel antimicrobial peptides (short protein sequences exhibiting broad activity against multiple species of bacteria). We show how datasets can be harvested to reveal features which inform new design concepts. We introduce new analysis and visualization tools: a graphical representation of the k-mer spectrum as a fundamental property encoded in antimicrobial peptide databases and a data-driven representation to illustrate membrane binding and permeation of helical peptides.


Asunto(s)
Antibacterianos/química , Péptidos Catiónicos Antimicrobianos/química , Minería de Datos , Bases de Datos de Proteínas , Membranas/química , Disciplinas de las Ciencias Naturales , Bacterias/metabolismo , Descubrimiento de Drogas , Membranas/metabolismo
15.
Sci Data ; 3: 160086, 2016 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-27676312

RESUMEN

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

16.
Acta Crystallogr B Struct Sci Cryst Eng Mater ; 72(Pt 4): 477-87, 2016 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-27484370

RESUMEN

We present a re-parameterization of a popular intermolecular force field for describing intermolecular interactions in the organic solid state. Specifically we optimize the performance of the exp-6 force field when used in conjunction with atomic multipole electrostatics. We also parameterize force fields that are optimized for use with multipoles derived from polarized molecular electron densities, to account for induction effects in molecular crystals. Parameterization is performed against a set of 186 experimentally determined, low-temperature crystal structures and 53 measured sublimation enthalpies of hydrogen-bonding organic molecules. The resulting force fields are tested on a validation set of 129 crystal structures and show improved reproduction of the structures and lattice energies of a range of organic molecular crystals compared with the original force field with atomic partial charge electrostatics. Unit-cell dimensions of the validation set are typically reproduced to within 3% with the re-parameterized force fields. Lattice energies, which were all included during parameterization, are systematically underestimated when compared with measured sublimation enthalpies, with mean absolute errors of between 7.4 and 9.0%.

17.
J Am Chem Soc ; 136(4): 1438-48, 2014 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-24410310

RESUMEN

Small structural changes in organic molecules can have a large influence on solid-state crystal packing, and this often thwarts attempts to produce isostructural series of crystalline solids. For metal-organic frameworks and covalent organic frameworks, this has been addressed by using strong, directional intermolecular bonding to create families of isoreticular solids. Here, we show that an organic directing solvent, 1,4-dioxane, has a dominant effect on the lattice energy for a series of organic cage molecules. Inclusion of dioxane directs the crystal packing for these cages away from their lowest-energy polymorphs to form isostructural, 3-dimensional diamondoid pore channels. This is a unique function of the size, chemical function, and geometry of 1,4-dioxane, and hence, a noncovalent auxiliary interaction assumes the role of directional coordination bonding or covalent bonding in extended crystalline frameworks. For a new cage, CC13, a dual, interpenetrating pore structure is formed that doubles the gas uptake and the surface area in the resulting dioxane-directed crystals.

18.
J Am Chem Soc ; 135(25): 9307-10, 2013 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-23745577

RESUMEN

We synthesize a series of imine cage molecules where increasing the chain length of the alkanediamine precursor results in an odd-even alternation between [2 + 3] and [4 + 6] cage macrocycles. A computational procedure is developed to predict the thermodynamically preferred product and the lowest energy conformer, hence rationalizing the observed alternation and the 3D cage structures, based on knowledge of the precursors alone.


Asunto(s)
Iminas/síntesis química , Cristalografía por Rayos X , Ciclización , Iminas/química , Sustancias Macromoleculares/síntesis química , Sustancias Macromoleculares/química , Modelos Moleculares , Estructura Molecular , Termodinámica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...