Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

The Societal and Scientific Importance of Inclusivity, Diversity, and Equity in Machine Learning for Chemistry.

Probst, Daniel.

Chimia (Aarau) ; 77(1-2): 56-61, 2023 Feb 22.

Artículo en Inglés | MEDLINE | ID: mdl-38047854

RESUMEN

While the introduction of practical deep learning has driven progress across scientific fields, recent research highlighted that the requirement of deep learning for ever-increasing computational resources and data has potential negative impacts on the scientific community and society as a whole. An ever-growing need for more computational resources may exacerbate the concentration of funding, the exclusiveness of research, and thus the inequality between countries, sectors, and institutions. Here, I introduce recent concerns and considerations of the machine learning research community that could affect chemistry and present potential solutions, including more detailed assessments of model performance, increased adherence to open science and open data practices, an increase in multinational and multi-institutional collaboration, and a focus on thematic and cultural diversity.

2.

Fuelling the Digital Chemistry Revolution with Language Models.

Cardinale, Antonio; Castrogiovanni, Alessandro; Gaudin, Theophile; Geluykens, Joppe; Laino, Teodoro; Manica, Matteo; Probst, Daniel; Schwaller, Philippe; Sobczyk, Aleksandros; Toniato, Alessandra; Vaucher, Alain C; Wolf, Heiko; Zipoli, Federico.

Chimia (Aarau) ; 77(7-8): 484-488, 2023 Aug 09.

Artículo en Inglés | MEDLINE | ID: mdl-38047789

RESUMEN

The RXN for Chemistry project, initiated by IBM Research Europe - Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry. This research adopts an innovative concept by treating chemical reaction data as language records, treating the prediction of a synthetic organic chemistry reaction as a translation task between precursor and product languages. Over the years, the IBM Research team has successfully developed language models for various applications including forward reaction prediction, retrosynthesis, reaction classification, atom-mapping, procedure extraction from text, inference of experimental protocols and its use in programming commercial automation hardware to implement an autonomous chemical laboratory. Furthermore, the project has recently incorporated biochemical data in training models for greener and more sustainable chemical reactions. The remarkable ease of constructing prediction models and continually enhancing them through data augmentation with minimal human intervention has led to the widespread adoption of language model technologies, facilitating the digitalization of chemistry in diverse industrial sectors such as pharmaceuticals and chemical manufacturing. This manuscript provides a concise overview of the scientific components that contributed to the prestigious Sandmeyer Award in 2022.

3.

FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web.

Probst, Daniel; Reymond, Jean-Louis.

Bioinformatics ; 34(8): 1433-1435, 2018 04 15.

Artículo en Inglés | MEDLINE | ID: mdl-29186333

RESUMEN

Motivation: During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing and summarizing enormous amounts of data, the results do not allow for a visual inspection of the entire data. Current scientific software, including R packages and Python libraries such as ggplot2, matplotlib and plot.ly, do not support interactive visualizations of datasets exceeding 100 000 data points on the web. Other solutions enable the web-based visualization of big data only through data reduction or statistical representations. However, recent hardware developments, especially advancements in graphical processing units, allow for the rendering of millions of data points on a wide range of consumer hardware such as laptops, tablets and mobile phones. Similar to the challenges and opportunities brought to virtually every scientific field by big data, both the visualization of and interaction with copious amounts of data are both demanding and hold great promise. Results: Here we present FUn, a framework consisting of a client (Faerun) and server (Underdark) module, facilitating the creation of web-based, interactive 3D visualizations of large datasets, enabling record level visual inspection. We also introduce a reference implementation providing access to SureChEMBL, a database containing patent information on more than 17 million chemical compounds. Availability and implementation: The source code and the most recent builds of Faerun and Underdark, Lore.js and the data preprocessing toolchain used in the reference implementation, are available on the project website (http://doc.gdb.tools/fun/). Contact: daniel.probst@dcb.unibe.ch or jean-louis.reymond@dcb.unibe.ch.

Asunto(s)

Visualización de Datos , Programas Informáticos , Algoritmos , Bases de Datos Factuales , Internet

4.

Exploring Chemical Space with Machine Learning.

Arús-Pous, Josep; Awale, Mahendra; Probst, Daniel; Reymond, Jean-Louis.

Chimia (Aarau) ; 73(12): 1018-1023, 2019 Dec 18.

Artículo en Inglés | MEDLINE | ID: mdl-31883554

RESUMEN

Chemical space is a concept to organize molecular diversity by postulating that different molecules occupy different regions of a mathematical space where the position of each molecule is defined by its properties. Our aim is to develop methods to explicitly explore chemical space in the area of drug discovery. Here we review our implementations of machine learning in this project, including our use of deep neural networks to enumerate the GDB13 database from a small sample set, to generate analogs of drugs and natural products after training with fragment-size molecules, and to predict the polypharmacology of molecules after training with known bioactive compounds from ChEMBL. We also discuss visualization methods for big data as means to keep track and learn from machine learning results. Computational tools discussed in this review are freely available at http://gdb.unibe.ch and https://github.com/reymond-group.

5.

SmilesDrawer: Parsing and Drawing SMILES-Encoded Molecular Structures Using Client-Side JavaScript.

Probst, Daniel; Reymond, Jean-Louis.

J Chem Inf Model ; 58(1): 1-7, 2018 01 22.

Artículo en Inglés | MEDLINE | ID: mdl-29257869

RESUMEN

Here we present SmilesDrawer, a dependency-free JavaScript component capable of both parsing and drawing SMILES-encoded molecular structures client-side, developed to be easily integrated into web projects and to display organic molecules in large numbers and fast succession. SmilesDrawer can draw structurally and stereochemically complex structures such as maitotoxin and C60 without using templates, yet has an exceptionally small computational footprint and low memory usage without the requirement for loading images or any other form of client-server communication, making it easy to integrate even in secure (intranet, firewalled) or offline applications. These features allow the rendering of thousands of molecular structure drawings on a single web page within seconds on a wide range of hardware supporting modern browsers. The source code as well as the most recent build of SmilesDrawer is available on Github ( http://doc.gdb.tools/smilesDrawer/ ). Both yarn and npm packages are also available.

Asunto(s)

Diseño Asistido por Computadora , Estructura Molecular , Lenguajes de Programación , Programas Informáticos , Internet , Compuestos Orgánicos/química , Estereoisomerismo , Interfaz Usuario-Computador

6.

Exploring DrugBank in Virtual Reality Chemical Space.

Probst, Daniel; Reymond, Jean-Louis.

J Chem Inf Model ; 58(9): 1731-1735, 2018 09 24.

Artículo en Inglés | MEDLINE | ID: mdl-30114367

RESUMEN

The recent general availability of low-cost virtual reality headsets and accompanying three-dimensional (3D) engine support presents an opportunity to bring the concept of chemical space into virtual environments. While virtual reality applications represent a category of widespread tools in other fields, their use in the visualization and exploration of abstract data such as chemical spaces has been experimental. In our previous work, we established the concept of interactive two-dimensional (2D) maps of chemical spaces followed by interactive web-based 3D visualizations, culminating in the interactive web-based 3D visualization of extremely large chemical spaces. Virtual reality chemical spaces are a natural extension of these concepts. As 2D and 3D embeddings and projections of high-dimensional chemical fingerprint spaces have been shown to be valuable tools in chemical space visualization and exploration, existing pipelines of data mining and preparation can be extended to be used in virtual reality applications. Here we present an application based on the Unity engine and the Virtual Reality Toolkit, allowing for the interactive exploration of chemical space populated by DrugBank compounds in virtual reality. The source code of the application as well as the most recent build are available on GitHub ( https://github.com/reymond-group/virtual-reality-chemical-space ).

Asunto(s)

Bases de Datos Factuales , Realidad Virtual , Imagenología Tridimensional , Modelos Moleculares , Estructura Molecular , Programas Informáticos , Interfaz Usuario-Computador

7.

WebMolCS: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces.

Awale, Mahendra; Probst, Daniel; Reymond, Jean-Louis.

J Chem Inf Model ; 57(4): 643-649, 2017 04 24.

Artículo en Inglés | MEDLINE | ID: mdl-28316236

RESUMEN

The concept of chemical space provides a convenient framework to analyze large collections of molecules by placing them in property spaces where distances represent similarities. Here we report webMolCS, a new type of web-based interface visualizing up to 5000 user-defined molecules in six different three-dimensional (3D) chemical spaces obtained by principal component analysis or similarity mapping of multidimensional property spaces describing composition (MQN: 42D molecular quantum numbers, SMIfp: 34D SMILES fingerprint), shapes and pharmacophores (APfp: 20D atom pair fingerprint, Xfp: 55D category extended atom pair fingerprint), and substructures (Sfp: 1024D binary substructure fingerprint, ECfp4:1024D extended connectivity fingerprint). Each molecule is shown as a sphere, and its structure appears on mouse over. The sphere is color-coded by similarity to the first compound in the list, by the list rank, or by a user-defined value, which reveals the relationship between any property encoded by these values and structural similarities. WebMolCS is freely available at www.gdb.unibe.ch .

Asunto(s)

Internet , Modelos Moleculares , Interfaz Usuario-Computador , Conformación Molecular

8.

Chemical Space: Big Data Challenge for Molecular Diversity.

Awale, Mahendra; Visini, Ricardo; Probst, Daniel; Arús-Pous, Josep; Reymond, Jean-Louis.

Chimia (Aarau) ; 71(10): 661-666, 2017 10 25.

Artículo en Inglés | MEDLINE | ID: mdl-29070411

RESUMEN

Chemical space describes all possible molecules as well as multi-dimensional conceptual spaces representing the structural diversity of these molecules. Part of this chemical space is available in public databases ranging from thousands to billions of compounds. Exploiting these databases for drug discovery represents a typical big data problem limited by computational power, data storage and data access capacity. Here we review recent developments of our laboratory, including progress in the chemical universe databases (GDB) and the fragment subset FDB-17, tools for ligand-based virtual screening by nearest neighbor searches, such as our multi-fingerprint browser for the ZINC database to select purchasable screening compounds, and their application to discover potent and selective inhibitors for calcium channel TRPV6 and Aurora A kinase, the polypharmacology browser (PPB) for predicting off-target effects, and finally interactive 3D-chemical space visualization using our online tools WebDrugCS and WebMolCS. All resources described in this paper are available for public use at www.gdb.unibe.ch.

Asunto(s)

Bases de Datos de Compuestos Químicos , Descubrimiento de Drogas

9.

Communicating Near Real-Time Data During the COVID-19 Pandemic.

Probst, Daniel.

Chimia (Aarau) ; 74(7): 613-614, 2020 08 12.

Artículo en Inglés | MEDLINE | ID: mdl-32778214

Asunto(s)

Comunicación , Infecciones por Coronavirus , Recolección de Datos , Difusión de la Información , Pandemias , Neumonía Viral , Betacoronavirus , COVID-19 , Humanos , SARS-CoV-2

10.

Multicenter implementation of a severe sepsis and septic shock treatment bundle.

Miller, Russell R; Dong, Li; Nelson, Nancy C; Brown, Samuel M; Kuttler, Kathryn G; Probst, Daniel R; Allen, Todd L; Clemmer, Terry P.

Am J Respir Crit Care Med ; 188(1): 77-82, 2013 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-23631750

RESUMEN

RATIONALE: Severe sepsis and septic shock are leading causes of intensive care unit (ICU) admission, morbidity, and mortality. The effect of compliance with sepsis management guidelines on outcomes is unclear. OBJECTIVES: To assess the effect on mortality of compliance with a severe sepsis and septic shock management bundle. METHODS: Observational study of a severe sepsis and septic shock bundle as part of a quality improvement project in 18 ICUs in 11 hospitals in Utah and Idaho. MEASUREMENTS AND MAIN RESULTS: Among 4,329 adult subjects with severe sepsis or septic shock admitted to study ICUs from the emergency department between January 2004 and December 2010, hospital mortality was 12.1%, declining from 21.2% in 2004 to 8.7% in 2010. All-or-none total bundle compliance increased from 4.9-73.4% simultaneously. Mortality declined from 21.7% in 2004 to 9.7% in 2010 among subjects noncompliant with one or more bundle element. Regression models adjusting for age, severity of illness, and comorbidities identified an association between mortality and compliance with each of inotropes and red cell transfusions, glucocorticoids, and lung-protective ventilation. Compliance with early resuscitation elements during the first 3 hours after emergency department admission caused ineligibility, through lower subsequent severity of illness, for these later bundle elements. CONCLUSIONS: Total severe sepsis and septic shock bundle compliances increased substantially and were associated with a marked reduction in hospital mortality after adjustment for age, severity of illness, and comorbidities in a multicenter ICU cohort. Early resuscitation bundle element compliance predicted ineligibility for subsequent bundle elements.

Asunto(s)

Adhesión a Directriz/estadística & datos numéricos , Sepsis/terapia , Choque Séptico/terapia , Anciano , Cardiotónicos/uso terapéutico , Transfusión de Eritrocitos/métodos , Transfusión de Eritrocitos/estadística & datos numéricos , Femenino , Glucocorticoides/uso terapéutico , Mortalidad Hospitalaria , Humanos , Idaho , Unidades de Cuidados Intensivos/estadística & datos numéricos , Tiempo de Internación/estadística & datos numéricos , Masculino , Persona de Mediana Edad , Respiración Artificial/estadística & datos numéricos , Resucitación/métodos , Resucitación/estadística & datos numéricos , Resultado del Tratamiento , Utah

11.

Language models can identify enzymatic binding sites in protein sequences.

Nana Teukam, Yves Gaetan; Kwate Dassi, Loïc; Manica, Matteo; Probst, Daniel; Schwaller, Philippe; Laino, Teodoro.

Comput Struct Biotechnol J ; 23: 1929-1937, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-38736695

RESUMEN

Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and have since gained prominence in modeling proteins and chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some of these relationships refer to three-dimensional structural features, raising important questions on the dimensionality of the information encoded within sequential data. Here, we demonstrate that the unsupervised use of a language model architecture to a language representation of bio-catalyzed chemical reactions can capture the signal at the base of the substrate-binding site atomic interactions. This allows us to identify the three-dimensional binding site position in unknown protein sequences. The language representation comprises a reaction-simplified molecular-input line-entry system (SMILES) for substrate and products, and amino acid sequence information for the enzyme. This approach can recover, with no supervision, 52.13% of the binding site when considering co-crystallized substrate-enzyme structures as ground truth, vastly outperforming other attention-based models.

12.

Deep Learning Invades Drug Design and Synthesis.

Arús-Pous, Josep; Probst, Daniel; Reymond, Jean-Louis.

Chimia (Aarau) ; 72(1): 70-71, 2018 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-29490798

13.

An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification.

Probst, Daniel.

J Cheminform ; 15(1): 113, 2023 Nov 23.

Artículo en Inglés | MEDLINE | ID: mdl-37996942

RESUMEN

Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given the lack of large and balanced annotated data sets of enzyme-catalysed reactions, assigning an enzyme to a reaction still relies on expert-curated rules and databases. Here, we present a data-driven explainable human-in-the-loop machine learning approach to support and ultimately automate the association of a catalysing enzyme with a given biochemical reaction. In addition, the proposed method is capable of predicting enzymes as candidate catalysts for organic reactions amendable to biocatalysis. Finally, the introduced explainability and visualisation methods can easily be generalised to support other machine-learning approaches involving chemical and biochemical reactions.

14.

Alchemical analysis of FDA approved drugs.

Orsi, Markus; Probst, Daniel; Schwaller, Philippe; Reymond, Jean-Louis.

Digit Discov ; 2(5): 1289-1296, 2023 Oct 09.

Artículo en Inglés | MEDLINE | ID: mdl-38013905

RESUMEN

Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fact that tools designed for reaction informatics also work for alchemical processes that do not obey Lavoisier's principle, such as the transmutation of lead into gold. We start by using the differential reaction fingerprint (DRFP) to create tree-maps (TMAPs) representing the chemical space of pairs of drugs selected as being similar according to various molecular fingerprints. We then use the Transformer-based RXNMapper model to understand structural relationships between drugs, and its confidence score to distinguish between pairs related by chemically feasible transformations and pairs related by alchemical transmutations. This analysis reveals a diversity of structural similarity relationships that are otherwise difficult to analyze simultaneously. We exemplify this approach by visualizing FDA-approved drugs, EGFR inhibitors, and polymyxin B analogs.

15.

EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions.

Heid, Esther; Probst, Daniel; Green, William H; Madsen, Georg K H.

Chem Sci ; 14(48): 14229-14242, 2023 Dec 13.

Artículo en Inglés | MEDLINE | ID: mdl-38098707

RESUMEN

Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.

16.

Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planning.

Toniato, Alessandra; Unsleber, Jan P; Vaucher, Alain C; Weymuth, Thomas; Probst, Daniel; Laino, Teodoro; Reiher, Markus.

Digit Discov ; 2(3): 663-673, 2023 Jun 12.

Artículo en Inglés | MEDLINE | ID: mdl-37312681

RESUMEN

Data-driven synthesis planning has seen remarkable successes in recent years by virtue of modern approaches of artificial intelligence that efficiently exploit vast databases with experimental data on chemical reactions. However, this success story is intimately connected to the availability of existing experimental data. It may well occur in retrosynthetic and synthesis design tasks that predictions in individual steps of a reaction cascade are affected by large uncertainties. In such cases, it will, in general, not be easily possible to provide missing data from autonomously conducted experiments on demand. However, first-principles calculations can, in principle, provide missing data to enhance the confidence of an individual prediction or for model retraining. Here, we demonstrate the feasibility of such an ansatz and examine resource requirements for conducting autonomous first-principles calculations on demand.

17.

What is the Rate of Response to Nonoperative Treatment for Hip-Related Pain? A Systematic Review With Meta-analysis.

Probst, Daniel T; Sookochoff, Michael F; Harris-Hayes, Marcie; Prather, Heidi; Lipsey, Kim L; Cheng, Abby L.

J Orthop Sports Phys Ther ; 53(5): 286306, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-36892224

RESUMEN

OBJECTIVE: We aimed to (1) determine the rate of satisfactory response to nonoperative treatment for nonarthritic hip-related pain, and (2) evaluate the specific effect of various elements of physical therapy and nonoperative treatment options aside from physical therapy. DESIGN: Systematic review with meta-analysis. LITERATURE SEARCH: We searched 7 databases and reference lists of eligible studies from their inception to February 2022. STUDY SELECTION CRITERIA: We included randomized controlled trials and prospective cohort studies that compared a nonoperative management protocol to any other treatment for patients with femoroacetabular impingement syndrome, acetabular dysplasia, acetabular labral tear, and/or nonarthritic hip pain not otherwise specified. DATA SYNTHESIS: We used random-effects meta-analyses, as appropriate. Study quality was assessed using an adapted Downs and Black checklist. Certainty of evidence was assessed using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach. RESULTS: Twenty-six studies (1153 patients) were eligible for qualitative synthesis, and 16 were included in the meta-analysis. Moderate certainty evidence suggests that the overall response rate to nonoperative treatment was 54% (95% confidence interval: 32%, 76%). The overall mean improvement after physical therapy treatment was 11.3 points (7.6-14.9) on 100-point patient-reported hip symptom measures (low to moderate certainty) and 22.2 points (4.6-39.9) on 100-point pain severity measures (low certainty). No definitive specific effect was observed regarding therapy duration or approach (ie, flexibility exercise, movement pattern training, and/or mobilization) (very low to low certainty). Very low to low certainty evidence supported viscosupplementation, corticosteroid injection, and a supportive brace. CONCLUSION: Over half of patients with nonarthritic hip-related pain reported satisfactory response to nonoperative treatment. However, the essential elements of comprehensive nonoperative treatment remain unclear. J Orthop Sports Phys Ther 2023;53(5):1-21. Epub 9 March 2023. doi:10.2519/jospt.2023.11666.

Asunto(s)

Pinzamiento Femoroacetabular , Modalidades de Fisioterapia , Humanos , Estudios Prospectivos , Artralgia/terapia , Terapia por Ejercicio/métodos , Pinzamiento Femoroacetabular/rehabilitación

18.

Reaction classification and yield prediction using the differential reaction fingerprint DRFP.

Probst, Daniel; Schwaller, Philippe; Reymond, Jean-Louis.

Digit Discov ; 1(2): 91-97, 2022 Apr 11.

Artículo en Inglés | MEDLINE | ID: mdl-35515081

RESUMEN

Predicting the nature and outcome of reactions using computational methods is a crucial tool to accelerate chemical research. The recent application of deep learning-based learned fingerprints to reaction classification and reaction yield prediction has shown an impressive increase in performance compared to previous methods such as DFT- and structure-based fingerprints. However, learned fingerprints require large training data sets, are inherently biased, and are based on complex deep learning architectures. Here we present the differential reaction fingerprint DRFP. The DRFP algorithm takes a reaction SMILES as an input and creates a binary fingerprint based on the symmetric difference of two sets containing the circular molecular n-grams generated from the molecules listed left and right from the reaction arrow, respectively, without the need for distinguishing between reactants and reagents. We show that DRFP performs better than DFT-based fingerprints in reaction yield prediction and other structure-based fingerprints in reaction classification, reaching the performance of state-of-the-art learned fingerprints in both tasks while being data-independent.

19.

Biocatalysed synthesis planning using data-driven learning.

Probst, Daniel; Manica, Matteo; Nana Teukam, Yves Gaetan; Castrogiovanni, Alessandro; Paratore, Federico; Laino, Teodoro.

Nat Commun ; 13(1): 964, 2022 02 18.

Artículo en Inglés | MEDLINE | ID: mdl-35181654

RESUMEN

Enzyme catalysts are an integral part of green chemistry strategies towards a more sustainable and resource-efficient chemical synthesis. However, the use of biocatalysed reactions in retrosynthetic planning clashes with the difficulties in predicting the enzymatic activity on unreported substrates and enzyme-specific stereo- and regioselectivity. As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, we extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis. The enzymatic knowledge is learned from an extensive data set of publicly available biochemical reactions with the aid of a new class token scheme based on the enzyme commission classification number, which captures catalysis patterns among different enzymes belonging to the same hierarchy. The forward reaction prediction model (top-1 accuracy of 49.6%), the retrosynthetic pathway (top-1 single-step round-trip accuracy of 39.6%) and the curated data set are made publicly available to facilitate the adoption of enzymatic catalysis in the design of greener chemistry processes.

Asunto(s)

Biocatálisis , Reactores Biológicos , Técnicas de Química Sintética , Tecnología Química Verde/métodos , Catálisis , Quimioinformática , Recursos Naturales

20.

Visualization of very large high-dimensional data sets as minimum spanning trees.

Probst, Daniel; Reymond, Jean-Louis.

J Cheminform ; 12(1): 12, 2020 Feb 12.

Artículo en Inglés | MEDLINE | ID: mdl-33431043

RESUMEN

The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature.

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA