Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37642660

RESUMO

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Assuntos
Benchmarking , Relação Quantitativa Estrutura-Atividade , Bioensaio , Aprendizado de Máquina
2.
J Cheminform ; 13(1): 96, 2021 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-34876230

RESUMO

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

3.
Molecules ; 26(22)2021 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-34834051

RESUMO

Machine learning models predicting the bioactivity of chemical compounds belong nowadays to the standard tools of cheminformaticians and computational medicinal chemists. Multi-task and federated learning are promising machine learning approaches that allow privacy-preserving usage of large amounts of data from diverse sources, which is crucial for achieving good generalization and high-performance results. Using large, real world data sets from six pharmaceutical companies, here we investigate different strategies for averaging weighted task loss functions to train multi-task bioactivity classification models. The weighting strategies shall be suitable for federated learning and ensure that learning efforts are well distributed even if data are diverse. Comparing several approaches using weights that depend on the number of sub-tasks per assay, task size, and class balance, respectively, we find that a simple sub-task weighting approach leads to robust model performance for all investigated data sets and is especially suited for federated learning.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Desenho de Fármacos , Humanos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia
4.
Rapid Commun Mass Spectrom ; : e9120, 2021 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-33955607

RESUMO

RATIONALE: Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation. ARCHITECTURES: Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search. CONCLUSIONS: In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.

5.
Chemistry ; 20(26): 7962-78, 2014 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-24895060

RESUMO

DFT calculations have been used to elucidate the chain termination mechanisms for neutral nickel ethylene oligo- and polymerization catalysts and to rationalize the kind of oligomers and polymers produced by each catalyst. The catalysts studied are the (κ(2)-O,O)-coordinated (1,1,1,5,5,5-hexafluoro-2,4-acetylacetonato)nickel catalyst I, the (κ(2)-P,O)-coordinated SHOP-type nickel catalyst II, the (κ(2)-N,O)-coordinated anilinotropone and salicylaldiminato nickel catalysts III and IV, respectively, and the (κ(2)-P,N)-coordinated phosphinosulfonamide nickel catalyst V. Numerous termination pathways involving ß-H elimination and ß-H transfer steps have been investigated, and the most probable routes identified. Despite the complexity and multitude of the possible termination pathways, the information most critical to chain termination is contained in only few transition states. In addition, by consideration of the propagation pathway, we have been able to estimate chain lengths and discriminate between oligo- and polymerization catalysts. In agreement with experiment, we found the Gibbs free energy difference between the overall barrier for the most facile propagation and termination pathways to be close to 0 kcal mol(-1) for the ethylene oligomerization catalysts I and V, whereas values of at least 7 kcal mol(-1) in favor of propagation were determined for the polymerization catalysts III and IV. Because of the shared intermediates between the termination and branching pathways, we have been able to identify the preferred cis/trans regiochemistry of ß-H elimination and show that a pronounced difference in σ donation of the two bridgehead atoms of the bidentate ligand can suppress hydride formation and thus branching. The degree of rationalization obtained here from a handful of key intermediates and transition states is promising for the use of computational methods in the screening and prediction of new catalysts of the title class.

6.
J Am Chem Soc ; 134(21): 8885-95, 2012 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-22524432

RESUMO

Development of functional inorganic and transition metal compounds is usually based on ad hoc qualified guesses, with computational methods playing a lesser role than in drug discovery. A de novo evolutionary algorithm (EA) is presented that automatically generates transition metal complexes using a search space constrained around chemically meaningful structures assembled from three kinds of fragments: a part shared by all structures and typically containing the metal center itself, one or several parts consisting of ligand skeletons, and unconstrained parts that may grow and vary freely. In EA optimizations, using a cost-efficient fitness function based on a linear quantitative structure-activity relationship model for catalytic activity, we demonstrate the capabilities of the method by retracing the transition from the first-generation, phosphine-based Grubbs olefin metathesis catalysts to second-generation catalysts containing N-heterocyclic carbene ligands instead of phosphines. Moreover, DFT calculations on selected high-fitness, last-generation structures from these evolutionary experiments suggest that, in terms of catalytic activity, the structures arrived at by virtual evolution alone compare favorably with existing, highly active catalysts. The structures from the evolution experiments are, however, complex and probably difficult to synthesize, but a set of manually simplified variations thereof might form the leads for a new generation of Grubbs catalysts.


Assuntos
Algoritmos , Desenho de Fármacos , Elementos de Transição/química , Alcenos/química , Catálise , Compostos Organometálicos/química , Rutênio/química
7.
Chemistry ; 17(51): 14628-42, 2011 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-22095527

RESUMO

An unconventional chain termination reaction has been explored for the SHOP (Shell higher olefin process)-type, anilinotropone, and salicylaldiminato nickel-based oligo- and polymerization catalysts by using density functional theory (DFT). Starting from the tetracoordinate alkyl phosphine complex, the termination reaction was found to involve a rearrangement of the alkyl chain to form a pentacoordinate ß-agostic complex, ß-hydride elimination, and olefinic chain dissociation and to compete with propagation at sufficiently high phosphine concentration and/or basicity. It provides the first complete and convincing mechanistic rationale for the decreasing chain lengths observed upon increasing phosphine concentration and basicity. The unconventional reaction was found to be a major termination pathway for the SHOP-type catalyst and is very unlikely to lead to branching and olefin isomerization, which is critical for explaining why the SHOP catalyst, in contrast to the anilinotropone and salicylaldiminato catalysts, tends to lead to the oligomerization of ethylene to form linear α-olefins. Based on our results we have proposed a new and extended catalytic cycle for the SHOP-type ethylene oligomerization catalyst. Finally, the importance of the new termination reaction for the SHOP-type catalyst suggests that this reaction may also operate with other ethylene oligomerization nickel catalysts. This prediction was confirmed for a pyrazolonatophosphine catalyst, for which the new termination route was found to be even more facile, which explains the short oligomers produced by this catalyst.

8.
J Comput Chem ; 32(3): 386-95, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20803487

RESUMO

Several definitions of an atom in a molecule (AIM) in three-dimensional (3D) space, including both fuzzy and disjoint domains, are used to calculate electron sharing indices (ESI) and related electronic aromaticity measures, namely, I(ring) and multicenter indices (MCI), for a wide set of cyclic planar aromatic and nonaromatic molecules of different ring size. The results obtained using the recent iterative Hirshfeld scheme are compared with those derived from the classical Hirshfeld method and from Bader's quantum theory of atoms in molecules. For bonded atoms, all methods yield ESI values in very good agreement, especially for C-C interactions. In the case of nonbonded interactions, there are relevant deviations, particularly between fuzzy and QTAIM schemes. These discrepancies directly translate into significant differences in the values and the trends of the aromaticity indices. In particular, the chemically expected trends are more consistently found when using disjoint domains. Careful examination of the underlying effects reveals the different reasons why the aromaticity indices investigated give the expected results for binary divisions of 3D space.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA