Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
J Chem Inf Model ; 64(7): 2539-2553, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38185877

RESUMO

A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Proteínas , Desenvolvimento de Medicamentos , Descoberta de Drogas
2.
Nat Commun ; 14(1): 7339, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957207

RESUMO

The field of bioimage analysis is currently impacted by a profound transformation, driven by the advancements in imaging technologies and artificial intelligence. The emergence of multi-modal AI systems could allow extracting and utilizing knowledge from bioimaging databases based on information from other data modalities. We leverage the multi-modal contrastive learning paradigm, which enables the embedding of both bioimages and chemical structures into a unified space by means of bioimage and molecular structure encoders. This common embedding space unlocks the possibility of querying bioimaging databases with chemical structures that induce different phenotypic effects. Concretely, in this work we show that a retrieval system based on multi-modal contrastive learning is capable of identifying the correct bioimage corresponding to a given chemical structure from a database of ~2000 candidate images with a top-1 accuracy >70 times higher than a random baseline. Additionally, the bioimage encoder demonstrates remarkable transferability to various further prediction tasks within the domain of drug discovery, such as activity prediction, molecule classification, and mechanism of action identification. Thus, our approach not only addresses the current limitations of bioimaging databases but also paves the way towards foundation models for microscopy images.


Assuntos
Inteligência Artificial , Aprendizagem , Bases de Dados Factuais , Descoberta de Drogas , Conhecimento
5.
MAbs ; 14(1): 2031482, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35377271

RESUMO

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.


Assuntos
Reações Antígeno-Anticorpo , Aprendizado de Máquina , Anticorpos Monoclonais/química , Sítios de Ligação de Anticorpos , Epitopos
6.
J Med Syst ; 46(5): 23, 2022 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-35348909

RESUMO

Many previous studies claim to have developed machine learning models that diagnose COVID-19 from blood tests. However, we hypothesize that changes in the underlying distribution of the data, so called domain shifts, affect the predictive performance and reliability and are a reason for the failure of such machine learning models in clinical application. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (way of taking samples, laboratory procedures), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. We investigate whether domain shifts are present in COVID-19 datasets and how they affect machine learning methods. We further set out to estimate the mortality risk based on routinely acquired blood tests in a hospital setting throughout pandemics and under domain shifts. We reveal domain shifts by evaluating the models on a large-scale dataset with different assessment strategies, such as temporal validation. We present the novel finding that domain shifts strongly affect machine learning models for COVID-19 diagnosis and deteriorate their predictive performance and credibility. Therefore, frequent re-training and re-assessment are indispensable for robust models enabling clinical utility.


Assuntos
COVID-19 , COVID-19/diagnóstico , Teste para COVID-19 , Testes Hematológicos , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
7.
J Chem Inf Model ; 62(9): 2111-2120, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-35034452

RESUMO

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

8.
Nat Comput Sci ; 2(12): 845-865, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38177393

RESUMO

Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity. We formalized common immunological antibody-specificity prediction problems as ML tasks and confirmed that for both sequence- and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework has the potential to enable real-world relevant development and benchmarking of ML strategies for biotherapeutics design.


Assuntos
Anticorpos , Reações Antígeno-Anticorpo , Especificidade de Anticorpos , Epitopos/química , Aprendizado de Máquina
9.
Front Artif Intell ; 4: 638410, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33937745

RESUMO

Drug-induced liver injury (DILI) is a common reason for the withdrawal of a drug from the market. Early assessment of DILI risk is an essential part of drug development, but it is rendered challenging prior to clinical trials by the complex factors that give rise to liver damage. Artificial intelligence (AI) approaches, particularly those building on machine learning, range from random forests to more recent techniques such as deep learning, and provide tools that can analyze chemical compounds and accurately predict some of their properties based purely on their structure. This article reviews existing AI approaches to predicting DILI and elaborates on the challenges that arise from the as yet limited availability of data. Future directions are discussed focusing on rich data modalities, such as 3D spheroids, and the slow but steady increase in drugs annotated with DILI risk labels.

10.
Nat Mach Intell ; 3(11): 936-944, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37396030

RESUMO

Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

11.
J Cheminform ; 12(1): 26, 2020 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33430964

RESUMO

Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.

12.
J Lipid Res ; 60(11): 1922-1934, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31530576

RESUMO

During pregnancy, extravillous trophoblasts (EVTs) invade the maternal decidua and remodel the local vasculature to establish blood supply for the growing fetus. Compromised EVT function has been linked to aberrant pregnancy associated with maternal and fetal morbidity and mortality. However, metabolic features of this invasive trophoblast subtype are largely unknown. Using primary human trophoblasts isolated from first trimester placental tissues, we show that cellular cholesterol homeostasis is differentially regulated in EVTs compared with villous cytotrophoblasts. Utilizing RNA-sequencing, gene set-enrichment analysis, and functional validation, we provide evidence that EVTs display increased levels of free and esterified cholesterol. Accordingly, EVTs are characterized by increased expression of the HDL-receptor, scavenger receptor class B type I, and reduced expression of the LXR and its target genes. We further reveal that EVTs express elevated levels of hydroxy-delta-5-steroid dehydrogenase 3 beta- and steroid delta-isomerase 1 (HSD3B1) (a rate-limiting enzyme in progesterone synthesis) and are capable of secreting progesterone. Increasing cholesterol export by LXR activation reduced progesterone secretion in an ABCA1-dependent manner. Importantly, HSD3B1 expression was decreased in EVTs of idiopathic recurrent spontaneous abortions, pointing toward compromised progesterone metabolism in EVTs of early miscarriages. Here, we provide insights into the regulation of cholesterol and progesterone metabolism in trophoblastic subtypes and its putative relevance in human miscarriage.


Assuntos
Aborto Habitual/metabolismo , Colesterol/metabolismo , Progesterona/metabolismo , Trofoblastos/metabolismo , Biologia Computacional , Feminino , Homeostase , Humanos , Gravidez , Análise de Sequência de RNA
13.
J Chem Inf Model ; 59(3): 1163-1171, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30840449

RESUMO

Predicting the outcome of biological assays based on high-throughput imaging data is a highly promising task in drug discovery since it can tremendously increase hit rates and suggest novel chemical scaffolds. However, end-to-end learning with convolutional neural networks (CNNs) has not been assessed for the task biological assay prediction despite the success of these networks at visual recognition. We compared several CNNs trained directly on high-throughput imaging data to a) CNNs trained on cell-centric crops and to b) the current state-of-the-art: fully connected networks trained on precalculated morphological cell features. The comparison was performed on the Cell Painting data set, the largest publicly available data set of microscopic images of cells with approximately 30,000 compound treatments. We found that CNNs perform significantly better at predicting the outcome of assays than fully connected networks operating on precomputed morphological features of cells. Surprisingly, the best performing method could predict 32% of the 209 biological assays at high predictive performance (AUC > 0.9) indicating that the cell morphology changes contain a large amount of information about compound activities. Our results suggest that many biological assays could be replaced by high-throughput imaging together with convolutional neural networks and that the costly cell segmentation and feature extraction step can be replaced by convolutional neural networks.


Assuntos
Bioensaio , Biologia Computacional/métodos , Microscopia , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
14.
15.
J Chem Inf Model ; 59(3): 962-972, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30408959

RESUMO

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Aprendizado de Máquina , Redes Neurais de Computação
16.
Drug Discov Today Technol ; 32-33: 55-63, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33386095

RESUMO

There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.


Assuntos
Descoberta de Drogas , Modelos Moleculares , Humanos
17.
Chem Sci ; 9(24): 5441-5451, 2018 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-30155234

RESUMO

Deep learning is currently the most successful machine learning technique in a wide range of application areas and has recently been applied successfully in drug discovery research to predict potential drug targets and to screen for active molecules. However, due to (1) the lack of large-scale studies, (2) the compound series bias that is characteristic of drug discovery datasets and (3) the hyperparameter selection bias that comes with the high number of potential deep learning architectures, it remains unclear whether deep learning can indeed outperform existing computational methods in drug discovery tasks. We therefore assessed the performance of several deep learning methods on a large-scale drug discovery dataset and compared the results with those of other machine learning and target prediction methods. To avoid potential biases from hyperparameter selection or compound series, we used a nested cluster-cross-validation strategy. We found (1) that deep learning methods significantly outperform all competing methods and (2) that the predictive performance of deep learning is in many cases comparable to that of tests performed in wet labs (i.e., in vitro assays).

18.
J Chem Inf Model ; 58(9): 1736-1741, 2018 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-30118593

RESUMO

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, method comparison is difficult because of various flaws of the currently employed evaluation metrics. We propose an evaluation metric for generative models called Fréchet ChemNet distance (FCD). The advantage of the FCD over previous metrics is that it can detect whether generated molecules are diverse and have similar chemical and biological properties as real molecules.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Simulação por Computador , Bases de Dados Factuais , Modelos Moleculares , Software
19.
Cell Chem Biol ; 25(5): 611-618.e3, 2018 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-29503208

RESUMO

In both academia and the pharmaceutical industry, large-scale assays for drug discovery are expensive and often impractical, particularly for the increasingly important physiologically relevant model systems that require primary cells, organoids, whole organisms, or expensive or rare reagents. We hypothesized that data from a single high-throughput imaging assay can be repurposed to predict the biological activity of compounds in other assays, even those targeting alternate pathways or biological processes. Indeed, quantitative information extracted from a three-channel microscopy-based screen for glucocorticoid receptor translocation was able to predict assay-specific biological activity in two ongoing drug discovery projects. In these projects, repurposing increased hit rates by 50- to 250-fold over that of the initial project assays while increasing the chemical structure diversity of the hits. Our results suggest that data from high-content screens are a rich source of information that can be used to predict and replace customized biological assays.


Assuntos
Reposicionamento de Medicamentos/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Redes Neurais de Computação , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Ensaios de Triagem em Larga Escala/métodos , Humanos , Neoplasias/tratamento farmacológico
20.
Bioinformatics ; 34(9): 1538-1546, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29253077

RESUMO

Motivation: While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results: DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation: DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Contact: klambauer@bioinf.jku.at. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Biologia Computacional/métodos , Aprendizado Profundo , Perfilação da Expressão Gênica/métodos , Neoplasias/tratamento farmacológico , Software , Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/genética , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...