Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Chem Inf Model ; 64(7): 2539-2553, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38185877

RESUMO

A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Proteínas , Desenvolvimento de Medicamentos , Descoberta de Drogas
2.
J Chem Inf Model ; 62(9): 2111-2120, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-35034452

RESUMO

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

3.
J Med Syst ; 46(5): 23, 2022 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-35348909

RESUMO

Many previous studies claim to have developed machine learning models that diagnose COVID-19 from blood tests. However, we hypothesize that changes in the underlying distribution of the data, so called domain shifts, affect the predictive performance and reliability and are a reason for the failure of such machine learning models in clinical application. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (way of taking samples, laboratory procedures), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. We investigate whether domain shifts are present in COVID-19 datasets and how they affect machine learning methods. We further set out to estimate the mortality risk based on routinely acquired blood tests in a hospital setting throughout pandemics and under domain shifts. We reveal domain shifts by evaluating the models on a large-scale dataset with different assessment strategies, such as temporal validation. We present the novel finding that domain shifts strongly affect machine learning models for COVID-19 diagnosis and deteriorate their predictive performance and credibility. Therefore, frequent re-training and re-assessment are indispensable for robust models enabling clinical utility.


Assuntos
COVID-19 , COVID-19/diagnóstico , Teste para COVID-19 , Testes Hematológicos , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
4.
J Lipid Res ; 60(11): 1922-1934, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31530576

RESUMO

During pregnancy, extravillous trophoblasts (EVTs) invade the maternal decidua and remodel the local vasculature to establish blood supply for the growing fetus. Compromised EVT function has been linked to aberrant pregnancy associated with maternal and fetal morbidity and mortality. However, metabolic features of this invasive trophoblast subtype are largely unknown. Using primary human trophoblasts isolated from first trimester placental tissues, we show that cellular cholesterol homeostasis is differentially regulated in EVTs compared with villous cytotrophoblasts. Utilizing RNA-sequencing, gene set-enrichment analysis, and functional validation, we provide evidence that EVTs display increased levels of free and esterified cholesterol. Accordingly, EVTs are characterized by increased expression of the HDL-receptor, scavenger receptor class B type I, and reduced expression of the LXR and its target genes. We further reveal that EVTs express elevated levels of hydroxy-delta-5-steroid dehydrogenase 3 beta- and steroid delta-isomerase 1 (HSD3B1) (a rate-limiting enzyme in progesterone synthesis) and are capable of secreting progesterone. Increasing cholesterol export by LXR activation reduced progesterone secretion in an ABCA1-dependent manner. Importantly, HSD3B1 expression was decreased in EVTs of idiopathic recurrent spontaneous abortions, pointing toward compromised progesterone metabolism in EVTs of early miscarriages. Here, we provide insights into the regulation of cholesterol and progesterone metabolism in trophoblastic subtypes and its putative relevance in human miscarriage.


Assuntos
Aborto Habitual/metabolismo , Colesterol/metabolismo , Progesterona/metabolismo , Trofoblastos/metabolismo , Biologia Computacional , Feminino , Homeostase , Humanos , Gravidez , Análise de Sequência de RNA
5.
Bioinformatics ; 34(9): 1538-1546, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29253077

RESUMO

Motivation: While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results: DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation: DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Contact: klambauer@bioinf.jku.at. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Biologia Computacional/métodos , Aprendizado Profundo , Perfilação da Expressão Gênica/métodos , Neoplasias/tratamento farmacológico , Software , Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/genética , Máquina de Vetores de Suporte
6.
J Chem Inf Model ; 59(3): 1163-1171, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30840449

RESUMO

Predicting the outcome of biological assays based on high-throughput imaging data is a highly promising task in drug discovery since it can tremendously increase hit rates and suggest novel chemical scaffolds. However, end-to-end learning with convolutional neural networks (CNNs) has not been assessed for the task biological assay prediction despite the success of these networks at visual recognition. We compared several CNNs trained directly on high-throughput imaging data to a) CNNs trained on cell-centric crops and to b) the current state-of-the-art: fully connected networks trained on precalculated morphological cell features. The comparison was performed on the Cell Painting data set, the largest publicly available data set of microscopic images of cells with approximately 30,000 compound treatments. We found that CNNs perform significantly better at predicting the outcome of assays than fully connected networks operating on precomputed morphological features of cells. Surprisingly, the best performing method could predict 32% of the 209 biological assays at high predictive performance (AUC > 0.9) indicating that the cell morphology changes contain a large amount of information about compound activities. Our results suggest that many biological assays could be replaced by high-throughput imaging together with convolutional neural networks and that the costly cell segmentation and feature extraction step can be replaced by convolutional neural networks.


Assuntos
Bioensaio , Biologia Computacional/métodos , Microscopia , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
7.
J Chem Inf Model ; 59(3): 962-972, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30408959

RESUMO

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Aprendizado de Máquina , Redes Neurais de Computação
8.
Drug Discov Today Technol ; 32-33: 55-63, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33386095

RESUMO

There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.


Assuntos
Descoberta de Drogas , Modelos Moleculares , Humanos
10.
J Chem Inf Model ; 58(9): 1736-1741, 2018 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-30118593

RESUMO

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, method comparison is difficult because of various flaws of the currently employed evaluation metrics. We propose an evaluation metric for generative models called Fréchet ChemNet distance (FCD). The advantage of the FCD over previous metrics is that it can detect whether generated molecules are diverse and have similar chemical and biological properties as real molecules.


Assuntos
Aprendizado Profundo , Descoberta de Drogas , Simulação por Computador , Bases de Dados Factuais , Modelos Moleculares , Software
11.
Hum Mutat ; 38(7): 889-897, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28449315

RESUMO

Targeted next-generation-sequencing (NGS) panels have largely replaced Sanger sequencing in clinical diagnostics. They allow for the detection of copy-number variations (CNVs) in addition to single-nucleotide variants and small insertions/deletions. However, existing computational CNV detection methods have shortcomings regarding accuracy, quality control (QC), incidental findings, and user-friendliness. We developed panelcn.MOPS, a novel pipeline for detecting CNVs in targeted NGS panel data. Using data from 180 samples, we compared panelcn.MOPS with five state-of-the-art methods. With panelcn.MOPS leading the field, most methods achieved comparably high accuracy. panelcn.MOPS reliably detected CNVs ranging in size from part of a region of interest (ROI), to whole genes, which may comprise all ROIs investigated in a given sample. The latter is enabled by analyzing reads from all ROIs of the panel, but presenting results exclusively for user-selected genes, thus avoiding incidental findings. Additionally, panelcn.MOPS offers QC criteria not only for samples, but also for individual ROIs within a sample, which increases the confidence in called CNVs. panelcn.MOPS is freely available both as R package and standalone software with graphical user interface that is easy to use for clinical geneticists without any programming experience. panelcn.MOPS combines high sensitivity and specificity with user-friendliness rendering it highly suitable for routine clinical diagnostics.


Assuntos
Biologia Computacional , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Algoritmos , Gráficos por Computador , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA , Software , Interface Usuário-Computador
13.
Bioinformatics ; 31(20): 3392-4, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26088801

RESUMO

UNLABELLED: We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. AVAILABILITY AND IMPLEMENTATION: The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. CONTACT: hochreit@bioinf.jku.at SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Compostos Químicos , Descoberta de Drogas , Software , Expressão Gênica/efeitos dos fármacos , Internet , Aprendizado de Máquina
14.
Nucleic Acids Res ; 41(21): e198, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24049071

RESUMO

Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional and nonrandomized controlled studies such as the HapMap, the ENCODE or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/noninformative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2400 simulated data sets, I/NI value thresholds of 0.025, 0.05 and 0.1 yielded average specificities of 92, 97 and 99% at sensitivities of 76, 61 and 38%, respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants or quantitative trait loci. The DEXUS R package is publicly available from Bioconductor and the scripts for all experiments are available at http://www.bioinf.jku.at/software/dexus/.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Animais , Projeto HapMap , Humanos , Fígado/metabolismo , Macaca mulatta , Pan troglodytes , Folhas de Planta/genética , Folhas de Planta/metabolismo , Zea mays/genética , Zea mays/metabolismo
15.
16.
Nucleic Acids Res ; 40(9): e69, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22302147

RESUMO

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.


Assuntos
Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Cromossomos Humanos X/química , Projeto HapMap , Humanos , Masculino , Distribuição de Poisson
17.
Nucleic Acids Res ; 39(12): e79, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21486749

RESUMO

Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.


Assuntos
Variações do Número de Cópias de DNA , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Algoritmos , Alelos , Biologia Computacional , Polimorfismo de Nucleotídeo Único
18.
Nat Commun ; 14(1): 7339, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957207

RESUMO

The field of bioimage analysis is currently impacted by a profound transformation, driven by the advancements in imaging technologies and artificial intelligence. The emergence of multi-modal AI systems could allow extracting and utilizing knowledge from bioimaging databases based on information from other data modalities. We leverage the multi-modal contrastive learning paradigm, which enables the embedding of both bioimages and chemical structures into a unified space by means of bioimage and molecular structure encoders. This common embedding space unlocks the possibility of querying bioimaging databases with chemical structures that induce different phenotypic effects. Concretely, in this work we show that a retrieval system based on multi-modal contrastive learning is capable of identifying the correct bioimage corresponding to a given chemical structure from a database of ~2000 candidate images with a top-1 accuracy >70 times higher than a random baseline. Additionally, the bioimage encoder demonstrates remarkable transferability to various further prediction tasks within the domain of drug discovery, such as activity prediction, molecule classification, and mechanism of action identification. Thus, our approach not only addresses the current limitations of bioimaging databases but also paves the way towards foundation models for microscopy images.


Assuntos
Inteligência Artificial , Aprendizagem , Bases de Dados Factuais , Descoberta de Drogas , Conhecimento
19.
MAbs ; 14(1): 2031482, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35377271

RESUMO

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.


Assuntos
Reações Antígeno-Anticorpo , Aprendizado de Máquina , Anticorpos Monoclonais/química , Sítios de Ligação de Anticorpos , Epitopos
20.
Nat Comput Sci ; 2(12): 845-865, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38177393

RESUMO

Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity. We formalized common immunological antibody-specificity prediction problems as ML tasks and confirmed that for both sequence- and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework has the potential to enable real-world relevant development and benchmarking of ML strategies for biotherapeutics design.


Assuntos
Anticorpos , Reações Antígeno-Anticorpo , Especificidade de Anticorpos , Epitopos/química , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA