Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
J Chem Inf Model ; 2024 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-39029090

RESUMEN

Since the rise of generative AI models, many goal-directed molecule generators have been proposed as tools for discovering novel drug candidates. However, molecule generators often produce highly similar molecules and tend to overemphasize conformity to an imperfect scoring function rather than capturing the true underlying properties sought. We rectify these two shortcomings by offering diversity-based evaluations using the #Circles metric and considering constraints on scoring function calls or computation time. Our findings highlight the superior performance of SMILES-based autoregressive models in generating diverse sets of desired molecules compared to graph-based models or genetic algorithms.

2.
J Chem Inf Model ; 64(7): 2539-2553, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38185877

RESUMEN

A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Proteínas , Desarrollo de Medicamentos , Descubrimiento de Drogas
3.
J Chem Inf Model ; 62(9): 2111-2120, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-35034452

RESUMEN

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

4.
J Med Syst ; 46(5): 23, 2022 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-35348909

RESUMEN

Many previous studies claim to have developed machine learning models that diagnose COVID-19 from blood tests. However, we hypothesize that changes in the underlying distribution of the data, so called domain shifts, affect the predictive performance and reliability and are a reason for the failure of such machine learning models in clinical application. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (way of taking samples, laboratory procedures), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. We investigate whether domain shifts are present in COVID-19 datasets and how they affect machine learning methods. We further set out to estimate the mortality risk based on routinely acquired blood tests in a hospital setting throughout pandemics and under domain shifts. We reveal domain shifts by evaluating the models on a large-scale dataset with different assessment strategies, such as temporal validation. We present the novel finding that domain shifts strongly affect machine learning models for COVID-19 diagnosis and deteriorate their predictive performance and credibility. Therefore, frequent re-training and re-assessment are indispensable for robust models enabling clinical utility.


Asunto(s)
COVID-19 , COVID-19/diagnóstico , Prueba de COVID-19 , Pruebas Hematológicas , Humanos , Aprendizaje Automático , Reproducibilidad de los Resultados
5.
J Lipid Res ; 60(11): 1922-1934, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31530576

RESUMEN

During pregnancy, extravillous trophoblasts (EVTs) invade the maternal decidua and remodel the local vasculature to establish blood supply for the growing fetus. Compromised EVT function has been linked to aberrant pregnancy associated with maternal and fetal morbidity and mortality. However, metabolic features of this invasive trophoblast subtype are largely unknown. Using primary human trophoblasts isolated from first trimester placental tissues, we show that cellular cholesterol homeostasis is differentially regulated in EVTs compared with villous cytotrophoblasts. Utilizing RNA-sequencing, gene set-enrichment analysis, and functional validation, we provide evidence that EVTs display increased levels of free and esterified cholesterol. Accordingly, EVTs are characterized by increased expression of the HDL-receptor, scavenger receptor class B type I, and reduced expression of the LXR and its target genes. We further reveal that EVTs express elevated levels of hydroxy-delta-5-steroid dehydrogenase 3 beta- and steroid delta-isomerase 1 (HSD3B1) (a rate-limiting enzyme in progesterone synthesis) and are capable of secreting progesterone. Increasing cholesterol export by LXR activation reduced progesterone secretion in an ABCA1-dependent manner. Importantly, HSD3B1 expression was decreased in EVTs of idiopathic recurrent spontaneous abortions, pointing toward compromised progesterone metabolism in EVTs of early miscarriages. Here, we provide insights into the regulation of cholesterol and progesterone metabolism in trophoblastic subtypes and its putative relevance in human miscarriage.


Asunto(s)
Aborto Habitual/metabolismo , Colesterol/metabolismo , Progesterona/metabolismo , Trofoblastos/metabolismo , Biología Computacional , Femenino , Homeostasis , Humanos , Embarazo , Análisis de Secuencia de ARN
6.
Bioinformatics ; 34(9): 1538-1546, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29253077

RESUMEN

Motivation: While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results: DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation: DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Contact: klambauer@bioinf.jku.at. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Biología Computacional/métodos , Aprendizaje Profundo , Perfilación de la Expresión Génica/métodos , Neoplasias/tratamiento farmacológico , Programas Informáticos , Protocolos de Quimioterapia Combinada Antineoplásica/farmacología , Línea Celular Tumoral , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias/genética , Máquina de Vectores de Soporte
7.
J Chem Inf Model ; 59(3): 1163-1171, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-30840449

RESUMEN

Predicting the outcome of biological assays based on high-throughput imaging data is a highly promising task in drug discovery since it can tremendously increase hit rates and suggest novel chemical scaffolds. However, end-to-end learning with convolutional neural networks (CNNs) has not been assessed for the task biological assay prediction despite the success of these networks at visual recognition. We compared several CNNs trained directly on high-throughput imaging data to a) CNNs trained on cell-centric crops and to b) the current state-of-the-art: fully connected networks trained on precalculated morphological cell features. The comparison was performed on the Cell Painting data set, the largest publicly available data set of microscopic images of cells with approximately 30,000 compound treatments. We found that CNNs perform significantly better at predicting the outcome of assays than fully connected networks operating on precomputed morphological features of cells. Surprisingly, the best performing method could predict 32% of the 209 biological assays at high predictive performance (AUC > 0.9) indicating that the cell morphology changes contain a large amount of information about compound activities. Our results suggest that many biological assays could be replaced by high-throughput imaging together with convolutional neural networks and that the costly cell segmentation and feature extraction step can be replaced by convolutional neural networks.


Asunto(s)
Bioensayo , Biología Computacional/métodos , Microscopía , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador
8.
J Chem Inf Model ; 59(3): 962-972, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-30408959

RESUMEN

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Aprendizaje Automático , Redes Neurales de la Computación
9.
Drug Discov Today Technol ; 32-33: 55-63, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33386095

RESUMEN

There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.


Asunto(s)
Descubrimiento de Drogas , Modelos Moleculares , Humanos
11.
J Chem Inf Model ; 58(9): 1736-1741, 2018 09 24.
Artículo en Inglés | MEDLINE | ID: mdl-30118593

RESUMEN

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, method comparison is difficult because of various flaws of the currently employed evaluation metrics. We propose an evaluation metric for generative models called Fréchet ChemNet distance (FCD). The advantage of the FCD over previous metrics is that it can detect whether generated molecules are diverse and have similar chemical and biological properties as real molecules.


Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas , Simulación por Computador , Bases de Datos Factuales , Modelos Moleculares , Programas Informáticos
12.
Hum Mutat ; 38(7): 889-897, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-28449315

RESUMEN

Targeted next-generation-sequencing (NGS) panels have largely replaced Sanger sequencing in clinical diagnostics. They allow for the detection of copy-number variations (CNVs) in addition to single-nucleotide variants and small insertions/deletions. However, existing computational CNV detection methods have shortcomings regarding accuracy, quality control (QC), incidental findings, and user-friendliness. We developed panelcn.MOPS, a novel pipeline for detecting CNVs in targeted NGS panel data. Using data from 180 samples, we compared panelcn.MOPS with five state-of-the-art methods. With panelcn.MOPS leading the field, most methods achieved comparably high accuracy. panelcn.MOPS reliably detected CNVs ranging in size from part of a region of interest (ROI), to whole genes, which may comprise all ROIs investigated in a given sample. The latter is enabled by analyzing reads from all ROIs of the panel, but presenting results exclusively for user-selected genes, thus avoiding incidental findings. Additionally, panelcn.MOPS offers QC criteria not only for samples, but also for individual ROIs within a sample, which increases the confidence in called CNVs. panelcn.MOPS is freely available both as R package and standalone software with graphical user interface that is easy to use for clinical geneticists without any programming experience. panelcn.MOPS combines high sensitivity and specificity with user-friendliness rendering it highly suitable for routine clinical diagnostics.


Asunto(s)
Biología Computacional , Variaciones en el Número de Copia de ADN , Bases de Datos Genéticas , Algoritmos , Gráficos por Computador , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Control de Calidad , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de ADN , Programas Informáticos , Interfaz Usuario-Computador
14.
Bioinformatics ; 31(20): 3392-4, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26088801

RESUMEN

UNLABELLED: We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. AVAILABILITY AND IMPLEMENTATION: The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. CONTACT: hochreit@bioinf.jku.at SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases de Datos de Compuestos Químicos , Descubrimiento de Drogas , Programas Informáticos , Expresión Génica/efectos de los fármacos , Internet , Aprendizaje Automático
15.
Nucleic Acids Res ; 41(21): e198, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24049071

RESUMEN

Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional and nonrandomized controlled studies such as the HapMap, the ENCODE or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/noninformative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2400 simulated data sets, I/NI value thresholds of 0.025, 0.05 and 0.1 yielded average specificities of 92, 97 and 99% at sensitivities of 76, 61 and 38%, respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants or quantitative trait loci. The DEXUS R package is publicly available from Bioconductor and the scripts for all experiments are available at http://www.bioinf.jku.at/software/dexus/.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Animales , Proyecto Mapa de Haplotipos , Humanos , Hígado/metabolismo , Macaca mulatta , Pan troglodytes , Hojas de la Planta/genética , Hojas de la Planta/metabolismo , Zea mays/genética , Zea mays/metabolismo
16.
17.
Nucleic Acids Res ; 40(9): e69, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22302147

RESUMEN

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.


Asunto(s)
Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos , Cromosomas Humanos X/química , Proyecto Mapa de Haplotipos , Humanos , Masculino , Distribución de Poisson
18.
Nucleic Acids Res ; 39(12): e79, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21486749

RESUMEN

Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.


Asunto(s)
Variaciones en el Número de Copia de ADN , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Algoritmos , Alelos , Biología Computacional , Polimorfismo de Nucleótido Simple
19.
Nat Commun ; 14(1): 7339, 2023 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-37957207

RESUMEN

The field of bioimage analysis is currently impacted by a profound transformation, driven by the advancements in imaging technologies and artificial intelligence. The emergence of multi-modal AI systems could allow extracting and utilizing knowledge from bioimaging databases based on information from other data modalities. We leverage the multi-modal contrastive learning paradigm, which enables the embedding of both bioimages and chemical structures into a unified space by means of bioimage and molecular structure encoders. This common embedding space unlocks the possibility of querying bioimaging databases with chemical structures that induce different phenotypic effects. Concretely, in this work we show that a retrieval system based on multi-modal contrastive learning is capable of identifying the correct bioimage corresponding to a given chemical structure from a database of ~2000 candidate images with a top-1 accuracy >70 times higher than a random baseline. Additionally, the bioimage encoder demonstrates remarkable transferability to various further prediction tasks within the domain of drug discovery, such as activity prediction, molecule classification, and mechanism of action identification. Thus, our approach not only addresses the current limitations of bioimaging databases but also paves the way towards foundation models for microscopy images.


Asunto(s)
Inteligencia Artificial , Aprendizaje , Bases de Datos Factuales , Descubrimiento de Drogas , Conocimiento
20.
MAbs ; 14(1): 2031482, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35377271

RESUMEN

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.


Asunto(s)
Reacciones Antígeno-Anticuerpo , Aprendizaje Automático , Anticuerpos Monoclonales/química , Sitios de Unión de Anticuerpos , Epítopos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA