Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37642660

RESUMEN

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Asunto(s)
Benchmarking , Relación Estructura-Actividad Cuantitativa , Bioensayo , Aprendizaje Automático
2.
Genome Biol ; 24(1): 224, 2023 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-37798735

RESUMEN

BACKGROUND: Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ([Formula: see text]). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects. RESULTS: We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case-control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis. CONCLUSIONS: In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.


Asunto(s)
Enfermedades Inflamatorias del Intestino , Dinámicas no Lineales , Humanos , Tamaño de la Muestra , Enfermedades Inflamatorias del Intestino/genética , Redes Neurales de la Computación , Fenotipo
3.
Pharmaceutics ; 15(3)2023 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-36986760

RESUMEN

In vitro non-cellular permeability models such as the parallel artificial membrane permeability assay (PAMPA) are widely applied tools for early-phase drug candidate screening. In addition to the commonly used porcine brain polar lipid extract for modeling the blood-brain barrier's permeability, the total and polar fractions of bovine heart and liver lipid extracts were investigated in the PAMPA model by measuring the permeability of 32 diverse drugs. The zeta potential of the lipid extracts and the net charge of their glycerophospholipid components were also determined. Physicochemical parameters of the 32 compounds were calculated using three independent forms of software (Marvin Sketch, RDKit, and ACD/Percepta). The relationship between the lipid-specific permeabilities and the physicochemical descriptors of the compounds was investigated using linear correlation, Spearman correlation, and PCA analysis. While the results showed only subtle differences between total and polar lipids, permeability through liver lipids highly differed from that of the heart or brain lipid-based models. Correlations between the in silico descriptors (e.g., number of amide bonds, heteroatoms, and aromatic heterocycles, accessible surface area, and H-bond acceptor-donor balance) of drug molecules and permeability values were also found, which provides support for understanding tissue-specific permeability.

4.
Bioinformatics ; 37(16): 2275-2281, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33560405

RESUMEN

MOTIVATION: Modern bioinformatics is facing increasingly complex problems to solve, and we are indeed rapidly approaching an era in which the ability to seamlessly integrate heterogeneous sources of information will be crucial for the scientific progress. Here, we present a novel non-linear data fusion framework that generalizes the conventional matrix factorization paradigm allowing inference over arbitrary entity-relation graphs, and we applied it to the prediction of protein-protein interactions (PPIs). Improving our knowledge of PPI networks at the proteome scale is indeed crucial to understand protein function, physiological and disease states and cell life in general. RESULTS: We devised three data fusion-based models for the proteome-level prediction of PPIs, and we show that our method outperforms state of the art approaches on common benchmarks. Moreover, we investigate its predictions on newly published PPIs, showing that this new data has a clear shift in its underlying distributions and we thus train and test our models on this extended dataset. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
J Chem Inf Model ; 60(10): 4506-4517, 2020 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-32924466

RESUMEN

In drug discovery, knowledge of the graph structure of chemical compounds is essential. Many thousands of scientific articles and patents in chemistry and pharmaceutical sciences have investigated chemical compounds, but in many cases, the details of the structure of these chemical compounds are published only as an image. A tool to analyze these images automatically and convert them into a chemical graph structure would be useful for many applications, such as drug discovery. A few such tools are available and they are mostly derived from optical character recognition. However, our evaluation of the performance of these tools reveals that they often make mistakes in recognizing the correct bond multiplicity and stereochemical information. In addition, errors sometimes even lead to missing atoms in the resulting graph. In our work, we address these issues by developing a compound recognition method based on machine learning. More specifically, we develop a deep neural network model for optical compound recognition. The deep learning solution presented here consists of a segmentation model, followed by three classification models that predict atom locations, bonds, and charges. Furthermore, this model not only predicts the graph structure of the molecule but also provides all information necessary to relate each component of the resulting graph to the source image. This solution is scalable and can rapidly process thousands of images. Finally, we empirically compare the proposed method with the well-established tool OSRA1 and observe significant error reduction.


Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas , Aprendizaje Automático , Redes Neurales de la Computación
6.
NAR Genom Bioinform ; 2(1): lqaa011, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33575557

RESUMEN

Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn's disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.

7.
Bioinformatics ; 34(13): i447-i456, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949967

RESUMEN

Motivation: Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known. Results: Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour. Availability and implementation: The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Ontología de Genes , Predisposición Genética a la Enfermedad , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Algoritmos , Teorema de Bayes , Humanos
8.
Cell Chem Biol ; 25(5): 611-618.e3, 2018 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-29503208

RESUMEN

In both academia and the pharmaceutical industry, large-scale assays for drug discovery are expensive and often impractical, particularly for the increasingly important physiologically relevant model systems that require primary cells, organoids, whole organisms, or expensive or rare reagents. We hypothesized that data from a single high-throughput imaging assay can be repurposed to predict the biological activity of compounds in other assays, even those targeting alternate pathways or biological processes. Indeed, quantitative information extracted from a three-channel microscopy-based screen for glucocorticoid receptor translocation was able to predict assay-specific biological activity in two ongoing drug discovery projects. In these projects, repurposing increased hit rates by 50- to 250-fold over that of the initial project assays while increasing the chemical structure diversity of the hits. Our results suggest that data from high-content screens are a rich source of information that can be used to predict and replace customized biological assays.


Asunto(s)
Reposicionamiento de Medicamentos/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Antineoplásicos/farmacología , Línea Celular Tumoral , Ensayos Analíticos de Alto Rendimiento/métodos , Humanos , Neoplasias/tratamiento farmacológico
9.
Oncotarget ; 8(6): 9388-9398, 2017 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-27566582

RESUMEN

Inter-individual differences in toxic symptoms and pharmacokinetics of high-dose methotrexate (MTX) treatment may be caused by genetic variants in the MTX pathway. Correlations between polymorphisms and pharmacokinetic parameters and the occurrence of hepato- and myelotoxicity were studied. Single nucleotide polymorphisms (SNPs) of the ABCB1, ABCC1, ABCC2, ABCC3, ABCC10, ABCG2, GGH, SLC19A1 and NR1I2 genes were analyzed in 59 patients with osteosarcoma. Univariate association analysis and Bayesian network-based Bayesian univariate and multilevel analysis of relevance (BN-BMLA) were applied. Rare alleles of 10 SNPs of ABCB1, ABCC2, ABCC3, ABCG2 and NR1I2 genes showed a correlation with the pharmacokinetic values and univariate association analysis. The risk of toxicity was associated with five SNPs in the ABCC2 and NR1I2 genes. Pharmacokinetic parameters were associated with four SNPs of the ABCB1, ABCC3, NR1I2, and GGH genes, and toxicity was shown to be associated with ABCC1 rs246219 and ABCC2 rs717620 using the univariate and BN-BMLA method. BN-BMLA analysis detected relevant effects on the AUC0-48 in the following SNPs: ABCB1 rs928256, ABCC3 rs4793665, and GGH rs3758149. In both univariate and multivariate analyses the SNPs ABCB1 rs928256, ABCC3 rs4793665, GGH rs3758149, and NR1I2 rs3814058 SNPs were relevant. These SNPs should be considered in future dose individualization during treatment.


Asunto(s)
Antimetabolitos Antineoplásicos/administración & dosificación , Neoplasias Óseas/tratamiento farmacológico , Metotrexato/administración & dosificación , Osteosarcoma/tratamiento farmacológico , Variantes Farmacogenómicas , Polimorfismo de Nucleótido Simple , Subfamilia B de Transportador de Casetes de Unión a ATP/genética , Subfamilia B de Transportador de Casetes de Unión a ATP/metabolismo , Adolescente , Factores de Edad , Antimetabolitos Antineoplásicos/efectos adversos , Antimetabolitos Antineoplásicos/farmacocinética , Área Bajo la Curva , Teorema de Bayes , Neoplasias Óseas/genética , Neoplasias Óseas/metabolismo , Neoplasias Óseas/patología , Distribución de Chi-Cuadrado , Niño , Femenino , Genotipo , Semivida , Humanos , Masculino , Tasa de Depuración Metabólica , Metotrexato/efectos adversos , Metotrexato/farmacocinética , Proteína 2 Asociada a Resistencia a Múltiples Medicamentos , Proteínas Asociadas a Resistencia a Múltiples Medicamentos/genética , Proteínas Asociadas a Resistencia a Múltiples Medicamentos/metabolismo , Análisis Multivariante , Oportunidad Relativa , Osteosarcoma/genética , Osteosarcoma/metabolismo , Osteosarcoma/patología , Farmacogenética , Fenotipo , Receptor X de Pregnano , Receptores de Esteroides/genética , Receptores de Esteroides/metabolismo , Factores de Riesgo , Resultado del Tratamiento , gamma-Glutamil Hidrolasa/genética , gamma-Glutamil Hidrolasa/metabolismo
10.
Future Med Chem ; 6(5): 563-75, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24649958

RESUMEN

Despite famous serendipitous drug repositioning success stories, systematic projects have not yet delivered the expected results. However, repositioning technologies are gaining ground in different phases of routine drug development, together with new adaptive strategies. We demonstrate the power of the compound information pool, the ever-growing heterogeneous information repertoire of approved drugs and candidates as an invaluable catalyzer in this transition. Systematic, computational utilization of this information pool for candidates in early phases is an open research problem; we propose a novel application of the enrichment analysis statistical framework for fusion of this information pool, specifically for the prediction of indications. Pharmaceutical consequences are formulated for a systematic and continuous knowledge recycling strategy, utilizing this information pool throughout the drug-discovery pipeline.


Asunto(s)
Reposicionamiento de Medicamentos/tendencias , Bases de Datos de Compuestos Químicos , Descubrimiento de Drogas , Reposicionamiento de Medicamentos/economía , Ensayos Analíticos de Alto Rendimiento , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismo , Farmacocinética , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/metabolismo , Transcriptoma
11.
Curr Top Med Chem ; 13(18): 2337-63, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24059461

RESUMEN

Movement disorders are a heterogeneous group of both common and rare neurological conditions characterized by abnormalities of motor functions and movement patterns. This work overviews recent successes and ongoing studies of repositioning relating to this disease group, which underscores the challenge of integrating the voluminous and heterogeneous findings required for making suitable drug repositioning decisions. In silico drug repositioning methods hold the promise of automated fusion of heterogeneous information sources, but the controllable, flexible and transparent incorporation of the expertise of medicinal chemists throughout the repositioning process remains an open challenge. In support of a more systematic approach toward repositioning, we summarize the application of a computational repurposing method based on statistically rooted knowledge fusion. To foster the spread of this technique, we provide a step-by-step guide to the complete workflow, together with a case study in Parkinson's disease.


Asunto(s)
Descubrimiento de Drogas , Reposicionamiento de Medicamentos , Trastornos del Movimiento/tratamiento farmacológico , Animales , Humanos
12.
Neuropsychopharmacol Hung ; 13(3): 139-44, 2011 Sep.
Artículo en Húngaro | MEDLINE | ID: mdl-21876222

RESUMEN

UNLABELLED: There is an ongoing extensive study on the polymorphisms of the oxytocine receptor (OXTR) gene and their relation to certain psychological traits and psychiatric disorders, however the results are contradictory. One of the sources of inconsistency could originate from the fact that the OXTR gene contains more than 270 SNPs (single nucleotide polymorphisms) without clarified molecular effect. GOALS: The aim of this study was an in silico analysis of sequence variations between the human and dog OXTR gene. RESULTS: Comparative analysis of the human and the dog OXTR amino acid sequence revealed that the most robust difference between the two proteins is a five amino acid containing fragment which is present in the human but absent in the dog receptor. In silico addition of this sequence to the dog receptor resulted in a dramatic change in the conformation of the intracellular region. CONCLUSION: In silico comparative analysis of OXTR gene variants among species and individuals might serve as an important cue for predicting the functional effects of genetic variants.


Asunto(s)
Polimorfismo de Nucleótido Simple , Receptores de Oxitocina/genética , Secuencia de Aminoácidos , Animales , Perros , Variación Genética , Humanos , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...