Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Cell Rep Methods ; 2(2): 100171, 2022 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-35474966

RESUMO

We present deep link prediction (DLP), a method for the interpretation of loss-of-function screens. Our approach uses representation-based link prediction to reprioritize phenotypic readouts by integrating screening experiments with gene-gene interaction networks. We validate on 2 different loss-of-function technologies, RNAi and CRISPR, using datasets obtained from DepMap. Extensive benchmarking shows that DLP-DeepWalk outperforms other methods in recovering cell-specific dependencies, achieving an average precision well above 90% across 7 different cancer types and on both RNAi and CRISPR data. We show that the genes ranked highest by DLP-DeepWalk are appreciably more enriched in drug targets compared to the ranking based on original screening scores. Interestingly, this enrichment is more pronounced on RNAi data compared to CRISPR data, consistent with the greater inherent noise of RNAi screens. Finally, we demonstrate how DLP-DeepWalk can infer the molecular mechanism through which putative targets trigger cell line mortality.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Interferência de RNA , Linhagem Celular
2.
BMC Bioinformatics ; 12 Suppl 1: S37, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21342568

RESUMO

BACKGROUND: With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set. RESULTS: We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds. CONCLUSIONS: ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Análise por Conglomerados , Bases de Dados Genéticas , Escherichia coli/genética , Análise de Sequência com Séries de Oligonucleotídeos , Regulon
3.
J Biomed Inform ; 44(2): 319-25, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21167313

RESUMO

Newborn screening programs for severe metabolic disorders using tandem mass spectrometry are widely used. Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) is the most prevalent mitochondrial fatty acid oxidation defect (1:15,000 newborns) and it has been proven that early detection of this metabolic disease decreases mortality and improves the outcome. In previous studies, data mining methods on derivatized tandem MS datasets have shown high classification accuracies. However, no machine learning methods currently have been applied to datasets based on non-derivatized screening methods. A dataset with 44,159 blood samples was collected using a non-derivatized screening method as part of a systematic newborn screening by the PCMA screening center (Belgium). Twelve MCADD cases were present in this partially MCADD-enriched dataset. We extended three data mining methods, namely C4.5 decision trees, logistic regression and ridge logistic regression, with a parameter and threshold optimization method and evaluated their applicability as a diagnostic support tool. Within a stratified cross-validation setting, a grid search was performed for each model for a wide range of model parameters, included variables and classification thresholds. The best performing model used ridge logistic regression and achieved a sensitivity of 100%, a specificity of 99.987% and a positive predictive value of 32% (recalibrated for a real population), obtained in a stratified cross-validation setting. These results were further validated on an independent test set. Using a method that combines ridge logistic regression with variable selection and threshold optimization, a significantly improved performance was achieved compared to the current state-of-the-art for derivatized data, while retaining more interpretability and requiring less variables. The results indicate the potential value of data mining methods as a diagnostic support tool.


Assuntos
Mineração de Dados/métodos , Triagem Neonatal/métodos , Espectrometria de Massas em Tandem/métodos , Acil-CoA Desidrogenase/classificação , Acil-CoA Desidrogenase/deficiência , Inteligência Artificial , Bélgica , Humanos , Recém-Nascido , Erros Inatos do Metabolismo Lipídico/classificação
4.
Bioinformatics ; 25(18): 2450-1, 2009 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-19587224

RESUMO

MOTIVATION: We developed ViTraM, a tool that allows visualizing overlapping transcriptional modules in an intuitive way. By visualizing not only the genes and the experiments in which the genes are co-expressed, but also additional properties of the modules such as the regulators and regulatory motifs that are responsible for the observed co-expression, ViTraM can assist in the biological analysis and interpretation of the output of module detection tools. AVAILABILITY: The ViTraM software is platform-independent. The software and supplementary material are available at: http://homes.esat.kuleuven.be/~kmarchal/ViTraM/Index.html


Assuntos
Biologia Computacional/métodos , Software , Transcrição Gênica , Interface Usuário-Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Análise de Sequência de DNA
5.
BMC Bioinformatics ; 8 Suppl 2: S5, 2007 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-17493254

RESUMO

BACKGROUND: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. RESULTS: Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators. CONCLUSION: We show that data simulators such as SynTReN are very well suited for the purpose of developing, testing and improving module network algorithms. We used SynTReN data to develop and test an alternative module network learning strategy, which is incorporated in the software package LeMoNe, and we provide evidence that this alternative strategy has several advantages with respect to existing methods.


Assuntos
Algoritmos , Inteligência Artificial , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Validação de Programas de Computador , Software , Simulação por Computador , Regulação da Expressão Gênica/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Biologia de Sistemas/métodos
6.
BMC Bioinformatics ; 7: 43, 2006 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-16438721

RESUMO

BACKGROUND: The development of algorithms to infer the structure of gene regulatory networks based on expression data is an important subject in bioinformatics research. Validation of these algorithms requires benchmark data sets for which the underlying network is known. Since experimental data sets of the appropriate size and design are usually not available, there is a clear need to generate well-characterized synthetic data sets that allow thorough testing of learning algorithms in a fast and reproducible manner. RESULTS: In this paper we describe a network generator that creates synthetic transcriptional regulatory networks and produces simulated gene expression data that approximates experimental data. Network topologies are generated by selecting subnetworks from previously described regulatory networks. Interaction kinetics are modeled by equations based on Michaelis-Menten and Hill kinetics. Our results show that the statistical properties of these topologies more closely approximate those of genuine biological networks than do those of different types of random graph models. Several user-definable parameters adjust the complexity of the resulting data set with respect to the structure learning algorithms. CONCLUSION: This network generation technique offers a valid alternative to existing methods. The topological characteristics of the generated networks more closely resemble the characteristics of real transcriptional networks. Simulation of the network scales well to large networks. The generator models different types of biological interactions and produces biologically plausible synthetic gene expression data.


Assuntos
Algoritmos , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Transdução de Sinais/fisiologia , Validação de Programas de Computador , Software , Fatores de Transcrição/metabolismo , Inteligência Artificial , Benchmarking/métodos , Simulação por Computador , Bases de Dados Factuais
7.
J Am Med Inform Assoc ; 23(e1): e11-9, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26316458

RESUMO

OBJECTIVE: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation. METHODS: Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties. RESULTS: When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes. DISCUSSION: Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach. CONCLUSIONS: We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.


Assuntos
Codificação Clínica/métodos , Registros Eletrônicos de Saúde/organização & administração , Classificação Internacional de Doenças , Mineração de Dados , Conjuntos de Dados como Assunto , Humanos , Aprendizado de Máquina
8.
Genome Med ; 6(10): 74, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25352915

RESUMO

Interpretation of the multitude of variants obtained from next generation sequencing (NGS) is labor intensive and complex. Web-based interfaces such as Galaxy streamline the generation of variant lists but lack flexibility in the downstream annotation and filtering that are necessary to identify causative variants in medical genomics. To this end, we built VariantDB, a web-based interactive annotation and filtering platform that automatically annotates variants with allele frequencies, functional impact, pathogenicity predictions and pathway information. VariantDB allows filtering by all annotations, under dominant, recessive or de novo inheritance models and is freely available at http://www.biomina.be/app/variantdb/.

9.
Artif Life ; 14(1): 49-63, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18171130

RESUMO

The development of structure-learning algorithms for gene regulatory networks depends heavily on the availability of synthetic data sets that contain both the original network and associated expression data. This article reports the application of SynTReN, an existing network generator that samples topologies from existing biological networks and uses Michaelis-Menten and Hill enzyme kinetics to simulate gene interactions. We illustrate the effects of different aspects of the expression data on the quality of the inferred network. The tested expression data parameters are network size, network topology, type and degree of noise, quantity of expression data, and interaction types between genes. This is done by applying three well-known inference algorithms to SynTReN data sets. The results show the power of synthetic data in revealing operational characteristics of inference algorithms that are unlikely to be discovered by means of biological microarray data only.


Assuntos
Algoritmos , Simulação por Computador , Redes Reguladoras de Genes , Modelos Biológicos , Transcrição Gênica , Inteligência Artificial , Bases de Dados Genéticas , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA