Pesquisa | Biblioteca Virtual em Saúde

Wide-Open: Accelerating public data release by automating detection of overdue datasets.

Grechkin, Maxim; Poon, Hoifung; Howe, Bill.

PLoS Biol ; 15(6): e2002477, 2017 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-28594819

RESUMO

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

Assuntos

Acesso à Informação , Pesquisa Biomédica/métodos , Bases de Dados Genéticas , Animais , Pesquisa Biomédica/tendências , Biotecnologia/tendências , Biologia Computacional/tendências , Mineração de Dados , Bases de Dados Bibliográficas , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Regulação da Expressão Gênica , Humanos , Automação de Bibliotecas , Dados de Sequência Molecular , National Library of Medicine (U.S.) , Publicações Periódicas como Assunto , Reprodutibilidade dos Testes , Fatores de Tempo , Estados Unidos

Identifying Network Perturbation in Cancer.

Grechkin, Maxim; Logsdon, Benjamin A; Gentles, Andrew J; Lee, Su-In.

PLoS Comput Biol ; 12(5): e1004888, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-27145341

RESUMO

We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed-having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.

Assuntos

Redes Reguladoras de Genes , Modelos Genéticos , Neoplasias/genética , Neoplasias da Mama/genética , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas , Epigênese Genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Leucemia Mieloide Aguda/genética , Funções Verossimilhança , Neoplasias Pulmonares/genética , Prognóstico

Pathway Graphical Lasso.

Grechkin, Maxim; Fazel, Maryam; Witten, Daniela; Lee, Su-In.

Proc AAAI Conf Artif Intell ; 2015: 2617-2623, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26167394

RESUMO

Graphical models provide a rich framework for summarizing the dependencies among variables. The graphical lasso approach attempts to learn the structure of a Gaussian graphical model (GGM) by maximizing the log likelihood of the data, subject to an l1 penalty on the elements of the inverse co-variance matrix. Most algorithms for solving the graphical lasso problem do not scale to a very large number of variables. Furthermore, the learned network structure is hard to interpret. To overcome these challenges, we propose a novel GGM structure learning method that exploits the fact that for many real-world problems we have prior knowledge that certain edges are unlikely to be present. For example, in gene regulatory networks, a pair of genes that does not participate together in any of the cellular processes, typically referred to as pathways, is less likely to be connected. In computer vision applications in which each variable corresponds to a pixel, each variable is likely to be connected to the nearby variables. In this paper, we propose the pathway graphical lasso, which learns the structure of a GGM subject to pathway-based constraints. In order to solve this problem, we decompose the network into smaller parts, and use a message-passing algorithm in order to communicate among the subnetworks. Our algorithm has orders of magnitude improvement in run time compared to the state-of-the-art optimization methods for the graphical lasso problem that were modified to handle pathway-based constraints.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA