Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Mol Cell ; 59(1): 62-74, 2015 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-26073540

RESUMO

Thousands of cis-elements in genomes are predicted to have vital functions. Although conservation, activity in surrogate assays, polymorphisms, and disease mutations provide functional clues, deletion from endogenous loci constitutes the gold-standard test. A GATA-2-binding, Gata2 intronic cis-element (+9.5) required for hematopoietic stem cell genesis in mice is mutated in a human immunodeficiency syndrome. Because +9.5 is the only cis-element known to mediate stem cell genesis, we devised a strategy to identify functionally comparable enhancers ("+9.5-like") genome-wide. Gene editing revealed +9.5-like activity to mediate GATA-2 occupancy, chromatin opening, and transcriptional activation. A +9.5-like element resided in Samd14, which encodes a protein of unknown function. Samd14 increased hematopoietic progenitor levels/activity and promoted signaling by a pathway vital for hematopoietic stem/progenitor cell regulation (stem cell factor/c-Kit), and c-Kit rescued Samd14 loss-of-function phenotypes. Thus, the hematopoietic stem/progenitor cell cistrome revealed a mediator of a signaling pathway that has broad importance for stem/progenitor cell biology.


Assuntos
Fator de Transcrição GATA2/genética , Células-Tronco Hematopoéticas/metabolismo , Proteínas/genética , Proteínas Proto-Oncogênicas c-kit/genética , Ativação Transcricional/genética , Sequência de Aminoácidos , Animais , Diferenciação Celular/genética , Linhagem Celular , Camundongos , Dados de Sequência Molecular , Proteínas/metabolismo , Interferência de RNA , RNA Interferente Pequeno , Transdução de Sinais , Transcrição Gênica/genética
2.
Bioinformatics ; 31(20): 3353-5, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26092860

RESUMO

MOTIVATION: Genome-wide association studies revealed that most disease-associated single nucleotide polymorphisms (SNPs) are located in regulatory regions within introns or in regions between genes. Regulatory SNPs (rSNPs) are such SNPs that affect gene regulation by changing transcription factor (TF) binding affinities to genomic sequences. Identifying potential rSNPs is crucial for understanding disease mechanisms. In silico methods that evaluate the impact of SNPs on TF binding affinities are not scalable for large-scale analysis. RESULTS: We describe A: ffinity T: esting for regulatory SNP: s (atSNP), a computationally efficient R package for identifying rSNPs in silico. atSNP implements an importance sampling algorithm coupled with a first-order Markov model for the background nucleotide sequences to test the significance of affinity scores and SNP-driven changes in these scores. Application of atSNP with >20 K SNPs indicates that atSNP is the only available tool for such a large-scale task. atSNP provides user-friendly output in the form of both tables and composite logo plots for visualizing SNP-motif interactions. Evaluations of atSNP with known rSNP-TF interactions indicate that SNP is able to prioritize motifs for a given set of SNPs with high accuracy. AVAILABILITY AND IMPLEMENTATION: https://github.com/keleslab/atSNP. CONTACT: keles@stat.wisc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Regulação da Expressão Gênica , Genômica/métodos , Polimorfismo de Nucleotídeo Único/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Bases de Dados Genéticas , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Ligação Proteica
3.
Bioinformatics ; 30(6): 753-60, 2014 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-23665773

RESUMO

MOTIVATION: ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Although the availability of basic analysis tools for ChIP-seq data is rapidly increasing, there has not been much progress on the related design issues. A challenging question for designing a ChIP-seq experiment is how deeply should the ChIP and the control samples be sequenced? The answer depends on multiple factors some of which can be set by the experimenter based on pilot/preliminary data. The sequencing depth of a ChIP-seq experiment is one of the key factors that determine whether all the underlying targets (e.g. binding locations or epigenomic profiles) can be identified with a targeted power. RESULTS: We developed a statistical framework named CSSP (ChIP-seq Statistical Power) for power calculations in ChIP-seq experiments by considering a local Poisson model, which is commonly adopted by many peak callers. Evaluations with simulations and data-driven computational experiments demonstrate that this framework can reliably estimate the power of a ChIP-seq experiment at different sequencing depths based on pilot data. Furthermore, it provides an analytical approach for calculating the required depth for a targeted power while controlling the false discovery rate at a user-specified level. Hence, our results enable researchers to use their own or publicly available data for determining required sequencing depths of their ChIP-seq experiments and potentially make better use of the multiplexing functionality of the sequencers. Evaluation of power for multiple public ChIP-seq datasets indicate that, currently, typical ChIP-seq studies are powered well for detecting large fold changes of ChIP enrichment over the control sample, but they have considerably less power for detecting smaller fold changes. AVAILABILITY: Available at www.stat.wisc.edu/~zuo/CSSP. CONTACT: keles@stat.wisc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Escherichia coli/genética , Projetos Piloto
4.
J Comput Biol ; 24(6): 472-485, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-27835030

RESUMO

Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each data set (i.e., peak calling) independently. This approach discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between different ChIP-seq samples. Methods geared toward multisample investigations have limited applicability in settings that aim to integrate 100s to 1000s of ChIP-seq data sets for query loci (e.g., thousands of genomic loci with a specific binding site). Recently, Zuo et al. developed a hierarchical framework for state-space matrix inference and clustering, named MBASIC, to enable joint analysis of user-specified loci across multiple ChIP-seq data sets. Although this versatile framework estimates both the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization-based estimation structure hinders its applicability with large number of loci and samples. We address this limitation by developing MAP-based asymptotic derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm that converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. Comparison with MBASIC indicates that this speed comes at a relatively insignificant loss in estimation accuracy. Although MAD-Bayes MBASIC is specifically designed for the analysis of user-specified loci, it is able to capture overall patterns of histone marks from multiple ChIP-seq data sets similar to those identified by genome-wide segmentation methods such as ChromHMM and Spectacle.


Assuntos
Algoritmos , Teorema de Bayes , Imunoprecipitação da Cromatina/métodos , Análise por Conglomerados , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genoma Humano , Genômica , Humanos , Software
5.
Ann Appl Stat ; 10(3): 1348-1372, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29910842

RESUMO

In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC (Matrix Based Analysis for State-space Inference and Clustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA