Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Mais filtros

Base de dados
Intervalo de ano de publicação
Am J Hum Genet ; 107(4): 622-635, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32946763


Quantifying the functional effects of complex disease risk variants can provide insights into mechanisms underlying disease biology. Genome-wide association studies have identified 39 regions associated with risk of epithelial ovarian cancer (EOC). The vast majority of these variants lie in the non-coding genome, where they likely function through interaction with gene regulatory elements. In this study we first estimated the heritability explained by known common low penetrance risk alleles for EOC. The narrow sense heritability (hg2) of EOC overall and high-grade serous ovarian cancer (HGSOCs) were estimated to be 5%-6%. Partitioned SNP heritability across broad functional categories indicated a significant contribution of regulatory elements to EOC heritability. We collated epigenomic profiling data for 77 cell and tissue types from Roadmap Epigenomics and ENCODE, and from H3K27Ac ChIP-seq data generated in 26 ovarian cancer and precursor-related cell and tissue types. We identified significant enrichment of risk single-nucleotide polymorphisms (SNPs) in active regulatory elements marked by H3K27Ac in HGSOCs. To further investigate how risk SNPs in active regulatory elements influence predisposition to ovarian cancer, we used motifbreakR to predict the disruption of transcription factor binding sites. We identified 469 candidate causal risk variants in H3K27Ac peaks that are predicted to significantly break transcription factor (TF) motifs. The most frequently broken motif was REST (p value = 0.0028), which has been reported as both a tumor suppressor and an oncogene. Overall, these systematic functional annotations with epigenomic data improve interpretation of EOC risk variants and shed light on likely cells of origin.

Carcinoma Epitelial do Ovário/genética , Proteínas Correpressoras/genética , Cistadenocarcinoma Seroso/genética , Elementos Facilitadores Genéticos , Histonas/genética , Proteínas do Tecido Nervoso/genética , Neoplasias Ovarianas/genética , Alelos , Sítios de Ligação , Carcinoma Epitelial do Ovário/diagnóstico , Carcinoma Epitelial do Ovário/patologia , Mapeamento Cromossômico , Proteínas Correpressoras/metabolismo , Cistadenocarcinoma Seroso/diagnóstico , Cistadenocarcinoma Seroso/patologia , Feminino , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Histonas/metabolismo , Humanos , Padrões de Herança , Proteínas do Tecido Nervoso/metabolismo , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/patologia , Penetrância , Polimorfismo de Nucleotídeo Único , Risco
Genome Biol Evol ; 11(7): 1813-1828, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31114856


Transcription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.

Cromatina/metabolismo , Proteínas de Drosophila/metabolismo , Fatores de Transcrição/metabolismo , Animais , Cromatina/genética , Proteínas de Drosophila/genética , Drosophila melanogaster , Evolução Molecular , Ligação Proteica , Fatores de Transcrição/genética
Elife ; 62017 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-28792889


Sequence variation within enhancers plays a major role in both evolution and disease, yet its functional impact on transcription factor (TF) occupancy and enhancer activity remains poorly understood. Here, we assayed the binding of five essential TFs over multiple stages of embryogenesis in two distant Drosophila species (with 1.4 substitutions per neutral site), identifying thousands of orthologous enhancers with conserved or diverged combinatorial occupancy. We used these binding signatures to dissect two properties of developmental enhancers: (1) potential TF cooperativity, using signatures of co-associations and co-divergence in TF occupancy. This revealed conserved combinatorial binding despite sequence divergence, suggesting protein-protein interactions sustain conserved collective occupancy. (2) Enhancer in-vivo activity, revealing orthologous enhancers with conserved activity despite divergence in TF occupancy. Taken together, we identify enhancers with diverged motifs yet conserved occupancy and others with diverged occupancy yet conserved activity, emphasising the need to functionally measure the effect of divergence on enhancer activity.

DNA/metabolismo , Elementos Facilitadores Genéticos , Evolução Molecular , Fatores de Transcrição/metabolismo , Animais , Drosophila/embriologia , Drosophila/genética , Ligação Proteica
Nucleic Acids Res ; 44(13): e120, 2016 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-27257066


Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.

Proteínas de Ligação a DNA/genética , Drosophila melanogaster/genética , Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Animais , Sítios de Ligação/genética , Biologia Computacional , DNA/genética , Proteínas de Ligação a DNA/biossíntese , Matrizes de Pontuação de Posição Específica , Sequências Reguladoras de Ácido Nucleico/genética , Termodinâmica
Biophys J ; 108(5): 1257-67, 2015 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-25762337


Prediction of gene expression levels from regulatory sequences is one of the major challenges of genomic biology today. A particularly promising approach to this problem is that taken by thermodynamics-based models that interpret an enhancer sequence in a given cellular context specified by transcription factor concentration levels and predict precise expression levels driven by that enhancer. Such models have so far not accounted for the effect of chromatin accessibility on interactions between transcription factor and DNA and consequently on gene-expression levels. Here, we extend a thermodynamics-based model of gene expression, called GEMSTAT (Gene Expression Modeling Based on Statistical Thermodynamics), to incorporate chromatin accessibility data and quantify its effect on accuracy of expression prediction. In the new model, called GEMSTAT-A, accessibility at a binding site is assumed to affect the transcription factor's binding strength at the site, whereas all other aspects are identical to the GEMSTAT model. We show that this modification results in significantly better fits in a data set of over 30 enhancers regulating spatial expression patterns in the blastoderm-stage Drosophila embryo. It is important to note that the improved fits result not from an overall elevated accessibility in active enhancers but from the variation of accessibility levels within an enhancer. With whole-genome DNA accessibility measurements becoming increasingly popular, our work demonstrates how such data may be useful for sequence-to-expression models. It also calls for future advances in modeling accessibility levels from sequence and the transregulatory context, so as to predict accurately the effect of cis and trans perturbations on gene expression.

Montagem e Desmontagem da Cromatina , Cromatina/genética , Modelos Genéticos , Animais , Cromatina/metabolismo , Drosophila/genética , Drosophila/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento , Termodinâmica
Artigo em Inglês | MEDLINE | ID: mdl-24091394


The amount of gene expression data of microarray has grown exponentially. To apply them for extensive studies, integrated analysis of cross-laboratory (cross-lab) data becomes a trend, and thus, choosing an appropriate feature selection method is an essential issue. This paper focuses on feature selection for Affymetrix (Affy) microarray studies across different labs. We investigate four feature selection methods: $(t)$-test, significance analysis of microarrays (SAM), rank products (RP), and random forest (RF). The four methods are applied to acute lymphoblastic leukemia, acute myeloid leukemia, breast cancer, and lung cancer Affy data which consist of three cross-lab data sets each. We utilize a rank-based normalization method to reduce the bias from cross-lab data sets. Training on one data set or two combined data sets to test the remaining data set(s) are both considered. Balanced accuracy is used for prediction evaluation. This study provides comprehensive comparisons of the four feature selection methods in cross-lab microarray analysis. Results show that SAM has the best classification performance. RF also gets high classification accuracy, but it is not as stable as SAM. The most naive method is $(t)$-test, but its performance is the worst among the four methods. In this study, we further discuss the influence from the number of training samples, the number of selected genes, and the issue of unbalanced data sets.

Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Neoplasias/genética , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Bases de Dados Factuais , Regulação Neoplásica da Expressão Gênica/genética , Humanos