Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 113(45): E7020-E7029, 2016 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-27791097

RESUMO

Eukaryotic genomes are organized into domains of differing structure and activity. There is evidence that the domain organization of the genome regulates its activity, yet our understanding of domain properties and the factors that influence their formation is poor. Here, we use chromatin state analyses in early embryos and third-larval stage (L3) animals to investigate genome domain organization and its regulation in Caenorhabditis elegans At both stages we find that the genome is organized into extended chromatin domains of high or low gene activity defined by different subsets of states, and enriched for H3K36me3 or H3K27me3, respectively. The border regions between domains contain large intergenic regions and a high density of transcription factor binding, suggesting a role for transcription regulation in separating chromatin domains. Despite the differences in cell types, overall domain organization is remarkably similar in early embryos and L3 larvae, with conservation of 85% of domain border positions. Most genes in high-activity domains are expressed in the germ line and broadly across cell types, whereas low-activity domains are enriched for genes that are developmentally regulated. We find that domains are regulated by the germ-line H3K36 methyltransferase MES-4 and that border regions show striking remodeling of H3K27me1, supporting roles for H3K36 and H3K27 methylation in regulating domain structure. Our analyses of C. elegans chromatin domain structure show that genes are organized by type into domains that have differing modes of regulation.

2.
BMC Genomics ; 11: 286, 2010 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-20459624

RESUMO

BACKGROUND: Many algorithms for finding transcription factor binding sites have concentrated on the characterisation of the binding site itself: and these algorithms lead to a large number of false positive sites. The DNA sequence which does not bind has been modeled only to the extent necessary to complement this formulation. RESULTS: We find that the human genome may be described by 19 pairs of mosaic classes, each defined by its base frequencies, (or more precisely by the frequencies of doublets), so that typically a run of 10 to 100 bases belongs to the same class. Most experimentally verified binding sites are in the same four pairs of classes. In our sample of seventeen transcription factors - taken from different families of transcription factors - the average proportion of sites in this subset of classes was 75%, with values for individual factors ranging from 48% to 98%. By contrast these same classes contain only 26% of the bases of the genome and only 31% of occurrences of the motifs of these factors - that is places where one might expect the factors to bind. These results are not a consequence of the class composition in promoter regions. CONCLUSIONS: This method of analysis will help to find transcription factor binding sites and assist with the problem of false positives. These results also imply a profound difference between the mosaic classes.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Fatores de Transcrição/genética , Sítios de Ligação , Humanos
3.
BMC Genomics ; 11: 30, 2010 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-20074339

RESUMO

BACKGROUND: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets. RESULTS: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance. CONCLUSIONS: We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Fatores de Transcrição/metabolismo , Algoritmos , Motivos de Aminoácidos , Sítios de Ligação , Cadeias de Markov
4.
BMC Genomics ; 9: 43, 2008 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-18221531

RESUMO

BACKGROUND: For eukaryotes, there is almost no strand bias with regard to base composition, with exceptions for origins of replication and transcription start sites and transcribed regions. This paper revisits the question for subsequences of DNA taken at random from the genome. RESULTS: For a typical mammal, for example mouse or human, there is a small strand bias throughout the genomic DNA: there is a correlation between (G - C) and (A - T) on the same strand, (that is between the difference in the number of guanine and cytosine bases and the difference in the number of adenine and thymine bases). For small subsequences - up to 1 kb - this correlation is weak but positive; but for large windows - around 50 kb to 2 Mb - the correlation is strong and negative. This effect is largely independent of GC%. Transcribed and untranscribed regions give similar correlations both for small and large subsequences, but there is a difference in these regions for intermediate sized subsequences. An analysis of the human genome showed that position within the isochore structure did not affect these correlations. An analysis of available genomes of different species shows that this contrast between large and small windows is a general feature of mammals and birds. Further down the evolutionary tree, other organisms show a similar but smaller effect. Except for the nematode, all the animals analysed showed at least a small effect. CONCLUSION: The correlations on the large scale may be explained by DNA replication. Transcription may be a modifier of these effects but is not the fundamental cause. These results cast light on how DNA mutations affect the genome over evolutionary time. At least for vertebrates, there is a broad relationship between body temperature and the size of the correlation. The genome of mammals and birds has a structure marked by strand bias segments.


Assuntos
Composição de Bases/genética , DNA/genética , Genoma/genética , Animais , Sequência de Bases , Aves/genética , DNA/química , Replicação do DNA , Humanos , Isocoros/genética , Mamíferos/genética , Transcrição Gênica
5.
BMC Genomics ; 9: 16, 2008 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-18194530

RESUMO

BACKGROUND: On a single strand of genomic DNA the number of As is usually about equal to the number of Ts (and similarly for Gs and Cs), but deviations have been noted for transcribed regions and origins of replication. RESULTS: The mouse genome is shown to have a segmented structure defined by strand bias. Transcription is known to cause a strand bias and numerous analyses are presented to show that the strand bias in question is not caused by transcription. However, these strand bias segments influence the position of genes and their unspliced length. The position of genes within the strand bias structure affects the probability that a gene is switched on and its expression level. Transcription has a highly directional flow within this structure and the peak volume of transcription is around 20 kb from the A-rich/T-rich segment boundary on the T-rich side, directed away from the boundary. The A-rich/T-rich boundaries are SATB1 binding regions, whereas the T-rich/A-rich boundary regions are not. CONCLUSION: The direct cause of the strand bias structure may be DNA replication. The strand bias segments represent a further biological feature, the chromatin structure, which in turn influences the ease of transcription.


Assuntos
Sequência Rica em At , Cromatina/genética , DNA/química , DNA/genética , Expressão Gênica , Animais , Replicação do DNA , Interpretação Estatística de Dados , Genoma , Genômica/estatística & dados numéricos , Camundongos , Modelos Genéticos , Sítio de Iniciação de Transcrição , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA