Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Res Sq ; 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37503119

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

2.
bioRxiv ; 2023 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-37066421

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

3.
Blood ; 140(15): 1674-1685, 2022 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-35960871

RESUMO

The randomized, placebo-controlled, phase 3 QUAZAR AML-001 trial (ClinicalTrials.gov identifier: NCT01757535) evaluated oral azacitidine (Oral-AZA) in patients with acute myeloid leukemia (AML) in first remission after intensive chemotherapy (IC) who were not candidates for hematopoietic stem cell transplantation. Eligible patients were randomized 1:1 to Oral-AZA 300 mg or placebo for 14 days per 28-day cycle. We evaluated relapse-free survival (RFS) and overall survival (OS) in patient subgroups defined by NPM1 and FLT3 mutational status at AML diagnosis and whether survival outcomes in these subgroups were influenced by presence of post-IC measurable residual disease (MRD). Gene mutations at diagnosis were collected from patient case report forms; MRD was determined centrally by multiparameter flow cytometry. Overall, 469 of 472 randomized patients (99.4%) had available mutational data; 137 patients (29.2%) had NPM1 mutations (NPM1mut), 66 patients (14.1%) had FLT3 mutations (FLT3mut; with internal tandem duplications [ITD], tyrosine kinase domain mutations [TKDmut], or both), and 30 patients (6.4%) had NPM1mut and FLT3-ITD at diagnosis. Among patients with NPM1mut, OS and RFS were improved with Oral-AZA by 37% (hazard ratio [HR], 0.63; 95% confidence interval [CI], 0.41-0.98) and 45% (HR, 0.55; 95% CI, 0.35-0.84), respectively, vs placebo. Median OS was improved numerically with Oral-AZA among patients with NPM1mut whether without MRD (48.6 months vs 31.4 months with placebo) or with MRD (46.1 months vs 10.0 months with placebo) post-IC. Among patients with FLT3mut, Oral-AZA improved OS and RFS by 37% (HR, 0.63; 95% CI, 0.35-1.12) and 49% (HR, 0.51; 95% CI, 0.27-0.95), respectively, vs placebo. Median OS with Oral-AZA vs placebo was 28.2 months vs 16.2 months, respectively, for patients with FLT3mut and without MRD and 24.0 months vs 8.0 months for patients with FLT3mut and MRD. In multivariate analyses, Oral-AZA significantly improved survival independent of NPM1 or FLT3 mutational status, cytogenetic risk, or post-IC MRD status.


Assuntos
Leucemia Mieloide Aguda , Proteínas Nucleares , Azacitidina/uso terapêutico , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Mutação , Neoplasia Residual , Proteínas Nucleares/genética , Nucleofosmina , Prognóstico , Proteínas Tirosina Quinases/genética , Recidiva , Indução de Remissão , Tirosina Quinase 3 Semelhante a fms/genética
5.
Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29126249

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.


Assuntos
DNA/genética , Bases de Dados Genéticas , Componentes do Gene , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Animais , Caenorhabditis elegans/genética , Apresentação de Dados , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Previsões , Genoma Humano , Humanos , Camundongos/genética , Interface Usuário-Computador
6.
PLoS One ; 12(4): e0175310, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28403240

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Metadados , Software , Animais , DNA/genética , Genoma , Humanos , Camundongos
7.
Artigo em Inglês | MEDLINE | ID: mdl-26980513

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.


Assuntos
Biologia Computacional/métodos , DNA/genética , Bases de Dados Genéticas , Algoritmos , Animais , Caenorhabditis elegans , Biologia Computacional/normas , Coleta de Dados , Drosophila melanogaster , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Ácidos Nucleicos/genética , Controle de Qualidade , Reprodutibilidade dos Testes , Alinhamento de Sequência
8.
Nucleic Acids Res ; 44(D1): D726-32, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26527727

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica , Animais , DNA/metabolismo , Genes , Humanos , Camundongos , Proteínas/metabolismo , RNA/metabolismo
9.
Artigo em Inglês | MEDLINE | ID: mdl-25776021

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC's use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.


Assuntos
Curadoria de Dados/métodos , Bases de Dados Genéticas , Ontologia Genética , Redes Reguladoras de Genes/fisiologia , Anotação de Sequência Molecular/métodos , Transcrição Gênica/fisiologia , Animais , Humanos , Camundongos
10.
Nucleic Acids Res ; 42(Database issue): D717-25, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24265222

RESUMO

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the community resource for genomic, gene and protein information about the budding yeast Saccharomyces cerevisiae, containing a variety of functional information about each yeast gene and gene product. We have recently added regulatory information to SGD and present it on a new tabbed section of the Locus Summary entitled 'Regulation'. We are compiling transcriptional regulator-target gene relationships, which are curated from the literature at SGD or imported, with permission, from the YEASTRACT database. For nearly every S. cerevisiae gene, the Regulation page displays a table of annotations showing the regulators of that gene, and a graphical visualization of its regulatory network. For genes whose products act as transcription factors, the Regulation page also shows a table of their target genes, accompanied by a Gene Ontology enrichment analysis of the biological processes in which those genes participate. We additionally synthesize information from the literature for each transcription factor in a free-text Regulation Summary, and provide other information relevant to its regulatory function, such as DNA binding site motifs and protein domains. All of the regulation data are available for querying, analysis and download via YeastMine, the InterMine-based data warehouse system in use at SGD.


Assuntos
Bases de Dados Genéticas , Regulação Fúngica da Expressão Gênica , Genoma Fúngico , Saccharomyces cerevisiae/genética , Sítios de Ligação , Redes Reguladoras de Genes , Internet , Estrutura Terciária de Proteína , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Transcrição Gênica
11.
Database (Oxford) ; 2012: bar057, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22434826

RESUMO

The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org. DATABASE URL: http://www.yeastgenome.org.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genoma Fúngico , Anotação de Sequência Molecular , Saccharomycetales/genética , Mapeamento Cromossômico , Saccharomyces/genética
12.
Nucleic Acids Res ; 40(Database issue): D700-5, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22110037

RESUMO

The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.


Assuntos
Bases de Dados Genéticas , Genoma Fúngico , Saccharomyces cerevisiae/genética , Genes Fúngicos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Fenótipo , Software , Terminologia como Assunto
13.
PLoS Comput Biol ; 6: e1000832, 2010 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-20617199

RESUMO

Metazoan genomes encode hundreds of RNA-binding proteins (RBPs). These proteins regulate post-transcriptional gene expression and have critical roles in numerous cellular processes including mRNA splicing, export, stability and translation. Despite their ubiquity and importance, the binding preferences for most RBPs are not well characterized. In vitro and in vivo studies, using affinity selection-based approaches, have successfully identified RNA sequence associated with specific RBPs; however, it is difficult to infer RBP sequence and structural preferences without specifically designed motif finding methods. In this study, we introduce a new motif-finding method, RNAcontext, designed to elucidate RBP-specific sequence and structural preferences with greater accuracy than existing approaches. We evaluated RNAcontext on recently published in vitro and in vivo RNA affinity selected data and demonstrate that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2. The predicted preferences for SF2/ASF are consistent with its recently reported in vivo binding sites. RNAcontext is an accurate and efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures.


Assuntos
Sequência de Aminoácidos , Sítios de Ligação , Conformação Proteica , Proteínas de Ligação a RNA , Algoritmos , Motivos de Aminoácidos , Área Sob a Curva , Sequência de Bases , Bases de Dados de Proteínas , Modelos Genéticos , Modelos Estatísticos , Conformação de Ácido Nucleico , Ligação Proteica , RNA Mensageiro/química , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
14.
Genomics ; 95(4): 185-95, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20079828

RESUMO

Sequence-specific binding by transcription factors (TFs) interprets regulatory information encoded in the genome. Using recently published universal protein binding microarray (PBM) data on the in vitro DNA binding preferences of these proteins for all possible 8-base-pair sequences, we examined the evolutionary conservation and enrichment within putative regulatory regions of the binding sequences of a diverse library of 104 nonredundant mouse TFs spanning 22 different DNA-binding domain structural classes. We found that not only high affinity binding sites, but also numerous moderate and low affinity binding sites, are under negative selection in the mouse genome. These 8-mers occur preferentially in putative regulatory regions of the mouse genome, including CpG islands and non-exonic ultraconserved elements (UCEs). Of TFs whose PBM "bound" 8-mers are enriched within sets of tissue-specific UCEs, many are expressed in the same tissue(s) as the UCE-driven gene expression. Phylogenetically conserved motif occurrences of various TFs were also enriched in the noncoding sequence surrounding numerous gene sets corresponding to Gene Ontology categories and tissue-specific gene expression clusters, suggesting involvement in transcriptional regulation of those genes. Altogether, our results indicate that many of the sequences bound by these proteins in vitro, including lower affinity DNA sequences, are likely to be functionally important in vivo. This study not only provides an initial analysis of the potential regulatory associations of 104 mouse TFs, but also presents an approach for the functional analysis of TFs from any other metazoan genome as their DNA binding preferences are determined by PBMs or other technologies.


Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases/genética , Sítios de Ligação/genética , Ilhas de CpG/genética , Humanos , Camundongos , Regiões Promotoras Genéticas/genética , Análise Serial de Proteínas , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Sequência de DNA
15.
Nat Biotechnol ; 27(7): 667-70, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19561594

RESUMO

Metazoan genomes encode hundreds of RNA-binding proteins (RBPs) but RNA-binding preferences for relatively few RBPs have been well defined. Current techniques for determining RNA targets, including in vitro selection and RNA co-immunoprecipitation, require significant time and labor investment. Here we introduce RNAcompete, a method for the systematic analysis of RNA binding specificities that uses a single binding reaction to determine the relative preferences of RBPs for short RNAs that contain a complete range of k-mers in structured and unstructured RNA contexts. We tested RNAcompete by analyzing nine diverse RBPs (HuR, Vts1, FUSIP1, PTB, U1A, SF2/ASF, SLM2, RBM4 and YB1). RNAcompete identified expected and previously unknown RNA binding preferences. Using in vitro and in vivo binding data, we demonstrate that preferences for individual 7-mers identified by RNAcompete are a more accurate representation of binding activity than are conventional motif models. We anticipate that RNAcompete will be a valuable tool for the study of RNA-protein interactions.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Ligação a RNA/metabolismo , RNA/metabolismo , Animais , Sequência de Bases , Sítios de Ligação/genética , Bases de Dados de Ácidos Nucleicos , Genoma , Dados de Sequência Molecular , RNA/química , RNA/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Curva ROC , Especificidade por Substrato
16.
Science ; 324(5935): 1720-3, 2009 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-19443739

RESUMO

Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.


Assuntos
DNA/metabolismo , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Sequência de Bases , Sítios de Ligação , DNA/química , Ensaio de Desvio de Mobilidade Eletroforética , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Camundongos , Análise Serial de Proteínas , Ligação Proteica , Estrutura Terciária de Proteína , Proteínas Recombinantes de Fusão/química , Proteínas Recombinantes de Fusão/metabolismo
17.
J Biol ; 8(3): 33, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19371447

RESUMO

BACKGROUND: Vertebrates share the same general body plan and organs, possess related sets of genes, and rely on similar physiological mechanisms, yet show great diversity in morphology, habitat and behavior. Alteration of gene regulation is thought to be a major mechanism in phenotypic variation and evolution, but relatively little is known about the broad patterns of conservation in gene expression in non-mammalian vertebrates. RESULTS: We measured expression of all known and predicted genes across twenty tissues in chicken, frog and pufferfish. By combining the results with human and mouse data and considering only ten common tissues, we have found evidence of conserved expression for more than a third of unique orthologous genes. We find that, on average, transcription factor gene expression is neither more nor less conserved than that of other genes. Strikingly, conservation of expression correlates poorly with the amount of conserved nonexonic sequence, even using a sequence alignment technique that accounts for non-collinearity in conserved elements. Many genes show conserved human/fish expression despite having almost no nonexonic conserved primary sequence. CONCLUSIONS: There are clearly strong evolutionary constraints on tissue-specific gene expression. A major challenge will be to understand the precise mechanisms by which many gene expression patterns remain similar despite extensive cis-regulatory restructuring.


Assuntos
Regulação da Expressão Gênica , Vertebrados , Animais , Anuros , Sequência de Bases , Galinhas , Sequência Conservada/genética , DNA/análise , DNA/genética , Evolução Molecular , Perfilação da Expressão Gênica , Humanos , Camundongos , Alinhamento de Sequência , Análise de Sequência de DNA , Tetraodontiformes , Fatores de Transcrição/biossíntese , Fatores de Transcrição/genética , Vertebrados/genética , Vertebrados/metabolismo
18.
Mol Cell ; 32(6): 878-87, 2008 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-19111667

RESUMO

The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes. One-third of the binding specificities have not been previously reported. Several binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially approximately 100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters.


Assuntos
Proteínas de Ligação a DNA/metabolismo , Nucleossomos/metabolismo , Regiões Promotoras Genéticas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Genes Fúngicos , Dados de Sequência Molecular , Mutação/genética , Filogenia , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos , Fatores de Transcrição/metabolismo
19.
Cell ; 133(7): 1266-76, 2008 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-18585359

RESUMO

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.


Assuntos
DNA/química , Proteínas de Homeodomínio/química , Animais , Sequência de Bases , Biologia Computacional , Sequência Conservada , DNA/metabolismo , Evolução Molecular , Proteínas de Homeodomínio/metabolismo , Camundongos , Modelos Moleculares , Ligação Proteica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo
20.
Proc Natl Acad Sci U S A ; 103(32): 12045-50, 2006 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-16880382

RESUMO

Mapping transcriptional regulatory networks is difficult because many transcription factors (TFs) are activated only under specific conditions. We describe a generic strategy for identifying genes and pathways induced by individual TFs that does not require knowledge of their normal activation cues. Microarray analysis of 55 yeast TFs that caused a growth phenotype when overexpressed showed that the majority caused increased transcript levels of genes in specific physiological categories, suggesting a mechanism for growth inhibition. Induced genes typically included established targets and genes with consensus promoter motifs, if known, indicating that these data are useful for identifying potential new target genes and binding sites. We identified the sequence 5'-TCACGCAA as a binding sequence for Hms1p, a TF that positively regulates pseudohyphal growth and previously had no known motif. The general strategy outlined here presents a straightforward approach to discovery of TF activities and mapping targets that could be adapted to any organism with transgenic technology.


Assuntos
Regulação Fúngica da Expressão Gênica , Genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Saccharomyces cerevisiae/química , Fatores de Transcrição/genética , Motivos de Aminoácidos , Sítios de Ligação , Técnicas Genéticas , Modelos Genéticos , Fenótipo , Regiões Promotoras Genéticas , Ligação Proteica , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Transgenes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA