Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nature ; 515(7527): 371-375, 2014 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-25409826

RESUMEN

To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.


Asunto(s)
Secuencia Conservada/genética , Genoma/genética , Genómica , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Animales , Línea Celular , Cromatina/genética , Cromatina/metabolismo , Elementos de Facilitación Genéticos/genética , Humanos , Ratones , Polimorfismo de Nucleótido Simple/genética
2.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164757

RESUMEN

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Evolución Molecular , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Caenorhabditis elegans/crecimiento & desarrollo , Inmunoprecipitación de Cromatina , Secuencia Conservada/genética , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Genoma/genética , Humanos , Anotación de Secuencia Molecular , Motivos de Nucleótidos/genética , Especificidad de Órganos/genética , Factores de Transcripción/genética
3.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955619

RESUMEN

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Alelos , Línea Celular , Factor de Transcripción GATA1/metabolismo , Perfilación de la Expresión Génica , Genómica , Humanos , Células K562 , Especificidad de Órganos , Fosforilación/genética , Polimorfismo de Nucleótido Simple/genética , Mapas de Interacción de Proteínas , ARN no Traducido/genética , ARN no Traducido/metabolismo , Selección Genética/genética , Sitio de Iniciación de la Transcripción
4.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955991

RESUMEN

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Genoma/genética , Genómica/métodos , Guías como Asunto , Histonas/metabolismo , Humanos , Internet , Factores de Transcripción/metabolismo
5.
Nucleic Acids Res ; 38(20): 6997-7007, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20615899

RESUMEN

Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity≥90% and length≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents') characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a 'parent pseudogene', followed by further duplication creating duplicated-duplicated or duplicated-processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.


Asunto(s)
Genoma Humano , Seudogenes , Duplicaciones Segmentarias en el Genoma , Evolución Molecular , Duplicación de Gen , Sitios Genéticos , Humanos
6.
Nucleic Acids Res ; 37(Database issue): D738-43, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18957444

RESUMEN

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.


Asunto(s)
Bases de Datos Genéticas , Seudogenes , Animales , Interpretación Estadística de Datos , Genómica , Humanos , Internet , Proteínas/clasificación , Proteínas/genética , Alineación de Secuencia
7.
Mol Biol Evol ; 25(1): 131-43, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18065488

RESUMEN

Transcription factor pseudogenes have not been systematically studied before. Nuclear receptors (NRs) constitute one of the largest groups of transcription factors in animals (e.g., 48 NRs in human). The availability of whole-genome sequences enables a global inventory of the NR pseudogenes in a number of vertebrate model organisms. Here we identify the NR pseudogenes in 8 vertebrate organisms and make our results available online at http://www.pseudogene.org/nr. The assignments reveal that NR pseudogenes as a group have characteristics related to generation and distribution contrary to expectations derived from previous large-scale pseudogene studies. In particular, 1) despite its large size, the NR gene family has only a very small number of pseudogenes in each of the vertebrate genomes examined; 2) despite the low transcription levels of NR genes, except for one, all other NR pseudogenes identified in this study are retropseudogenes; and 3) no duplicated NR pseudogenes are found, contrary to the fact that the NR gene family was expanded through several waves of gene duplication events. Our analyses further reveal a number of interesting aspects of NR pseudogenes. Specifically, through careful sequence analysis, we identify remnant introns in 2 mouse retropseudogenes, psiRev-erbbeta and psiLRH1. Generated from partially processed pre-mRNAs, they appear to be rare examples of highly unusual "semiprocessed" pseudogenes. Second, by comparing the genomic sequences, we uncover a pseudogene that is unique to the human lineage relative to chimpanzee. Generated by a recent duplication of a segment in the human genome, this pseudogene is a "duplicated-processed" pseudogene, belonging to a new pseudogene species. Finally, FXRbeta was nonfunctionalized in the human lineage and thus appears to be an example of a rare unitary pseudogene. By comparing orthologous sequences, we dated the FXR-FXRbeta duplication and the nonfunctionalization of FXRbeta in primates.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Familia de Multigenes/genética , Seudogenes/genética , Receptores Citoplasmáticos y Nucleares/genética , Vertebrados/genética , Animales , Genoma/fisiología , Humanos , Ratones
8.
Nucleic Acids Res ; 35(Database issue): D55-60, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17099229

RESUMEN

The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to address this issue. It integrates a variety of heterogeneous resources and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community. Tools are provided for the comparison of sets and the creation of layered set unions, enabling researchers to derive a current 'consensus' set of pseudogenes. Additional features include versatile search, the capacity for robust interaction with other databases, the ability to reconstruct older versions of the database (accounting for changing genome builds) and an underlying object-oriented interface designed for researchers with a minimal knowledge of programming. At the present time, the database contains more than 100,000 pseudogenes spanning 64 prokaryote and 11 eukaryote genomes, including a collection of human annotations compiled from 16 sources.


Asunto(s)
Bases de Datos Genéticas , Seudogenes , Humanos , Internet , Programas Informáticos , Interfaz Usuario-Computador
10.
Nat Biotechnol ; 28(1): 47-55, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20037582

RESUMEN

Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.


Asunto(s)
Puntos de Rotura del Cromosoma , Biblioteca de Genes , Variación Genética , Nucleótidos/genética , Análisis de Secuencia de ADN/métodos , Animales , Sesgo , Mapeo Cromosómico , Sitios Genéticos/genética , Humanos , Filogenia , Primates/genética
11.
Genome Biol ; 10(2): R23, 2009 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-19236709

RESUMEN

Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.


Asunto(s)
Biología Computacional/métodos , Variación Estructural del Genoma , Modelos Genéticos , Secuencia de Bases , Simulación por Computador , Genoma , Genómica/métodos , Internet , Programas Informáticos
12.
Genome Biol ; 10(1): R2, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19123937

RESUMEN

BACKGROUND: The availability of genome sequences of numerous organisms allows comparative study of pseudogenes in syntenic regions. Conservation of pseudogenes suggests that they might have a functional role in some instances. RESULTS: We report the first large-scale comparative analysis of ribosomal protein pseudogenes in four mammalian genomes (human, chimpanzee, mouse and rat). To this end, we have assigned these pseudogenes in the four organisms using an automated pipeline and make the results available online. Each organism has a large number of ribosomal protein pseudogenes (approximately 1,400 to 2,800). The majority of them are processed (generated by retrotransposition). However, we do not see a correlation between the number of pseudogenes associated with a ribosomal protein gene and its mRNA abundance. Analysis of pseudogenes in syntenic regions between species shows that most are conserved between human and chimpanzee, but very few are conserved between primates and rodents. Interestingly, syntenic pseudogenes have a lower rate of nucleotide substitution than their surrounding intergenic DNA. Moreover, evidence from expressed sequence tags indicates that two pseudogenes conserved between human and mouse are transcribed. Detailed analysis shows that one of them, the pseudogene of RPS27, is likely to be a protein-coding gene. This is significant as previous reports indicated there are exactly 80 ribosomal protein genes encoded by the human genome. CONCLUSIONS: Our analysis indicates that processed ribosomal protein pseudogenes abound in mammalian genomes, but few of these are conserved between primates and rodents. This highlights the large amount of recent retrotranspositional activity in mammals and a relatively larger amount of it in the rodent lineage.


Asunto(s)
Genoma/genética , Seudogenes , Proteínas Ribosómicas/genética , Animales , Etiquetas de Secuencia Expresada , Humanos , Internet , Mamíferos/genética , Ratones , Pan troglodytes , Filogenia , ARN Mensajero/análisis , Ratas , Retroelementos/genética , Sintenía
13.
Genome Biol ; 9(1): 401, 2008 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-18254929

RESUMEN

We take stock of current genetic nomenclature and attempt to organize strange and notable gene names. We categorize, for instance, those that involve a naming system transferred from another context (for example, Pavlov's dogs). We hope this analysis provides clues to better steer gene naming in the future.


Asunto(s)
Genes , Terminología como Asunto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA