Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nature ; 444(7118): 499-502, 2006 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-17086198

RESUMEN

Identifying the sequences that direct the spatial and temporal expression of genes and defining their function in vivo remains a significant challenge in the annotation of vertebrate genomes. One major obstacle is the lack of experimentally validated training sets. In this study, we made use of extreme evolutionary sequence conservation as a filter to identify putative gene regulatory elements, and characterized the in vivo enhancer activity of a large group of non-coding elements in the human genome that are conserved in human-pufferfish, Takifugu (Fugu) rubripes, or ultraconserved in human-mouse-rat. We tested 167 of these extremely conserved sequences in a transgenic mouse enhancer assay. Here we report that 45% of these sequences functioned reproducibly as tissue-specific enhancers of gene expression at embryonic day 11.5. While directing expression in a broad range of anatomical structures in the embryo, the majority of the 75 enhancers directed expression to various regions of the developing nervous system. We identified sequence signatures enriched in a subset of these elements that targeted forebrain expression, and used these features to rank all approximately 3,100 non-coding elements in the human genome that are conserved between human and Fugu. The testing of the top predictions in transgenic mice resulted in a threefold enrichment for sequences with forebrain enhancer activity. These data dramatically expand the catalogue of human gene enhancers that have been characterized in vivo, and illustrate the utility of such training sets for a variety of biological applications, including decoding the regulatory vocabulary of the human genome.


Asunto(s)
Elementos de Facilitación Genéticos , Genoma Humano , Animales , Secuencia de Bases , Cromosomas Humanos Par 16 , Secuencia Conservada , Embrión de Mamíferos/metabolismo , Embrión no Mamífero , Expresión Génica , Genómica/métodos , Humanos , Ratones , Ratones Transgénicos , Sistema Nervioso/embriología , Sistema Nervioso/metabolismo , Prosencéfalo/embriología , Prosencéfalo/metabolismo , Takifugu/genética , Factores de Transcripción/genética
2.
Nature ; 432(7020): 988-94, 2004 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-15616553

RESUMEN

Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.


Asunto(s)
Cromosomas Humanos Par 16/genética , Duplicación de Gen , Mapeo Físico de Cromosoma , Animales , Genes/genética , Genómica , Heterocromatina/genética , Humanos , Datos de Secuencia Molecular , Polimorfismo Genético/genética , Análisis de Secuencia de ADN , Sintenía/genética
3.
Nature ; 431(7006): 268-74, 2004 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-15372022

RESUMEN

Chromosome 5 is one of the largest human chromosomes and contains numerous intrachromosomal duplications, yet it has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding conservation with non-mammalian vertebrates, suggesting that they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-coding genes including the protocadherin and interleukin gene families. We also completely sequenced versions of the large chromosome-5-specific internal duplications. These duplications are very recent evolutionary events and probably have a mechanistic role in human physiological variation, as deletions in these regions are the cause of debilitating disorders including spinal muscular atrophy.


Asunto(s)
Cromosomas Humanos Par 5/genética , Análisis de Secuencia de ADN , Animales , Composición de Base , Cadherinas/genética , Secuencia Conservada/genética , Duplicación de Gen , Genes/genética , Enfermedades Genéticas Congénitas/genética , Genómica , Humanos , Interleucinas/genética , Datos de Secuencia Molecular , Atrofia Muscular Espinal/genética , Pan troglodytes/genética , Mapeo Físico de Cromosoma , Seudogenes/genética , Sintenía/genética , Vertebrados/genética
4.
Nature ; 428(6982): 529-35, 2004 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-15057824

RESUMEN

Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.


Asunto(s)
Cromosomas Humanos Par 19/genética , Genes/genética , Mapeo Físico de Cromosoma , Empalme Alternativo/genética , Animales , Composición de Base , Secuencia Conservada/genética , Islas de CpG/genética , Evolución Molecular , Duplicación de Gen , Genética Médica , Humanos , Ratones , Datos de Secuencia Molecular , Familia de Multigenes/genética , Seudogenes/genética , Análisis de Secuencia de ADN
5.
Genome Res ; 16(7): 855-63, 2006 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-16769978

RESUMEN

Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%-80%) and true-positive rate (27%-67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for in vivo testing at embryonic time points.


Asunto(s)
Elementos de Facilitación Genéticos , Genoma Humano , Secuencias Reguladoras de Ácidos Nucleicos , Animales , Secuencia de Bases , Cromosomas Humanos Par 16 , Biología Computacional , Secuencia Conservada , ADN/genética , Evolución Molecular , Proteínas del Ojo/química , Proteínas del Ojo/genética , Humanos , Ratones , Ratones Transgénicos , Valor Predictivo de las Pruebas , Estructura Terciaria de Proteína , Ratas , Sensibilidad y Especificidad , Análisis de Secuencia de ADN , Factores de Transcripción/química , Factores de Transcripción/genética
6.
Hum Mol Genet ; 14(20): 3057-63, 2005 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-16155111

RESUMEN

Our inability to associate distant regulatory elements with the genes they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries, we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBSs), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes they regulate. A total of 2116 and 1942 CBSs >200 kb were assembled for HMC and HMF, respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBSs, we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a gene's regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide an extensive data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.


Asunto(s)
Secuencia Conservada/genética , Genoma Humano , Mapeo Físico de Cromosoma/métodos , Secuencias Reguladoras de Ácidos Nucleicos/genética , Sintenía/genética , Animales , Pollos/genética , Cromosomas Humanos/genética , Evolución Molecular , Humanos , Ratones , Ranidae/genética , Eliminación de Secuencia
7.
Genome Res ; 15(1): 1-18, 2005 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-15632085

RESUMEN

We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25-55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species--but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.


Asunto(s)
Cromosomas/genética , Drosophila/genética , Evolución Molecular , Genes de Insecto/genética , Genoma , Análisis de Secuencia de ADN/métodos , Animales , Rotura Cromosómica/genética , Inversión Cromosómica/genética , Mapeo Cromosómico/métodos , Secuencia Conservada/genética , Drosophila melanogaster/genética , Elementos de Facilitación Genéticos , Reordenamiento Génico/genética , Variación Genética/genética , Datos de Secuencia Molecular , Valor Predictivo de las Pruebas , Secuencias Repetitivas de Ácidos Nucleicos/genética
8.
Genome Res ; 13(1): 73-80, 2003 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-12529308

RESUMEN

The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for the subsequent analysis of conservation of genomes that are effective for assemblies of different quality. These strategies were applied to the comparison of the working draft of the human genome with the Mouse Genome Sequencing Consortium assembly, as well as other intermediate mouse assemblies. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. We obtained such coverage while preserving specificity. With a view towards the end user, we developed a suite of tools and Web sites for automatically aligning and subsequently browsing and working with whole-genome comparisons. We describe the use of these tools to identify conserved non-coding regions between the human and mouse genomes, some of which have not been identified by other methods.


Asunto(s)
Genoma Humano , Genoma , Proyectos de Investigación , Alineación de Secuencia/instrumentación , Alineación de Secuencia/métodos , Algoritmos , Animales , Cromosomas/genética , Cromosomas Humanos/genética , Redes de Comunicación de Computadores/instrumentación , Bases de Datos Genéticas , Humanos , Internet/instrumentación , Ratones , Programas Informáticos
9.
Bioinformatics ; 19 Suppl 1: i54-62, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-12855437

RESUMEN

MOTIVATION: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS: In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.


Asunto(s)
Mapeo Cromosómico/métodos , ADN/análisis , ADN/química , Perfilación de la Expresión Génica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Animales , Secuencia de Bases , ADN/genética , Genoma Humano , Humanos , Ratones , Datos de Secuencia Molecular , Homología de Secuencia de Ácido Nucleico
10.
Bioinformatics ; 20(5): 636-43, 2004 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-15033870

RESUMEN

MOTIVATION: The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. RESULTS: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a framework based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. AVAILABILITY: Phylo-VISTA is available at http://www-gsd.lbl.gov/phylovista. It requires an Internet browser with Java Plug-in 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu


Asunto(s)
Algoritmos , Gráficos por Computador , Perfilación de la Expresión Génica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Animales , Secuencia de Bases , Humanos , Leucemia/genética , Datos de Secuencia Molecular , Filogenia , Homología de Secuencia de Ácido Nucleico
11.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-12466850

RESUMEN

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Asunto(s)
Cromosomas de los Mamíferos/genética , Evolución Molecular , Genoma , Ratones/genética , Mapeo Físico de Cromosoma , Animales , Composición de Base , Secuencia Conservada/genética , Islas de CpG/genética , Regulación de la Expresión Génica , Genes/genética , Variación Genética/genética , Genoma Humano , Genómica , Humanos , Ratones/clasificación , Ratones Noqueados , Ratones Transgénicos , Modelos Animales , Familia de Multigenes/genética , Mutagénesis , Neoplasias/genética , Proteoma/genética , Seudogenes/genética , Sitios de Carácter Cuantitativo/genética , ARN no Traducido/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Selección Genética , Análisis de Secuencia de ADN , Cromosomas Sexuales/genética , Especificidad de la Especie , Sintenía
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA