RESUMEN
The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.
Asunto(s)
Bases de Datos Genéticas , Genoma Humano , Regulación de la Expresión Génica , Genómica , Humanos , Internet , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
We formalize the problem of recovering the evolutionary history of a set of genomes that are related to an unseen common ancestor genome by operations of speciation, deletion, insertion, duplication, and rearrangement of segments of bases. The problem is examined in the limit as the number of bases in each genome goes to infinity. In this limit, the chromosomes are represented by continuous circles or line segments. For such an infinite-sites model, we present a polynomial-time algorithm to find the most parsimonious evolutionary history of any set of related present-day genomes.
Asunto(s)
Evolución Molecular , Genoma , Modelos Genéticos , Algoritmos , Animales , Simulación por Computador , Humanos , Ratones , Mutación/genética , Cromosoma XRESUMEN
Accurately reconstructing the large-scale gene order in an ancestral genome is a critical step to better understand genome evolution. In this paper, we propose a heuristic algorithm, called DUPCAR, for reconstructing ancestral genomic orders with duplications. The method starts from the order of genes in modern genomes and predicts predecessor and successor relationships in the ancestor. Then a greedy algorithm is used to reconstruct the ancestral orders by connecting genes into contiguous regions based on predicted adjacencies. Computer simulation was used to validate the algorithm. We also applied the method to reconstruct the ancestral chromosome X of placental mammals and the ancestral genomes of the ciliate Paramecium tetraurelia.
Asunto(s)
Algoritmos , Duplicación de Gen , Genoma , Modelos Genéticos , Animales , Simulación por Computador , Evolución Molecular , Humanos , Paramecium tetraurelia/genética , FilogeniaRESUMEN
This article analyzes mammalian genome rearrangements at higher resolution than has been published to date. We identify 3171 intervals, covering approximately 92% of the human genome, within which we find no rearrangements larger than 50 kilobases (kb) in the lineages leading to human, mouse, rat, and dog from their most recent common ancestor. Combining intervals that are adjacent in all contemporary species produces 1338 segments that may contain large insertions or deletions but that are free of chromosome fissions or fusions as well as inversions or translocations >50 kb in length. We describe a new method for predicting the ancestral order and orientation of those intervals from their observed adjacencies in modern species. We combine the results from this method with data from chromosome painting experiments to produce a map of an early mammalian genome that accounts for 96.8% of the available human genome sequence data. The precision is further increased by mapping inversions as small as 31 bp. Analysis of the predicted evolutionary breakpoints in the human lineage confirms certain published observations but disagrees with others. Although only a few mammalian genomes are currently sequenced to high precision, our theoretical analyses and computer simulations indicate that our results are reasonably accurate and that they will become highly accurate in the foreseeable future. Our methods were developed as part of a project to reconstruct the genome sequence of the last ancestor of human, dogs, and most other placental mammals.
Asunto(s)
Evolución Molecular , Genoma Humano , Genoma , Algoritmos , Animales , Composición de Base , Emparejamiento Base , Rotura Cromosómica , Inversión Cromosómica , Mapeo Cromosómico , Pintura Cromosómica , Cromosomas , Simulación por Computador , Perros , Eliminación de Gen , Reordenamiento Génico , Humanos , Ratones , Modelos Genéticos , Ratas , Alineación de Secuencia/métodos , Homología de Secuencia de Ácido NucleicoRESUMEN
Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
Asunto(s)
Cryptococcus neoformans/genética , Genoma Fúngico , Empalme Alternativo , Pared Celular/metabolismo , Cromosomas Fúngicos/genética , Biología Computacional , Cryptococcus neoformans/patogenicidad , Cryptococcus neoformans/fisiología , Elementos Transponibles de ADN , Proteínas Fúngicas/metabolismo , Biblioteca de Genes , Genes Fúngicos , Humanos , Intrones , Datos de Secuencia Molecular , Fenotipo , Polimorfismo Genético , Polimorfismo de Nucleótido Simple , Polisacáridos/metabolismo , ARN sin Sentido , Análisis de Secuencia de ADN , Transcripción Genética , Virulencia , Factores de Virulencia/metabolismoRESUMEN
The study of genetic variation in malaria parasites has practical significance for developing strategies to control the disease. Vaccines based on highly polymorphic antigens may be confounded by allelic restriction of the host immune response. In response to drug pressure, a highly plastic genome may generate resistant mutants more easily than a monomorphic one. Additionally, the study of the distribution of genomic polymorphisms may provide information leading to the identification of genes associated with traits such as parasite development and drug resistance. Indeed, the age and diversity of the human malaria parasite Plasmodium falciparum has been the subject of recent debate, because an ancient parasite with a complex genome is expected to present greater challenges for drug and vaccine development. The genome diversity of the important human pathogen Plasmodium vivax, however, remains essentially unknown. Here we analyze an approximately 100-kb contiguous chromosome segment from five isolates, revealing 191 single-nucleotide polymorphisms (SNPs) and 44 size polymorphisms. The SNPs are not evenly distributed across the segment with blocks of high and low diversity. Whereas the majority (approximately 63%) of the SNPs are in intergenic regions, introns contain significantly less SNPs than intergenic sequences. Polymorphic tandem repeats are abundant and are more uniformly distributed at a frequency of about one polymorphic tandem repeat per 3 kb. These data show that P. vivax has a highly diverse genome, and provide useful information for further understanding the genome diversity of the parasite.
Asunto(s)
Genes Protozoarios , Genoma de Protozoos , Plasmodium vivax/genética , Polimorfismo de Nucleótido Simple , Animales , Mapeo Cromosómico , ADN Protozoario/genética , Variación Genética , Haplotipos/genética , Intrones/genética , Datos de Secuencia Molecular , Plasmodium falciparum/genética , Reacción en Cadena de la Polimerasa , Proteínas Protozoarias/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Especificidad de la Especie , Secuencias Repetidas en TándemRESUMEN
Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.