Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
2.
J Comput Biol ; 18(3): 469-81, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21385048

ABSTRACT

We introduce a data structure, analysis, and visualization scheme called a cactus graph for comparing sets of related genomes. In common with multi-break point graphs and A-Bruijn graphs, cactus graphs can represent duplications and general genomic rearrangements, but additionally, they naturally decompose the common substructures in a set of related genomes into a hierarchy of chains that can be visualized as two-dimensional multiple alignments and nets that can be visualized in circular genome plots. Supplementary Material is available at www.liebertonline.com/cmb .


Subject(s)
Computer Graphics , Genome , Genomics/methods , Sequence Alignment/methods , Algorithms , Animals , Base Sequence , DNA/genetics , Evolution, Molecular , Humans , Molecular Sequence Data
3.
Nature ; 469(7331): 529-33, 2011 Jan 27.
Article in English | MEDLINE | ID: mdl-21270892

ABSTRACT

'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.


Subject(s)
Genetic Variation , Genome/genetics , Pongo abelii/genetics , Pongo pygmaeus/genetics , Animals , Centromere/genetics , Cerebrosides/metabolism , Chromosomes , Evolution, Molecular , Female , Gene Rearrangement/genetics , Genetic Speciation , Genetics, Population , Humans , Male , Phylogeny , Population Density , Population Dynamics , Species Specificity
4.
Nucleic Acids Res ; 39(Database issue): D871-5, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21037257

ABSTRACT

The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.


Subject(s)
Databases, Genetic , Genome, Human , Gene Expression Regulation , Genomics , Humans , Internet , Software , User-Computer Interface
5.
Proc Natl Acad Sci U S A ; 105(38): 14254-61, 2008 Sep 23.
Article in English | MEDLINE | ID: mdl-18787111

ABSTRACT

We formalize the problem of recovering the evolutionary history of a set of genomes that are related to an unseen common ancestor genome by operations of speciation, deletion, insertion, duplication, and rearrangement of segments of bases. The problem is examined in the limit as the number of bases in each genome goes to infinity. In this limit, the chromosomes are represented by continuous circles or line segments. For such an infinite-sites model, we present a polynomial-time algorithm to find the most parsimonious evolutionary history of any set of related present-day genomes.


Subject(s)
Evolution, Molecular , Genome , Models, Genetic , Algorithms , Animals , Computer Simulation , Humans , Mice , Mutation/genetics , X Chromosome
6.
J Comput Biol ; 15(8): 1007-27, 2008 Oct.
Article in English | MEDLINE | ID: mdl-18774902

ABSTRACT

Accurately reconstructing the large-scale gene order in an ancestral genome is a critical step to better understand genome evolution. In this paper, we propose a heuristic algorithm, called DUPCAR, for reconstructing ancestral genomic orders with duplications. The method starts from the order of genes in modern genomes and predicts predecessor and successor relationships in the ancestor. Then a greedy algorithm is used to reconstruct the ancestral orders by connecting genes into contiguous regions based on predicted adjacencies. Computer simulation was used to validate the algorithm. We also applied the method to reconstruct the ancestral chromosome X of placental mammals and the ancestral genomes of the ciliate Paramecium tetraurelia.


Subject(s)
Algorithms , Gene Duplication , Genome , Models, Genetic , Animals , Computer Simulation , Evolution, Molecular , Humans , Paramecium tetraurelia/genetics , Phylogeny
7.
Genome Res ; 16(12): 1557-65, 2006 Dec.
Article in English | MEDLINE | ID: mdl-16983148

ABSTRACT

This article analyzes mammalian genome rearrangements at higher resolution than has been published to date. We identify 3171 intervals, covering approximately 92% of the human genome, within which we find no rearrangements larger than 50 kilobases (kb) in the lineages leading to human, mouse, rat, and dog from their most recent common ancestor. Combining intervals that are adjacent in all contemporary species produces 1338 segments that may contain large insertions or deletions but that are free of chromosome fissions or fusions as well as inversions or translocations >50 kb in length. We describe a new method for predicting the ancestral order and orientation of those intervals from their observed adjacencies in modern species. We combine the results from this method with data from chromosome painting experiments to produce a map of an early mammalian genome that accounts for 96.8% of the available human genome sequence data. The precision is further increased by mapping inversions as small as 31 bp. Analysis of the predicted evolutionary breakpoints in the human lineage confirms certain published observations but disagrees with others. Although only a few mammalian genomes are currently sequenced to high precision, our theoretical analyses and computer simulations indicate that our results are reasonably accurate and that they will become highly accurate in the foreseeable future. Our methods were developed as part of a project to reconstruct the genome sequence of the last ancestor of human, dogs, and most other placental mammals.


Subject(s)
Evolution, Molecular , Genome, Human , Genome , Algorithms , Animals , Base Composition , Base Pairing , Chromosome Breakage , Chromosome Inversion , Chromosome Mapping , Chromosome Painting , Chromosomes , Computer Simulation , Dogs , Gene Deletion , Gene Rearrangement , Humans , Mice , Models, Genetic , Rats , Sequence Alignment/methods , Sequence Homology, Nucleic Acid
8.
Science ; 309(5731): 134-7, 2005 Jul 01.
Article in English | MEDLINE | ID: mdl-15994558

ABSTRACT

We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.


Subject(s)
Genome, Protozoan , Lymphocytes/parasitology , Protozoan Proteins/genetics , Theileria parva/genetics , Algorithms , Animals , Antigens, Protozoan/genetics , Cattle , Cell Proliferation , Chromosomes/genetics , Conserved Sequence , Enzymes/genetics , Enzymes/metabolism , Genes, Protozoan , Lymphocytes/cytology , Mitochondria/metabolism , Molecular Sequence Data , Organelles/genetics , Organelles/physiology , Plasmodium falciparum/genetics , Protein Structure, Tertiary , Protozoan Proteins/chemistry , Protozoan Proteins/metabolism , Sequence Analysis, DNA , Synteny , Telomere/genetics , Theileria parva/growth & development , Theileria parva/pathogenicity , Theileria parva/physiology
9.
Nature ; 433(7028): 865-8, 2005 Feb 24.
Article in English | MEDLINE | ID: mdl-15729342

ABSTRACT

Entamoeba histolytica is an intestinal parasite and the causative agent of amoebiasis, which is a significant source of morbidity and mortality in developing countries. Here we present the genome of E. histolytica, which reveals a variety of metabolic adaptations shared with two other amitochondrial protist pathogens: Giardia lamblia and Trichomonas vaginalis. These adaptations include reduction or elimination of most mitochondrial metabolic pathways and the use of oxidative stress enzymes generally associated with anaerobic prokaryotes. Phylogenomic analysis identifies evidence for lateral gene transfer of bacterial genes into the E. histolytica genome, the effects of which centre on expanding aspects of E. histolytica's metabolic repertoire. The presence of these genes and the potential for novel metabolic pathways in E. histolytica may allow for the development of new chemotherapeutic agents. The genome encodes a large number of novel receptor kinases and contains expansions of a variety of gene families, including those associated with virulence. Additional genome features include an abundance of tandemly repeated transfer-RNA-containing arrays, which may have a structural function in the genome. Analysis of the genome provides new insights into the workings and genome evolution of a major human pathogen.


Subject(s)
Entamoeba histolytica/genetics , Genome, Protozoan , Parasites/genetics , Animals , Entamoeba histolytica/metabolism , Entamoeba histolytica/pathogenicity , Evolution, Molecular , Fermentation , Gene Transfer, Horizontal/genetics , Glycolysis , Oxidative Stress/genetics , Parasites/metabolism , Parasites/pathogenicity , Phylogeny , Signal Transduction , Virulence/genetics
10.
Science ; 307(5713): 1321-4, 2005 Feb 25.
Article in English | MEDLINE | ID: mdl-15653466

ABSTRACT

Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.


Subject(s)
Cryptococcus neoformans/genetics , Genome, Fungal , Alternative Splicing , Cell Wall/metabolism , Chromosomes, Fungal/genetics , Computational Biology , Cryptococcus neoformans/pathogenicity , Cryptococcus neoformans/physiology , DNA Transposable Elements , Fungal Proteins/metabolism , Gene Library , Genes, Fungal , Humans , Introns , Molecular Sequence Data , Phenotype , Polymorphism, Genetic , Polymorphism, Single Nucleotide , Polysaccharides/metabolism , RNA, Antisense , Sequence Analysis, DNA , Transcription, Genetic , Virulence , Virulence Factors/metabolism
11.
Nucleic Acids Res ; 31(16): 4856-63, 2003 Aug 15.
Article in English | MEDLINE | ID: mdl-12907728

ABSTRACT

We report here the sequence of chromosome II from Trypanosoma brucei, the causative agent of African sleeping sickness. The 1.2-Mb pairs encode about 470 predicted genes organised in 17 directional clusters on either strand, the largest cluster of which has 92 genes lined up over a 284-kb region. An analysis of the GC skew reveals strand compositional asymmetries that coincide with the distribution of protein-coding genes, suggesting these asymmetries may be the result of transcription-coupled repair on coding versus non-coding strand. A 5-cM genetic map of the chromosome reveals recombinational 'hot' and 'cold' regions, the latter of which is predicted to include the putative centromere. One end of the chromosome consists of a 250-kb region almost exclusively composed of RHS (pseudo)genes that belong to a newly characterised multigene family containing a hot spot of insertion for retroelements. Interspersed with the RHS genes are a few copies of truncated RNA polymerase pseudogenes as well as expression site associated (pseudo)genes (ESAGs) 3 and 4, and 76 bp repeats. These features are reminiscent of a vestigial variant surface glycoprotein (VSG) gene expression site. The other end of the chromosome contains a 30-kb array of VSG genes, the majority of which are pseudogenes, suggesting that this region may be a site for modular de novo construction of VSG gene diversity during transposition/gene conversion events.


Subject(s)
Chromosomes/genetics , DNA, Protozoan/genetics , Trypanosoma brucei brucei/genetics , Animals , Antigens, Protozoan/genetics , Chromosome Mapping , DNA, Protozoan/chemistry , Gene Duplication , Genes, Protozoan/genetics , Molecular Sequence Data , Pseudogenes/genetics , Recombination, Genetic , Sequence Analysis, DNA
12.
Proc Natl Acad Sci U S A ; 100(14): 8502-7, 2003 Jul 08.
Article in English | MEDLINE | ID: mdl-12799466

ABSTRACT

The study of genetic variation in malaria parasites has practical significance for developing strategies to control the disease. Vaccines based on highly polymorphic antigens may be confounded by allelic restriction of the host immune response. In response to drug pressure, a highly plastic genome may generate resistant mutants more easily than a monomorphic one. Additionally, the study of the distribution of genomic polymorphisms may provide information leading to the identification of genes associated with traits such as parasite development and drug resistance. Indeed, the age and diversity of the human malaria parasite Plasmodium falciparum has been the subject of recent debate, because an ancient parasite with a complex genome is expected to present greater challenges for drug and vaccine development. The genome diversity of the important human pathogen Plasmodium vivax, however, remains essentially unknown. Here we analyze an approximately 100-kb contiguous chromosome segment from five isolates, revealing 191 single-nucleotide polymorphisms (SNPs) and 44 size polymorphisms. The SNPs are not evenly distributed across the segment with blocks of high and low diversity. Whereas the majority (approximately 63%) of the SNPs are in intergenic regions, introns contain significantly less SNPs than intergenic sequences. Polymorphic tandem repeats are abundant and are more uniformly distributed at a frequency of about one polymorphic tandem repeat per 3 kb. These data show that P. vivax has a highly diverse genome, and provide useful information for further understanding the genome diversity of the parasite.


Subject(s)
Genes, Protozoan , Genome, Protozoan , Plasmodium vivax/genetics , Polymorphism, Single Nucleotide , Animals , Chromosome Mapping , DNA, Protozoan/genetics , Genetic Variation , Haplotypes/genetics , Introns/genetics , Molecular Sequence Data , Plasmodium falciparum/genetics , Polymerase Chain Reaction , Protozoan Proteins/genetics , Sequence Alignment , Sequence Analysis, DNA , Species Specificity , Tandem Repeat Sequences
13.
Nucleic Acids Res ; 31(1): 229-33, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12519988

ABSTRACT

Rice is not only a major food staple for the world's population but it also is a model species for a major group of flowering plants, the monocotyledonous plants. Draft genomic sequence of two subspecies of rice, Oryza sativa spp. japonica and indica ssp. are publicly available. To provide the community with a resource to data-mine the rice genome, we have constructed an annotation resource for rice (http://www.tigr.org/tdb/e2k1/osa1/). In this resource, we have annotated the rice genome for gene content, identified motifs/domains within the predicted genes, constructed a rice repeat database, identified related sequences in other plant species, and identified syntenic sequences between rice and maize. All of the data is available through web-based interfaces, FTP downloads, and a Distributed Annotation System.


Subject(s)
Databases, Genetic , Genome, Plant , Oryza/genetics , Chromosomes, Artificial , Chromosomes, Plant , Computational Biology , Plant Proteins/chemistry , Plants/genetics , Repetitive Sequences, Nucleic Acid , Sequence Alignment , Sequence Homology , Synteny , Zea mays/genetics
14.
Nature ; 419(6906): 512-9, 2002 Oct 03.
Article in English | MEDLINE | ID: mdl-12368865

ABSTRACT

Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.


Subject(s)
Genome, Protozoan , Plasmodium yoelii/genetics , Animals , DNA, Protozoan , Disease Models, Animal , Humans , Malaria/parasitology , Multigene Family , Plasmodium falciparum/genetics , Recombination, Genetic , Rodentia , Sequence Alignment , Sequence Analysis, DNA , Species Specificity , Synteny , Telomere
15.
Nature ; 419(6906): 531-4, 2002 Oct 03.
Article in English | MEDLINE | ID: mdl-12368868

ABSTRACT

The mosquito-borne malaria parasite Plasmodium falciparum kills an estimated 0.7-2.7 million people every year, primarily children in sub-Saharan Africa. Without effective interventions, a variety of factors-including the spread of parasites resistant to antimalarial drugs and the increasing insecticide resistance of mosquitoes-may cause the number of malaria cases to double over the next two decades. To stimulate basic research and facilitate the development of new drugs and vaccines, the genome of Plasmodium falciparum clone 3D7 has been sequenced using a chromosome-by-chromosome shotgun strategy. We report here the nucleotide sequences of chromosomes 10, 11 and 14, and a re-analysis of the chromosome 2 sequence. These chromosomes represent about 35% of the 23-megabase P. falciparum genome.


Subject(s)
DNA, Protozoan , Plasmodium falciparum/genetics , Animals , Chromosomes , Genome, Protozoan , Proteome , Protozoan Proteins/genetics , Sequence Analysis, DNA
16.
Nature ; 419(6906): 498-511, 2002 Oct 03.
Article in English | MEDLINE | ID: mdl-12368864

ABSTRACT

The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.


Subject(s)
Genome, Protozoan , Plasmodium falciparum/genetics , Animals , Chromosome Structures , DNA Repair , DNA Replication , DNA, Protozoan/biosynthesis , DNA, Protozoan/genetics , Evolution, Molecular , Humans , Malaria Vaccines , Malaria, Falciparum/immunology , Malaria, Falciparum/parasitology , Malaria, Falciparum/prevention & control , Membrane Transport Proteins/genetics , Membrane Transport Proteins/metabolism , Molecular Sequence Data , Plasmodium falciparum/immunology , Plasmodium falciparum/metabolism , Plastids/genetics , Proteome , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , Protozoan Proteins/physiology , Recombination, Genetic , Sequence Analysis, DNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...