RESUMEN
The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci.
Asunto(s)
Animales Domésticos/genética , Animales Salvajes/genética , Conejos/genética , Animales , Animales Domésticos/anatomía & histología , Animales Domésticos/psicología , Animales Salvajes/anatomía & histología , Animales Salvajes/psicología , Secuencia de Bases , Conducta Animal , Cruzamiento , Evolución Molecular , Frecuencia de los Genes , Sitios Genéticos , Genoma/genética , Datos de Secuencia Molecular , Fenotipo , Polimorfismo de Nucleótido Simple , Conejos/anatomía & histología , Conejos/psicología , Selección Genética , Análisis de Secuencia de ADNRESUMEN
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
Asunto(s)
Drosophila/clasificación , Drosophila/genética , Evolución Molecular , Genes de Insecto/genética , Genoma de los Insectos/genética , Genómica , Filogenia , Animales , Codón/genética , Elementos Transponibles de ADN/genética , Drosophila/inmunología , Drosophila/metabolismo , Proteínas de Drosophila/genética , Orden Génico/genética , Genoma Mitocondrial/genética , Inmunidad/genética , Familia de Multigenes/genética , ARN no Traducido/genética , Reproducción/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Sintenía/genéticaRESUMEN
We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
Asunto(s)
Evolución Molecular , Genoma/genética , Genómica , Zarigüeyas/genética , Animales , Composición de Base , Secuencia Conservada/genética , Elementos Transponibles de ADN/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Biosíntesis de Proteínas , Sintenía/genética , Inactivación del Cromosoma X/genéticaRESUMEN
Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome. It is also enriched in segmental duplications, ranking third in density among the autosomes. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.
Asunto(s)
Cromosomas Humanos Par 17/genética , Evolución Molecular , Animales , Composición de Base , Duplicación de Gen , Humanos , Elementos de Nucleótido Esparcido Largo/genética , Ratones , Análisis de Secuencia de ADN , Elementos de Nucleótido Esparcido Corto/genética , Sintenía/genéticaRESUMEN
Chromosome 11, although average in size, is one of the most gene- and disease-rich chromosomes in the human genome. Initial gene annotation indicates an average gene density of 11.6 genes per megabase, including 1,524 protein-coding genes, some of which were identified using novel methods, and 765 pseudogenes. One-quarter of the protein-coding genes shows overlap with other genes. Of the 856 olfactory receptor genes in the human genome, more than 40% are located in 28 single- and multi-gene clusters along this chromosome. Out of the 171 disorders currently attributed to the chromosome, 86 remain for which the underlying molecular basis is not yet known, including several mendelian traits, cancer and susceptibility loci. The high-quality data presented here--nearly 134.5 million base pairs representing 99.8% coverage of the euchromatic sequence--provide scientists with a solid foundation for understanding the genetic basis of these disorders and other biological phenomena.
Asunto(s)
Cromosomas Humanos Par 11 , Análisis de Secuencia de ADN , ADN , Expresión Génica , Genes , Humanos , Datos de Secuencia Molecular , Mapeo Físico de Cromosoma , Receptores Odorantes/genéticaRESUMEN
Here we present a finished sequence of human chromosome 15, together with a high-quality gene catalogue. As chromosome 15 is one of seven human chromosomes with a high rate of segmental duplication, we have carried out a detailed analysis of the duplication structure of the chromosome. Segmental duplications in chromosome 15 are largely clustered in two regions, on proximal and distal 15q; the proximal region is notable because recombination among the segmental duplications can result in deletions causing Prader-Willi and Angelman syndromes. Sequence analysis shows that the proximal and distal regions of 15q share extensive ancient similarity. Using a simple approach, we have been able to reconstruct many of the events by which the current duplication structure arose. We find that most of the intrachromosomal duplications seem to share a common ancestry. Finally, we demonstrate that some remaining gaps in the genome sequence are probably due to structural polymorphisms between haplotypes; this may explain a significant fraction of the gaps remaining in the human genome.
Asunto(s)
Cromosomas Humanos Par 15/genética , Evolución Molecular , Duplicación de Gen , Animales , Secuencia Conservada/genética , Genes , Genoma Humano , Haplotipos/genética , Humanos , Macaca mulatta/genética , Datos de Secuencia Molecular , Familia de Multigenes/genética , Filogenia , Polimorfismo Genético/genética , Análisis de Secuencia de ADN , Sintenía/genéticaRESUMEN
The International Human Genome Sequencing Consortium (IHGSC) recently completed a sequence of the human genome. As part of this project, we have focused on chromosome 8. Although some chromosomes exhibit extreme characteristics in terms of length, gene content, repeat content and fraction segmentally duplicated, chromosome 8 is distinctly typical in character, being very close to the genome median in each of these aspects. This work describes a finished sequence and gene catalogue for the chromosome, which represents just over 5% of the euchromatic human genome. A unique feature of the chromosome is a vast region of approximately 15 megabases on distal 8p that appears to have a strikingly high mutation rate, which has accelerated in the hominids relative to other sequenced mammals. This fast-evolving region contains a number of genes related to innate immunity and the nervous system, including loci that appear to be under positive selection--these include the major defensin (DEF) gene cluster and MCPH1, a gene that may have contributed to the evolution of expanded brain size in the great apes. The data from chromosome 8 should allow a better understanding of both normal and disease biology and genome evolution.
Asunto(s)
Cromosomas Humanos Par 8/genética , Evolución Molecular , Animales , Mapeo Contig , ADN Satélite/genética , Defensinas/genética , Eucromatina/genética , Femenino , Humanos , Inmunidad Innata/genética , Masculino , Datos de Secuencia Molecular , Familia de Multigenes/genética , Análisis de Secuencia de ADNRESUMEN
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Asunto(s)
Perros/genética , Evolución Molecular , Genoma/genética , Genómica , Haplotipos/genética , Animales , Secuencia Conservada/genética , Enfermedades de los Perros/genética , Perros/clasificación , Femenino , Humanos , Hibridación Genética , Masculino , Ratones , Mutagénesis/genética , Polimorfismo de Nucleótido Simple/genética , Ratas , Elementos de Nucleótido Esparcido Corto/genética , Sintenía/genéticaRESUMEN
Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements.
Asunto(s)
Cromosomas Humanos Par 18/genética , ADN/genética , Aneuploidia , Animales , Secuencia Conservada/genética , Islas de CpG/genética , Exones/genética , Etiquetas de Secuencia Expresada , Genes/genética , Genoma Humano , Humanos , Intrones/genética , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , SinteníaRESUMEN
With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.
Asunto(s)
Secuencia Conservada/genética , Genoma Humano , Genómica/métodos , Mamíferos/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia de Bases , Biología Computacional , Humanos , Filogenia , Alineación de SecuenciaRESUMEN
Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.
Asunto(s)
Proteínas Portadoras/genética , Biología Computacional/métodos , Lugares Marcados de Secuencia , Transporte Biológico Activo/genética , Biotecnología/métodos , Biotecnología/tendencias , Bases de Datos Genéticas , Humanos , Internet , Familia de Multigenes/genéticaRESUMEN
MOTIVATION: During the process of high-throughput genome sequencing there are opportunities for mixups of reagents and data associated with particular projects. The sequencing templates or sequence data generated for an assembly may become contaminated with reagents or sequences from another project, resulting in poorer quality and inaccurate assemblies. RESULTS: We have developed a system to assess sequence assemblies and monitor for laboratory mixups. We describe several methods for testing the consistency of assemblies and resolving mixed ones. We use statistical tests to evaluate the distribution of sequencing reads from different plates into contigs, and a graph-based approach to resolve situations where data has been inappropriately combined. While these methods have been designed for use in a high-throughput DNA sequencing environment processing thousands of clones, they can be applied in any situation where distinct sequencing projects are performed at redundant coverage.