Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Genome Res ; 28(4): 448-459, 2018 04.
Article in English | MEDLINE | ID: mdl-29563166

ABSTRACT

Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.


Subject(s)
Evolution, Molecular , Genome/genetics , Muridae/genetics , Phylogeny , Animals , Binding Sites , CCCTC-Binding Factor/genetics , Chromosomes/genetics , Karyotyping/methods , Long Interspersed Nucleotide Elements/genetics , Mice , Retroelements/genetics , Species Specificity
2.
Nature ; 477(7364): 326-9, 2011 Sep 14.
Article in English | MEDLINE | ID: mdl-21921916

ABSTRACT

Structural variation is widespread in mammalian genomes and is an important cause of disease, but just how abundant and important structural variants (SVs) are in shaping phenotypic variation remains unclear. Without knowing how many SVs there are, and how they arise, it is difficult to discover what they do. Combining experimental with automated analyses, we identified 711,920 SVs at 281,243 sites in the genomes of thirteen classical and four wild-derived inbred mouse strains. The majority of SVs are less than 1 kilobase in size and 98% are deletions or insertions. The breakpoints of 160,000 SVs were mapped to base pair resolution, allowing us to infer that insertion of retrotransposons causes more than half of SVs. Yet, despite their prevalence, SVs are less likely than other sequence variants to cause gene expression or quantitative phenotypic variation. We identified 24 SVs that disrupt coding exons, acting as rare variants of large effect on gene function. One-third of the genes so affected have immunological functions.


Subject(s)
Genetic Variation/genetics , Genome/genetics , Mice, Inbred Strains/genetics , Phenotype , Animals , Chromosome Breakpoints , Exons/genetics , Female , Gene Expression , Genomics , Genotype , Male , Mice , Mice, Inbred Strains/immunology , Mutagenesis, Insertional/genetics , Quantitative Trait Loci/genetics , Rats , Retroelements/genetics , Sequence Deletion/genetics
3.
Nature ; 477(7364): 289-94, 2011 Sep 14.
Article in English | MEDLINE | ID: mdl-21921910

ABSTRACT

We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.


Subject(s)
Gene Expression Regulation/genetics , Genetic Variation/genetics , Genome/genetics , Mice, Inbred Strains/genetics , Mice/genetics , Phenotype , Alleles , Animals , Animals, Laboratory/genetics , Genomics , Mice/classification , Mice, Inbred C57BL/genetics , Phylogeny , Quantitative Trait Loci/genetics
4.
PLoS Genet ; 9(6): e1003570, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23785304

ABSTRACT

Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits.


Subject(s)
Chromatin/genetics , Deoxyribonuclease I/genetics , Regulatory Sequences, Nucleic Acid/genetics , Animals , Deoxyribonuclease I/metabolism , Mice , Phenotype
5.
Genome Res ; 21(8): 1239-48, 2011 Aug.
Article in English | MEDLINE | ID: mdl-21493779

ABSTRACT

The Collaborative Cross (CC) is a genetic reference panel of recombinant inbred lines of mice, designed for the dissection of complex traits and gene networks. Each line is independently descended from eight genetically diverse founder strains such that the genomes of the CC lines, once fully inbred, are fine-grained homozygous mosaics of the founder haplotypes. We present an analysis of 120 CC lines, from a cohort of the CC bred at Tel Aviv University in collaboration with the University of Oxford, which at the time of this study were between the sixth and 12th generations of inbreeding and substantially homozygous at 170,000 SNPs. We show how CC genomes decompose into mosaics, and we identify loci that carry a deficiency or excess of a founder, many being deficient for the wild-derived strains WSB/EiJ and PWK/PhJ. We phenotyped 371 mice from 66 CC lines for a susceptibility to Aspergillus fumigatus infection. The survival time after infection varied significantly between CC lines. Quantitative trait locus (QTL) mapping identified genome-wide significant QTLs on chromosomes 2, 3, 8, 10 (two QTLs), 15, and 18. Simulations show that QTL mapping resolution (the median distance between the QTL peak and true location) varied between 0.47 and 1.18 Mb. Most of the QTLs involved contrasts between wild-derived founder strains and therefore would not segregate between classical inbred strains. Use of variation data from the genomes of the CC founder strains refined these QTLs further and suggested several candidate genes. These results support the use of the CC for dissecting complex traits.


Subject(s)
Aspergillosis/genetics , Aspergillus fumigatus/physiology , Crosses, Genetic , Animals , Aspergillosis/microbiology , Chromosome Mapping/methods , Genetic Predisposition to Disease , Haplotypes , Mice , Mice, Inbred Strains , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci
6.
Nature ; 447(7141): 167-77, 2007 May 10.
Article in English | MEDLINE | ID: mdl-17495919

ABSTRACT

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Subject(s)
Evolution, Molecular , Genome/genetics , Genomics , Opossums/genetics , Animals , Base Composition , Conserved Sequence/genetics , DNA Transposable Elements/genetics , Humans , Polymorphism, Single Nucleotide/genetics , Protein Biosynthesis , Synteny/genetics , X Chromosome Inactivation/genetics
7.
PLoS Biol ; 7(5): e1000112, 2009 May 05.
Article in English | MEDLINE | ID: mdl-19468303

ABSTRACT

The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.


Subject(s)
Computational Biology/methods , Genome/genetics , Animals , Databases, Genetic , Gene Duplication , Genome/physiology , Humans , Mice
8.
PLoS Genet ; 5(12): e1000753, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19997497

ABSTRACT

The onset of prezygotic and postzygotic barriers to gene flow between populations is a hallmark of speciation. One of the earliest postzygotic isolating barriers to arise between incipient species is the sterility of the heterogametic sex in interspecies' hybrids. Four genes that underlie hybrid sterility have been identified in animals: Odysseus, JYalpha, and Overdrive in Drosophila and Prdm9 (Meisetz) in mice. Mouse Prdm9 encodes a protein with a KRAB motif, a histone methyltransferase domain and several zinc fingers. The difference of a single zinc finger distinguishes Prdm9 alleles that cause hybrid sterility from those that do not. We find that concerted evolution and positive selection have rapidly altered the number and sequence of Prdm9 zinc fingers across 13 rodent genomes. The patterns of positive selection in Prdm9 zinc fingers imply that rapid evolution has acted on the interface between the Prdm9 protein and the DNA sequences to which it binds. Similar patterns are apparent for Prdm9 zinc fingers for diverse metazoans, including primates. Indeed, allelic variation at the DNA-binding positions of human PRDM9 zinc fingers show significant association with decreased risk of infertility. Prdm9 thus plays a role in determining male sterility both between species (mouse) and within species (human). The recurrent episodes of positive selection acting on Prdm9 suggest that the DNA sequences to which it binds must also be evolving rapidly. Our findings do not identify the nature of the underlying DNA sequences, but argue against the proposed role of Prdm9 as an essential transcription factor in mouse meiosis. We propose a hypothetical model in which incompatibilities between Prdm9-binding specificity and satellite DNAs provide the molecular basis for Prdm9-mediated hybrid sterility. We suggest that Prdm9 should be investigated as a candidate gene in other instances of hybrid sterility in metazoans.


Subject(s)
Evolution, Molecular , Genetic Speciation , Histone-Lysine N-Methyltransferase/genetics , Amino Acid Sequence , Animals , Base Sequence , DNA, Satellite/genetics , Histone-Lysine N-Methyltransferase/chemistry , Humans , Models, Biological , Molecular Sequence Data , Phylogeny , Primates/genetics , Rodentia/genetics , Selection, Genetic , Zinc Fingers/genetics
9.
Bioinformatics ; 26(21): 2778-9, 2010 Nov 01.
Article in English | MEDLINE | ID: mdl-20847218

ABSTRACT

SUMMARY: Computational pipelines are common place in scientific research. However, most of the resources for constructing pipelines are heavyweight systems with graphical user interfaces. Ruffus is a library for the creation of computational pipelines. Its lightweight and unobtrusive design recommends it for use even for the most trivial of analyses. At the same time, it is powerful enough to have been used for complex workflows involving more than 50 interdependent stages. AVAILABILITY AND IMPLEMENTATION: Ruffus is written in python. Source code, a short tutorial, examples and a comprehensive user manual are freely available at http://www.ruffus.org.uk. The example program is available at http://www.ruffus.org.uk/examples/bioinformatics


Subject(s)
Computational Biology/methods , Software , Databases, Factual , Internet
10.
Nature ; 438(7069): 803-19, 2005 Dec 08.
Article in English | MEDLINE | ID: mdl-16341006

ABSTRACT

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.


Subject(s)
Dogs/genetics , Evolution, Molecular , Genome/genetics , Genomics , Haplotypes/genetics , Animals , Conserved Sequence/genetics , Dog Diseases/genetics , Dogs/classification , Female , Humans , Hybridization, Genetic , Male , Mice , Mutagenesis/genetics , Polymorphism, Single Nucleotide/genetics , Rats , Short Interspersed Nucleotide Elements/genetics , Synteny/genetics
11.
Biochem Soc Trans ; 37(Pt 4): 734-9, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19614585

ABSTRACT

To take full advantage of the mouse as a model organism, it is essential to distinguish lineage-specific biology from what is shared between human and mouse. Investigations into shared genetic elements common to both have been well served by the draft human and mouse genome sequences. More recently, the virtually complete euchromatic sequences of the two reference genomes have been finished. These reveal a high ( approximately 5%) level of sequence duplications that had previously been recalcitrant to sequencing and assembly. Within these duplications lie large numbers of rodent- or primate-specific genes. In the present paper, we review the sequence properties of the two genomes, dwelling most on the duplications, deletions and insertions that separate each of them from their most recent common ancestor, approx. 90 million years ago. We consider the differences in gene numbers and repertoires between the two species, and speculate on their contributions to lineage-specific biology. Loss of ancient single-copy genes are rare, as are gains of new functional genes through retrotransposition. Instead, most changes to the gene repertoire have occurred in large multicopy families. It has been proposed that numbers of such 'environmental genes' rise and fall, and their sequences change, as adaptive responses to infection and other environmental pressures, including conspecific competition. Nevertheless, many such genes may be under little or no selection.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Animals , Gene Duplication , Humans , Mice , Sequence Analysis, DNA
12.
Trends Biochem Sci ; 29(6): 289-92, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15276181

ABSTRACT

Vitamin K epoxide reductase (VKOR) recycles reduced vitamin K, which is used subsequently as a co-factor in the gamma-carboxylation of glutamic acid residues in blood coagulation enzymes. VKORC1, a subunit of the VKOR complex, has recently been shown to possess this activity. Here, we show that VKORC1 is a member of a large family of predicted enzymes that are present in vertebrates, Drosophila, plants, bacteria and archaea. Four cysteine residues and one residue, which is either serine or threonine, are identified as likely active-site residues. In some plant and bacterial homologues the VKORC1 homologous domain is fused with domains of the thioredoxin family of oxidoreductases. These might reduce disulfide bonds of VKORC1-like enzymes as a prerequisite for their catalytic activities.


Subject(s)
Mixed Function Oxygenases/metabolism , Amino Acid Sequence , Animals , Binding Sites , Catalysis , Humans , Molecular Sequence Data , Oxidoreductases/metabolism , Sequence Alignment , Sequence Homology, Nucleic Acid , Thioredoxins/chemistry , Vitamin K/metabolism , Vitamin K Epoxide Reductases
13.
Curr Opin Genet Dev ; 13(6): 623-8, 2003 Dec.
Article in English | MEDLINE | ID: mdl-14638325

ABSTRACT

Comparative analyses of eukaryotic genomes are providing insights into the mode and tempo of domain family evolution. Gene duplication, the source of family expansion, far exceeds the rate of emergence of domains from non-coding sequence, and the rate of recruitment of domains into novel architectures. Domain families that appear to be restricted to certain lineages are likely to be the result of gene duplication, coupled with rapid sequence diversification. If such families are evidence of past adaptation, then their functions must relate to the underlying mechanism of selection: competition among organisms.


Subject(s)
Evolution, Molecular , Gene Duplication , Genome , Animals , Eukaryotic Cells , Humans , Protein Structure, Tertiary , Selection, Genetic
14.
Nat Genet ; 50(11): 1574-1583, 2018 11.
Article in English | MEDLINE | ID: mdl-30275530

ABSTRACT

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.


Subject(s)
Chromosome Mapping , Genetic Loci , Genome , Haplotypes , Mice, Inbred Strains/genetics , Animals , Animals, Laboratory , Chromosome Mapping/veterinary , Haplotypes/genetics , Mice , Mice, Inbred BALB C/genetics , Mice, Inbred C3H/genetics , Mice, Inbred C57BL/genetics , Mice, Inbred CBA/genetics , Mice, Inbred DBA/genetics , Mice, Inbred NOD/genetics , Mice, Inbred Strains/classification , Molecular Sequence Annotation , Phylogeny , Polymorphism, Single Nucleotide , Species Specificity
15.
PLoS Comput Biol ; 2(9): e133, 2006 Sep 29.
Article in English | MEDLINE | ID: mdl-17009864

ABSTRACT

Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or "in-paralogues," are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.


Subject(s)
Dogs/genetics , Phylogeny , Synteny/genetics , Animals , Chromosomes, Mammalian/genetics , Computational Biology , Female , Genome/genetics , Humans , Male , Pedigree , Transcription, Genetic/genetics
16.
Nucleic Acids Res ; 30(1): 242-4, 2002 Jan 01.
Article in English | MEDLINE | ID: mdl-11752305

ABSTRACT

SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Proteins/chemistry , Animals , Genome, Human , Humans , Information Storage and Retrieval , Internet , Protein Sorting Signals , Proteins/genetics , Proteins/physiology , Quality Control , Repetitive Sequences, Amino Acid , Sequence Alignment , Sequence Homology, Amino Acid
17.
Nat Genet ; 48(8): 912-8, 2016 08.
Article in English | MEDLINE | ID: mdl-27376238

ABSTRACT

Two bottlenecks impeding the genetic analysis of complex traits in rodents are access to mapping populations able to deliver gene-level mapping resolution and the need for population-specific genotyping arrays and haplotype reference panels. Here we combine low-coverage (0.15×) sequencing with a new method to impute the ancestral haplotype space in 1,887 commercially available outbred mice. We mapped 156 unique quantitative trait loci for 92 phenotypes at a 5% false discovery rate. Gene-level mapping resolution was achieved at about one-fifth of the loci, implicating Unc13c and Pgc1a at loci for the quality of sleep, Adarb2 for home cage activity, Rtkn2 for intensity of reaction to startle, Bmp2 for wound healing, Il15 and Id2 for several T cell measures and Prkca for bone mineral content. These findings have implications for diverse areas of mammalian biology and demonstrate how genome-wide association studies can be extended via low-coverage sequencing to species with highly recombinant outbred populations.


Subject(s)
Animals, Outbred Strains/genetics , Chromosome Mapping , Genetic Markers/genetics , Genome-Wide Association Study , Haplotypes/genetics , Multifactorial Inheritance/genetics , Quantitative Trait Loci/genetics , Animals , Genotype , Mice , Phenotype , Polymorphism, Single Nucleotide/genetics
18.
BMC Genomics ; 6: 120, 2005 Sep 13.
Article in English | MEDLINE | ID: mdl-16159394

ABSTRACT

BACKGROUND: The physiological and phenotypic differences between human and chimpanzee are largely specified by our genomic differences. We have been particularly interested in recent duplications in the human genome as examples of relatively large-scale changes to our genome. We performed an in-depth evolutionary analysis of a region of chromosome 1, which is copy number polymorphic among humans, and that contains at least 32 PRAME (Preferentially expressed antigen of melanoma) genes and pseudogenes. PRAME-like genes are expressed in the testis and in a large number of tumours, and are thought to possess roles in spermatogenesis and oogenesis. RESULTS: Using nucleotide substitution rate estimates for exons and introns, we show that two large segmental duplications, of six and seven human PRAME genes respectively, occurred in the last 3 million years. These duplicated genes are thus hominin-specific, having arisen in our genome since the divergence from chimpanzee. This cluster of PRAME genes appears to have arisen initially from a translocation approximately 95-85 million years ago. We identified multiple sites within human or mouse PRAME sequences which exhibit strong evidence of positive selection. These form a pronounced cluster on one face of the predicted PRAME protein structure. CONCLUSION: We predict that PRAME genes evolved adaptively due to strong competition between rapidly-dividing cells during spermatogenesis and oogenesis. We suggest that as PRAME gene copy number is polymorphic among individuals, positive selection of PRAME alleles may still prevail within the human population.


Subject(s)
Chromosomes, Human, Pair 1 , Gene Expression Regulation, Neoplastic , Alleles , Animals , Cluster Analysis , Evolution, Molecular , Exons , Gene Duplication , Genome , Genome, Human , Humans , Introns , Male , Melanoma/metabolism , Models, Genetic , Models, Molecular , Multigene Family , Pan troglodytes , Phenotype , Phylogeny , Polymorphism, Genetic , Pseudogenes , Selection, Genetic , Testis/metabolism , Translocation, Genetic
19.
Genome Biol ; 11(7): R72, 2010.
Article in English | MEDLINE | ID: mdl-20624288

ABSTRACT

BACKGROUND: Long considered to be the building block of life, it is now apparent that protein is only one of many functional products generated by the eukaryotic genome. Indeed, more of the human genome is transcribed into noncoding sequence than into protein-coding sequence. Nevertheless, whilst we have developed a deep understanding of the relationships between evolutionary constraint and function for protein-coding sequence, little is known about these relationships for non-coding transcribed sequence. This dearth of information is partially attributable to a lack of established non-protein-coding RNA (ncRNA) orthologs among birds and mammals within sequence and expression databases. RESULTS: Here, we performed a multi-disciplinary study of four highly conserved and brain-expressed transcripts selected from a list of mouse long intergenic noncoding RNA (lncRNA) loci that generally show pronounced evolutionary constraint within their putative promoter regions and across exon-intron boundaries. We identify some of the first lncRNA orthologs present in birds (chicken), marsupial (opossum), and eutherian mammals (mouse), and investigate whether they exhibit conservation of brain expression. In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable. CONCLUSIONS: The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla. Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.


Subject(s)
Brain/metabolism , Conserved Sequence/genetics , RNA, Untranslated/genetics , Vertebrates/genetics , Animals , Brain/embryology , Chickens/genetics , Evolution, Molecular , Gene Expression Regulation, Developmental , Genetic Loci/genetics , Mice , Sequence Homology, Nucleic Acid
20.
Nat Genet ; 40(11): 1285-7, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18836446

ABSTRACT

Using a positional cloning approach supported by comparative genomics, we have identified a previously unreported gene, EYS, at the RP25 locus on chromosome 6q12 commonly mutated in autosomal recessive retinitis pigmentosa. Spanning over 2 Mb, this is the largest eye-specific gene identified so far. EYS is independently disrupted in four other mammalian lineages, including that of rodents, but is well conserved from Drosophila to man and is likely to have a role in the modeling of retinal architecture.


Subject(s)
Drosophila Proteins/chemistry , Drosophila melanogaster/chemistry , Eye Proteins/genetics , Genes, Recessive , Mutation/genetics , Retinitis Pigmentosa/genetics , Sequence Homology, Amino Acid , Animals , Cell Line , Chromosomes, Human, Pair 6/genetics , Eye Proteins/chemistry , Eye Proteins/metabolism , Gene Expression Profiling , Gene Expression Regulation , Humans , Protein Structure, Tertiary , Protein Transport
SELECTION OF CITATIONS
SEARCH DETAIL