Search | VHL Regional Portal

1.

Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq).

Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Santoyo-Lopez, Javier; Gonzalez, Jose Manuel; Tapanari, Electra; Mudge, Jonathan M; Steward, Charles A; Wilming, Laurens; Tanzer, Andrea; Howald, Cédric; Chrast, Jacqueline; Vela-Boza, Alicia; Rueda, Antonio; Lopez-Domingo, Francisco J; Dopazo, Joaquin; Reymond, Alexandre; Guigó, Roderic; Harrow, Jennifer.

Nat Commun ; 7: 12339, 2016 08 17.

Article in English | MEDLINE | ID: mdl-27531712

ABSTRACT

Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Polymerase Chain Reaction/methods , RNA, Long Noncoding/genetics , Sequence Analysis, RNA/methods , Exons/genetics , Genetic Loci , Humans , Molecular Sequence Annotation , Organ Specificity/genetics , Proof of Concept Study , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA Splice Sites/genetics , RNA, Long Noncoding/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcriptome/genetics

2.

Comprehensive comparative homeobox gene annotation in human and mouse.

Wilming, Laurens G; Boychenko, Veronika; Harrow, Jennifer L.

Database (Oxford) ; 20152015.

Article in English | MEDLINE | ID: mdl-26412852

ABSTRACT

Homeobox genes are a group of genes coding for transcription factors with a DNA-binding helix-turn-helix structure called a homeodomain and which play a crucial role in pattern formation during embryogenesis. Many homeobox genes are located in clusters and some of these, most notably the HOX genes, are known to have antisense or opposite strand long non-coding RNA (lncRNA) genes that play a regulatory role. Because automated annotation of both gene clusters and non-coding genes is fraught with difficulty (over-prediction, under-prediction, inaccurate transcript structures), we set out to manually annotate all homeobox genes in the mouse and human genomes. This includes all supported splice variants, pseudogenes and both antisense and flanking lncRNAs. One of the areas where manual annotation has a significant advantage is the annotation of duplicated gene clusters. After comprehensive annotation of all homeobox genes and their antisense genes in human and in mouse, we found some discrepancies with the current gene set in RefSeq regarding exact gene structures and coding versus pseudogene locus biotype. We also identified previously un-annotated pseudogenes in the DUX, Rhox and Obox gene clusters, which helped us re-evaluate and update the gene nomenclature in these regions. We found that human homeobox genes are enriched in antisense lncRNA loci, some of which are known to play a role in gene or gene cluster regulation, compared to their mouse orthologues. Of the annotated set of 241 human protein-coding homeobox genes, 98 have an antisense locus (41%) while of the 277 orthologous mouse genes, only 62 protein coding gene have an antisense locus (22%), based on publicly available transcriptional evidence.

Subject(s)

Databases, Nucleic Acid , Genome, Human , Homeodomain Proteins/genetics , Molecular Sequence Annotation/methods , Multigene Family , Pseudogenes , Animals , Helix-Turn-Helix Motifs , Humans , Mice , RNA, Long Noncoding/genetics

3.

The Vertebrate Genome Annotation browser 10 years on.

Harrow, Jennifer L; Steward, Charles A; Frankish, Adam; Gilbert, James G; Gonzalez, Jose M; Loveland, Jane E; Mudge, Jonathan; Sheppard, Dan; Thomas, Mark; Trevanion, Stephen; Wilming, Laurens G.

Nucleic Acids Res ; 42(Database issue): D771-9, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24316575

ABSTRACT

The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).

Subject(s)

Databases, Genetic , Genome , Molecular Sequence Annotation , Animals , Genome, Human , Genomics , Humans , Internet , Mice , Mice, Inbred NOD , Mice, Knockout , Rats , Swine/genetics , Zebrafish/genetics

4.

Current status and new features of the Consensus Coding Sequence database.

Farrell, Catherine M; O'Leary, Nuala A; Harte, Rachel A; Loveland, Jane E; Wilming, Laurens G; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen M J; Aken, Bronwen; Hiatt, Susan M; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles A; Brown, Garth R; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike P; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian D; Hunt, Toby; Webb, David; Thomas, Mark; Tamez, Pamela; Rangwala, Sanjida H; McGarvey, Kelly M; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan M; Gonzalez, Jose M; Gilbert, James G R; Trevanion, Stephen J; Baertsch, Robert; Harrow, Jennifer L; Hubbard, Tim; Ostell, James M; Haussler, David; Pruitt, Kim D.

Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24217909

ABSTRACT

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

Subject(s)

Databases, Genetic , Proteins/genetics , Animals , Exons , Genomics , Humans , Internet , Mice , Molecular Sequence Annotation , Sequence Analysis

5.

Structural and functional annotation of the porcine immunome.

Dawson, Harry D; Loveland, Jane E; Pascal, Géraldine; Gilbert, James G R; Uenishi, Hirohide; Mann, Katherine M; Sang, Yongming; Zhang, Jie; Carvalho-Silva, Denise; Hunt, Toby; Hardy, Matthew; Hu, Zhiliang; Zhao, Shu-Hong; Anselmo, Anna; Shinkai, Hiroki; Chen, Celine; Badaoui, Bouabid; Berman, Daniel; Amid, Clara; Kay, Mike; Lloyd, David; Snow, Catherine; Morozumi, Takeya; Cheng, Ryan Pei-Yen; Bystrom, Megan; Kapetanovic, Ronan; Schwartz, John C; Kataria, Ranjit; Astley, Matthew; Fritz, Eric; Steward, Charles; Thomas, Mark; Wilming, Laurens; Toki, Daisuke; Archibald, Alan L; Bed'Hom, Bertrand; Beraldi, Dario; Huang, Ting-Hua; Ait-Ali, Tahar; Blecha, Frank; Botti, Sara; Freeman, Tom C; Giuffra, Elisabetta; Hume, David A; Lunney, Joan K; Murtaugh, Michael P; Reecy, James M; Harrow, Jennifer L; Rogel-Gaillard, Claire; Tuggle, Christopher K.

BMC Genomics ; 14: 332, 2013 May 15.

Article in English | MEDLINE | ID: mdl-23676093

ABSTRACT

BACKGROUND: The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems. RESULTS: The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome. CONCLUSIONS: This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig's adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.

Subject(s)

Genomics , Immunity/genetics , Molecular Sequence Annotation , Swine/genetics , Swine/immunology , Animals , Cattle , Evolution, Molecular , Gene Duplication , Humans , Immunoglobulins/genetics , Mice , Models, Molecular , Protein Conformation , Receptors, Antigen, T-Cell/genetics , Receptors, KIR/genetics , Selection, Genetic , Species Specificity

6.

Sequencing and comparative analysis of the gorilla MHC genomic sequence.

Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

Database (Oxford) ; 2013: bat011, 2013.

Article in English | MEDLINE | ID: mdl-23589541

ABSTRACT

Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

Subject(s)

Genome/genetics , Gorilla gorilla/genetics , Gorilla gorilla/immunology , Major Histocompatibility Complex/genetics , Sequence Analysis, DNA , Animals , Base Sequence , Chromosome Mapping , Humans , Multigene Family/genetics , Pan troglodytes/genetics

7.

Sequencing and characterization of the FVB/NJ mouse genome.

Wong, Kim; Bumpstead, Suzannah; Van Der Weyden, Louise; Reinholdt, Laura G; Wilming, Laurens G; Adams, David J; Keane, Thomas M.

Genome Biol ; 13(8): R72, 2012 Aug 23.

Article in English | MEDLINE | ID: mdl-22916792

ABSTRACT

BACKGROUND: The FVB/NJ mouse strain has its origins in a colony of outbred Swiss mice established in 1935 at the National Institutes of Health. Mice derived from this source were selectively bred for sensitivity to histamine diphosphate and the B strain of Friend leukemia virus. This led to the establishment of the FVB/N inbred strain, which was subsequently imported to the Jackson Laboratory and designated FVB/NJ. The FVB/NJ mouse has several distinct characteristics, such as large pronuclear morphology, vigorous reproductive performance, and consistently large litters that make it highly desirable for transgenic strain production and general purpose use. RESULTS: Using next-generation sequencing technology, we have sequenced the genome of FVB/NJ to approximately 50-fold coverage, and have generated a comprehensive catalog of single nucleotide polymorphisms, small insertion/deletion polymorphisms, and structural variants, relative to the reference C57BL/6J genome. We have examined a previously identified quantitative trait locus for atherosclerosis susceptibility on chromosome 10 and identify several previously unknown candidate causal variants. CONCLUSION: The sequencing of the FVB/NJ genome and generation of this catalog has increased the number of known variant sites in FVB/NJ by a factor of four, and will help accelerate the identification of the precise molecular variants that are responsible for phenotypes observed in this widely used strain.

Subject(s)

Genome , Mice, Inbred C57BL/genetics , Mice, Inbred Strains/genetics , Sequence Analysis, DNA/methods , Animals , Female , Mice , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Sequence Alignment

8.

Tracking and coordinating an international curation effort for the CCDS Project.

Harte, Rachel A; Farrell, Catherine M; Loveland, Jane E; Suner, Marie-Marthe; Wilming, Laurens; Aken, Bronwen; Barrell, Daniel; Frankish, Adam; Wallin, Craig; Searle, Steve; Diekhans, Mark; Harrow, Jennifer; Pruitt, Kim D.

Database (Oxford) ; 2012: bas008, 2012.

Article in English | MEDLINE | ID: mdl-22434842

ABSTRACT

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi.

Subject(s)

Consensus Sequence , Database Management Systems , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , Animals , Humans , Mice

9.

Meeting report: a workshop on Best Practices in Genome Annotation.

Madupu, Ramana; Brinkac, Lauren M; Harrow, Jennifer; Wilming, Laurens G; Böhme, Ulrike; Lamesch, Philippe; Hannick, Linda I.

Database (Oxford) ; 2010: baq001, 2010.

Article in English | MEDLINE | ID: mdl-20428316

ABSTRACT

Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the 'Best Practices in Genome Annotation: Inference from Evidence' workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.

10.

Discovery of candidate disease genes in ENU-induced mouse mutants by large-scale sequencing, including a splice-site mutation in nucleoredoxin.

Boles, Melissa K; Wilkinson, Bonney M; Wilming, Laurens G; Liu, Bin; Probst, Frank J; Harrow, Jennifer; Grafham, Darren; Hentges, Kathryn E; Woodward, Lanette P; Maxwell, Andrea; Mitchell, Karen; Risley, Michael D; Johnson, Randy; Hirschi, Karen; Lupski, James R; Funato, Yosuke; Miki, Hiroaki; Marin-Garcia, Pablo; Matthews, Lucy; Coffey, Alison J; Parker, Anne; Hubbard, Tim J; Rogers, Jane; Bradley, Allan; Adams, David J; Justice, Monica J.

PLoS Genet ; 5(12): e1000759, 2009 Dec.

Article in English | MEDLINE | ID: mdl-20011118

ABSTRACT

An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated in an N-ethyl-N-nitrosourea (ENU) mutation screen targeted to mouse Chromosome 11. Fifty-nine sequence variants were identified in 55 genes from 31 mutant lines. 39% of the lesions lie in coding sequences and create primarily missense mutations. The other 61% lie in noncoding regions, many of them in highly conserved sequences. A lesion in the perinatal lethal line l11Jus13 alters a consensus splice site of nucleoredoxin (Nxn), inserting 10 amino acids into the resulting protein. We conclude that point mutations can be accurately and sensitively recovered by large-scale sequencing, and that conserved noncoding regions should be included for disease mutation identification. Only seven of the candidate genes we report have been previously targeted by mutation in mice or rats, showing that despite ongoing efforts to functionally annotate genes in the mammalian genome, an enormous gap remains between phenotype and function. Our data show that the classical positional mapping approach of disease mutation identification can be extended to large target regions using high-throughput sequencing.

Subject(s)

Ethylnitrosourea/pharmacology , Gene Expression Profiling , Mutation , Nuclear Proteins/genetics , Oxidoreductases/genetics , Animals , Chromosome Mapping , Exons , Genes, Lethal , Mice , Mice, Mutant Strains

11.

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Pruitt, Kim D; Harrow, Jennifer; Harte, Rachel A; Wallin, Craig; Diekhans, Mark; Maglott, Donna R; Searle, Steve; Farrell, Catherine M; Loveland, Jane E; Ruef, Barbara J; Hart, Elizabeth; Suner, Marie-Marthe; Landrum, Melissa J; Aken, Bronwen; Ayling, Sarah; Baertsch, Robert; Fernandez-Banet, Julio; Cherry, Joshua L; Curwen, Val; Dicuccio, Michael; Kellis, Manolis; Lee, Jennifer; Lin, Michael F; Schuster, Michael; Shkeda, Andrew; Amid, Clara; Brown, Garth; Dukhanina, Oksana; Frankish, Adam; Hart, Jennifer; Maidak, Bonnie L; Mudge, Jonathan; Murphy, Michael R; Murphy, Terence; Rajan, Jeena; Rajput, Bhanu; Riddick, Lillian D; Snow, Catherine; Steward, Charles; Webb, David; Weber, Janet A; Wilming, Laurens; Wu, Wenyu; Birney, Ewan; Haussler, David; Hubbard, Tim; Ostell, James; Durbin, Richard; Lipman, David.

Genome Res ; 19(7): 1316-23, 2009 Jul.

Article in English | MEDLINE | ID: mdl-19498102

ABSTRACT

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

Subject(s)

Consensus Sequence , Genome , Open Reading Frames/genetics , Animals , Humans , Mice , Sequence Alignment

12.

Dynamic instability of the major urinary protein gene family revealed by genomic and phenotypic comparisons between C57 and 129 strain mice.

Mudge, Jonathan M; Armstrong, Stuart D; McLaren, Karen; Beynon, Robert J; Hurst, Jane L; Nicholson, Christine; Robertson, Duncan H; Wilming, Laurens G; Harrow, Jennifer L.

Genome Biol ; 9(5): R91, 2008.

Article in English | MEDLINE | ID: mdl-18507838

ABSTRACT

BACKGROUND: The major urinary proteins (MUPs) of Mus musculus domesticus are deposited in urine in large quantities, where they bind and release pheromones and also provide an individual 'recognition signal' via their phenotypic polymorphism. Whilst important information about MUP functionality has been gained in recent years, the gene cluster is poorly studied in terms of structure, genic polymorphism and evolution. RESULTS: We combine targeted sequencing, manual genome annotation and phylogenetic analysis to compare the Mup clusters of C57BL/6J and 129 strains of mice. We describe organizational heterogeneity within both clusters: a central array of cassettes containing Mup genes highly similar at the protein level, flanked by regions containing Mup genes displaying significantly elevated divergence. Observed genomic rearrangements in all regions have likely been mediated by endogenous retroviral elements. Mup loci with coding sequences that differ between the strains are identified--including a gene/pseudogene pair--suggesting that these inbred lineages exhibit variation that exists in wild populations. We have characterized the distinct MUP profiles in the urine of both strains by mass spectrometry. The total MUP phenotype data is reconciled with our genomic sequence data, matching all proteins identified in urine to annotated genes. CONCLUSION: Our observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of Mup gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns. We propose that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.

Subject(s)

Mice/genetics , Proteins/genetics , Animals , Evolution, Molecular , Female , Male , Mass Spectrometry , Mice, Inbred C57BL , Mice, Inbred Strains , Molecular Weight , Proteins/chemistry , Species Specificity

13.

Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project.

Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J; Almeida, Jeff; Forbes, Simon; Gilbert, James G R; Halls, Karen; Harrow, Jennifer L; Hart, Elizabeth; Howe, Kevin; Jackson, David K; Palmer, Sophie; Roberts, Anne N; Sims, Sarah; Stewart, C Andrew; Traherne, James A; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J; Elliott, John F; Sawcer, Stephen; Todd, John A; Trowsdale, John; Beck, Stephan.

Immunogenetics ; 60(1): 1-18, 2008 Jan.

Article in English | MEDLINE | ID: mdl-18193213

ABSTRACT

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.

Subject(s)

Databases, Genetic , Genetic Variation/immunology , HLA Antigens/genetics , Haplotypes/genetics , Terminology as Topic , Computational Biology/methods , Computational Biology/trends , Genome, Human , Humans

14.

Pseudo-messenger RNA: phantoms of the transcriptome.

Frith, Martin C; Wilming, Laurens G; Forrest, Alistair; Kawaji, Hideya; Tan, Sin Lam; Wahlestedt, Claes; Bajic, Vladimir B; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Bailey, Timothy L; Huminiecki, Lukasz.

PLoS Genet ; 2(4): e23, 2006 Apr.

Article in English | MEDLINE | ID: mdl-16683022

ABSTRACT

The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo-messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.

Subject(s)

RNA, Messenger/genetics , Transcription, Genetic , Animals , DNA Transposable Elements , Evolution, Molecular , Humans , Mice , Promoter Regions, Genetic , Proteins/genetics , Pseudogenes , Reproducibility of Results , Sequence Alignment

15.

DNA sequence of human chromosome 17 and analysis of rearrangement in the human lineage.

Zody, Michael C; Garber, Manuel; Adams, David J; Sharpe, Ted; Harrow, Jennifer; Lupski, James R; Nicholson, Christine; Searle, Steven M; Wilming, Laurens; Young, Sarah K; Abouelleil, Amr; Allen, Nicole R; Bi, Weimin; Bloom, Toby; Borowsky, Mark L; Bugalter, Boris E; Butler, Jonathan; Chang, Jean L; Chen, Chao-Kung; Cook, April; Corum, Benjamin; Cuomo, Christina A; de Jong, Pieter J; DeCaprio, David; Dewar, Ken; FitzGerald, Michael; Gilbert, James; Gibson, Richard; Gnerre, Sante; Goldstein, Steven; Grafham, Darren V; Grocock, Russell; Hafez, Nabil; Hagopian, Daniel S; Hart, Elizabeth; Norman, Catherine Hosage; Humphray, Sean; Jaffe, David B; Jones, Matt; Kamal, Michael; Khodiyar, Varsha K; LaButti, Kurt; Laird, Gavin; Lehoczky, Jessica; Liu, Xiaohong; Lokyitsang, Tashi; Loveland, Jane; Lui, Annie; Macdonald, Pendexter; Major, John E.

Nature ; 440(7087): 1045-9, 2006 Apr 20.

Article in English | MEDLINE | ID: mdl-16625196

ABSTRACT

Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome. It is also enriched in segmental duplications, ranking third in density among the autosomes. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.

Subject(s)

Chromosomes, Human, Pair 17/genetics , Evolution, Molecular , Animals , Base Composition , Gene Duplication , Humans , Long Interspersed Nucleotide Elements/genetics , Mice , Sequence Analysis, DNA , Short Interspersed Nucleotide Elements/genetics , Synteny/genetics

16.

Genomic anatomy of the Tyrp1 (brown) deletion complex.

Smyth, Ian M; Wilming, Laurens; Lee, Angela W; Taylor, Martin S; Gautier, Phillipe; Barlow, Karen; Wallis, Justine; Martin, Sancha; Glithero, Rebecca; Phillimore, Ben; Pelan, Sarah; Andrew, Rob; Holt, Karen; Taylor, Ruth; McLaren, Stuart; Burton, John; Bailey, Jonathon; Sims, Sarah; Squares, Jan; Plumb, Bob; Joy, Ann; Gibson, Richard; Gilbert, James; Hart, Elizabeth; Laird, Gavin; Loveland, Jane; Mudge, Jonathan; Steward, Charlie; Swarbreck, David; Harrow, Jennifer; North, Philip; Leaves, Nicholas; Greystrong, John; Coppola, Maria; Manjunath, Shilpa; Campbell, Mark; Smith, Mark; Strachan, Gregory; Tofts, Calli; Boal, Esther; Cobley, Victoria; Hunter, Giselle; Kimberley, Christopher; Thomas, Daniel; Cave-Berry, Lee; Weston, Paul; Botcherby, Marc R M; White, Sharon; Edgar, Ruth; Cross, Sally H.

Proc Natl Acad Sci U S A ; 103(10): 3704-9, 2006 Mar 07.

Article in English | MEDLINE | ID: mdl-16505357

ABSTRACT

Chromosome deletions in the mouse have proven invaluable in the dissection of gene function. The brown deletion complex comprises >28 independent genome rearrangements, which have been used to identify several functional loci on chromosome 4 required for normal embryonic and postnatal development. We have constructed a 172-bacterial artificial chromosome contig that spans this 22-megabase (Mb) interval and have produced a contiguous, finished, and manually annotated sequence from these clones. The deletion complex is strikingly gene-poor, containing only 52 protein-coding genes (of which only 39 are supported by human homologues) and has several further notable genomic features, including several segments of >1 Mb, apparently devoid of a coding sequence. We have used sequence polymorphisms to finely map the deletion breakpoints and identify strong candidate genes for the known phenotypes that map to this region, including three lethal loci (l4Rn1, l4Rn2, and l4Rn3) and the fitness mutant brown-associated fitness (baf). We have also characterized misexpression of the basonuclin homologue, Bnc2, associated with the inversion-mediated coat color mutant white-based brown (B(w)). This study provides a molecular insight into the basis of several characterized mouse mutants, which will allow further dissection of this region by targeted or chemical mutagenesis.

Subject(s)

Chromosome Deletion , Membrane Glycoproteins/genetics , Oxidoreductases/genetics , Animals , Base Sequence , Biological Evolution , Chromosome Mapping , Chromosomes, Artificial, Bacterial/genetics , Female , Fetal Death/genetics , Genes, Lethal , Hair Color/genetics , Mice , Mice, Inbred C57BL , Mice, Mutant Strains , Phenotype , Polymorphism, Single Nucleotide , Pregnancy

17.

Genetic analysis of completely sequenced disease-associated MHC haplotypes identifies shuffling of segments in recent human history.

Traherne, James A; Horton, Roger; Roberts, Anne N; Miretti, Marcos M; Hurles, Matthew E; Stewart, C Andrew; Ashurst, Jennifer L; Atrazhev, Alexey M; Coggill, Penny; Palmer, Sophie; Almeida, Jeff; Sims, Sarah; Wilming, Laurens G; Rogers, Jane; de Jong, Pieter J; Carrington, Mary; Elliott, John F; Sawcer, Stephen; Todd, John A; Trowsdale, John; Beck, Stephan.

PLoS Genet ; 2(1): e9, 2006 Jan.

Article in English | MEDLINE | ID: mdl-16440057

ABSTRACT

The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.

Subject(s)

Evolution, Molecular , Haplotypes/genetics , Major Histocompatibility Complex , Chromosome Mapping , Chromosomes, Artificial, Bacterial , Cloning, Molecular , Genetic Variation , HLA-DR Antigens/genetics , Humans , Polymorphism, Genetic , Polymorphism, Single Nucleotide , Recombination, Genetic , Sequence Analysis, DNA

18.

Gene map of the extended human MHC.

Horton, Roger; Wilming, Laurens; Rand, Vikki; Lovering, Ruth C; Bruford, Elspeth A; Khodiyar, Varsha K; Lush, Michael J; Povey, Sue; Talbot, C Conover; Wright, Mathew W; Wain, Hester M; Trowsdale, John; Ziegler, Andreas; Beck, Stephan.

Nat Rev Genet ; 5(12): 889-99, 2004 Dec.

Article in English | MEDLINE | ID: mdl-15573121

ABSTRACT

The major histocompatibility complex (MHC) is the most important region in the vertebrate genome with respect to infection and autoimmunity, and is crucial in adaptive and innate immunity. Decades of biomedical research have revealed many MHC genes that are duplicated, polymorphic and associated with more diseases than any other region of the human genome. The recent completion of several large-scale studies offers the opportunity to assimilate the latest data into an integrated gene map of the extended human MHC. Here, we present this map and review its content in relation to paralogy, polymorphism, immune function and disease.

Subject(s)

Genome, Human , Major Histocompatibility Complex , Autoimmune Diseases/genetics , Chromosome Mapping , Chromosomes, Human, Pair 6 , Humans , Immunity , Multigene Family , Polymorphism, Genetic , RNA, Transfer/genetics

19.

Organization and evolution of a gene-rich region of the mouse genome: a 12.7-Mb region deleted in the Del(13)Svea36H mouse.

Mallon, Ann-Marie; Wilming, Laurens; Weekes, Joseph; Gilbert, James G R; Ashurst, Jennifer; Peyrefitte, Sandrine; Matthews, Lucy; Cadman, Matthew; McKeone, Richard; Sellick, Chris A; Arkell, Ruth; Botcherby, Marc R M; Strivens, Mark A; Campbell, R Duncan; Gregory, Simon; Denny, Paul; Hancock, John M; Rogers, Jane; Brown, Steve D M.

Genome Res ; 14(10A): 1888-901, 2004 Oct.

Article in English | MEDLINE | ID: mdl-15364904

ABSTRACT

Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution.

Subject(s)

Evolution, Molecular , Genome , Sequence Deletion , Animals , Mice , Multigene Family

20.

Complete MHC haplotype sequencing for common disease gene mapping.

Stewart, C Andrew; Horton, Roger; Allcock, Richard J N; Ashurst, Jennifer L; Atrazhev, Alexey M; Coggill, Penny; Dunham, Ian; Forbes, Simon; Halls, Karen; Howson, Joanna M M; Humphray, Sean J; Hunt, Sarah; Mungall, Andrew J; Osoegawa, Kazutoyo; Palmer, Sophie; Roberts, Anne N; Rogers, Jane; Sims, Sarah; Wang, Yu; Wilming, Laurens G; Elliott, John F; de Jong, Pieter J; Sawcer, Stephen; Todd, John A; Trowsdale, John; Beck, Stephan.

Genome Res ; 14(6): 1176-87, 2004 Jun.

Article in English | MEDLINE | ID: mdl-15140828

ABSTRACT

The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification.

Subject(s)

Autoimmune Diseases/genetics , Chromosome Mapping/methods , Genetic Predisposition to Disease/genetics , Haplotypes/genetics , Major Histocompatibility Complex/genetics , Cell Line , Chromosome Mapping/statistics & numerical data , Chromosomes, Artificial, Bacterial/genetics , Consanguinity , Genes/genetics , Genetic Variation , Genome, Human , HLA-A1 Antigen/genetics , HLA-A3 Antigen/genetics , HLA-B8 Antigen/genetics , HLA-C Antigens/genetics , HLA-DR3 Antigen/genetics , Humans , Linkage Disequilibrium/genetics , Polymorphism, Genetic/genetics , White People/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL