Search | Virtual Health Library

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Morales, Joannella; Pujar, Shashikant; Loveland, Jane E; Astashyn, Alex; Bennett, Ruth; Berry, Andrew; Cox, Eric; Davidson, Claire; Ermolaeva, Olga; Farrell, Catherine M; Fatima, Reham; Gil, Laurent; Goldfarb, Tamara; Gonzalez, Jose M; Haddad, Diana; Hardy, Matthew; Hunt, Toby; Jackson, John; Joardar, Vinita S; Kay, Michael; Kodali, Vamsi K; McGarvey, Kelly M; McMahon, Aoife; Mudge, Jonathan M; Murphy, Daniel N; Murphy, Michael R; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Thibaud-Nissen, Françoise; Threadgold, Glen; Vatsan, Anjana R; Wallin, Craig; Webb, David; Flicek, Paul; Birney, Ewan; Pruitt, Kim D; Frankish, Adam; Cunningham, Fiona; Murphy, Terence D.

Nature ; 604(7905): 310-315, 2022 04.

Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.

Subject(s)

Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States

Ensembl 2019.

Cunningham, Fiona; Achuthan, Premanand; Akanni, Wasiu; Allen, James; Amode, M Ridwan; Armean, Irina M; Bennett, Ruth; Bhai, Jyothish; Billis, Konstantinos; Boddu, Sanjay; Cummins, Carla; Davidson, Claire; Dodiya, Kamalkumar Jayantilal; Gall, Astrid; Girón, Carlos García; Gil, Laurent; Grego, Tiago; Haggerty, Leanne; Haskell, Erin; Hourlier, Thibaut; Izuogu, Osagie G; Janacek, Sophie H; Juettemann, Thomas; Kay, Mike; Laird, Matthew R; Lavidas, Ilias; Liu, Zhicheng; Loveland, Jane E; Marugán, José C; Maurel, Thomas; McMahon, Aoife C; Moore, Benjamin; Morales, Joannella; Mudge, Jonathan M; Nuhn, Michael; Ogeh, Denye; Parker, Anne; Parton, Andrew; Patricio, Mateus; Abdul Salam, Ahamed Imran; Schmitt, Bianca M; Schuilenburg, Helen; Sheppard, Dan; Sparrow, Helen; Stapleton, Eloise; Szuba, Marek; Taylor, Kieron; Threadgold, Glen; Thormann, Anja; Vullo, Alessandro.

Nucleic Acids Res ; 47(D1): D745-D751, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30407521

ABSTRACT

The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. Ensembl seeks to be a fundamental resource driving scientific progress by creating, maintaining and updating reference genome annotation and comparative genomics resources. This year we describe our new and expanded gene, variant and comparative annotation capabilities, which led to a 50% increase in the number of vertebrate genomes we support. We have also doubled the number of available human variants and added regulatory regions for many mouse cell types and developmental stages. Our data sets and tools are available via the Ensembl website as well as a through a RESTful webservice, Perl application programming interface and as data files for download.

Subject(s)

Databases, Genetic , Genome/genetics , Genomics , Vertebrates/genetics , Animals , Computational Biology/trends , Humans , Mice , Molecular Sequence Annotation , Software

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Schneider, Valerie A; Graves-Lindsay, Tina; Howe, Kerstin; Bouk, Nathan; Chen, Hsiu-Chuan; Kitts, Paul A; Murphy, Terence D; Pruitt, Kim D; Thibaud-Nissen, Françoise; Albracht, Derek; Fulton, Robert S; Kremitzki, Milinn; Magrini, Vincent; Markovic, Chris; McGrath, Sean; Steinberg, Karyn Meltz; Auger, Kate; Chow, William; Collins, Joanna; Harden, Glenn; Hubbard, Timothy; Pelan, Sarah; Simpson, Jared T; Threadgold, Glen; Torrance, James; Wood, Jonathan M; Clarke, Laura; Koren, Sergey; Boitano, Matthew; Peluso, Paul; Li, Heng; Chin, Chen-Shan; Phillippy, Adam M; Durbin, Richard; Wilson, Richard K; Flicek, Paul; Eichler, Evan E; Church, Deanna M.

Genome Res ; 27(5): 849-864, 2017 05.

Article in English | MEDLINE | ID: mdl-28396521

ABSTRACT

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Subject(s)

Contig Mapping/methods , Genome, Human , Genomics/methods , Sequence Analysis, DNA/methods , Software , Contig Mapping/standards , Genomics/standards , Haploidy , Haplotypes , Humans , Polymorphism, Genetic , Reference Standards , Sequence Analysis, DNA/standards

The pig X and Y Chromosomes: structure, sequence, and evolution.

Skinner, Benjamin M; Sargent, Carole A; Churcher, Carol; Hunt, Toby; Herrero, Javier; Loveland, Jane E; Dunn, Matt; Louzada, Sandra; Fu, Beiyuan; Chow, William; Gilbert, James; Austin-Guest, Siobhan; Beal, Kathryn; Carvalho-Silva, Denise; Cheng, William; Gordon, Daria; Grafham, Darren; Hardy, Matt; Harley, Jo; Hauser, Heidi; Howden, Philip; Howe, Kerstin; Lachani, Kim; Ellis, Peter J I; Kelly, Daniel; Kerry, Giselle; Kerwin, James; Ng, Bee Ling; Threadgold, Glen; Wileman, Thomas; Wood, Jonathan M D; Yang, Fengtang; Harrow, Jen; Affara, Nabeel A; Tyler-Smith, Chris.

Genome Res ; 26(1): 130-9, 2016 Jan.

Article in English | MEDLINE | ID: mdl-26560630

ABSTRACT

We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which are protein coding. Gene order closely matches that found in primates (including humans) and carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X Chromosome were absent from the pig, and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y-specific Chromosome sequence generated here comprises 30 megabases (Mb). A 15-Mb subset of this sequence was assembled, revealing two clusters of male-specific low copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. Many of the ancestral X-related genes previously reported in at least one mammalian Y Chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes--both single copy and amplified--on the pig Y Chromosome, to compare the pig X and Y Chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y Chromosome evolution.

Subject(s)

Chromosomes, Mammalian/genetics , Evolution, Molecular , Swine/genetics , X Chromosome/genetics , Y Chromosome/genetics , Animals , Base Sequence , Cats/genetics , Dogs/genetics , Female , Gene Conversion , Gene Expression , Gene Library , Gene Order , Humans , Male , Molecular Sequence Data , Sequence Alignment , Sequence Analysis, DNA

Modernizing reference genome assemblies.

Church, Deanna M; Schneider, Valerie A; Graves, Tina; Auger, Katherine; Cunningham, Fiona; Bouk, Nathan; Chen, Hsiu-Chuan; Agarwala, Richa; McLaren, William M; Ritchie, Graham R S; Albracht, Derek; Kremitzki, Milinn; Rock, Susan; Kotkiewicz, Holland; Kremitzki, Colin; Wollam, Aye; Trani, Lee; Fulton, Lucinda; Fulton, Robert; Matthews, Lucy; Whitehead, Siobhan; Chow, Will; Torrance, James; Dunn, Matthew; Harden, Glenn; Threadgold, Glen; Wood, Jonathan; Collins, Joanna; Heath, Paul; Griffiths, Guy; Pelan, Sarah; Grafham, Darren; Eichler, Evan E; Weinstock, George; Mardis, Elaine R; Wilson, Richard K; Howe, Kerstin; Flicek, Paul; Hubbard, Tim.

PLoS Biol ; 9(7): e1001091, 2011 Jul.

Article in English | MEDLINE | ID: mdl-21750661

Subject(s)

Databases, Genetic , Genome, Human , Humans , International Cooperation

Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

Lilue, Jingtao; Doran, Anthony G; Fiddes, Ian T; Abrudan, Monica; Armstrong, Joel; Bennett, Ruth; Chow, William; Collins, Joanna; Collins, Stephan; Czechanski, Anne; Danecek, Petr; Diekhans, Mark; Dolle, Dirk-Dominik; Dunn, Matt; Durbin, Richard; Earl, Dent; Ferguson-Smith, Anne; Flicek, Paul; Flint, Jonathan; Frankish, Adam; Fu, Beiyuan; Gerstein, Mark; Gilbert, James; Goodstadt, Leo; Harrow, Jennifer; Howe, Kerstin; Ibarra-Soria, Ximena; Kolmogorov, Mikhail; Lelliott, Chris J; Logan, Darren W; Loveland, Jane; Mathews, Clayton E; Mott, Richard; Muir, Paul; Nachtweide, Stefanie; Navarro, Fabio C P; Odom, Duncan T; Park, Naomi; Pelan, Sarah; Pham, Son K; Quail, Mike; Reinholdt, Laura; Romoth, Lars; Shirley, Lesley; Sisu, Cristina; Sjoberg-Herrera, Marcela; Stanke, Mario; Steward, Charles; Thomas, Mark; Threadgold, Glen.

Nat Genet ; 50(11): 1574-1583, 2018 11.

Article in English | MEDLINE | ID: mdl-30275530

ABSTRACT

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.

Subject(s)

Chromosome Mapping , Genetic Loci , Genome , Haplotypes , Mice, Inbred Strains/genetics , Animals , Animals, Laboratory , Chromosome Mapping/veterinary , Haplotypes/genetics , Mice , Mice, Inbred BALB C/genetics , Mice, Inbred C3H/genetics , Mice, Inbred C57BL/genetics , Mice, Inbred CBA/genetics , Mice, Inbred DBA/genetics , Mice, Inbred NOD/genetics , Mice, Inbred Strains/classification , Molecular Sequence Annotation , Phylogeny , Polymorphism, Single Nucleotide , Species Specificity

A physical map of the mouse genome.

Gregory, Simon G; Sekhon, Mandeep; Schein, Jacqueline; Zhao, Shaying; Osoegawa, Kazutoyo; Scott, Carol E; Evans, Richard S; Burridge, Paul W; Cox, Tony V; Fox, Christopher A; Hutton, Richard D; Mullenger, Ian R; Phillips, Kimbly J; Smith, James; Stalker, Jim; Threadgold, Glen J; Birney, Ewan; Wylie, Kristine; Chinwalla, Asif; Wallis, John; Hillier, LaDeana; Carter, Jason; Gaige, Tony; Jaeger, Sara; Kremitzki, Colin; Layman, Dan; Maas, Jason; McGrane, Rebecca; Mead, Kelly; Walker, Rebecca; Jones, Steven; Smith, Michael; Asano, Jennifer; Bosdet, Ian; Chan, Susanna; Chittaranjan, Suganthi; Chiu, Readman; Fjell, Chris; Fuhrmann, Dan; Girn, Noreen; Gray, Catharine; Guin, Ran; Hsiao, Letticia; Krzywinski, Martin; Kutsche, Reta; Lee, Soo Sen; Mathewson, Carrie; McLeavy, Candice; Messervier, Steve; Ness, Steven.

Nature ; 418(6899): 743-50, 2002 Aug 15.

Article in English | MEDLINE | ID: mdl-12181558

ABSTRACT

A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs were aligned to the human genome sequence on the basis of 51,486 homology matches, thus enabling use of the conserved synteny (correspondence between chromosome blocks) of the two genomes to accelerate construction of the mouse map. The map provides a framework for assembly of whole-genome shotgun sequence data, and a tile path of clones for generation of the reference sequence. Definition of the human-mouse alignment at this level of resolution enables identification of a mouse clone that corresponds to almost any position in the human genome. The human sequence may be used to facilitate construction of other mammalian genome maps using the same strategy.

Subject(s)

Genome , Mice/genetics , Physical Chromosome Mapping/methods , Animals , Chromosomes/genetics , Chromosomes, Human, Pair 6/genetics , Cloning, Molecular , Conserved Sequence/genetics , Contig Mapping/methods , Genome, Human , Humans , Molecular Sequence Data , Radiation Hybrid Mapping , Sequence Alignment , Sequence Homology, Nucleic Acid , Species Specificity , Synteny

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL