Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
Cell Genom ; 4(4): 100527, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38537634

ABSTRACT

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.


Subject(s)
Genome , Genomics , Rats , Animals , Genome/genetics , Molecular Sequence Annotation , Whole Genome Sequencing , Genetic Variation/genetics
2.
Nat Ecol Evol ; 8(3): 519-535, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38216617

ABSTRACT

Polyploidy or whole-genome duplication (WGD) is a major event that drastically reshapes genome architecture and is often assumed to be causally associated with organismal innovations and radiations. The 2R hypothesis suggests that two WGD events (1R and 2R) occurred during early vertebrate evolution. However, the timing of the 2R event relative to the divergence of gnathostomes (jawed vertebrates) and cyclostomes (jawless hagfishes and lampreys) is unresolved and whether these WGD events underlie vertebrate phenotypic diversification remains elusive. Here we present the genome of the inshore hagfish, Eptatretus burgeri. Through comparative analysis with lamprey and gnathostome genomes, we reconstruct the early events in cyclostome genome evolution, leveraging insights into the ancestral vertebrate genome. Genome-wide synteny and phylogenetic analyses support a scenario in which 1R occurred in the vertebrate stem-lineage during the early Cambrian, and 2R occurred in the gnathostome stem-lineage, maximally in the late Cambrian-earliest Ordovician, after its divergence from cyclostomes. We find that the genome of stem-cyclostomes experienced an additional independent genome triplication. Functional genomic and morphospace analyses demonstrate that WGD events generally contribute to developmental evolution with similar changes in the regulatory genome of both vertebrate groups. However, appreciable morphological diversification occurred only in the gnathostome but not in the cyclostome lineage, calling into question the general expectation that WGDs lead to leaps of bodyplan complexity.


Subject(s)
Hagfishes , Animals , Phylogeny , Hagfishes/genetics , Gene Duplication , Vertebrates/genetics , Genome , Lampreys/genetics
3.
Nature ; 621(7978): 344-354, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37612512

ABSTRACT

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Subject(s)
Chromosomes, Human, Y , Genomics , Sequence Analysis, DNA , Humans , Base Sequence , Chromosomes, Human, Y/genetics , DNA, Satellite/genetics , Genetic Variation/genetics , Genetics, Population , Genomics/methods , Genomics/standards , Heterochromatin/genetics , Multigene Family/genetics , Reference Standards , Segmental Duplications, Genomic/genetics , Sequence Analysis, DNA/standards , Tandem Repeat Sequences/genetics , Telomere/genetics
4.
BMC Genomics ; 24(1): 408, 2023 Jul 19.
Article in English | MEDLINE | ID: mdl-37468834

ABSTRACT

BACKGROUND: The group of > 40 cryptic whitefly species called Bemisia tabaci sensu lato are amongst the world's worst agricultural pests and plant-virus vectors. Outbreaks of B. tabaci s.l. and the associated plant-virus diseases continue to contribute to global food insecurity and social instability, particularly in sub-Saharan Africa and Asia. Published B. tabaci s.l. genomes have limited use for studying African cassava B. tabaci SSA1 species, due to the high genetic divergences between them. Genomic annotations presented here were performed using the 'Ensembl gene annotation system', to ensure that comparative analyses and conclusions reflect biological differences, as opposed to arising from different methodologies underpinning transcript model identification. RESULTS: We present here six new B. tabaci s.l. genomes from Africa and Asia, and two re-annotated previously published genomes, to provide evolutionary insights into these globally distributed pests. Genome sizes ranged between 616-658 Mb and exhibited some of the highest coverage of transposable elements reported within Arthropoda. Many fewer total protein coding genes (PCG) were recovered compared to the previously published B. tabaci s.l. genomes and structural annotations generated via the uniform methodology strongly supported a repertoire of between 12.8-13.2 × 103 PCG. An integrative systematics approach incorporating phylogenomic analysis of nuclear and mitochondrial markers supported a monophyletic Aleyrodidae and the basal positioning of B. tabaci Uganda-1 to the sub-Saharan group of species. Reciprocal cross-mating data and the co-cladogenesis pattern of the primary obligate endosymbiont 'Candidatus Portiera aleyrodidarum' from 11 Bemisia genomes further supported the phylogenetic reconstruction to show that African cassava B. tabaci populations consist of just three biological species. We include comparative analyses of gene families related to detoxification, sugar metabolism, vector competency and evaluate the presence and function of horizontally transferred genes, essential for understanding the evolution and unique biology of constituent B. tabaci. s.l species. CONCLUSIONS: These genomic resources have provided new and critical insights into the genetics underlying B. tabaci s.l. biology. They also provide a rich foundation for post-genomic research, including the selection of candidate gene-targets for innovative whitefly and virus-control strategies.


Subject(s)
Hemiptera , Plant Viruses , Animals , Phylogeny , Africa , Asia
5.
Nature ; 617(7960): 312-324, 2023 05.
Article in English | MEDLINE | ID: mdl-37165242

ABSTRACT

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Subject(s)
Genome, Human , Genomics , Humans , Diploidy , Genome, Human/genetics , Haplotypes/genetics , Sequence Analysis, DNA , Genomics/standards , Reference Standards , Cohort Studies , Alleles , Genetic Variation
6.
bioRxiv ; 2023 Sep 28.
Article in English | MEDLINE | ID: mdl-37214860

ABSTRACT

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared to its predecessor. Gene annotations are now more complete, significantly improving the mapping precision of genomic, transcriptomic, and proteomics data sets. We jointly analyzed 163 short-read whole genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ~20.0 million sequence variations, of which 18.7 thousand are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.

7.
Science ; 380(6642): 283-293, 2023 04 21.
Article in English | MEDLINE | ID: mdl-37079675

ABSTRACT

Tasmanian devils have spawned two transmissible cancer lineages, named devil facial tumor 1 (DFT1) and devil facial tumor 2 (DFT2). We investigated the genetic diversity and evolution of these clones by analyzing 78 DFT1 and 41 DFT2 genomes relative to a newly assembled, chromosome-level reference. Time-resolved phylogenetic trees reveal that DFT1 first emerged in 1986 (1982 to 1989) and DFT2 in 2011 (2009 to 2012). Subclone analysis documents transmission of heterogeneous cell populations. DFT2 has faster mutation rates than DFT1 across all variant classes, including substitutions, indels, rearrangements, transposable element insertions, and copy number alterations, and we identify a hypermutated DFT1 lineage with defective DNA mismatch repair. Several loci show plausible evidence of positive selection in DFT1 or DFT2, including loss of chromosome Y and inactivation of MGA, but none are common to both cancers. This study reveals the parallel long-term evolution of two transmissible cancers inhabiting a common niche in Tasmanian devils.


Subject(s)
Evolution, Molecular , Facial Neoplasms , Marsupialia , Selection, Genetic , Animals , Facial Neoplasms/classification , Facial Neoplasms/genetics , Facial Neoplasms/veterinary , Genome , Marsupialia/genetics , Phylogeny
8.
Genome Biol ; 24(1): 13, 2023 Jan 23.
Article in English | MEDLINE | ID: mdl-36683094

ABSTRACT

BACKGROUND: The Australian black swan (Cygnus atratus) is an iconic species with contrasting plumage to that of the closely related northern hemisphere white swans. The relative geographic isolation of the black swan may have resulted in a limited immune repertoire and increased susceptibility to infectious diseases, notably infectious diseases from which Australia has been largely shielded. Unlike mallard ducks and the mute swan (Cygnus olor), the black swan is extremely sensitive to highly pathogenic avian influenza. Understanding this susceptibility has been impaired by the absence of any available swan genome and transcriptome information. RESULTS: Here, we generate the first chromosome-length black and mute swan genomes annotated with transcriptome data, all using long-read based pipelines generated for vertebrate species. We use these genomes and transcriptomes to show that unlike other wild waterfowl, black swans lack an expanded immune gene repertoire, lack a key viral pattern-recognition receptor in endothelial cells and mount a poorly controlled inflammatory response to highly pathogenic avian influenza. We also implicate genetic differences in SLC45A2 gene in the iconic plumage of the black swan. CONCLUSION: Together, these data suggest that the immune system of the black swan is such that should any avian viral infection become established in its native habitat, the black swan would be in a significant peril.


Subject(s)
Anseriformes , Influenza in Birds , Animals , Transcriptome , Endothelial Cells , Australia
9.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36420896

ABSTRACT

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
Computational Biology , Genome, Human , Humans , Animals , Mice , Molecular Sequence Annotation , Computational Biology/methods , Genome, Human/genetics , Transcriptome/genetics , Gene Expression Profiling , Databases, Genetic
10.
Nucleic Acids Res ; 51(D1): D933-D941, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36318249

ABSTRACT

Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.


Subject(s)
Databases, Genetic , Software , Animals , Humans , Molecular Sequence Annotation , Genomics , Genome
11.
BMC Genomics ; 23(1): 344, 2022 May 04.
Article in English | MEDLINE | ID: mdl-35508966

ABSTRACT

BACKGROUND: The gaur (Bos gaurus) is the largest extant wild bovine species, native to South and Southeast Asia, with unique traits, and is listed as vulnerable by the International Union for Conservation of Nature (IUCN). RESULTS: We report the first gaur reference genome and identify three biological pathways including lysozyme activity, proton transmembrane transporter activity, and oxygen transport with significant changes in gene copy number in gaur compared to other mammals. These may reflect adaptation to challenges related to climate and nutrition. Comparative analyses with domesticated indicine (Bos indicus) and taurine (Bos taurus) cattle revealed genomic signatures of artificial selection, including the expansion of sperm odorant receptor genes in domesticated cattle, which may have important implications for understanding selection for male fertility. CONCLUSIONS: Apart from aiding dissection of economically important traits, the gaur genome will also provide the foundation to conserve the species.


Subject(s)
Receptors, Odorant , Animals , Cattle/genetics , Genome , Genomics , Male , Mammals , Receptors, Odorant/genetics , Spermatozoa , Zona Pellucida Glycoproteins
12.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34634797

ABSTRACT

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Subject(s)
COVID-19/virology , Databases, Genetic , SARS-CoV-2/genetics , Web Browser , Coronaviridae/genetics , Genetic Variation , Genome, Viral , Humans , Molecular Sequence Annotation
13.
Front Genet ; 12: 650228, 2021.
Article in English | MEDLINE | ID: mdl-33995484

ABSTRACT

Genome assembly is cheaper, more accurate and more automated than it has ever been. This is due to a combination of more cost-efficient chemistries, new sequencing technologies and better algorithms. The livestock community has been at the forefront of this new wave of genome assembly, generating some of the highest quality vertebrate genome sequences. Ensembl's goal is to add functional and comparative annotation to these genomes, through our gene annotation, genomic alignments, gene trees, regulatory, and variation data. We run computationally complex analyses in a high throughput and consistent manner to help accelerate downstream science. Our livestock resources are continuously growing in both breadth and depth. We annotate reference genome assemblies for newly sequenced species and regularly update annotation for existing genomes. We are the only major resource to support the annotation of breeds and other non-reference assemblies. We currently provide resources for 13 pig breeds, maternal and paternal haplotypes for hybrid cattle and various other non-reference or wild type assemblies for livestock species. Here, we describe the livestock data present in Ensembl and provide protocols for how to view data in our genome browser, download via it our FTP site, manipulate it via our tools and interact with it programmatically via our REST API.

14.
Cell Genom ; 1(1): 100002, 2021 Oct 13.
Article in English | MEDLINE | ID: mdl-36777713

ABSTRACT

The kakapo is a flightless parrot endemic to New Zealand. Once common in the archipelago, only 201 individuals remain today, most of them descending from an isolated island population. We report the first genome-wide analyses of the species, including a high-quality genome assembly for kakapo, one of the first chromosome-level reference genomes sequenced by the Vertebrate Genomes Project (VGP). We also sequenced and analyzed 35 modern genomes from the sole surviving island population and 14 genomes from the extinct mainland population. While theory suggests that such a small population is likely to have accumulated deleterious mutations through genetic drift, our analyses on the impact of the long-term small population size in kakapo indicate that present-day island kakapo have a reduced number of harmful mutations compared to mainland individuals. We hypothesize that this reduced mutational load is due to the island population having been subjected to a combination of genetic drift and purging of deleterious mutations, through increased inbreeding and purifying selection, since its isolation from the mainland ∼10,000 years ago. Our results provide evidence that small populations can survive even when isolated for hundreds of generations. This work provides key insights into kakapo breeding and recovery and more generally into the application of genetic tools in conservation efforts for endangered species.

15.
Haematologica ; 106(10): 2613-2623, 2021 10 01.
Article in English | MEDLINE | ID: mdl-32703790

ABSTRACT

Transcriptional profiling of hematopoietic cell subpopulations has helped to characterize the developmental stages of the hematopoietic system and the molecular bases of malignant and non-malignant blood diseases. Previously, only the genes targeted by expression microarrays could be profiled genome-wide. High-throughput RNA sequencing, however, encompasses a broader repertoire of RNA molecules, without restriction to previously annotated genes. We analyzed the BLUEPRINT consortium RNA-sequencing data for mature hematopoietic cell types. The data comprised 90 total RNA-sequencing samples, each composed of one of 27 cell types, and 32 small RNA-sequencing samples, each composed of one of 11 cell types. We estimated gene and isoform expression levels for each cell type using existing annotations from Ensembl. We then used guided transcriptome assembly to discover unannotated transcripts. We identified hundreds of novel non-coding RNA genes and showed that the majority have cell type-dependent expression. We also characterized the expression of circular RNA and found that these are also cell type-specific. These analyses refine the active transcriptional landscape of mature hematopoietic cells, highlight abundant genes and transcriptional isoforms for each blood cell type, and provide a valuable resource for researchers of hematologic development and diseases. Finally, we made the data accessible via a web-based interface: https://blueprint.haem.cam.ac.uk/bloodatlas/.


Subject(s)
RNA, Long Noncoding , Transcriptome , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , RNA, Circular , RNA, Long Noncoding/genetics , Sequence Analysis, RNA
16.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33270111

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics
17.
Nature ; 584(7821): 403-409, 2020 08.
Article in English | MEDLINE | ID: mdl-32760000

ABSTRACT

The tuatara (Sphenodon punctatus)-the only living member of the reptilian order Rhynchocephalia (Sphenodontia), once widespread across Gondwana1,2-is an iconic species that is endemic to New Zealand2,3. A key link to the now-extinct stem reptiles (from which dinosaurs, modern reptiles, birds and mammals evolved), the tuatara provides key insights into the ancestral amniotes2,4. Here we analyse the genome of the tuatara, which-at approximately 5 Gb-is among the largest of the vertebrate genomes yet assembled. Our analyses of this genome, along with comparisons with other vertebrate genomes, reinforce the uniqueness of the tuatara. Phylogenetic analyses indicate that the tuatara lineage diverged from that of snakes and lizards around 250 million years ago. This lineage also shows moderate rates of molecular evolution, with instances of punctuated evolution. Our genome sequence analysis identifies expansions of proteins, non-protein-coding RNA families and repeat elements, the latter of which show an amalgam of reptilian and mammalian features. The sequencing of the tuatara genome provides a valuable resource for deep comparative analyses of tetrapods, as well as for tuatara biology and conservation. Our study also provides important insights into both the technical challenges and the cultural obligations that are associated with genome sequencing.


Subject(s)
Evolution, Molecular , Genome/genetics , Phylogeny , Reptiles/genetics , Animals , Conservation of Natural Resources/trends , Female , Genetics, Population , Lizards/genetics , Male , Molecular Sequence Annotation , New Zealand , Sex Characteristics , Snakes/genetics , Synteny
19.
Gigascience ; 9(6)2020 06 01.
Article in English | MEDLINE | ID: mdl-32543654

ABSTRACT

BACKGROUND: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility. RESULTS: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2. CONCLUSIONS: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.


Subject(s)
Computational Biology/methods , Genome , Genomics/methods , Sequence Analysis, DNA/methods , Sus scrofa/immunology , Animals , Molecular Sequence Annotation , Reproducibility of Results , Research , Swine
20.
Nat Commun ; 11(1): 2071, 2020 04 29.
Article in English | MEDLINE | ID: mdl-32350247

ABSTRACT

Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.


Subject(s)
Cattle/genetics , Genetic Variation , Genome , Haplotypes/genetics , Alleles , Allelic Imbalance , Animals , Base Sequence , Chromosomes, Mammalian/genetics , Female , Genetic Loci , INDEL Mutation/genetics , Male , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Repetitive Sequences, Nucleic Acid/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...