RESUMEN
Mammalian AIM-2-like receptor (ALR) proteins bind nucleic acids and initiate production of type I interferons or inflammasome assembly, thereby contributing to host innate immunity. In mice, the Alr locus is highly polymorphic at the sequence and copy number level, and we show here that it is one of the most dynamic regions of the genome. One rapidly evolving gene within this region, Ifi207, was introduced to the Mus genome by gene conversion or an unequal recombination event a few million years ago. Ifi207 has a large, distinctive repeat region that differs in sequence and length among Mus species and even closely related inbred Mus musculus strains. We show that IFI207 controls murine leukemia virus (MLV) infection in vivo and that it plays a role in the STING-mediated response to cGAMP, dsDNA, DMXXA, and MLV. IFI207 binds to STING, and inclusion of its repeat region appears to stabilize STING protein. The Alr locus and Ifi207 provide a clear example of the evolutionary innovation of gene function, possibly as a result of host-pathogen co-evolution.IMPORTANCEThe Red Queen hypothesis predicts that the arms race between pathogens and the host may accelerate evolution of both sides, and therefore causes higher diversity in virulence factors and immune-related proteins, respectively . The Alr gene family in mice has undergone rapid evolution in the last few million years and includes the creation of two novel members, MndaL and Ifi207. Ifi207, in particular, became highly divergent, with significant genetic changes between highly related inbred mice. IFI207 protein acts in the STING pathway and contributes to anti-retroviral resistance via a novel mechanism. The data show that under the pressure of host-pathogen coevolution in a dynamic locus, gene conversion and recombination between gene family members creates new genes with novel and essential functions that play diverse roles in biological processes.
Asunto(s)
Proteínas de la Membrana , Replicación Viral , Animales , Ratones , Evolución Molecular , Interacciones Huésped-Patógeno/genética , Inmunidad Innata , Virus de la Leucemia Murina/genética , Virus de la Leucemia Murina/fisiología , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Ratones Endogámicos C57BL , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismoRESUMEN
Summary: Semantic ontology mapping of clinical descriptors with disease outcome is essential. ClinVar is a key resource for human variation with known clinical significance. We present CMAT, a software toolkit and curation protocol for accurately enriching ClinVar releases with disease ontology associations and complex functional consequences. Availability and implementation: The software and ontology mappings can be obtained from: https://github.com/EBIvariation/CMAT.
RESUMEN
Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP's comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results.
Asunto(s)
Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/genética , Acceso a la Información , Estudios Prospectivos , Genómica/métodos , FenotipoRESUMEN
Mammalian ALR proteins bind nucleic acids and initiate production of type I interferons or inflammasome assembly, thereby contributing to host innate immunity. ALR s are encoded at a single genetic locus. In mice, the Alr locus is highly polymorphic at the sequence and copy number level. We suggest that one rapidly evolving member of the Alr family, Ifi207 , was introduced to the Mus genome by a recent recombination event. Ifi207 has a large, distinctive repeat region that differs in sequence and length in different Mus strains. We show that IFI207 plays a key role in the STING-mediated response to cGAMP, DNA, and MLV, and that IFI207 controls MLV infection in vivo. Uniquely, IFI207 acts by stabilizing STING protein via its repeat region. Our studies suggest that under the pressure of host-pathogen coevolution, in a dynamic locus such as the Alr , recombination between gene family members creates new genes with novel and essential functions that play diverse roles in biological processes.
RESUMEN
The European Variation Archive (EVA; https://www.ebi.ac.uk/eva/) is a resource for sharing all types of genetic variation data (SNPs, indels, and structural variants) for all species. The EVA was created in 2014 to provide FAIR access to genetic variation data and has since grown to be a primary resource for genomic variants hosting >3 billion records. The EVA and dbSNP have established a compatible global system to assign unique identifiers to all submitted genetic variants. The EVA is active within the Global Alliance of Genomics and Health (GA4GH), maintaining, contributing and implementing standards such as VCF, Refget and Variant Representation Specification (VRS). In this article, we describe the submission and permanent accessioning services along with the different ways the data can be retrieved by the scientific community.
Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Variación Genética/genética , Programas Informáticos , Animales , Variación Estructural del Genoma/genética , Genómica , Humanos , Mutación INDEL/genética , Anotación de Secuencia Molecular , Polimorfismo de Nucleótido Simple/genéticaAsunto(s)
Biología Computacional/métodos , Metabolómica/métodos , Proteómica/métodos , Animales , HumanosRESUMEN
MOTIVATION: Reference sequences are essential in creating a baseline of knowledge for many common bioinformatics methods, especially those using genomic sequencing. RESULTS: We have created refget, a Global Alliance for Genomics and Health API specification to access reference sequences and sub-sequences using an identifier derived from the sequence itself. We present four reference implementations across in-house and cloud infrastructure, a compliance suite and a web report used to ensure specification conformity across implementations. AVAILABILITY AND IMPLEMENTATION: The refget specification can be found at: https://w3id.org/ga4gh/refget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica , Programas InformáticosRESUMEN
MOTIVATION: The majority of genome analysis tools and pipelines require data to be decrypted for access. This potentially leaves sensitive genetic data exposed, either because the unencrypted data is not removed after analysis, or because the data leaves traces on the permanent storage medium. RESULTS: : We defined a file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF. By standardizing this format, we show how it can be added as a native file format to genomic libraries, enabling direct analysis of encrypted data without the need to create a decrypted copy. AVAILABILITY AND IMPLEMENTATION: The Crypt4GH specification can be found at: http://samtools.github.io/hts-specs/crypt4gh.pdf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.
Asunto(s)
Seudogenes/genética , Transcripción Genética , Animales , Secuencia Conservada/genética , Evolución Molecular , Ontología de Genes , Genoma , Humanos , Ratones Endogámicos C57BL , Anotación de Secuencia Molecular , Especificidad de la EspecieRESUMEN
BACKGROUND: Colorectal cancer (CRC) is a multifactorial disease resulting from both genetic predisposition and environmental factors including the gut microbiota (GM), but deciphering the influence of genetic variants, environmental variables, and interactions with the GM is exceedingly difficult. We previously observed significant differences in intestinal adenoma multiplicity between C57BL/6 J-ApcMin (B6-Min/J) from The Jackson Laboratory (JAX), and original founder strain C57BL/6JD-ApcMin (B6-Min/D) from the University of Wisconsin. METHODS: To resolve genetic and environmental interactions and determine their contributions we utilized two genetically inbred, independently isolated ApcMin mouse colonies that have been separated for over 20 generations. Whole genome sequencing was used to identify genetic variants unique to the two substrains. To determine the influence of genetic variants and the impact of differences in the GM on phenotypic variability, we used complex microbiota targeted rederivation to generate two Apc mutant mouse colonies harboring complex GMs from two different sources (GMJAX originally from JAX or GMHSD originally from Envigo), creating four ApcMin groups. Untargeted metabolomics were used to characterize shifts in the fecal metabolite profile based on genetic variation and differences in the GM. RESULTS: WGS revealed several thousand high quality variants unique to the two substrains. No homozygous variants were present in coding regions, with the vast majority of variants residing in noncoding regions. Host genetic divergence between Min/J and Min/D and the complex GM additively determined differential adenoma susceptibility. Untargeted metabolomics revealed that both genetic lineage and the GM collectively determined the fecal metabolite profile, and that each differentially regulates bile acid (BA) metabolism. Metabolomics pathway analysis facilitated identification of a functionally relevant private noncoding variant associated with the bile acid transporter Fatty acid binding protein 6 (Fabp6). Expression studies demonstrated differential expression of Fabp6 between Min/J and Min/D, and the variant correlates with adenoma multiplicity in backcrossed mice. CONCLUSIONS: We found that both genetic variation and differences in microbiota influences the quantitiative adenoma phenotype in ApcMin mice. These findings demonstrate how the use of metabolomics datasets can aid as a functional genomic tool, and furthermore illustrate the power of a multi-omics approach to dissect complex disease susceptibility of noncoding variants.
Asunto(s)
Adenoma/genética , Neoplasias Colorrectales/genética , Microbioma Gastrointestinal/fisiología , Predisposición Genética a la Enfermedad , Adenoma/metabolismo , Adenoma/microbiología , Proteína de la Poliposis Adenomatosa del Colon/genética , Alelos , Animales , Neoplasias Colorrectales/metabolismo , Neoplasias Colorrectales/microbiología , Modelos Animales de Enfermedad , Femenino , Humanos , Masculino , Metabolómica , Metagenómica , Ratones , MutaciónRESUMEN
For over a century, mice have been used to model human disease, leading to many fundamental discoveries about mammalian biology and the development of new therapies. Mouse genetics research has been further catalysed by a plethora of genomic resources developed in the last 20 years, including the genome sequence of C57BL/6J and more recently the first draft reference genomes for 16 additional laboratory strains. Collectively, the comparison of these genomes highlights the extreme diversity that exists at loci associated with the immune system, pathogen response, and key sensory functions, which form the foundation for dissecting phenotypic traits in vivo. We review the current status of the mouse genome across the diversity of the mouse lineage and discuss the value of mice to understanding human disease.
Asunto(s)
Animales Endogámicos/genética , Genoma/genética , Genómica , Animales , Mapeo Cromosómico , Haplotipos , Humanos , Endogamia , Ratones , FenotipoRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Asunto(s)
Investigación Biomédica , Genoma Humano , Proyecto Genoma Humano , Europa (Continente) , HumanosRESUMEN
Summary: Standardized interfaces for efficiently accessing high-throughput sequencing data are a fundamental requirement for large-scale genomic data sharing. We have developed htsget, a protocol for secure, efficient and reliable access to sequencing read and variation data. We demonstrate four independent client and server implementations, and the results of a comprehensive interoperability demonstration. Availability and implementation: http://samtools.github.io/hts-specs/htsget.html. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , GenomaRESUMEN
Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.
Asunto(s)
Mapeo Contig/métodos , Secuenciación Completa del Genoma/métodos , Animales , Mapeo Contig/normas , Ratones , Estándares de Referencia , Secuenciación Completa del Genoma/normasRESUMEN
ZIC2 mutation is known to cause holoprosencephaly (HPE). A subset of ZIC2 HPE probands harbour cardiovascular and visceral anomalies suggestive of laterality defects. 3D-imaging of novel mouse Zic2 mutants uncovers, in addition to HPE, laterality defects in lungs, heart, vasculature and viscera. A strong bias towards right isomerism indicates a failure to establish left identity in the lateral plate mesoderm (LPM), a phenotype that cannot be explained simply by the defective ciliogenesis previously noted in Zic2 mutants. Gene expression analysis showed that the left-determining NODAL-dependent signalling cascade fails to be activated in the LPM, and that the expression of Nodal at the node, which normally triggers this event, is itself defective in these embryos. Analysis of ChiP-seq data, in vitro transcriptional assays and mutagenesis reveals a requirement for a low-affinity ZIC2 binding site for the activation of the Nodal enhancer HBE, which is normally active in node precursor cells. These data show that ZIC2 is required for correct Nodal expression at the node and suggest a model in which ZIC2 acts at different levels to establish LR asymmetry, promoting both the production of the signal that induces left side identity and the morphogenesis of the cilia that bias its distribution.
Asunto(s)
Mesodermo/embriología , Morfogénesis , Proteína Nodal/metabolismo , Proteínas Nucleares/fisiología , Factores de Transcripción/fisiología , Animales , Tipificación del Cuerpo , Cilios , Holoprosencefalia/genética , Ratones , Mutación , Proteínas Nucleares/genética , Fenotipo , Transducción de Señal , Factores de Transcripción/genéticaRESUMEN
In the supplementary information PDF originally posted, there were discrepancies from the integrated supplementary information that appeared in the HTML; the former has been corrected as follows. In the legend to Supplementary Fig. 2c, "major organs of the mouse" has been changed to "major organs of the adult mouse." In the legend to Supplementary Fig. 6d,h, "At E14.5 Mbe/Mbe mutants have a smaller percentage of Brdu positive cells in bin 3" has been changed to "At E14.5 Mbe/Mbe mutants have a higher percentage of Brdu positive cells in bin 3."
RESUMEN
Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.
Asunto(s)
Evolución Molecular , Genoma/genética , Muridae/genética , Filogenia , Animales , Sitios de Unión , Factor de Unión a CCCTC/genética , Cromosomas/genética , Cariotipificación/métodos , Elementos de Nucleótido Esparcido Largo/genética , Ratones , Retroelementos/genética , Especificidad de la EspecieRESUMEN
The formation of the vertebrate brain requires the generation, migration, differentiation and survival of neurons. Genetic mutations that perturb these critical cellular events can result in malformations of the telencephalon, providing a molecular window into brain development. Here we report the identification of an N-ethyl-N-nitrosourea-induced mouse mutant characterized by a fractured hippocampal pyramidal cell layer, attributable to defects in neuronal migration. We show that this is caused by a hypomorphic mutation in Vps15 that perturbs endosomal-lysosomal trafficking and autophagy, resulting in an upregulation of Nischarin, which inhibits Pak1 signaling. The complete ablation of Vps15 results in the accumulation of autophagic substrates, the induction of apoptosis and severe cortical atrophy. Finally, we report that mutations in VPS15 are associated with cortical atrophy and epilepsy in humans. These data highlight the importance of the Vps15-Vps34 complex and the Nischarin-Pak1 signaling hub in the development of the telencephalon.
Asunto(s)
Movimiento Celular/genética , Regulación del Desarrollo de la Expresión Génica/efectos de los fármacos , Mutación/efectos de los fármacos , Trastornos del Neurodesarrollo , Neuronas/patología , ATPasas de Translocación de Protón Vacuolares/genética , Alquilantes/toxicidad , Animales , Animales Recién Nacidos , Atrofia/inducido químicamente , Atrofia/genética , Atrofia/patología , Autofagia/efectos de los fármacos , Autofagia/genética , Encéfalo/efectos de los fármacos , Encéfalo/patología , Movimiento Celular/efectos de los fármacos , Modelos Animales de Enfermedad , Embrión de Mamíferos , Etilnitrosourea/toxicidad , Femenino , Regulación del Desarrollo de la Expresión Génica/genética , Humanos , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Trastornos del Neurodesarrollo/inducido químicamente , Trastornos del Neurodesarrollo/diagnóstico por imagen , Trastornos del Neurodesarrollo/genética , Trastornos del Neurodesarrollo/patología , Neuronas/efectos de los fármacos , Neuronas/ultraestructura , Transducción de Señal/efectos de los fármacos , Transducción de Señal/genética , ATPasas de Translocación de Protón Vacuolares/efectos de los fármacosRESUMEN
Armillaria mellea is a major plant pathogen. Yet, the strategies the organism uses to infect susceptible species, degrade lignocellulose and other plant material and protect itself against plant defences and its own glycodegradative arsenal are largely unknown. Here, we use a combination of gel and MS-based proteomics to profile A. mellea under conditions of oxidative stress and changes in growth matrix. 2-DE and LC-MS/MS were used to investigate the response of A. mellea to H2O2 and menadione/FeCl3 exposure, respectively. Several proteins were detected with altered abundance in response to H2O2, but not menadione/FeCl3 (i.e., valosin-containing protein), indicating distinct responses to these different forms of oxidative stress. One protein, cobalamin-independent methionine synthase, demonstrated a common response in both conditions, which may be a marker for a more general stress response mechanism. Further changes to the A. mellea proteome were investigated using MS-based proteomics, which identified changes to putative secondary metabolism (SM) enzymes upon growth in agar compared to liquid cultures. Metabolomic analyses revealed distinct profiles, highlighting the effect of growth matrix on SM production. This establishes robust methods by which to utilize comparative proteomics to characterize this important phytopathogen.