RESUMEN
Hundreds of inbred mouse strains and intercross populations have been used to characterize the function of genetic variants that contribute to disease. Thousands of disease-relevant traits have been characterized in mice and made publicly available. New strains and populations including consomics, the collaborative cross, expanded BXD, and inbred wild-derived strains add to existing complex disease mouse models, mapping populations, and sensitized backgrounds for engineered mutations. The genome sequences of inbred strains, along with dense genotypes from others, enable integrated analysis of trait-variant associations across populations, but these analyses are hampered by the sparsity of genotypes available. Moreover, the data are not readily interoperable with other resources. To address these limitations, we created a uniformly dense variant resource by harmonizing multiple data sets. Missing genotypes were imputed using the Viterbi algorithm with a data-driven technique that incorporates local phylogenetic information, an approach that is extendable to other model organisms. The result is a web- and programmatically accessible data service called GenomeMUSter, comprising single-nucleotide variants covering 657 strains at 106.8 million segregating sites. Interoperation with phenotype databases, analytic tools, and other resources enable a wealth of applications, including multitrait, multipopulation meta-analysis. We show this in cross-species comparisons of type 2 diabetes and substance use disorder meta-analyses, leveraging mouse data to characterize the likely role of human variant effects in disease. Other applications include refinement of mapped loci and prioritization of strain backgrounds for disease modeling to further unlock extant mouse diversity for genetic and genomic studies in health and disease.
Asunto(s)
Diabetes Mellitus Tipo 2 , Humanos , Ratones , Animales , Filogenia , Genotipo , Ratones Endogámicos , Fenotipo , Mutación , Variación GenéticaRESUMEN
Motivation: Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results: Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects. Availability and implementation: EMASE software is available at https://github.com/churchill-lab/emase. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Alelos , Empalme Alternativo , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Transcriptoma , Animales , Genómica/métodos , Masculino , RatonesRESUMEN
The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.
Asunto(s)
Algoritmos , Evolución Molecular , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Automático , Proteínas/genética , Homología de Secuencia de Aminoácido , Animales , Especiación Genética , Variación Genética/genética , Humanos , Reconocimiento de Normas Patrones Automatizadas/métodos , Programas InformáticosRESUMEN
Nanoporous anodic aluminum oxide layers were fabricated on aluminum substrates with systematically varied pore diameters (20-80 nm) and oxide thicknesses (150-500 nm) by controlling the anodizing voltage and time and subsequent pore-widening process conditions. The porous nanostructures were then coated with a thin (only a couple of nanometers thick) Teflon film to make the surface hydrophobic and trap air in the pores. The corrosion resistance of the aluminum substrate was evaluated by a potentiodynamic polarization measurement in 3.5 wt % NaCl solution (saltwater). Results showed that the hydrophobic nanoporous anodic aluminum oxide layer significantly enhanced the corrosion resistance of the aluminum substrate compared to a hydrophilic oxide layer of the same nanostructures, to bare (nonanodized) aluminum with only a natural oxide layer on top, and to the latter coated with a thin Teflon film. The hydrophobic nanoporous anodic aluminum oxide layer with the largest pore diameter and the thickest oxide layer (i.e., the maximized air fraction) resulted in the best corrosion resistance with a corrosion inhibition efficiency of up to 99% for up to 7 days. The results demonstrate that the air impregnating the hydrophobic nanopores can effectively inhibit the penetration of corrosive media into the pores, leading to a significant improvement in corrosion resistance.
Asunto(s)
Aire , Óxido de Aluminio/química , Aluminio/química , Corrosión , Electrodos , Nanoporos , Interacciones Hidrofóbicas e Hidrofílicas , Microscopía Electrónica de RastreoRESUMEN
Control of the morphology and hierarchy of the nanopore structures of anodic alumina is investigated by employing stepwise anodizing processes, alternating the two different anodizing modes, including mild anodization (MA) and hard anodization (HA), which are further mediated by a pore-widening (PW) step in between. For the experiment, the MA and HA are applied at the anodizing voltages of 40 and 100 V, respectively, in 0.3 M oxalic acid, at 1 °C, for fixed durations (30 min for MA and 0.5 min for HA), while the intermediate PW is applied in 0.1 M phosphoric acid at 30 °C for different durations. In particular, to examine the effects of the anodizing sequence and the PW time on the morphology and hierarchy of the nanopore structures formed, the stepwise anodization is conducted in two different ways: one with no PW step, such as MAâHA and HAâMA, and the other with the timed PW in between, such as MAâPWâMA, MAâPWâHA, HAâPWâHA, and HAâPWâMA. The results show that both the sequence of the voltage-modulated anodizing modes and the application of the intermediate PW step led to unique three-dimensional morphology and hierarchy of the nanopore structures of the anodic alumina beyond the conventional two-dimensional cylindrical pore geometry. It suggests that the stepwise anodizing process regulated by the sequence of the anodizing modes and the intermediate PW step can allow the design and fabrication of various types of nanopore structures, which can broaden the applications of the nanoporous anodic alumina with greater efficacy and versatility.
RESUMEN
Hundreds of inbred laboratory mouse strains and intercross populations have been used to functionalize genetic variants that contribute to disease. Thousands of disease relevant traits have been characterized in mice and made publicly available. New strains and populations including the Collaborative Cross, expanded BXD and inbred wild-derived strains add to set of complex disease mouse models, genetic mapping resources and sensitized backgrounds against which to evaluate engineered mutations. The genome sequences of many inbred strains, along with dense genotypes from others could allow integrated analysis of trait - variant associations across populations, but these analyses are not feasible due to the sparsity of genotypes available. Moreover, the data are not readily interoperable with other resources. To address these limitations, we created a uniformly dense data resource by harmonizing multiple variant datasets. Missing genotypes were imputed using the Viterbi algorithm with a data-driven technique that incorporates local phylogenetic information, an approach that is extensible to other model organism species. The result is a web- and programmatically-accessible data service called GenomeMUSter ( https://muster.jax.org ), comprising allelic data covering 657 strains at 106.8M segregating sites. Interoperation with phenotype databases, analytic tools and other resources enable a wealth of applications including multi-trait, multi-population meta-analysis. We demonstrate this in a cross-species comparison of the meta-analysis of Type 2 Diabetes and of substance use disorders, resulting in the more specific characterization of the role of human variant effects in light of mouse phenotype data. Other applications include refinement of mapped loci and prioritization of strain backgrounds for disease modeling to further unlock extant mouse diversity for genetic and genomic studies in health and disease.
RESUMEN
BACKGROUND: High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. RESULTS: We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO) probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno) that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. CONCLUSION: The problems of ascertainment bias and missing information due to genotyping errors are widely recognized as limiting factors in genetic studies. We have conducted the first formal analysis of the effect of novel variants on genotyping arrays, and we have shown that these variants account for a large portion of miscalled and uncalled genotypes. Genetic studies will benefit from substantial improvements in the accuracy of their results by incorporating VINOs in their analyses.
Asunto(s)
Estudio de Asociación del Genoma Completo , Hibridación de Ácido Nucleico , Sondas de Oligonucleótidos/química , Algoritmos , Animales , Bovinos , Análisis por Conglomerados , Perros , Genotipo , Haplotipos , Humanos , Ratones , Polimorfismo de Nucleótido Simple , Programas InformáticosRESUMEN
Gait and posture are often perturbed in many neurological, neuromuscular, and neuropsychiatric conditions. Rodents provide a tractable model for elucidating disease mechanisms and interventions. Here, we develop a neural-network-based assay that adopts the commonly used open field apparatus for mouse gait and posture analysis. We quantitate both with high precision across 62 strains of mice. We characterize four mutants with known gait deficits and demonstrate that multiple autism spectrum disorder (ASD) models show gait and posture deficits, implying this is a general feature of ASD. Mouse gait and posture measures are highly heritable and fall into three distinct classes. We conduct a genome-wide association study to define the genetic architecture of stride-level mouse movement in the open field. We provide a method for gait and posture extraction from the open field and one of the largest laboratory mouse gait and posture data resources for the research community.
Asunto(s)
Marcha/genética , Marcha/fisiología , Equilibrio Postural/fisiología , Animales , Trastorno del Espectro Autista/genética , Trastorno del Espectro Autista/fisiopatología , Aprendizaje Profundo , Conducta Exploratoria , Estudio de Asociación del Genoma Completo/métodos , Ratones , Movimiento/fisiología , Red Nerviosa/fisiología , Prueba de Campo Abierto/fisiología , Equilibrio Postural/genéticaRESUMEN
Noninvasive live imaging has been used extensively for ocular phenotyping in mouse vision research. Bright-field imaging and optical coherence tomography (OCT) are two methods that are particularly useful for assessing the posterior mouse eye (fundus), including the retina, retinal pigment epithelium, and choroid, and are widely applied due to the commercial availability of sophisticated instruments and software. Here, we provide a guide to using these approaches with an emphasis on post-acquisition image processing using Fiji, a bundled version of the Java-based public domain software ImageJ. A bright-field fundus imaging protocol is described for acquisition of multi-frame videos, followed by image registration to reduce motion artifacts, averaging to reduce noise, shading correction to compensate for uneven illumination, filtering to improve image detail, and rotation to adjust orientation. An OCT imaging protocol is described for acquiring replicate volume scans, with subsequent registration and averaging to yield three-dimensional datasets that show reduced motion artifacts and enhanced detail. The Fiji algorithms used in these protocols are designed for batch processing and are freely available. The image acquisition and processing approaches described here may facilitate quantitative phenotyping of the mouse eye in drug discovery, mutagenesis screening, and the functional cataloging of mouse genes by individual laboratories and large-scale projects, such as the Knockout Mouse Phenotyping Project and International Mouse Phenotyping Consortium.
Asunto(s)
Procesamiento de Imagen Asistido por Computador/instrumentación , Retina/diagnóstico por imagen , Tomografía de Coherencia Óptica/métodos , Algoritmos , Animales , Fondo de Ojo , Ratones , Ratones Noqueados , Modelos Animales , Programas Informáticos , Tomografía de Coherencia Óptica/instrumentaciónRESUMEN
Asthma is a common chronic respiratory disease characterized by airway hyperresponsiveness (AHR). The genetics of asthma have been widely studied in mouse and human, and homologous genomic regions have been associated with mouse AHR and human asthma-related phenotypes. Our goal was to identify asthma-related genes by integrating AHR associations in mouse with human genome-wide association study (GWAS) data. We used Efficient Mixed Model Association (EMMA) analysis to conduct a GWAS of baseline AHR measures from males and females of 31 mouse strains. Genes near or containing SNPs with EMMA p-values <0.001 were selected for further study in human GWAS. The results of the previously reported EVE consortium asthma GWAS meta-analysis consisting of 12,958 diverse North American subjects from 9 study centers were used to select a subset of homologous genes with evidence of association with asthma in humans. Following validation attempts in three human asthma GWAS (i.e., Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG) and two human AHR GWAS (i.e., SHARP, DAG), the Kv channel interacting protein 4 (KCNIP4) gene was identified as nominally associated with both asthma and AHR at a gene- and SNP-level. In EVE, the smallest KCNIP4 association was at rs6833065 (P-value 2.9e-04), while the strongest associations for Sepracor/LOCCS/LODO/Illumina, GABRIEL, DAG were 1.5e-03, 1.0e-03, 3.1e-03 at rs7664617, rs4697177, rs4696975, respectively. At a SNP level, the strongest association across all asthma GWAS was at rs4697177 (P-value 1.1e-04). The smallest P-values for association with AHR were 2.3e-03 at rs11947661 in SHARP and 2.1e-03 at rs402802 in DAG. Functional studies are required to validate the potential involvement of KCNIP4 in modulating asthma susceptibility and/or AHR. Our results suggest that a useful approach to identify genes associated with human asthma is to leverage mouse AHR association data.
Asunto(s)
Asma/genética , Proteínas de Interacción con los Canales Kv/genética , Polimorfismo de Nucleótido Simple , Animales , Secuencia de Bases , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Ratones , FenotipoRESUMEN
Quantitative trait locus (QTL) analysis is a statistical method to link phenotypes with regions of the genome that affect the phenotypes in a mapping population. R/qtl is a powerful statistical program commonly used for analyzing rodent QTL crosses, but R/qtl is a command line program that can be difficult for novice users to run. J/qtl was developed as an R/qtl graphical user interface that enables even novice users to utilize R/qtl for QTL analyses. In this chapter, we describe the process for analyzing rodent cross data with J/qtl, including data formatting, data quality control, main scan QTL analysis, pair scan QTL analysis, and multiple regression modeling; this information should enable new users to identify QTL affecting phenotypes of interest within their rodent cross datasets.
Asunto(s)
Mapeo Cromosómico/métodos , Sitios de Carácter Cuantitativo/genética , Programas Informáticos , Algoritmos , Animales , Humanos , Control de Calidad , Análisis de RegresiónRESUMEN
The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students.
Asunto(s)
Biología/educación , Biología Computacional/educación , Curriculum/normas , Instituciones Académicas/normas , Biología/normas , HumanosRESUMEN
This essay describes how in the 1890s the Committee of Ten arrived at their recommendations about the organization of the high school biological sciences and seeks to correct the frequently held, but erroneous view that the Committee of Ten was the initiator of the Biology-Chemistry-Physics order of teaching sciences prevalent in high schools today. The essay details the factors underlying the changing views of high school biology from its "natural history" origins, through its "zoology, botany, physiology" disciplinary phase to its eventual integration into a "general biology" course. The simultaneous parallel development of the "Carnegie Unit" for measuring coursework is highlighted as a significant contributor in the evolution of the present day high school biology course. The essay concludes with a discussion of the implications of the grade placement of the sciences for the future development of high school biology.