RESUMEN
The strength of the stalk rind, measured as rind penetrometer resistance (RPR), is an important contributor to stalk lodging resistance. To enhance the genetic architecture of RPR, we combined selection mapping on populations developed by 15 cycles of divergent selection for high and low RPR with time-course transcriptomic and metabolic analyses of the stalks. Divergent selection significantly altered allele frequencies of 3,656 and 3,412 single- nucleotide polymorphisms (SNPs) in the high and low RPR populations, respectively. Surprisingly, only 110 (1.56%) SNPs under selection were common in both populations, while the majority (98.4%) were unique to each population. This result indicated that high and low RPR phenotypes are produced by biologically distinct mechanisms. Remarkably, regions harboring lignin and polysaccharide genes were preferentially selected in high and low RPR populations, respectively. The preferential selection was manifested as higher lignification and increased saccharification of the high and low RPR stalks, respectively. The evolution of distinct gene classes according to the direction of selection was unexpected in the context of parallel evolution and demonstrated that selection for a trait, albeit in different directions, does not necessarily act on the same genes. Tricin, a grass-specific monolignol that initiates the incorporation of lignin in the cell walls, emerged as a key determinant of RPR. Integration of selection mapping and transcriptomic analyses with published genetic studies of RPR identified several candidate genes including ZmMYB31, ZmNAC25, ZmMADS1, ZmEXPA2, ZmIAA41 and hk5. These findings provide a foundation for an enhanced understanding of RPR and the improvement of stalk lodging resistance.
Asunto(s)
Zea mays/genética , Pared Celular/metabolismo , Evolución Molecular , Perfilación de la Expresión Génica , Frecuencia de los Genes , Metabolómica , Polimorfismo de Nucleótido Simple/genética , Carácter Cuantitativo Heredable , Zea mays/anatomía & histologíaRESUMEN
Genome-wide association studies (GWAS) have identified loci linked to hundreds of traits in many different species. Yet, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in nonhuman, nonmodel species, where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes. We developed a computational approach, Camoco, that integrates loci identified by GWAS with functional information derived from gene coexpression networks. Using Camoco, we prioritized candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize (Zea mays) seeds. Strikingly, we observed a strong dependence in the performance of our approach based on the type of coexpression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, roots that are the primary elemental uptake and delivery system) outperformed other alternative networks. Two candidate genes identified by our approach were validated using mutants. Our study demonstrates that coexpression networks provide a powerful basis for prioritizing candidate causal genes from GWAS loci but suggests that the success of such strategies can highly depend on the gene expression data context. Both the software and the lessons on integrating GWAS data with coexpression networks generalize to species beyond maize.
Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Zea mays/genética , Desequilibrio de Ligamiento/genética , Programas InformáticosRESUMEN
High-quality genomic tools have been integral in understanding genomic architecture and function in the modern-day horse. The equine genetics community has a long tradition of pooling resources to develop genomic tools. Since the equine genome was sequenced in 2006, several iterations of high throughput genotyping arrays have been developed and released, enabling rapid and cost-effective genotyping. This review highlights the design considerations of each iteration, focusing on data available during development and outlining considerations in selecting the genetic variants included on each array. Additionally, we outline recent applications of equine genotyping arrays as well as future prospects and applications.
Asunto(s)
Técnicas de Genotipaje/veterinaria , Caballos/genética , Animales , Genómica , Genotipo , Enfermedades de los Caballos/genética , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Severe equine asthma, also known as recurrent airway obstruction (RAO), is a debilitating, performance limiting, obstructive respiratory condition in horses that is phenotypically similar to human asthma. Past genome wide association studies (GWAS) have not discovered coding variants associated with RAO, leading to the hypothesis that causative variant(s) underlying the signals are likely non-coding, regulatory variant(s). Regions of the genome containing variants that influence the number of expressed RNA molecules are expression quantitative trait loci (eQTLs). Variation associated with RAO that also regulates a gene's expression in a disease relevant tissue could help identify candidate genes that influence RAO if that gene's expression is also associated with RAO disease status. RESULTS: We searched for eQTLs by analyzing peripheral blood mononuclear cells (PBMCs) from two half-sib families and one unrelated cohort of 82 European Warmblood horses that were previously treated in vitro with: no stimulation (MCK), lipopolysaccharides (LPS), recombinant cyathostomin antigen (RCA), and hay-dust extract (HDE). We identified high confidence eQTLs that did not violate linear modeling assumptions and were not significant due to single outlier individuals. We identified a mean of 4347 high confidence eQTLs in four treatments of PBMCs, and discovered two trans regulatory hotspots regulating genes involved in related biological pathways. We corroborated previous RAO associated single nucleotide polymorphisms (SNPs), and increased the resolution of past GWAS by analyzing 1,056,195 SNPs in 361 individuals. We identified four RAO-associated SNPs that only regulate gene expression of dexamethasone-induced protein (DEXI), however we found no significant association between DEXI gene expression and presence of RAO. CONCLUSIONS: Thousands of genetic variants regulate gene expression in PBMCs of European Warmblood horses in cis and trans. Most high confidence eSNPs are significantly enriched near the transcription start sites of their target genes. Two trans regulatory hotspots on chromosome 11 and 13 regulate many genes involved in transmembrane cell signaling and neurological development respectively when PBMCs are treated with HDE. None of the top fifteen RAO associated SNPs strongly influence disease status through gene expression regulation.
Asunto(s)
Asma/veterinaria , Perfilación de la Expresión Génica/veterinaria , Enfermedades de los Caballos/genética , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Animales , Asma/inducido químicamente , Asma/genética , Polvo , Regulación de la Expresión Génica , Redes Reguladoras de Genes/efectos de los fármacos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/veterinaria , Enfermedades de los Caballos/inducido químicamente , Caballos , Leucocitos Mononucleares/efectos de los fármacos , Lipopolisacáridos/efectos adversosRESUMEN
BACKGROUND: To date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array. RESULTS: Using whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ~5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ~2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ~670 thousand SNPs (MNEc670k), was designed for genotype imputation. CONCLUSIONS: Here, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.
Asunto(s)
Técnicas de Genotipaje/métodos , Caballos/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple , Animales , Frecuencia de los Genes , Técnicas de Genotipaje/normas , Desequilibrio de Ligamiento , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Estándares de Referencia , Secuenciación Completa del GenomaRESUMEN
A gene's response to an environment is tightly bound to the underlying genetic variation present in an individual's genome and varies greatly depending on the tissue it is being expressed in. Gene co-expression networks provide a mechanism to understand and interpret the collective transcriptional responses of genes. Here, we use the Camoco co-expression network framework to characterize the transcriptional landscape of adipose and gluteal muscle tissue in 83 domestic horses (Equus caballus) representing 5 different breeds. In each tissue, gene expression profiles, capturing transcriptional response due to variation across individuals, were used to build two separate, tissue-focused, genotypically-diverse gene co-expression networks. The aim of our study was to identify significantly co-expressed clusters of genes in each tissue, then compare the clusters across networks to quantify the extent that clusters were found in both networks as well as to identify clusters found in a single network. The known and unknown functions for each network were quantified using complementary, supervised and unsupervised approaches. First, supervised ontological enrichment was utilized to quantify biological functions represented by each network. Curated ontologies (GO and KEGG) were used to measure the known biological functions present in each tissue. Overall, a large percentage of terms (40.3% of GO and 41% of KEGG) were co-expressed in at least one tissue. Many terms were co-expressed in both tissues, however a small proportion of terms exhibited single tissue co-expression suggesting functional differentiation based on curated, functional annotation. To complement this, an unsupervised approach not relying on ontologies was employed. Strongly co-expressed sets of genes defined by Markov clustering identified sets of unannotated genes showing similar patterns of co-expression within a tissue. We compared gene sets across tissues and identified clusters of genes the either segregate in co-expression by tissue or exhibit high levels of co-expression in both tissues. Clusters were also integrated with GO and KEGG ontologies to identify gene sets containing previously curated annotations versus unannotated gene sets indicating potentially novel biological function. Coupling together these transcriptional datasets, we mapped the transcriptional landscape of muscle and adipose setting up a generalizable framework for interpreting gene function for additional tissues in the horse and other species.
RESUMEN
Landscape genetics is an emerging discipline that utilizes environmental and historical data to understand geographic patterns of genetic diversity. Niche modelling has added a new dimension to such efforts by allowing species-environmental associations to be projected into the past so that hypotheses about historical vicariance can be generated and tested independently with genetic data. However, previous approaches have primarily utilized DNA sequence data to test inferences about historical isolation and may have missed very recent episodes of environmentally mediated divergence. We type 15 microsatellite loci in California mule deer and identify five genetic groupings through a Structure analysis that are also well predicted by environmental data. We project the niches of these five deer ecotypes to the last glacial maximum (LGM) and show they overlap to a much greater extent than today, suggesting that vicariance associated with the LGM cannot explain the present-day genetic patterns. Further, we analyse mitochondrial DNA (mtDNA) sequence trees to search for evidence of historical vicariance and find only two well-supported clades. A coalescence-based analysis of mtDNA data shows that the genetic divergence of the mule deer genetic clusters in California is recent and appears to be mediated by ecological factors. The importance of environmental factors in explaining the genetic diversity of California mule deer is unexpected given that they are highly mobile species and have a broad habitat distribution. Geographic differences in the timing of reproduction and peak vegetation as well as habitat choice reflecting natal origin may explain the persistence of genetic subdivision.
Asunto(s)
Ciervos/genética , Variación Genética , Genética de Población , Animales , California , Análisis por Conglomerados , ADN Mitocondrial/genética , Ecosistema , Ambiente , Evolución Molecular , Femenino , Geografía , Haplotipos , Masculino , Repeticiones de Microsatélite , Modelos Genéticos , Dinámica Poblacional , Alineación de Secuencia , Análisis de Secuencia de ADNRESUMEN
Selective breeding for athletic performance in various disciplines has resulted in population stratification within the American Quarter Horse (QH) breed. The goals of this study were to utilize high density genotype data to: (1) identify genomic regions undergoing positive selection within and among QH subpopulations; (2) investigate haplotype structure within each QH subpopulation; and (3) identify candidate genes within genomic regions of interest (ROI), as well as biological pathways, predicted to play a role in elite performance in each group. For that, 65K SNP genotyping data on 143 elite individuals from 6 QH subpopulations (cutting, halter, racing, reining, western pleasure, and working cow) were imputed to 2M SNPs. Signatures of selection were identified using FST-based (di ) and haplotype-based (hapFLK) analyses, accompanied by identification of local haplotype structure and sharing within subpopulations (hapQTL). Regions undergoing positive selection were identified on all 31 autosomes, and ROI on 2 chromosomes were identified by all 3 methods combined. Genes within each ROI were retrieved and used to identify pathways and genes that might contribute to performance in each subpopulation. These included, among others, candidate genes associated with skeletal muscle development, metabolism, and central nervous system development. This work improves our understanding of equine breed development, and provides breeders with a better understanding of how selective breeding impacts the performance of QH populations.
RESUMEN
Co-expression networks have been shown to be a powerful tool for inferring a gene's function when little is known about it. With the advent of next generation sequencing technologies, the construction and analysis of co-expression networks is now possible in non-model species, including those with agricultural importance. Here, we review fundamental concepts in the construction and application of co-expression networks with a focus on agricultural crops. We survey past and current applications of co-expression network analysis in several agricultural species and provide perspective on important considerations that arise when analyzing network relationships. We conclude with a perspective on future directions and potential challenges of utilizing this powerful approach in crops. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Asunto(s)
Productos Agrícolas/genética , Regulación de la Expresión Génica de las Plantas/genética , Redes Reguladoras de Genes/genética , Agricultura/métodosRESUMEN
Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is one of the most common congenital birth defects. NSCL/P is a complex multifactorial disease caused by interactions between multiple environmental and genetic factors. However, the causal single nucleotide polymorphism (SNP) signature profile underlying the risk of familial NSCL/P still remains unknown. We previously reported a 5.7-Mb genomic region on chromosome 18q21.1 locus that potentially contributes to autosomal dominant, low-penetrance inheritance of NSCL/P. In the current study, we performed exome sequencing on 12 familial genomes (six affected individuals, two obligate carriers, and four seemingly unaffected individuals) of a six-generation family to identify candidate SNPs associated with NSCL/P risk. Subsequently, targeted bidirectional DNA re-sequencing of polymerase chain reaction (PCR)-amplified high-risk regions of MYO5B gene and sequenom iPLEX genotpying of 29 candidate SNPs were performed on a larger set of 33 members of this NSCL/P family (10 affected + 4 obligate carriers + 19 unaffected relatives) to find SNPs significantly associated with NSCL/P trait. SNP vs. NSCL/P association analysis showed the MYO5B SNP rs183559995 GA genotype had an odds ratio of 18.09 (95% Confidence Interval = 1.86-176.34; gender-adjusted P = 0.0019) compared to the reference GG genotype. Additionally, the following SNPs were also found significantly associated with NSCL/P risk: rs1450425 (LOXHD1), rs6507992 (SKA1), rs78950893 (SMAD7), rs8097060, rs17713847 (SCARNA17), rs6507872 (CTIF), rs8091995 (CTIF), and rs17715416 (MYO5B). We could thus identify mutations in several genes as key candidate SNPs associated with the risk of NSCL/P in this large multi-generation family.
RESUMEN
Tools that provide improved ability to relate genotype to phenotype have the potential to accelerate breeding for desired traits and to improve our understanding of the molecular variants that underlie phenotypes. The availability of large-scale gene expression profiles in maize provides an opportunity to advance our understanding of complex traits in this agronomically important species. We built co-expression networks based on genome-wide expression data from a variety of maize accessions as well as an atlas of different tissues and developmental stages. We demonstrate that these networks reveal clusters of genes that are enriched for known biological function and contain extensive structure which has yet to be characterized. Furthermore, we found that co-expression networks derived from developmental or tissue atlases as compared to expression variation across diverse accessions capture unique functions. To provide convenient access to these networks, we developed a public, web-based Co-expression Browser (COB), which enables interactive queries of the genome-wide networks. We illustrate the utility of this system through two specific use cases: one in which gene-centric queries are used to provide functional context for previously characterized metabolic pathways, and a second where lists of genes produced by mapping studies are further resolved and validated using co-expression networks.