RESUMO
BACKGROUND: Mitochondrial genomes differ from the nuclear genome and in humans it is known that mitochondrial variants contribute to genetic disorders. Prior to genomics, some livestock studies assessed the role of the mitochondrial genome but these were limited and inconclusive. Modern genome sequencing provides an opportunity to re-evaluate the potential impact of mitochondrial variation on livestock traits. This study first evaluated the empirical accuracy of mitochondrial sequence imputation and then used real and imputed mitochondrial sequence genotypes to study the role of mitochondrial variants on milk production traits of dairy cattle. RESULTS: The empirical accuracy of imputation from Single Nucleotide Polymorphism (SNP) panels to mitochondrial sequence genotypes was assessed in 516 test animals of Holstein, Jersey and Red breeds using Beagle software and a sequence reference of 1883 animals. The overall accuracy estimated as the Pearson's correlation squared (R2) between all imputed and real genotypes across all animals was 0.454. The low accuracy was attributed partly to the majority of variants having low minor allele frequency (MAF < 0.005) but also due to variants in the hypervariable D-loop region showing poor imputation accuracy. Beagle software provides an internal estimate of imputation accuracy (DR2), and 10 percent of the total 1927 imputed positions showed DR2 greater than 0.9 (N = 201). There were 151 sites with empirical R2 > 0.9 (of 954 variants segregating in the test animals) and 138 of these overlapped the sites with DR2 > 0.9. This suggests that the DR2 statistic is a reasonable proxy to select sites that are imputed with higher accuracy for downstream analyses. Accordingly, in the second part of the study mitochondrial sequence variants were imputed from real mitochondrial SNP panel genotypes of 9515 Australian Holstein, Jersey and Red dairy cattle. Then, using only sites with DR2 > 0.900 and real genotypes, we undertook a genome-wide association study (GWAS) for milk, fat and protein yields. The GWAS mitochondrial SNP effects were not significant. CONCLUSION: The accuracy of imputation of mitochondrial genotypes from the SNP panel to sequence was generally low. The Beagle DR2 statistic enabled selection of sites imputed with higher empirical accuracy. We recommend building larger reference populations with mitochondrial sequence to improve the accuracy of imputing less common variants and ensuring that SNP panels include common variants in the D-loop region.
Assuntos
Leite , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Leite/metabolismo , Genótipo , Genoma Mitocondrial , Frequência do Gene , Feminino , DNA Mitocondrial/genética , Estudo de Associação Genômica Ampla/métodos , SoftwareRESUMO
BACKGROUND: Female fertility is an important trait in dairy cattle. Identifying putative causal variants associated with fertility may help to improve the accuracy of genomic prediction of fertility. Combining expression data (eQTL) of genes, exons, gene splicing and allele specific expression is a promising approach to fine map QTL to get closer to the causal mutations. Another approach is to identify genomic differences between cows selected for high and low fertility and a selection experiment in New Zealand has created exactly this resource. Our objective was to combine multiple types of expression data, fertility traits and allele frequency in high- (POS) and low-fertility (NEG) cows with a genome-wide association study (GWAS) on calving interval in Australian cows to fine-map QTL associated with fertility in both Australia and New Zealand dairy cattle populations. RESULTS: Variants that were significantly associated with calving interval (CI) were strongly enriched for variants associated with gene, exon, gene splicing and allele-specific expression, indicating that there is substantial overlap between QTL associated with CI and eQTL. We identified 671 genes with significant differential expression between POS and NEG cows, with the largest fold change detected for the CCDC196 gene on chromosome 10. Our results provide numerous candidate genes associated with female fertility in dairy cattle, including GYS2 and TIGAR on chromosome 5 and SYT3 and HSD17B14 on chromosome 18. Multiple QTL regions were located in regions with large numbers of copy number variants (CNV). To identify the causal mutations for these variants, long read sequencing may be useful. CONCLUSIONS: Variants that were significantly associated with CI were highly enriched for eQTL. We detected 671 genes that were differentially expressed between POS and NEG cows. Several QTL detected for CI overlapped with eQTL, providing candidate genes for fertility in dairy cattle.
Assuntos
Fertilidade , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Animais , Bovinos/genética , Fertilidade/genética , Feminino , Estudo de Associação Genômica Ampla/veterinária , Polimorfismo de Nucleotídeo Único , Mapeamento Cromossômico , Frequência do GeneRESUMO
BACKGROUND: Mastitis is a disease that incurs significant costs in the dairy industry. A promising approach to mitigate its negative effects is to genetically improve the resistance of dairy cattle to mastitis. A meta-analysis of genome-wide association studies (GWAS) across multiple breeds for clinical mastitis (CM) and its indicator trait, somatic cell score (SCS), is a powerful method to identify functional genetic variants that impact mastitis resistance. RESULTS: We conducted meta-analyses of eight and fourteen GWAS on CM and SCS, respectively, using 30,689 and 119,438 animals from six dairy cattle breeds. Methods for the meta-analyses were selected to properly account for the multi-breed structure of the GWAS data. Our study revealed 58 lead markers that were associated with mastitis incidence, including 16 loci that did not overlap with previously identified quantitative trait loci (QTL), as curated at the Animal QTLdb. Post-GWAS analysis techniques such as gene-based analysis and genomic feature enrichment analysis enabled prioritization of 31 candidate genes and 14 credible candidate causal variants that affect mastitis. CONCLUSIONS: Our list of candidate genes can help to elucidate the genetic architecture underlying mastitis resistance and provide better tools for the prevention or treatment of mastitis, ultimately contributing to more sustainable animal production.
Assuntos
Resistência à Doença , Estudo de Associação Genômica Ampla , Mastite Bovina , Locos de Características Quantitativas , Animais , Bovinos/genética , Mastite Bovina/genética , Feminino , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/veterinária , Resistência à Doença/genética , Polimorfismo de Nucleotídeo Único , Cruzamento/métodosRESUMO
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Assuntos
Genômica , Gado , Animais , Gado/genética , Genótipo , FenótipoRESUMO
BACKGROUND: Heat tolerance is a trait of economic importance in the context of warm climates and the effects of global warming on livestock production, reproduction, health, and well-being. This study investigated the improvement in prediction accuracy for heat tolerance when selected sets of sequence variants from a large genome-wide association study (GWAS) were combined with a standard 50k single nucleotide polymorphism (SNP) panel used by the dairy industry. METHODS: Over 40,000 dairy cattle with genotype and phenotype data were analysed. The phenotypes used to measure an individual's heat tolerance were defined as the rate of decline in milk production traits with rising temperature and humidity. We used Holstein and Jersey cows to select sequence variants linked to heat tolerance. The prioritised sequence variants were the most significant SNPs passing a GWAS p-value threshold selected based on sliding 100-kb windows along each chromosome. We used a bull reference set to develop the genomic prediction equations, which were then validated in an independent set of Holstein, Jersey, and crossbred cows. Prediction analyses were performed using the BayesR, BayesRC, and GBLUP methods. RESULTS: The accuracy of genomic prediction for heat tolerance improved by up to 0.07, 0.05, and 0.10 units in Holstein, Jersey, and crossbred cows, respectively, when sets of selected sequence markers from Holstein cows were added to the 50k SNP panel. However, in some scenarios, the prediction accuracy decreased unexpectedly with the largest drop of - 0.10 units for the heat tolerance fat yield trait observed in Jersey cows when 50k plus pre-selected SNPs from Holstein cows were used. Using pre-selected SNPs discovered on a combined set of Holstein and Jersey cows generally improved the accuracy, especially in the Jersey validation. In addition, combining Holstein and Jersey bulls in the reference set generally improved prediction accuracy in most scenarios compared to using only Holstein bulls as the reference set. CONCLUSIONS: Informative sequence markers can be prioritised to improve the genomic prediction of heat tolerance in different breeds. In addition to providing biological insight, these variants could also have a direct application for developing customized SNP arrays or can be used via imputation in current industry SNP panels.
Assuntos
Estudo de Associação Genômica Ampla , Termotolerância , Animais , Bovinos/genética , Feminino , Genoma , Genômica/métodos , Genótipo , Masculino , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Urinary nitrogen leakage is an environmental concern in dairy cattle. Selection for reduced urinary nitrogen leakage may be done using indicator traits such as milk urea nitrogen (MUN). The result of a previous study indicated that the genetic correlation between MUN in Australia (AUS) and MUN in New Zealand (NZL) was only low to moderate (between 0.14 and 0.58). In this context, an alternative is to select sequence variants based on genome-wide association studies (GWAS) with a view to improve genomic prediction accuracies. A GWAS can also be used to detect quantitative trait loci (QTL) associated with MUN. Therefore, our objectives were to perform within-country GWAS and a meta-GWAS for MUN using records from up to 33,873 dairy cows and imputed whole-genome sequence data, to compare QTL detected in the GWAS for MUN in AUS and NZL, and to use sequence variants selected from the meta-GWAS to improve the prediction accuracy for MUN based on a joint AUS-NZL reference set. RESULTS: Using the meta-GWAS, we detected 14 QTL for MUN, located on chromosomes 1, 6, 11, 14, 19, 22, 26 and the X chromosome. The three most significant QTL encompassed the casein genes on chromosome 6, PAEP on chromosome 11 and DGAT1 on chromosome 14. We selected 50,000 sequence variants that had the same direction of effect for MUN in AUS and MUN in NZL and that were most significant in the meta-analysis for the GWAS. The selected sequence variants yielded a genetic correlation between MUN in AUS and MUN in NZL of 0.95 and substantially increased prediction accuracy in both countries. CONCLUSIONS: Our results demonstrate how the sharing of data between two countries can increase the power of a GWAS and increase the accuracy of genomic prediction using a multi-country reference population and sequence variants selected based on a meta-GWAS.
Assuntos
Estudo de Associação Genômica Ampla , Leite , Animais , Austrália , Bovinos/genética , Feminino , Genômica , Lactação/genética , Leite/química , Nova Zelândia , Nitrogênio , Ureia/análiseRESUMO
BACKGROUND: Sharing individual phenotype and genotype data between countries is complex and fraught with potential errors, while sharing summary statistics of genome-wide association studies (GWAS) is relatively straightforward, and thus would be especially useful for traits that are expensive or difficult-to-measure, such as feed efficiency. Here we examined: (1) the sharing of individual cow data from international partners; and (2) the use of sequence variants selected from GWAS of international cow data to evaluate the accuracy of genomic estimated breeding values (GEBV) for residual feed intake (RFI) in Australian cows. RESULTS: GEBV for RFI were estimated using genomic best linear unbiased prediction (GBLUP) with 50k or high-density single nucleotide polymorphisms (SNPs), from a training population of 3797 individuals in univariate to trivariate analyses where the three traits were RFI phenotypes calculated using 584 Australian lactating cows (AUSc), 824 growing heifers (AUSh), and 2526 international lactating cows (OVE). Accuracies of GEBV in AUSc were evaluated by either cohort-by-birth-year or fourfold random cross-validations. GEBV of AUSc were also predicted using only the AUS training population with a weighted genomic relationship matrix constructed with SNPs from the 50k array and sequence variants selected from a meta-GWAS that included only international datasets. The genomic heritabilities estimated using the AUSc, OVE and AUSh datasets were moderate, ranging from 0.20 to 0.36. The genetic correlations (rg) of traits between heifers and cows ranged from 0.30 to 0.95 but were associated with large standard errors. The mean accuracies of GEBV in Australian cows were up to 0.32 and almost doubled when either overseas cows, or both overseas cows and AUS heifers were included in the training population. They also increased when selected sequence variants were combined with 50k SNPs, but with a smaller relative increase. CONCLUSIONS: The accuracy of RFI GEBV increased when international data were used or when selected sequence variants were combined with 50k SNP array data. This suggests that if direct sharing of data is not feasible, a meta-analysis of summary GWAS statistics could provide selected SNPs for custom panels to use in genomic selection programs. However, since this finding is based on a small cross-validation study, confirmation through a larger study is recommended.
Assuntos
Bovinos , Lactação , Animais , Austrália , Bovinos/genética , Feminino , Estudo de Associação Genômica Ampla , Genômica , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
Assuntos
Evolução Biológica , Bovinos/genética , Regulação da Expressão Gênica/genética , Herança Multifatorial/genética , Animais , Cruzamento , Bases de Dados Genéticas , Feminino , Variação Genética , Genoma/genética , Estudo de Associação Genômica Ampla , Masculino , Fenótipo , Locos de Características Quantitativas/genética , Seleção GenéticaRESUMO
BACKGROUND: Imputation to whole-genome sequence is now possible in large sheep populations. It is therefore of interest to use this data in genome-wide association studies (GWAS) to investigate putative causal variants and genes that underpin economically important traits. Merino wool is globally sought after for luxury fabrics, but some key wool quality attributes are unfavourably correlated with the characteristic skin wrinkle of Merinos. In turn, skin wrinkle is strongly linked to susceptibility to "fly strike" (Cutaneous myiasis), which is a major welfare issue. Here, we use whole-genome sequence data in a multi-trait GWAS to identify pleiotropic putative causal variants and genes associated with changes in key wool traits and skin wrinkle. RESULTS: A stepwise conditional multi-trait GWAS (CM-GWAS) identified putative causal variants and related genes from 178 independent quantitative trait loci (QTL) of 16 wool and skin wrinkle traits, measured on up to 7218 Merino sheep with 31 million imputed whole-genome sequence (WGS) genotypes. Novel candidate gene findings included the MAT1A gene that encodes an enzyme involved in the sulphur metabolism pathway critical to production of wool proteins, and the ESRP1 gene. We also discovered a significant wrinkle variant upstream of the HAS2 gene, which in dogs is associated with the exaggerated skin folds in the Shar-Pei breed. CONCLUSIONS: The wool and skin wrinkle traits studied here appear to be highly polygenic with many putative candidate variants showing considerable pleiotropy. Our CM-GWAS identified many highly plausible candidate genes for wool traits as well as breech wrinkle and breech area wool cover.
Assuntos
Pleiotropia Genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Ovinos/genética , Animais , Hialuronan Sintases/genética , Metionina Adenosiltransferase/genética , Herança Multifatorial , Proteínas de Ligação a RNA/genética , Fenômenos Fisiológicos da Pele/genética , Fibra de Lã/normasRESUMO
BACKGROUND: Variants that regulate transcription, such as expression quantitative trait loci (eQTL), have shown enrichment in genome-wide association studies (GWAS) for mammalian complex traits. However, no study has reported eQTL in sheep, although it is an important agricultural species for which many GWAS of complex meat traits have been conducted. Using RNA sequence data produced from liver and muscle from 149 sheep and imputed whole-genome single nucleotide polymorphisms (SNPs), our aim was to dissect the genetic architecture of the transcriptome by associating sheep genotypes with three major molecular phenotypes including gene expression (geQTL), exon expression (eeQTL) and RNA splicing (sQTL). We also examined these three types of eQTL for their enrichment in GWAS of multi-meat traits and fatty acid profiles. RESULTS: Whereas a relatively small number of molecular phenotypes were significantly heritable (h2 > 0, P < 0.05), their mean heritability ranged from 0.67 to 0.73 for liver and from 0.71 to 0.77 for muscle. Association analysis between molecular phenotypes and SNPs within ± 1 Mb identified many significant cis-eQTL (false discovery rate, FDR < 0.01). The median distance between the eQTL and transcription start sites (TSS) ranged from 68 to 153 kb across the three eQTL types. The number of common variants between geQTL, eeQTL and sQTL within each tissue, and the number of common variants between liver and muscle within each eQTL type were all significantly (P < 0.05) larger than expected by chance. The identified eQTL were significantly (P < 0.05) enriched in GWAS hits associated with 56 carcass traits and fatty acid profiles. For example, several geQTL in muscle mapped to the FAM184B gene, hundreds of sQTL in liver and muscle mapped to the CAST gene, and hundreds of sQTL in liver mapped to the C6 gene. These three genes are associated with body composition or fatty acid profiles. CONCLUSIONS: We detected a large number of significant eQTL and found that the overlap of variants between eQTL types and tissues was prevalent. Many eQTL were also QTL for meat traits. Our study fills a gap in the knowledge on the regulatory variants and their role in complex traits for the sheep model.
Assuntos
Fígado/metabolismo , Músculo Esquelético/metabolismo , Polimorfismo Genético , Locos de Características Quantitativas , Carne Vermelha/normas , Ovinos/genética , Animais , Ácidos Graxos/metabolismo , Feminino , Masculino , Característica Quantitativa Herdável , TranscriptomaRESUMO
Feed efficiency and energy balance are important traits underpinning profitability and environmental sustainability in animal production. They are complex traits, and our understanding of their underlying biology is currently limited. One measure of feed efficiency is residual feed intake (RFI), which is the difference between actual and predicted intake. Variation in RFI among individuals is attributable to the metabolic efficiency of energy utilization. High RFI (H_RFI) animals require more energy per unit of weight gain or milk produced compared with low RFI (L_RFI) animals. Energy balance (EB) is a closely related trait calculated very similarly to RFI. Cellular energy metabolism in mitochondria involves mitochondrial protein (MiP) encoded by both nuclear (NuMiP) and mitochondrial (MtMiP) genomes. We hypothesized that MiP genes are differentially expressed (DE) between H_RFI and L_RFI animal groups and similarly between negative and positive EB groups. Our study aimed to characterize MiP gene expression in white blood cells of H_RFI and L_RFI cows using RNA sequencing to identify genes and biological pathways associated with feed efficiency in dairy cattle. We used the top and bottom 14 cows ranked for RFI and EB out of 109 animals as H_RFI and L_RFI, and positive and negative EB groups, respectively. The gene expression counts across all nuclear and mitochondrial genes for animals in each group were used for differential gene expression analyses, weighted gene correlation network analysis, functional enrichment, and identification of hub genes. Out of 244 DE genes between RFI groups, 38 were MiP genes. The DE genes were enriched for the oxidative phosphorylation (OXPHOS) and ribosome pathways. The DE MiP genes were underexpressed in L_RFI (and negative EB) compared with the H_RFI (and positive EB) groups, suggestive of reduced mitochondrial activity in the L_RFI group. None of the MtMiP genes were among the DE MiP genes between the groups, which suggests a non-rate limiting role of MtMiP genes in feed efficiency and warrants further investigation. The role of MiP, particularly the NuMiP and OXPHOS pathways in RFI, was also supported by our gene correlation network analysis and the hub gene identification. We validated the findings in an independent data set. Overall, our study suggested that differences in feed efficiency in dairy cows may be linked to differences in cellular energy demand. This study broadens our knowledge of the biology of feed efficiency in dairy cattle.
Assuntos
Ração Animal , Bovinos/genética , Proteínas Mitocondriais/genética , Fosforilação Oxidativa , Animais , Bovinos/metabolismo , Ingestão de Alimentos/genética , Metabolismo Energético , Feminino , Expressão Gênica , Genoma , Lactação , Leite , Fenótipo , Análise de Sequência de RNA/veterináriaRESUMO
BACKGROUND: Mutations in the mitochondrial genome have been implicated in mitochondrial disease, often characterized by impaired cellular energy metabolism. Cellular energy metabolism in mitochondria involves mitochondrial proteins (MP) from both the nuclear (NuMP) and mitochondrial (MtMP) genomes. The expression of MP genes in tissues may be tissue specific to meet varying specific energy demands across the tissues. Currently, the characteristics of MP gene expression in tissues of dairy cattle are not well understood. In this study, we profile the expression of MP genes in 29 adult and six foetal tissues in dairy cattle using RNA sequencing and gene expression analyses: particularly differential gene expression and co-expression network analyses. RESULTS: MP genes were differentially expressed (DE; over-expressed or under-expressed) across tissues in cattle. All 29 tissues showed DE NuMP genes in varying proportions of over-expression and under-expression. On the other hand, DE of MtMP genes was observed in < 50% of tissues and notably MtMP genes within a tissue was either all over-expressed or all under-expressed. A high proportion of NuMP (up to 60%) and MtMP (up to 100%) genes were over-expressed in tissues with expected high metabolic demand; heart, skeletal muscles and tongue, and under-expressed (up to 45% of NuMP, 77% of MtMP genes) in tissues with expected low metabolic rates; leukocytes, thymus, and lymph nodes. These tissues also invariably had the expression of all MtMP genes in the direction of dominant NuMP genes expression. The NuMP and MtMP genes were highly co-expressed across tissues and co-expression of genes in a cluster were non-random and functionally enriched for energy generation pathway. The differential gene expression and co-expression patterns were validated in independent cow and sheep datasets. CONCLUSIONS: The results of this study support the concept that there are biological interaction of MP genes from the mitochondrial and nuclear genomes given their over-expression in tissues with high energy demand and co-expression in tissues. This highlights the importance of considering MP genes from both genomes in future studies related to mitochondrial functions and traits related to energy metabolism.
Assuntos
Genoma Mitocondrial , Proteínas Mitocondriais , Animais , Bovinos/genética , Metabolismo Energético/genética , Feminino , Expressão Gênica , Perfilação da Expressão Gênica , OvinosRESUMO
BACKGROUND: Whole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes. METHODS: Between 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep. RESULTS: A substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants. CONCLUSIONS: Accuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.
Assuntos
Genômica/métodos , Ovinos/genética , Sequenciamento Completo do Genoma , Animais , Austrália , Teorema de Bayes , Cruzamento , Estudo de Associação Genômica Ampla , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep. RESULTS: The accuracy of imputation from the Ovine Infinium® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R2) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R2 below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R2 in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R2 ≤ 0.4. CONCLUSIONS: The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R2) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses.
Assuntos
Estudo de Associação Genômica Ampla/normas , Ovinos/genética , Software/normas , Sequenciamento Completo do Genoma/normas , Animais , Estudo de Associação Genômica Ampla/veterinária , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma/veterináriaRESUMO
BACKGROUND: Mammalian phenotypes are shaped by numerous genome variants, many of which may regulate gene transcription or RNA splicing. To identify variants with regulatory functions in cattle, an important economic and model species, we used sequence variants to map a type of expression quantitative trait loci (expression QTLs) that are associated with variations in the RNA splicing, i.e., sQTLs. To further the understanding of regulatory variants, sQTLs were compare with other two types of expression QTLs, 1) variants associated with variations in gene expression, i.e., geQTLs and 2) variants associated with variations in exon expression, i.e., eeQTLs, in different tissues. RESULTS: Using whole genome and RNA sequence data from four tissues of over 200 cattle, sQTLs identified using exon inclusion ratios were verified by matching their effects on adjacent intron excision ratios. sQTLs contained the highest percentage of variants that are within the intronic region of genes and contained the lowest percentage of variants that are within intergenic regions, compared to eeQTLs and geQTLs. Many geQTLs and sQTLs are also detected as eeQTLs. Many expression QTLs, including sQTLs, were significant in all four tissues and had a similar effect in each tissue. To verify such expression QTL sharing between tissues, variants surrounding (±1 Mb) the exon or gene were used to build local genomic relationship matrices (LGRM) and estimated genetic correlations between tissues. For many exons, the splicing and expression level was determined by the same cis additive genetic variance in different tissues. Thus, an effective but simple-to-implement meta-analysis combining information from three tissues is introduced to increase power to detect and validate sQTLs. sQTLs and eeQTLs together were more enriched for variants associated with cattle complex traits, compared to geQTLs. Several putative causal mutations were identified, including an sQTL at Chr6:87392580 within the 5th exon of kappa casein (CSN3) associated with milk production traits. CONCLUSIONS: Using novel analytical approaches, we report the first identification of numerous bovine sQTLs which are extensively shared between multiple tissue types. The significant overlaps between bovine sQTLs and complex traits QTL highlight the contribution of regulatory mutations to phenotypic variations.
Assuntos
Variação Genética , Splicing de RNA , Animais , Células Sanguíneas/metabolismo , Caseínas/genética , Bovinos , Éxons , Feminino , Fígado/metabolismo , Glândulas Mamárias Animais/metabolismo , Músculos/metabolismo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , TranscriptomaRESUMO
Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively with best linear unbiased prediction (BLUP) methods. Such methods were pioneered in plant and animal-breeding literature and have since been applied to predict human traits, with the aim of eventual clinical utility. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two-variance-component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated with genetic markers. In simulations using real genotypes from the Candidate-gene Association Resource (CARe) and Framingham Heart Study (FHS) family cohorts, we demonstrate that the two-variance-component model achieves gains in prediction r(2) over standard BLUP at current sample sizes, and we project, based on simulations, that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two-variance-component model significantly improves prediction r(2) in each case, with up to a 20% relative improvement. We also find that standard mixed-model association tests can produce inflated test statistics in datasets with related individuals, whereas the two-variance-component model corrects for inflation.
Assuntos
Doenças Cardiovasculares/diagnóstico , Marcadores Genéticos , Estudo de Associação Genômica Ampla , Modelos Genéticos , Modelos Estatísticos , Locos de Características Quantitativas , Doenças Cardiovasculares/genética , Simulação por Computador , Conjuntos de Dados como Assunto , Família , Estudos de Associação Genética , Genômica/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Análise de Componente Principal , Seleção Genética/genéticaRESUMO
BACKGROUND: Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. RESULTS: Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. CONCLUSIONS: The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations.
Assuntos
Mapeamento Cromossômico , Genômica , Dinâmica não Linear , Locos de Características Quantitativas/genética , Sequenciamento Completo do Genoma , Algoritmos , Animais , Teorema de Bayes , Bovinos , Feminino , Fertilidade/genética , Genótipo , Cadeias de Markov , Leite/metabolismo , Método de Monte Carlo , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Enhancers are non-coding DNA sequences, which when they are bound by specific proteins increase the level of gene transcription. Enhancers activate unique gene expression patterns within cells of different types or under different conditions. Enhancers are key contributors to gene regulation, and causative variants that affect quantitative traits in humans and mice have been located in enhancer regions. However, in the bovine genome, enhancers as well as other regulatory elements are not yet well defined. In this paper, we sought to improve the annotation of bovine enhancer regions by using publicly available mammalian enhancer information. To test if the identified putative bovine enhancer regions are enriched with functional variants that affect milk production traits, we performed genome-wide association studies using imputed whole-genome sequence data followed by meta-analysis and enrichment analysis. RESULTS: We produced a library of candidate bovine enhancer regions by using publicly available bovine ChIP-Seq enhancer data in combination with enhancer data that were identified based on sequence homology with human and mouse enhancer databases. We found that imputed whole-genome sequence variants associated with milk production traits in 16,581 dairy cattle were enriched with enhancer regions that were marked by bovine-liver H3K4me3 and H3K27ac histone modifications from both permutation tests and gene set enrichment analysis. Enhancer regions that were identified based on sequence homology with human and mouse enhancer regions were not as strongly enriched with trait-associated sequence variants as the bovine ChIP-Seq candidate enhancer regions. The bovine ChIP-Seq enriched enhancer regions were located near genes and quantitative trait loci that are associated with pregnancy, growth, disease resistance, meat quality and quantity, and milk quality and quantity traits in dairy and beef cattle. CONCLUSIONS: Our results suggest that sequence variants within enhancer regions that are located in bovine non-coding genomic regions contribute to the variation in complex traits. The level of enrichment was higher in bovine-specific enhancer regions that were identified by detecting histone modifications H3K4me3 and H3K27ac in bovine liver tissues than in enhancer regions identified by sequence homology with human and mouse data. These results highlight the need to use bovine-specific experimental data for the identification of enhancer regions.