Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

The rate of de novo structural variation is increased in in vitro-produced offspring and preferentially affects the paternal genome.

Lee, Young-Lim; Bouwman, Aniek C; Harland, Chad; Bosse, Mirte; Costa Monteiro Moreira, Gabriel; Veerkamp, Roel F; Mullaart, Erik; Cambisano, Nadine; Groenen, Martien A M; Karim, Latifa; Coppieters, Wouter; Georges, Michel; Charlier, Carole.

Genome Res ; 33(9): 1455-1464, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37793781

RESUMEN

Assisted reproductive technologies (ARTs), including in vitro maturation and fertilization (IVF), are increasingly used in human and animal reproduction. Whether these technologies directly affect the rate of de novo mutation (DNM), and to what extent, has been a matter of debate. Here we take advantage of domestic cattle, characterized by complex pedigrees that are ideally suited to detect DNMs and by the systematic use of ART, to study the rate of de novo structural variation (dnSV) in this species and how it is impacted by IVF. By exploiting features of associated de novo point mutations (dnPMs) and dnSVs in clustered DNMs, we provide strong evidence that (1) IVF increases the rate of dnSV approximately fivefold, and (2) the corresponding mutations occur during the very early stages of embryonic development (one- and two-cell stage), yet primarily affect the paternal genome.

Asunto(s)

Desarrollo Embrionario , Familia , Embarazo , Femenino , Animales , Bovinos , Humanos , Mutación , Linaje , Genoma Humano

2.

Meta-analysis of six dairy cattle breeds reveals biologically relevant candidate genes for mastitis resistance.

Cai, Zexi; Iso-Touru, Terhi; Sanchez, Marie-Pierre; Kadri, Naveen; Bouwman, Aniek C; Chitneedi, Praveen Krishna; MacLeod, Iona M; Vander Jagt, Christy J; Chamberlain, Amanda J; Gredler-Grandl, Birgit; Spengeler, Mirjam; Lund, Mogens Sandø; Boichard, Didier; Kühn, Christa; Pausch, Hubert; Vilkki, Johanna; Sahana, Goutam.

Genet Sel Evol ; 56(1): 54, 2024 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-39009986

RESUMEN

BACKGROUND: Mastitis is a disease that incurs significant costs in the dairy industry. A promising approach to mitigate its negative effects is to genetically improve the resistance of dairy cattle to mastitis. A meta-analysis of genome-wide association studies (GWAS) across multiple breeds for clinical mastitis (CM) and its indicator trait, somatic cell score (SCS), is a powerful method to identify functional genetic variants that impact mastitis resistance. RESULTS: We conducted meta-analyses of eight and fourteen GWAS on CM and SCS, respectively, using 30,689 and 119,438 animals from six dairy cattle breeds. Methods for the meta-analyses were selected to properly account for the multi-breed structure of the GWAS data. Our study revealed 58 lead markers that were associated with mastitis incidence, including 16 loci that did not overlap with previously identified quantitative trait loci (QTL), as curated at the Animal QTLdb. Post-GWAS analysis techniques such as gene-based analysis and genomic feature enrichment analysis enabled prioritization of 31 candidate genes and 14 credible candidate causal variants that affect mastitis. CONCLUSIONS: Our list of candidate genes can help to elucidate the genetic architecture underlying mastitis resistance and provide better tools for the prevention or treatment of mastitis, ultimately contributing to more sustainable animal production.

Asunto(s)

Resistencia a la Enfermedad , Estudio de Asociación del Genoma Completo , Mastitis Bovina , Sitios de Carácter Cuantitativo , Animales , Bovinos/genética , Mastitis Bovina/genética , Femenino , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/veterinaria , Resistencia a la Enfermedad/genética , Polimorfismo de Nucleótido Simple , Cruzamiento/métodos

3.

genomeprofile: Unveiling the genomic profile for livestock breeding through comprehensive SNP array-based genotyping.

Hulsegge, Ina; Bouwman, Aniek C; Derks, Martijn F L.

Anim Genet ; 2024 Jul 17.

Artículo en Inglés | MEDLINE | ID: mdl-39021305

RESUMEN

In livestock breeding, single nucleotide polymorphism arrays have become a cornerstone of modern livestock breeding. SNP arrays facilitate the identification of genetic markers linked to economically important traits and provide a powerful tool for predicting breeding values. However, conventional breeding programs often overlook additional genomic features contained in the SNP array data that can provide valuable insights into the genetic diversity, copy number variation, inbreeding levels and potential challenges in breeding lines. Here we present genomeprofile, a tool using SNP array-based genomic data, offering a comprehensive profile of breeding animals including the identification of copy number variants and runs of homozygosity, and screening for aneuploidy. By integrating these features into the breeding landscape, genomeprofile enables a more comprehensive picture of genomic variation, ultimately enhancing precision breeding strategies. To illustrate the practicality and efficacy of genomeprofile, we applied the tool to a dataset of four pig breeding lines. The genomeprofile tool is a user-friendly tool that processes genotype data in finalreport or plink ped format efficiently into useful output. The output contains copy number variations, runs of homozygosity, selection signatures, aneuploidy and inbreeding per individual and across populations. This allows breeding companies and researchers to identify unique individuals or regions in the genome of interest based on routinely collected data.

4.

A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle.

Lee, Young-Lim; Takeda, Haruko; Costa Monteiro Moreira, Gabriel; Karim, Latifa; Mullaart, Erik; Coppieters, Wouter; Appeltant, Ruth; Veerkamp, Roel F; Groenen, Martien A M; Georges, Michel; Bosse, Mirte; Druet, Tom; Bouwman, Aniek C; Charlier, Carole.

PLoS Genet ; 17(7): e1009331, 2021 07.

Artículo en Inglés | MEDLINE | ID: mdl-34288907

RESUMEN

Clinical mastitis (CM) is an inflammatory disease occurring in the mammary glands of lactating cows. CM is under genetic control, and a prominent CM resistance QTL located on chromosome 6 was reported in various dairy cattle breeds. Nevertheless, the biological mechanism underpinning this QTL has been lacking. Herein, we mapped, fine-mapped, and discovered the putative causal variant underlying this CM resistance QTL in the Dutch dairy cattle population. We identified a ~12 kb multi-allelic copy number variant (CNV), that is in perfect linkage disequilibrium with a lead SNP, as a promising candidate variant. By implementing a fine-mapping and through expression QTL mapping, we showed that the group-specific component gene (GC), a gene encoding a vitamin D binding protein, is an excellent candidate causal gene for the QTL. The multiplicated alleles are associated with increased GC expression and low CM resistance. Ample evidence from functional genomics data supports the presence of an enhancer within this CNV, which would exert cis-regulatory effect on GC. We observed that strong positive selection swept the region near the CNV, and haplotypes associated with the multiplicated allele were strongly selected for. Moreover, the multiplicated allele showed pleiotropic effects for increased milk yield and reduced fertility, hinting that a shared underlying biology for these effects may revolve around the vitamin D pathway. These findings together suggest a putative causal variant of a CM resistance QTL, where a cis-regulatory element located within a CNV can alter gene expression and affect multiple economically important traits.

Asunto(s)

Elementos de Facilitación Genéticos , Mastitis Bovina/genética , Proteína de Unión a Vitamina D/genética , Animales , Bovinos , Variaciones en el Número de Copia de ADN , Femenino , Predisposición Genética a la Enfermedad , Haplotipos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Secuenciación Completa del Genoma

5.

High-resolution structural variants catalogue in a large-scale whole genome sequenced bovine family cohort data.

Lee, Young-Lim; Bosse, Mirte; Takeda, Haruko; Moreira, Gabriel Costa Monteiro; Karim, Latifa; Druet, Tom; Oget-Ebrad, Claire; Coppieters, Wouter; Veerkamp, Roel F; Groenen, Martien A M; Georges, Michel; Bouwman, Aniek C; Charlier, Carole.

BMC Genomics ; 24(1): 225, 2023 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-37127590

RESUMEN

BACKGROUND: Structural variants (SVs) are chromosomal segments that differ between genomes, such as deletions, duplications, insertions, inversions and translocations. The genomics revolution enabled the discovery of sub-microscopic SVs via array and whole-genome sequencing (WGS) data, paving the way to unravel the functional impact of SVs. Recent human expression QTL mapping studies demonstrated that SVs play a disproportionally large role in altering gene expression, underlining the importance of including SVs in genetic analyses. Therefore, this study aimed to generate and explore a high-quality bovine SV catalogue exploiting a unique cattle family cohort data (total 266 samples, forming 127 trios). RESULTS: We curated 13,731 SVs segregating in the population, consisting of 12,201 deletions, 1,509 duplications, and 21 multi-allelic CNVs (> 50-bp). Of these, we validated a subset of copy number variants (CNVs) utilising a direct genotyping approach in an independent cohort, indicating that at least 62% of the CNVs are true variants, segregating in the population. Among gene-disrupting SVs, we prioritised two likely high impact duplications, encompassing ORM1 and POPDC3 genes, respectively. Liver expression QTL mapping results revealed that these duplications are likely causing altered gene expression, confirming the functional importance of SVs. Although most of the accurately genotyped CNVs are tagged by single nucleotide polymorphisms (SNPs) ascertained in WGS data, most CNVs were not captured by individual SNPs obtained from a 50K genotyping array. CONCLUSION: We generated a high-quality SV catalogue exploiting unique whole genome sequenced bovine family cohort data. Two high impact duplications upregulating the ORM1 and POPDC3 are putative candidates for postpartum feed intake and hoof health traits, thus warranting further investigation. Generally, CNVs were in low LD with SNPs on the 50K array. Hence, it remains crucial to incorporate CNVs via means other than tagging SNPs, such as investigation of tagging haplotypes, direct imputation of CNVs, or direct genotyping as done in the current study. The SV catalogue and the custom genotyping array generated in the current study will serve as valuable resources accelerating utilisation of full spectrum of genetic variants in bovine genomes.

Asunto(s)

Genoma , Genómica , Femenino , Humanos , Bovinos , Animales , Genómica/métodos , Genotipo , Variaciones en el Número de Copia de ADN , Haplotipos , Polimorfismo de Nucleótido Simple , Proteínas Musculares/genética , Moléculas de Adhesión Celular/genética

6.

Using prior information from humans to prioritize genes and gene-associated variants for complex traits in livestock.

Raymond, Biaty; Yengo, Loic; Costilla, Roy; Schrooten, Chris; Bouwman, Aniek C; Hayes, Ben J; Veerkamp, Roel F; Visscher, Peter M.

PLoS Genet ; 16(9): e1008780, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32925905

RESUMEN

Genome-Wide Association Studies (GWAS) in large human cohorts have identified thousands of loci associated with complex traits and diseases. For identifying the genes and gene-associated variants that underlie complex traits in livestock, especially where sample sizes are limiting, it may help to integrate the results of GWAS for equivalent traits in humans as prior information. In this study, we sought to investigate the usefulness of results from a GWAS on human height as prior information for identifying the genes and gene-associated variants that affect stature in cattle, using GWAS summary data on samples sizes of 700,000 and 58,265 for humans and cattle, respectively. Using Fisher's exact test, we observed a significant proportion of cattle stature-associated genes (30/77) that are also associated with human height (odds ratio = 5.1, p = 3.1e-10). Result of randomized sampling tests showed that cattle orthologs of human height-associated genes, hereafter referred to as candidate genes (C-genes), were more enriched for cattle stature GWAS signals than random samples of genes in the cattle genome (p = 0.01). Randomly sampled SNPs within the C-genes also tend to explain more genetic variance for cattle stature (up to 13.2%) than randomly sampled SNPs within random cattle genes (p = 0.09). The most significant SNPs from a cattle GWAS for stature within the C-genes did not explain more genetic variance for cattle stature than the most significant SNPs within random cattle genes (p = 0.87). Altogether, our findings support previous studies that suggest a similarity in the genetic regulation of height across mammalian species. However, with the availability of a powerful GWAS for stature that combined data from 8 cattle breeds, prior information from human-height GWAS does not seem to provide any additional benefit with respect to the identification of genes and gene-associated variants that affect stature in cattle.

Asunto(s)

Estatura/genética , Bovinos/genética , Estudio de Asociación del Genoma Completo/métodos , Animales , Cruzamiento/métodos , Bases de Datos Genéticas , Variación Genética/genética , Humanos , Ganado/genética , Herencia Multifactorial/genética , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética

7.

Classifying aneuploidy in genotype intensity data using deep learning.

Bouwman, Aniek C; Hulsegge, Ina; Hawken, Rachel J; Henshall, John M; Veerkamp, Roel F; Schokker, Dirkjan; Kamphuis, Claudia.

J Anim Breed Genet ; 140(3): 304-315, 2023 May.

Artículo en Inglés | MEDLINE | ID: mdl-36806175

RESUMEN

Aneuploidy is the loss or gain of one or more chromosomes. Although it is a rare phenomenon in liveborn individuals, it is observed in livestock breeding populations. These breeding populations are often routinely genotyped and the genotype intensity data from single nucleotide polymorphism (SNP) arrays can be exploited to identify aneuploidy cases. This identification is a time-consuming and costly task, because it is often performed by visual inspection of the data per chromosome, usually done in plots of the intensity data by an expert. Therefore, we wanted to explore the feasibility of automated image classification to replace (part of) the visual detection procedure for any diploid species. The aim of this study was to develop a deep learning Convolutional Neural Network (CNN) classification model based on chromosome level plots of SNP array intensity data that can classify the images into disomic, monosomic and trisomic cases. A multispecies dataset enriched for aneuploidy cases was collected containing genotype intensity data of 3321 disomic, 1759 monosomic and 164 trisomic chromosomes. The final CNN model had an accuracy of 99.9%, overall precision was 1, recall was 0.98 and the F1 score was 0.99 for classifying images from intensity data. The high precision assures that cases detected are most likely true cases, however, some trisomy cases may be missed (the recall of the class trisomic was 0.94). This supervised CNN model performed much better than an unsupervised k-means clustering, which reached an accuracy of 0.73 and had especially difficult to classify trisomic cases correctly. The developed CNN classification model provides high accuracy to classify aneuploidy cases based on images of plotted X and Y genotype intensity values. The classification model can be used as a tool for routine screening in large diploid populations that are genotyped to get a better understanding of the incidence and inheritance, and in addition, avoid anomalies in breeding candidates.

Asunto(s)

Aprendizaje Profundo , Animales , Aneuploidia , Redes Neurales de la Computación , Genotipo

8.

Functional and population genetic features of copy number variations in two dairy cattle populations.

Lee, Young-Lim; Bosse, Mirte; Mullaart, Erik; Groenen, Martien A M; Veerkamp, Roel F; Bouwman, Aniek C.

BMC Genomics ; 21(1): 89, 2020 Jan 28.

Artículo en Inglés | MEDLINE | ID: mdl-31992181

RESUMEN

BACKGROUND: Copy Number Variations (CNVs) are gain or loss of DNA segments that are known to play a role in shaping a wide range of phenotypes. In this study, we used two dairy cattle populations, Holstein Friesian and Jersey, to discover CNVs using the Illumina BovineHD Genotyping BeadChip aligned to the ARS-UCD1.2 assembly. The discovered CNVs were investigated for their functional impact and their population genetics features. RESULTS: We discovered 14,272 autosomal CNVs, which were aggregated into 1755 CNV regions (CNVR) from 451 animals. These CNVRs together cover 2.8% of the bovine autosomes. The assessment of the functional impact of CNVRs showed that rare CNVRs (MAF < 0.01) are more likely to overlap with genes, than common CNVRs (MAF ≥ 0.05). The Population differentiation index (Fst) based on CNVRs revealed multiple highly diverged CNVRs between the two breeds. Some of these CNVRs overlapped with candidate genes such as MGAM and ADAMTS17 genes, which are related to starch digestion and body size, respectively. Lastly, linkage disequilibrium (LD) between CNVRs and BovineHD BeadChip SNPs was generally low, close to 0, although common deletions (MAF ≥ 0.05) showed slightly higher LD (r2 = ~ 0.1 at 10 kb distance) than the rest. Nevertheless, this LD is still lower than SNP-SNP LD (r2 = ~ 0.5 at 10 kb distance). CONCLUSIONS: Our analyses showed that CNVRs detected using BovineHD BeadChip arrays are likely to be functional. This finding indicates that CNVs can potentially disrupt the function of genes and thus might alter phenotypes. Also, the population differentiation index revealed two candidate genes, MGAM and ADAMTS17, which hint at adaptive evolution between the two populations. Lastly, low CNVR-SNP LD implies that genetic variation from CNVs might not be fully captured in routine animal genetic evaluation, which relies solely on SNP markers.

Asunto(s)

Variaciones en el Número de Copia de ADN , Genética de Población , Animales , Cruzamiento , Bovinos , Genoma , Desequilibrio de Ligamiento , Sitios de Carácter Cuantitativo

9.

Using short read sequencing to characterise balanced reciprocal translocations in pigs.

Bouwman, Aniek C; Derks, Martijn F L; Broekhuijse, Marleen L W J; Harlizius, Barbara; Veerkamp, Roel F.

BMC Genomics ; 21(1): 576, 2020 Aug 24.

Artículo en Inglés | MEDLINE | ID: mdl-32831014

RESUMEN

BACKGROUND: A balanced constitutional reciprocal translocation (RT) is a mutual exchange of terminal segments of two non-homologous chromosomes without any loss or gain of DNA in germline cells. Carriers of balanced RTs are viable individuals with no apparent phenotypical consequences. These animals produce, however, unbalanced gametes and show therefore reduced fertility and offspring with congenital abnormalities. This cytogenetic abnormality is usually detected using chromosome staining techniques. The aim of this study was to test the possibilities of using paired end short read sequencing for detection of balanced RTs in boars and investigate their breakpoints and junctions. RESULTS: Balanced RTs were recovered in a blinded analysis, using structural variant calling software DELLY, in 6 of the 7 carriers with 30 fold short read paired end sequencing. In 15 non-carriers we did not detect any RTs. Reducing the coverage to 20 fold, 15 fold and 10 fold showed that at least 20 fold coverage is required to obtain good results. One RT was not detected using the blind screening, however, a highly likely RT was discovered after unblinding. This RT was located in a repetitive region, showing the limitations of short read sequence data. The detailed analysis of the breakpoints and junctions suggested three junctions showing microhomology, three junctions with blunt-end ligation, and three micro-insertions at the breakpoint junctions. The RTs detected also showed to disrupt genes. CONCLUSIONS: We conclude that paired end short read sequence data can be used to detect and characterize balanced reciprocal translocations, if sequencing depth is at least 20 fold coverage. However, translocations in repetitive areas may require large fragments or even long read sequence data.

Asunto(s)

Aberraciones Cromosómicas , Translocación Genética , Animales , ADN , Heterocigoto , Masculino , Porcinos/genética

10.

A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices.

Raymond, Biaty; Wientjes, Yvonne C J; Bouwman, Aniek C; Schrooten, Chris; Veerkamp, Roel F.

Genet Sel Evol ; 52(1): 21, 2020 Apr 28.

Artículo en Inglés | MEDLINE | ID: mdl-32345213

RESUMEN

BACKGROUND: A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy. METHODS: Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation. RESULTS: In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait. CONCLUSIONS: We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.

Asunto(s)

Cruzamiento , Metagenómica , Modelos Genéticos , Animales , Fenotipo , Polimorfismo de Nucleótido Simple

11.

Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies.

van den Berg, Sanne; Vandenplas, Jérémie; van Eeuwijk, Fred A; Bouwman, Aniek C; Lopes, Marcos S; Veerkamp, Roel F.

Genet Sel Evol ; 51(1): 2, 2019 Jan 24.

Artículo en Inglés | MEDLINE | ID: mdl-30678638

RESUMEN

BACKGROUND: Use of whole-genome sequence data (WGS) is expected to improve identification of quantitative trait loci (QTL). However, this requires imputation to WGS, often with a limited number of sequenced animals for the target population. The objective of this study was to investigate imputation to WGS in two pig lines using a multi-line reference population and, subsequently, to investigate the effect of using these imputed WGS (iWGS) for GWAS. METHODS: Phenotypes and genotypes were available on 12,184 Large White pigs (LW-line) and 4943 Dutch Landrace pigs (DL-line). Imputed 660 K and 80 K genotypes for the LW-line and DL-line, respectively, were imputed to iWGS using Beagle v.4.1. Since only 32 LW-line and 12 DL-line boars were sequenced, 142 animals from eight commercial lines were added. GWAS were performed for each line using the 80 K and 660 K SNPs, the genotype scores of iWGS SNPs that had an imputation accuracy (Beagle R2) higher than 0.6, and the dosage scores of all iWGS SNPs. RESULTS: For the DL-line (LW-line), imputation of 80 K genotypes to iWGS resulted in an average Beagle R2 of 0.39 (0.49). After quality control, 2.5 × 106 (3.5 × 106) SNPs had a Beagle R2 higher than 0.6, resulting in an average Beagle R2 of 0.83 (0.93). Compared to the 80 K and 660 K genotypes, using iWGS led to the identification of 48.9 and 64.4% more QTL regions, for the DL-line and LW-line, respectively, and the most significant SNPs in the QTL regions explained a higher proportion of phenotypic variance. Using dosage instead of genotype scores improved the identification of QTL, because the model accounted for uncertainty of imputation, and all SNPs were used in the analysis. CONCLUSIONS: Imputation to WGS using the multi-line reference population resulted in relatively poor imputation, especially when imputing from 80 K (DL-line). In spite of the poor imputation accuracies, using iWGS instead of a lower density SNP chip increased the number of detected QTL and the estimated proportion of phenotypic variance explained by these QTL, especially when dosage scores were used instead of genotype scores. Thus, iWGS, even with poor imputation accuracy, can be used to identify possible interesting regions for fine mapping.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Porcinos/genética , Secuenciación Completa del Genoma/métodos , Animales , Estudio de Asociación del Genoma Completo/normas , Estudio de Asociación del Genoma Completo/veterinaria , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Secuenciación Completa del Genoma/normas , Secuenciación Completa del Genoma/veterinaria

12.

Utility of whole-genome sequence data for across-breed genomic prediction.

Raymond, Biaty; Bouwman, Aniek C; Schrooten, Chris; Houwing-Duistermaat, Jeanine; Veerkamp, Roel F.

Genet Sel Evol ; 50(1): 27, 2018 05 18.

Artículo en Inglés | MEDLINE | ID: mdl-29776327

RESUMEN

BACKGROUND: Genomic prediction (GP) across breeds has so far resulted in low accuracies of the predicted genomic breeding values. Our objective was to evaluate whether using whole-genome sequence (WGS) instead of low-density markers can improve GP across breeds, especially when markers are pre-selected from a genome-wide association study (GWAS), and to test our hypothesis that many non-causal markers in WGS data have a diluting effect on accuracy of across-breed prediction. METHODS: Estimated breeding values for stature and bovine high-density (HD) genotypes were available for 595 Jersey bulls from New Zealand, 957 Holstein bulls from New Zealand and 5553 Holstein bulls from the Netherlands. BovineHD genotypes for all bulls were imputed to WGS using Beagle4 and Minimac2. Genomic prediction across the three populations was performed with ASReml4, with each population used as single reference and as single validation sets. In addition to the 50k, HD and WGS, markers that were significantly associated with stature in a large meta-GWAS analysis were selected and used for prediction, resulting in 10 prediction scenarios. Furthermore, we estimated the proportion of genetic variance captured by markers in each scenario. RESULTS: Across breeds, 50k, HD and WGS markers resulted in very low accuracies of prediction ranging from - 0.04 to 0.13. Accuracies were higher in scenarios with pre-selected markers from a meta-GWAS. For example, using only the 133 most significant markers in 133 QTL regions from the meta-GWAS yielded accuracies ranging from 0.08 to 0.23, while 23,125 markers with a - log10(p) higher than 7 resulted in accuracies of up 0.35. Using WGS data did not significantly improve the proportion of genetic variance captured across breeds compared to scenarios with few but pre-selected markers. CONCLUSIONS: Our results demonstrated that the accuracy of across-breed GP can be improved by using markers that are pre-selected from WGS based on their potential causal effect. We also showed that simply increasing the number of markers up to the WGS level does not increase the accuracy of across-breed prediction, even when markers that are expected to have a causal effect are included.

Asunto(s)

Cruzamiento , Bovinos/anatomía & histología , Bovinos/clasificación , Estudio de Asociación del Genoma Completo/veterinaria , Sitios de Carácter Cuantitativo , Animales , Biometría , Bovinos/genética , Biología Computacional , Variación Genética , Masculino , Modelos Genéticos , Linaje , Polimorfismo de Nucleótido Simple

13.

Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers.

Raymond, Biaty; Bouwman, Aniek C; Wientjes, Yvonne C J; Schrooten, Chris; Houwing-Duistermaat, Jeanine; Veerkamp, Roel F.

Genet Sel Evol ; 50(1): 49, 2018 Oct 10.

Artículo en Inglés | MEDLINE | ID: mdl-30314431

RESUMEN

BACKGROUND: Genomic prediction (GP) accuracy in numerically small breeds is limited by the small size of the reference population. Our objective was to test a multi-breed multiple genomic relationship matrices (GRM) GP model (MBMG) that weighs pre-selected markers separately, uses the remaining markers to explain the remaining genetic variance that can be explained by markers, and weighs information of breeds in the reference population by their genetic correlation with the validation breed. METHODS: Genotype and phenotype data were used on 595 Jersey bulls from New Zealand and 5503 Holstein bulls from the Netherlands, all with deregressed proofs for stature. Different sets of markers were used, containing either pre-selected markers from a meta-genome-wide association analysis on stature, remaining markers or both. We implemented a multi-breed bivariate GREML model in which we fitted either a single multi-breed GRM (MBSG), or two distinct multi-breed GRM (MBMG), one made with pre-selected markers and the other with remaining markers. Accuracies of predicting stature for Jersey individuals using the multi-breed models (Holstein and Jersey combined reference population) was compared to those obtained using either the Jersey (within-breed) or Holstein (across-breed) reference population. All the models were subsequently fitted in the analysis of simulated phenotypes, with a simulated genetic correlation between breeds of 1, 0.5, and 0.25. RESULTS: The MBMG model always gave better prediction accuracies for stature compared to MBSG, within-, and across-breed GP models. For example, with MBSG, accuracies obtained by fitting 48,912 unselected markers (0.43), 357 pre-selected markers (0.38) or a combination of both (0.43), were lower than accuracies obtained by fitting pre-selected and unselected markers in separate GRM in MBMG (0.49). This improvement was further confirmed by results from a simulation study, with MBMG performing on average 23% better than MBSG with all markers fitted. CONCLUSIONS: With the MBMG model, it is possible to use information from numerically large breeds to improve prediction accuracy of numerically small breeds. The superiority of MBMG is mainly due to its ability to use information on pre-selected markers, explain the remaining genetic variance and weigh information from a different breed by the genetic correlation between breeds.

Asunto(s)

Cruzamiento/métodos , Modelos Genéticos , Polimorfismo Genético , Animales , Cruzamiento/normas , Bovinos/genética , Marcadores Genéticos , Tamaño de la Muestra , Selección Genética

14.

Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts.

Bouwman, Aniek C; Hayes, Ben J; Calus, Mario P L.

Genet Sel Evol ; 49(1): 79, 2017 10 30.

Artículo en Inglés | MEDLINE | ID: mdl-29084514

RESUMEN

BACKGROUND: Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance. RESULTS: In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from - 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of - 0.8, while the mean squared error of prediction was minimized with a scaling parameter of - 1, i.e., RRcs. CONCLUSIONS: Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.

Asunto(s)

Algoritmos , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Animales , Bovinos/genética , Femenino , Estudio de Asociación del Genoma Completo/normas , Masculino , Modelos Genéticos

15.

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.

Calus, Mario P L; Bouwman, Aniek C; Schrooten, Chris; Veerkamp, Roel F.

Genet Sel Evol ; 48(1): 49, 2016 06 29.

Artículo en Inglés | MEDLINE | ID: mdl-27357580

RESUMEN

BACKGROUND: Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step. RESULTS: We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months. CONCLUSIONS: The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.

Asunto(s)

Bovinos/genética , Biología Computacional , Genómica/métodos , Modelos Genéticos , Animales , Teorema de Bayes , Genotipo , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple

16.

Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle.

Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L.

Genet Sel Evol ; 48(1): 95, 2016 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-27905878

RESUMEN

BACKGROUND: Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. METHODS: Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. RESULTS: The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. CONCLUSIONS: Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.

Asunto(s)

Variación Genética , Estudio de Asociación del Genoma Completo , Genoma , Genómica , Animales , Cruzamiento , Bovinos , Genómica/métodos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Selección Genética

17.

Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy.

Bouwman, Aniek C; Veerkamp, Roel F.

BMC Genet ; 15: 105, 2014 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-25277486

RESUMEN

BACKGROUND: The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. RESULTS: Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. CONCLUSIONS: This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference populations are small and sequencing effort is limiting. When sequencing effort is limiting and interest lays in multiple breeds or lines this provides imputation of each breed.

Asunto(s)

Bovinos/genética , Polimorfismo de Nucleótido Simple , Animales , Secuencia de Bases , Cruzamiento , Frecuencia de los Genes , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Masculino , Análisis de Secuencia de ADN , Especificidad de la Especie

18.

Exploring causal networks of bovine milk fatty acids in a multivariate mixed model context.

Bouwman, Aniek C; Valente, Bruno D; Janss, Luc L G; Bovenhuis, Henk; Rosa, Guilherme J M.

Genet Sel Evol ; 46: 2, 2014 Jan 17.

Artículo en Inglés | MEDLINE | ID: mdl-24438068

RESUMEN

BACKGROUND: Knowledge regarding causal relationships among traits is important to understand complex biological systems. Structural equation models (SEM) can be used to quantify the causal relations between traits, which allow prediction of outcomes to interventions applied to such a network. Such models are fitted conditionally on a causal structure among traits, represented by a directed acyclic graph and an Inductive Causation (IC) algorithm can be used to search for causal structures. The aim of this study was to explore the space of causal structures involving bovine milk fatty acids and to select a network supported by data as the structure of a SEM. RESULTS: The IC algorithm adapted to mixed models settings was applied to study 14 correlated bovine milk fatty acids, resulting in an undirected network. The undirected pathway from C4:0 to C12:0 resembled the de novo synthesis pathway of short and medium chain saturated fatty acids. By using prior knowledge, directions were assigned to that part of the network and the resulting structure was used to fit a SEM that led to structural coefficients ranging from 0.85 to 1.05. The deviance information criterion indicated that the SEM was more plausible than the multi-trait model. CONCLUSIONS: The IC algorithm output pointed towards causal relations between the studied traits. This changed the focus from marginal associations between traits to direct relationships, thus towards relationships that may result in changes when external interventions are applied. The causal structure can give more insight into underlying mechanisms and the SEM can predict conditional changes due to such interventions.

Asunto(s)

Algoritmos , Ácidos Grasos/análisis , Leche/química , Animales , Bovinos , Ácidos Grasos/genética , Modelos Genéticos , Fenotipo

19.

Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle.

Bouwman, Aniek C; Hickey, John M; Calus, Mario P L; Veerkamp, Roel F.

Genet Sel Evol ; 46: 6, 2014 Feb 03.

Artículo en Inglés | MEDLINE | ID: mdl-24490796

RESUMEN

BACKGROUND: Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives. METHODS: Genotypes were simulated for all individuals in the pedigree of a real (historical) dataset of phenotyped dairy cows and with part of the pedigree genotyped. The software AlphaImpute was used for imputation in its standard settings but also without phasing, i.e. using basic inheritance rules and segregation analysis only. Different scenarios were evaluated i.e.: (1) the real data scenario, (2) addition of genotypes of sires and maternal grandsires of the ungenotyped individuals, and (3) addition of one, two, or four genotyped offspring of the ungenotyped individuals to the reference population. RESULTS: The imputation accuracy using AlphaImpute in its standard settings was lower than without phasing. Including genotypes of sires and maternal grandsires in the reference population improved imputation accuracy, i.e. the correlation of the true genotypes with the imputed genotype dosages, corrected for mean gene content, across all animals increased from 0.47 (real situation) to 0.60. Including one, two and four genotyped offspring increased the accuracy of imputation across all animals from 0.57 (no offspring) to 0.73, 0.82, and 0.92, respectively. CONCLUSIONS: At present, the use of basic inheritance rules and segregation analysis appears to be the best imputation method for ungenotyped individuals. Comparison of our empirical animal-specific imputation accuracies to predictions based on selection index theory suggested that not correcting for mean gene content considerably overestimates the true accuracy. Imputation of ungenotyped individuals can help to include valuable phenotypes for genome-wide association studies or for genomic prediction, especially when the ungenotyped individuals have genotyped offspring.

Asunto(s)

Bovinos/genética , Genotipo , Fenotipo , Algoritmos , Animales , Cruzamiento , Genoma , Modelos Genéticos , Programas Informáticos

20.

Fine mapping of a quantitative trait locus for bovine milk fat composition on Bos taurus autosome 19.

Bouwman, Aniek C; Visker, Marleen H P W; van Arendonk, Johana M; Bovenhuis, Henk.

J Dairy Sci ; 97(2): 1139-49, 2014 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-24315323

RESUMEN

A major quantitative trait locus (QTL) for milk fat content and fatty acids in both milk and adipose tissue has been detected on Bos taurus autosome 19 (BTA19) in several cattle breeds. The objective of this study was to refine the location of the QTL on BTA19 for bovine milk fat composition using a denser set of markers. Opportunities for fine mapping were provided by imputation from 50,000 genotyped single nucleotide polymorphisms (SNP) toward a high-density SNP panel with up to 777,000 SNP. The QTL region was narrowed down to a linkage disequilibrium block formed by 22 SNP covering 85,007 bp, from 51,303,322 to 51,388,329 bp on BTA19. This linkage disequilibrium block contained 2 genes: coiled-coil domain containing 57 (CCDC57) and fatty acid synthase (FASN). The gene CCDC57 is minimally characterized and has not been associated with bovine milk fat previously, but is expressed in the mammary gland. The gene FASN has been associated with bovine milk fat and fat in adipose tissue before. This gene is a likely candidate for the QTL on BTA19 because of its involvement in de novo fat synthesis. Future studies using sequence data of both CCDC57 and FASN, and eventually functional studies, will have to be pursued to assign the causal variant(s).

Asunto(s)

Bovinos/genética , Cromosomas/genética , Ácidos Grasos/metabolismo , Leche/química , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Animales , Cruzamiento , Bovinos/fisiología , Mapeo Cromosómico , Ácido Graso Sintasas/genética , Femenino , Genotipo , Haplotipos , Desequilibrio de Ligamiento , Fenotipo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA