Pesquisa | Portal Regional da BVS

1.

Evaluation of breeding strategies to reduce the inbreeding rate in the Friesian horse population: Looking back and moving forward.

Steensma, Marije J; Doekes, Harmen P; Pook, Torsten; Derks, Martijn F L; Bakker, Nynke; Ducro, Bart J.

J Anim Breed Genet ; 2024 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-38745529

RESUMO

In the past, small population sizes and unequal ancestor contributions have resulted in high inbreeding rates (ΔF) in the Friesian horse. Two decades ago, the studbook implemented a mating quota and started publishing individual kinships and reduced ΔF below 1% per generation. However, since then, the breeding population size has decreased and this raises the question whether current breeding strategies are sufficient to keep ΔF below desired rates. The aim of this study was to (1) reflect on past inbreeding trends and their main determinants, using pedigree analysis and (2) evaluate the effectiveness of the current and additional breeding strategies using stochastic simulations. We estimated the current ΔF (2013-2022) at 0.72% per generation. While the total contribution of the top 10 sires to the number of offspring per year has decreased from 75% in 1980 to 35% in 2022, this was mainly due to an increased number of approved studbook sires, and not due to more equalized contributions among sires. Of the simulated breeding strategies, selecting only breeding stallions with a below average mean kinship (i.e., "mean kinship selection") was most effective to decrease ΔF (from 0.66% to 0.33%). Increasing the number of breeding sires only had an effect when also a mating quota was applied. However, its effect remained limited. For example, a ~1.5 fold increase, combined with a mating quota of 80 offspring per sire per year, reduced ΔF from 0.55% to 0.51%. When increasing the number of breeding mares, a practically unfeasible large increase was needed for a meaningful reduction in ΔF (e.g. twice as many mares were needed to reduce ΔF from 0.66% to 0.56%). Stratified mating quotas, a novel approach in which we assigned each sire a mating quota (of 60, 80, 100 or 120 offspring per year) based on its mean kinship to recently born foals, resulted in a lower ΔF (0.43%) than a general mating quota of 90 offspring per sire per year (0.55%). Overall, while the current ΔF is below 1%, we recommend to implement additional strategies to further reduce ΔF below 0.5% in the Friesian horse population. For this breed and similar populations, we recommend to focus on breeding strategies based on kinship levels to effectively reduce ΔF.

2.

Improving selection decisions with mating information by accounting for Mendelian sampling variances looking two generations ahead.

Niehoff, Tobias A M; Ten Napel, Jan; Bijma, Piter; Pook, Torsten; Wientjes, Yvonne C J; Hegedus, Bernadett; Calus, Mario P L.

Genet Sel Evol ; 56(1): 41, 2024 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-38773363

RESUMO

BACKGROUND: Breeding programs are judged by the genetic level of animals that are used to disseminate genetic progress. These animals are typically the best ones of the population. To maximise the genetic level of very good animals in the next generation, parents that are more likely to produce top performing offspring need to be selected. The ability of individuals to produce high-performing progeny differs because of differences in their breeding values and gametic variances. Differences in gametic variances among individuals are caused by differences in heterozygosity and linkage. The use of the gametic Mendelian sampling variance has been proposed before, for use in the usefulness criterion or Index5, and in this work, we extend existing approaches by not only considering the gametic Mendelian sampling variance of individuals, but also of their potential offspring. Thus, the criteria developed in this study plan one additional generation ahead. For simplicity, we assumed that the true quantitative trait loci (QTL) effects, genetic map and the haplotypes of all animals are known. RESULTS: In this study, we propose a new selection criterion, ExpBVSelGrOff, which describes the genetic level of selected grand-offspring that are produced by selected offspring of a particular mating. We compare our criterion with other published criteria in a stochastic simulation of an ongoing breeding program for 21 generations for proof of concept. ExpBVSelGrOff performed better than all other tested criteria, like the usefulness criterion or Index5 which have been proposed in the literature, without compromising short-term gains. After only five generations, when selection is strong (1%), selection based on ExpBVSelGrOff achieved 5.8% more commercial genetic gain and retained 25% more genetic variance without compromising inbreeding rate compared to selection based only on breeding values. CONCLUSIONS: Our proposed selection criterion offers a new tool to accelerate genetic progress for contemporary genomic breeding programs. It retains more genetic variance than previously published criteria that plan less far ahead. Considering future gametic Mendelian sampling variances in the selection process also seems promising for maintaining more genetic variance.

Assuntos

Modelos Genéticos , Locos de Características Quantitativas , Seleção Genética , Animais , Cruzamento/métodos , Feminino , Masculino , Seleção Artificial

3.

Genomic prediction within and across maize landrace derived populations using haplotypes.

Lin, Yan-Cheng; Mayer, Manfred; Valle Torres, Daniel; Pook, Torsten; Hölker, Armin C; Presterl, Thomas; Ouzunova, Milena; Schön, Chris-Carolin.

Front Plant Sci ; 15: 1351466, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38584949

RESUMO

Genomic prediction (GP) using haplotypes is considered advantageous compared to GP solely reliant on single nucleotide polymorphisms (SNPs), owing to haplotypes' enhanced ability to capture ancestral information and their higher linkage disequilibrium with quantitative trait loci (QTL). Many empirical studies supported the advantages of haplotype-based GP over SNP-based approaches. Nevertheless, the performance of haplotype-based GP can vary significantly depending on multiple factors, including the traits being studied, the genetic structure of the population under investigation, and the particular method employed for haplotype construction. In this study, we compared haplotype and SNP based prediction accuracies in four populations derived from European maize landraces. Populations comprised either doubled haploid lines (DH) derived directly from landraces, or gamete capture lines (GC) derived from crosses of the landraces with an inbred line. For two different landraces, both types of populations were generated, genotyped with 600k SNPs and phenotyped as lines per se for five traits. Our study explores three prediction scenarios: (i) within each of the four populations, (ii) across DH and GC populations from the same landrace, and (iii) across landraces using either DH or GC populations. Three haplotype construction methods were evaluated: 1. fixed-window blocks (FixedHB), 2. LD-based blocks (HaploView), and 3. IBD-based blocks (HaploBlocker). In within population predictions, FixedHB and HaploView methods performed as well as or slightly better than SNPs for all traits. HaploBlocker improved accuracy for certain traits but exhibited inferior performance for others. In prediction across populations, the parameter setting from HaploBlocker which controls the construction of shared haplotypes between populations played a crucial role for obtaining optimal results. When predicting across landraces, accuracies were low for both, SNP and haplotype approaches, but for specific traits substantial improvement was observed with HaploBlocker. This study provides recommendations for optimal haplotype construction and identifies relevant parameters for constructing haplotypes in the context of genomic prediction.

4.

The pig pangenome provides insights into the roles of coding structural variations in genetic diversity and adaptation.

Li, Zhengcao; Liu, Xiaohong; Wang, Chen; Li, Zhenyang; Jiang, Bo; Zhang, Ruifeng; Tong, Lu; Qu, Youping; He, Sheng; Chen, Haifan; Mao, Yafei; Li, Qingnan; Pook, Torsten; Wu, Yu; Zan, Yanjun; Zhang, Hui; Li, Lu; Wen, Keying; Chen, Yaosheng.

Genome Res ; 33(10): 1833-1847, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37914227

RESUMO

Structural variations have emerged as an important driving force for genome evolution and phenotypic variation in various organisms, yet their contributions to genetic diversity and adaptation in domesticated animals remain largely unknown. Here we constructed a pangenome based on 250 sequenced individuals from 32 pig breeds in Eurasia and systematically characterized coding sequence presence/absence variations (PAVs) within pigs. We identified 308.3-Mb nonreference sequences and 3438 novel genes absent from the current reference genome. Gene PAV analysis showed that 16.8% of the genes in the pangene catalog undergo PAV. A number of newly identified dispensable genes showed close associations with adaptation. For instance, several novel swine leukocyte antigen (SLA) genes discovered in nonreference sequences potentially participate in immune responses to productive and respiratory syndrome virus (PRRSV) infection. We delineated previously unidentified features of the pig mobilome that contained 490,480 transposable element insertion polymorphisms (TIPs) resulting from recent mobilization of 970 TE families, and investigated their population dynamics along with influences on population differentiation and gene expression. In addition, several candidate adaptive TE insertions were detected to be co-opted into genes responsible for responses to hypoxia, skeletal development, regulation of heart contraction, and neuronal cell development, likely contributing to local adaptation of Tibetan wild boars. These findings enhance our understanding on hidden layers of the genetic diversity in pigs and provide novel insights into the role of SVs in the evolutionary adaptation of mammals.

Assuntos

Cruzamento , Genoma , Humanos , Animais , Suínos , Variação Genética , Mamíferos

5.

Optimization Strategies to Adapt Sheep Breeding Programs to Pasture-Based Production Environments: A Simulation Study.

Martin, Rebecca; Pook, Torsten; Bennewitz, Jörn; Schmid, Markus.

Animals (Basel) ; 13(22)2023 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-38003094

RESUMO

Strong differences between the selection (indoor fattening) and production environment (pasture fattening) are expected to reduce genetic gain due to possible genotype-by-environment interactions (G × E). To investigate how to adapt a sheep breeding program to a pasture-based production environment, different scenarios were simulated for the German Merino sheep population using the R package Modular Breeding Program Simulator (MoBPS). All relevant selection steps and a multivariate pedigree-based BLUP breeding value estimation were included. The reference scenario included progeny testing at stations to evaluate the fattening performance and carcass traits. It was compared to alternative scenarios varying in the progeny testing scheme for fattening traits (station and/or field). The total merit index (TMI) set pasture-based lamb fattening as a breeding goal, i.e., field fattening traits were weighted. Regarding the TMI, the scenario with progeny testing both in the field and on station led to a significant increase in genetic gain compared with the reference scenario. Regarding fattening traits, genetic gain was significantly increased in the alternative scenarios in which field progeny testing was performed. In the presence of G × E, the study showed that the selection environment should match the production environment (pasture) to avoid losses in genetic gain. As most breeding goals also contain traits not recordable in field testing, the combination of both field and station testing is required to maximize genetic gain.

6.

Accelerated matrix-vector multiplications for matrices involving genotype covariates with applications in genomic prediction.

Freudenberg, Alexander; Vandenplas, Jeremie; Schlather, Martin; Pook, Torsten; Evans, Ross; Ten Napel, Jan.

Front Genet ; 14: 1220408, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37662837

RESUMO

In the last decade, a number of methods have been suggested to deal with large amounts of genetic data in genomic predictions. Yet, steadily growing population sizes and the suboptimal use of computational resources are pushing the practical application of these approaches to their limits. As an extension to the C/CUDA library miraculix, we have developed tailored solutions for the computation of genotype matrix multiplications which is a critical bottleneck in the empirical evaluation of many statistical models. We demonstrate the benefits of our solutions at the example of single-step models which make repeated use of this kind of multiplication. Targeting modern Nvidia® GPUs as well as a broad range of CPU architectures, our implementation significantly reduces the time required for the estimation of breeding values in large population sizes. miraculix is released under the Apache 2.0 license and is freely available at https://github.com/alexfreudenberg/miraculix.

7.

Optimization of breeding program design through stochastic simulation with kernel regression.

Hassanpour, Azadeh; Geibel, Johannes; Simianer, Henner; Pook, Torsten.

G3 (Bethesda) ; 13(12)2023 Dec 06.

Artigo em Inglês | MEDLINE | ID: mdl-37742059

RESUMO

In recent years, breeding programs have increased significantly in size and complexity, with various highly interdependent parameters and many contrasting breeding goals. As a result, resource allocation in these programs has become more complex, and deriving an optimal breeding strategy has become increasingly challenging. To address this, a common practice is to reduce the optimization problem to a set of scenarios that differ only in a few parameters and can therefore be analyzed in detail. The goal of this article is to provide a framework for the numerical optimization of breeding programs that goes beyond the simple comparison of scenarios. For this, we first determine the space of potential breeding programs only limited by basic constraints like the budget and housing capacities. Subsequently, the goal is to identify the optimal breeding program by finding the parametrization that maximizes the target function by combining different breeding goals. To assess the value of the target function for a parametrization, we propose using stochastic simulations and the subsequent use of a kernel regression method to cope with the stochasticity of simulation outcomes. This procedure is performed iteratively to narrow down the most promising areas of the search space and perform more and more simulations in these areas of interest. In a simplified example applied to a dairy cattle program, our proposed framework has shown its ability to identify an optimal breeding strategy that aligns with a target function aiming at genetic gain and genetic diversity conservation limited by budget constraints.

Assuntos

Endogamia , Seleção Genética , Animais , Bovinos , Simulação por Computador

8.

How economic weights translate into genetic and phenotypic progress, and vice versa.

Simianer, Henner; Heise, Johannes; Rensing, Stefan; Pook, Torsten; Geibel, Johannes; Reimer, Christian.

Genet Sel Evol ; 55(1): 38, 2023 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-37291496

RESUMO

BACKGROUND: This paper highlights the relationships between economic weights, genetic progress, and phenotypic progress in genomic breeding programs that aim at generating genetic progress in complex, i.e., multi-trait, breeding objectives via a combination of estimated breeding values for different trait complexes. RESULTS: Based on classical selection index theory in combination with quantitative genetic models, we provide a methodological framework for calculating expected genetic and phenotypic progress for all components of a complex breeding objective. We further provide an approach to study the sensitivity of the system to modifications, e.g. to changes in the economic weights. We propose a novel approach to derive the covariance structure of the stochastic errors of estimated breeding values from the observed correlations of estimated breeding values. We define 'realized economic weights' as those weights that would coincide with the observed composition of the genetic trend and show, how they can be calculated. The suggested methodology is illustrated with an index that aims at achieving a breeding goal composed of six trait complexes, that was applied in German Holstein cattle breeding until 2021. CONCLUSIONS: Based on the presented results, the main conclusions are (i) the composition of the observed genetic progress matches the expectations well, with predictions being slightly better when the covariance of estimation errors is taken into account; (ii) the composition of the expected phenotypic trend deviates significantly from the expected genetic trend due to the differences in trait heritabilities; and (iii) the realized economic weights derived from the observed genetic trend deviate substantially from the predefined ones, in one case even with a reversed sign. Further results highlight the implications of the change to a modified breeding goal based on the example of a new index comprising eight, partly new, trait complexes, which is used since 2021 in the German Holstein breeding program. The proposed framework and the analytical tools and software provided will be useful to define more rational and generally accepted breeding objectives in the future.

Assuntos

Genoma , Seleção Genética , Animais , Bovinos/genética , Fenótipo , Genômica , Modelos Genéticos

9.

Genomic prediction using information across years with epistatic models and dimension reduction via haplotype blocks.

Vojgani, Elaheh; Hölker, Armin C; Mayer, Manfred; Schön, Chris-Carolin; Simianer, Henner; Pook, Torsten.

PLoS One ; 18(3): e0282288, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37000811

RESUMO

The importance of accurate genomic prediction of phenotypes in plant breeding is undeniable, as higher prediction accuracy can increase selection responses. In this regard, epistasis models have shown to be capable of increasing the prediction accuracy while their high computational load is challenging. In this study, we investigated the predictive ability obtained in additive and epistasis models when utilizing haplotype blocks versus pruned sets of SNPs by including phenotypic information from the last growing season. This was done by considering a single biological trait in two growing seasons (2017 and 2018) as separate traits in a multi-trait model. Thus, bivariate variants of the Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) and selective Epistatic Random Regression BLUP (sERRBLUP) as epistasis models were compared with respect to their prediction accuracies for the second year. The prediction accuracies of bivariate GBLUP, ERRBLUP and sERRBLUP were assessed with eight phenotypic traits for 471/402 doubled haploid lines in the European maize landrace Kemater Landmais Gelb/Petkuser Ferdinand Rot. The results indicate that the obtained prediction accuracies are similar when utilizing a pruned set of SNPs or haplotype blocks, while utilizing haplotype blocks reduces the computational load significantly compared to the pruned sets of SNPs. The number of interactions considered in the model was reduced from 323.5/456.4 million for the pruned SNP panel to 4.4/5.5 million in the haplotype block dataset for Kemater and Petkuser landraces, respectively. Since the computational load scales linearly with the number of parameters in the model, this leads to a reduction in computational time of 98.9% from 13.5 hours for the pruned set of markers to 9 minutes for the haplotype block dataset. We further investigated the impact of genomic correlation, phenotypic correlation and trait heritability as factors affecting the bivariate models' prediction accuracy, identifying the genomic correlation between years as the most influential one. As computational load is substantially reduced, while the accuracy of genomic prediction is unchanged, the here proposed framework to use haplotype blocks in sERRBLUP provided a solution for the practical implementation of sERRBLUP in real breeding programs. Furthermore, our results indicate that sERRBLUP is not only suitable for prediction across different locations, but also for the prediction across growing seasons.

Assuntos

Modelos Genéticos , Melhoramento Vegetal , Haplótipos , Genoma , Genômica/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , Genótipo

10.

Imputation of low-density marker chip data in plant breeding: Evaluation of methods based on sugar beet.

Niehoff, Tobias; Pook, Torsten; Gholami, Mahmood; Beissinger, Timothy.

Plant Genome ; 15(4): e20257, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36258672

RESUMO

Low-density genotyping followed by imputation reduces genotyping costs while still providing high-density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet (Beta vulgaris L. ssp. vulgaris) as an example crop, where these are realistic marker numbers for modern breeding applications. The generally accepted 'gold standard' for imputation, Beagle 5.1, was compared with the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low-density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation whereas Beagle was better for phasing. Combining both tools yielded the highest accuracies.

Assuntos

Beta vulgaris , Cães , Animais , Beta vulgaris/genética , Genótipo , Melhoramento Vegetal , Polimorfismo de Nucleotídeo Único , Açúcares

11.

Development and validation of a horse reference panel for genotype imputation.

Reich, Paula; Falker-Gieske, Clemens; Pook, Torsten; Tetens, Jens.

Genet Sel Evol ; 54(1): 49, 2022 Jul 04.

Artigo em Inglês | MEDLINE | ID: mdl-35787788

RESUMO

BACKGROUND: Genotype imputation is a cost-effective method to generate sequence-level genotypes for a large number of animals. Its application can improve the power of genomic studies, provided that the accuracy of imputation is sufficiently high. The purpose of this study was to develop an optimal strategy for genotype imputation from genotyping array data to sequence level in German warmblood horses, and to investigate the effect of different factors on the accuracy of imputation. Publicly available whole-genome sequence data from 317 horses of 46 breeds was used to conduct the analyses. RESULTS: Depending on the size and composition of the reference panel, the accuracy of imputation from medium marker density (60K) to sequence level using the software Beagle 5.1 ranged from 0.64 to 0.70 for horse chromosome 3. Generally, imputation accuracy increased as the size of the reference panel increased, but if genetically distant individuals were included in the panel, the accuracy dropped. Imputation was most precise when using a reference panel of multiple but related breeds and the software Beagle 5.1, which outperformed the other two tested computer programs, Impute 5 and Minimac 4. Genome-wide imputation for this scenario resulted in a mean accuracy of 0.66. Stepwise imputation from 60K to 670K markers and subsequently to sequence level did not improve the accuracy of imputation. However, imputation from higher density (670K) was considerably more accurate (about 0.90) than from medium density. Likewise, imputation in genomic regions with a low marker coverage resulted in a reduced accuracy of imputation. CONCLUSIONS: The accuracy of imputation in horses was influenced by the size and composition of the reference panel, the marker density of the genotyping array, and the imputation software. Genotype imputation can be used to extend the limited amount of available sequence-level data from horses in order to boost the power of downstream analyses, such as genome-wide association studies, or the detection of embryonic lethal variants.

Assuntos

Estudo de Associação Genômica Ampla , Genômica , Animais , Cães , Genótipo , Cavalos/genética , Registros , Software

12.

Newly Developed MAGIC Population Allows Identification of Strong Associations and Candidate Genes for Anthocyanin Pigmentation in Eggplant.

Mangino, Giulio; Arrones, Andrea; Plazas, Mariola; Pook, Torsten; Prohens, Jaime; Gramazio, Pietro; Vilanova, Santiago.

Front Plant Sci ; 13: 847789, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35330873

RESUMO

Multi-parent advanced generation inter-cross (MAGIC) populations facilitate the genetic dissection of complex quantitative traits in plants and are valuable breeding materials. We report the development of the first eggplant MAGIC population (S3 Magic EGGplant InCanum, S3MEGGIC; 8-way), constituted by the 420 S3 individuals developed from the intercrossing of seven cultivated eggplant (Solanum melongena) and one wild relative (S. incanum) parents. The S3MEGGIC recombinant population was genotyped with the eggplant 5k probes SPET platform and phenotyped for anthocyanin presence in vegetative plant tissues (PA) and fruit epidermis (FA), and for the light-insensitive anthocyanic pigmentation under the calyx (PUC). The 7,724 filtered high-confidence single-nucleotide polymorphisms (SNPs) confirmed a low residual heterozygosity (6.87%), a lack of genetic structure in the S3MEGGIC population, and no differentiation among subpopulations carrying a cultivated or wild cytoplasm. Inference of haplotype blocks of the nuclear genome revealed an unbalanced representation of the founder genomes, suggesting a cryptic selection in favour or against specific parental genomes. Genome-wide association study (GWAS) analysis for PA, FA, and PUC detected strong associations with two myeloblastosis (MYB) genes similar to MYB113 involved in the anthocyanin biosynthesis pathway, and with a COP1 gene which encodes for a photo-regulatory protein and may be responsible for the PUC trait. Evidence was found of a duplication of an ancestral MYB113 gene with a translocation from chromosome 10 to chromosome 1 compared with the tomato genome. Parental genotypes for the three genes were in agreement with the identification of the candidate genes performed in the S3MEGGIC population. Our new eggplant MAGIC population is the largest recombinant population in eggplant and is a powerful tool for eggplant genetics and breeding studies.

13.

Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks.

Pook, Torsten; Nemri, Adnane; Gonzalez Segovia, Eric Gerardo; Valle Torres, Daniel; Simianer, Henner; Schoen, Chris-Carolin.

PLoS Genet ; 17(12): e1009944, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34941872

RESUMO

High-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing genotyping technologies when resources are limited. In this work, we are proposing a new imputation pipeline ("HBimpute") that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and subsequently use the reads of all locally similar lines in the variant calling for a specific line. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced at 0.5X read-depth. The overall imputing error rates are cut in half compared to state-of-the-art software like BEAGLE and STITCH, while the average read-depth is increased to 83X, thus enabling the calling of copy number variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance of sequence data in common breeding applications to that of genomic data generated with a genotyping array. For both genome-wide association studies and genomic prediction, results are on par or even slightly better than results obtained with high-density array data (600k). In particular for genomic prediction, we observe slightly higher data quality for the sequence data compared to the 600k array in the form of higher prediction accuracies. This occurred specifically when reducing the data panel to the set of overlapping markers between sequence and array, indicating that sequencing data can benefit from the same marker ascertainment as used in the array process to increase the quality and usability of genomic data.

Assuntos

Estudo de Associação Genômica Ampla/normas , Técnicas de Genotipagem , Haplótipos/genética , Software , Variações do Número de Cópias de DNA/genética , Genoma/genética , Genômica/métodos , Genótipo , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do Genoma , Zea mays/genética

14.

Accounting for epistasis improves genomic prediction of phenotypes with univariate and bivariate models across environments.

Vojgani, Elaheh; Pook, Torsten; Martini, Johannes W R; Hölker, Armin C; Mayer, Manfred; Schön, Chris-Carolin; Simianer, Henner.

Theor Appl Genet ; 134(9): 2913-2930, 2021 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-34115154

RESUMO

KEY MESSAGE: The accuracy of genomic prediction of phenotypes can be increased by including the top-ranked pairwise SNP interactions into the prediction model. We compared the predictive ability of various prediction models for a maize dataset derived from 910 doubled haploid lines from two European landraces (Kemater Landmais Gelb and Petkuser Ferdinand Rot), which were tested at six locations in Germany and Spain. The compared models were Genomic Best Linear Unbiased Prediction (GBLUP) as an additive model, Epistatic Random Regression BLUP (ERRBLUP) accounting for all pairwise SNP interactions, and selective Epistatic Random Regression BLUP (sERRBLUP) accounting for a selected subset of pairwise SNP interactions. These models have been compared in both univariate and bivariate statistical settings for predictions within and across environments. Our results indicate that modeling all pairwise SNP interactions into the univariate/bivariate model (ERRBLUP) is not superior in predictive ability to the respective additive model (GBLUP). However, incorporating only a selected subset of interactions with the highest effect variances in univariate/bivariate sERRBLUP can increase predictive ability significantly compared to the univariate/bivariate GBLUP. Overall, bivariate models consistently outperform univariate models in predictive ability. Across all studied traits, locations and landraces, the increase in prediction accuracy from univariate GBLUP to univariate sERRBLUP ranged from 5.9 to 112.4 percent, with an average increase of 47 percent. For bivariate models, the change ranged from -0.3 to + 27.9 percent comparing the bivariate sERRBLUP to the bivariate GBLUP, with an average increase of 11 percent. This considerable increase in predictive ability achieved by sERRBLUP may be of interest for "sparse testing" approaches in which only a subset of the lines/hybrids of interest is observed at each location.

Assuntos

Cromossomos de Plantas/genética , Meio Ambiente , Epistasia Genética , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas , Zea mays/genética , Mapeamento Cromossômico/métodos , Polimorfismo de Nucleotídeo Único

15.

How imputation can mitigate SNP ascertainment Bias.

Geibel, Johannes; Reimer, Christian; Pook, Torsten; Weigend, Steffen; Weigend, Annett; Simianer, Henner.

BMC Genomics ; 22(1): 340, 2021 May 12.

Artigo em Inglês | MEDLINE | ID: mdl-33980139

RESUMO

BACKGROUND: Population genetic studies based on genotyped single nucleotide polymorphisms (SNPs) are influenced by a non-random selection of the SNPs included in the used genotyping arrays. The resulting bias in the estimation of allele frequency spectra and population genetics parameters like heterozygosity and genetic distances relative to whole genome sequencing (WGS) data is known as SNP ascertainment bias. Full correction for this bias requires detailed knowledge of the array design process, which is often not available in practice. This study suggests an alternative approach to mitigate ascertainment bias of a large set of genotyped individuals by using information of a small set of sequenced individuals via imputation without the need for prior knowledge on the array design. RESULTS: The strategy was first tested by simulating additional ascertainment bias with a set of 1566 chickens from 74 populations that were genotyped for the positions of the Affymetrix Axiom™ 580 k Genome-Wide Chicken Array. Imputation accuracy was shown to be consistently higher for populations used for SNP discovery during the simulated array design process. Reference sets of at least one individual per population in the study set led to a strong correction of ascertainment bias for estimates of expected and observed heterozygosity, Wright's Fixation Index and Nei's Standard Genetic Distance. In contrast, unbalanced reference sets (overrepresentation of populations compared to the study set) introduced a new bias towards the reference populations. Finally, the array genotypes were imputed to WGS by utilization of reference sets of 74 individuals (one per population) to 98 individuals (additional commercial chickens) and compared with a mixture of individually and pooled sequenced populations. The imputation reduced the slope between heterozygosity estimates of array data and WGS data from 1.94 to 1.26 when using the smaller balanced reference panel and to 1.44 when using the larger but unbalanced reference panel. This generally supported the results from simulation but was less favorable, advocating for a larger reference panel when imputing to WGS. CONCLUSIONS: The results highlight the potential of using imputation for mitigation of SNP ascertainment bias but also underline the need for unbiased reference sets.

Assuntos

Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Animais , Galinhas/genética , Frequência do Gene , Genótipo

16.

How array design creates SNP ascertainment bias.

Geibel, Johannes; Reimer, Christian; Weigend, Steffen; Weigend, Annett; Pook, Torsten; Simianer, Henner.

PLoS One ; 16(3): e0245178, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33784304

RESUMO

Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.

Assuntos

Galinhas/genética , Polimorfismo de Nucleotídeo Único , Algoritmos , Animais , Bases de Dados Genéticas , Frequência do Gene , Genética Populacional

17.

MoBPSweb: A web-based framework to simulate and compare breeding programs.

Pook, Torsten; Büttgen, Lisa; Ganesan, Amudha; Ha, Ngoc-Thuy; Simianer, Henner.

G3 (Bethesda) ; 11(2)2021 02 09.

Artigo em Inglês | MEDLINE | ID: mdl-33712818

RESUMO

In this study, we introduce a new web-based simulation framework ("MoBPSweb") that combines a unified language to describe breeding programs with the simulation software MoBPS, standing for "Modular Breeding Program Simulator." Thereby, MoBPSweb provides a flexible environment to log, simulate, evaluate, and compare breeding programs. Inputs can be provided via modules ranging from a Vis.js-based environment for "drawing" the breeding program to a variety of modules to provide phenotype information, economic parameters, and other relevant information. Similarly, results of the simulation study can be extracted and compared to other scenarios via output modules (e.g., observed phenotypes, the accuracy of breeding value estimation, inbreeding rates), while all simulations and downstream analysis are executed in the highly efficient R-package MoBPS.

Assuntos

Endogamia , Software , Simulação por Computador , Internet , Modelos Genéticos , Fenótipo

18.

Phenotype Prediction Under Epistasis.

Vojgani, Elaheh; Pook, Torsten; Simianer, Henner.

Methods Mol Biol ; 2212: 105-120, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33733353

RESUMO

Reliable methods of phenotype prediction from genomic data play an increasingly important role in many areas of plant and animal breeding. Thus, developing methods that enhance prediction accuracy is of major interest. Here, we provide three methods for this purpose: (1) Genomic Best Linear Unbiased Prediction (GBLUP) as a model just accounting for additive SNP effects; (2) Epistatic Random Regression BLUP (ERRBLUP) as a full epistatic model which incorporates all pairwise SNP interactions, and (3) selective Epistatic Random Regression BLUP (sERRBLUP) as an epistatic model which incorporates a subset of pairwise SNP interactions selected based on their absolute effect sizes or the effect variances, which is computed based on solutions from the ERRBLUP model. We compared the predictive ability obtained from GBLUP, ERRBLUP, and sERRBLUP with genotypes from a publicly available wheat dataset and respective simulated phenotypes. Results showed that sERRBLUP provides a substantial increase in prediction accuracy compared to the other methods when the optimal proportion of SNP interactions is kept in the model, especially when an optimal proportion of SNP interactions is selected based on the SNP interaction effect sizes. All methods described here are implemented in the R-package EpiGP, which is able to process large-scale genomic data in a computationally efficient way.

Assuntos

Epistasia Genética , Modelos Genéticos , Modelos Estatísticos , Fenótipo , Característica Quantitativa Herdável , Triticum/genética , Conjuntos de Dados como Assunto , Estudos de Associação Genética , Genótipo , Heterozigoto , Melhoramento Vegetal/métodos , Tumores de Planta/genética , Tumores de Planta/microbiologia , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Triticum/anatomia & histologia , Triticum/metabolismo

19.

A unifying concept of animal breeding programmes.

Simianer, Henner; Büttgen, Lisa; Ganesan, Amudha; Ha, Ngoc Thuy; Pook, Torsten.

J Anim Breed Genet ; 138(2): 137-150, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33486850

RESUMO

Modern animal breeding programmes are constantly evolving with advances in breeding theory, biotechnology and genetics. Surprisingly, there seems to be no generally accepted succinct definition of what exactly a breeding programme is, neither is there a unified language to describe breeding programmes in a comprehensive, unambiguous and reproducible way. In this work, we try to fill this gap by suggesting a general definition of breeding programmes that also pertains to cases where genetic progress is not achieved through selection, but, for example, through transgenic technologies, or the aim is not to generate genetic progress, but, for example, to maintain genetic diversity. The key idea of the underlying concept is to represent a breeding programme in modular form as a directed graph that is composed of nodes and edges, where nodes represent cohorts of breeding units, usually individuals, and edges represent breeding activities, like "selection" or "reproduction." We claim, that by defining a comprehensive set of nodes and edges, it is possible to represent any breeding programme of arbitrary complexity by such a graph, which thus comprises a full description of the breeding programme. This concept is implemented in a web-based tool (MoBPSweb, available at www.mobps.de) and has a link to the R-package MoBPS (Modular Breeding Program Simulator) to simulate the described breeding programmes. The approach is illustrated by showcasing three different breeding programmes of increasing complexity. The concept allows a formal description of breeding programmes, which is requested, for example, in legal regulations of the European Union, but so far cannot be provided in a standardized format. In the discussion, we point out potential limitations of the concept and argue that the general approach can be easily extended to account for novel breeding technologies, to breeding of crops or experimental species, but also to modelling diversity dynamics in natural populations.

Assuntos

Melhoramento Vegetal , Reprodução , Animais , Biotecnologia , Produtos Agrícolas

20.

ANOVA-HD: Analysis of variance when both input and output layers are high-dimensional.

de Los Campos, Gustavo; Pook, Torsten; Gonzalez-Reymundez, Agustin; Simianer, Henner; Mias, George; Vazquez, Ana I.

PLoS One ; 15(12): e0243251, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33315963

RESUMO

Modern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns. We propose and evaluate two methods to determine the proportion of variance of an output data set that can be explained by an input data set when both data panels are high dimensional. Our approach uses random-effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on an orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. Using simulations, we show that the MC-ANOVA method gave nearly unbiased estimates. Estimates produced by Eigen-ANOVA were also nearly unbiased, except when the shared variance was very high (e.g., >0.9). We demonstrate the potential insight that can be obtained from the use of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken (Gallus gallus) genomes and to the assessment of inter-dependencies between gene expression, methylation, and copy-number-variants in data from breast cancer tumors from humans (Homo sapiens). Our analyses reveal that in chicken breeding populations ~50,000 evenly-spaced SNPs are enough to fully capture the span of whole-genome-sequencing genomes. In the study of multi-omic breast cancer data, we found that the span of copy-number-variants can be fully explained using either methylation or gene expression data and that roughly 74% of the variance in gene expression can be predicted from methylation data.

Assuntos

Genômica/métodos , Análise de Variância , Animais , Neoplasias da Mama/genética , Galinhas/genética , Variações do Número de Cópias de DNA , Metilação de DNA , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Desequilíbrio de Ligação , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA