|

1.

Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation-maximization maximum likelihood and increase of relationships.

Legarra, Andres; Bermann, Matias; Mei, Quanshun; Christensen, Ole F.

Genet Sel Evol ; 56(1): 35, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698347

BACKGROUND: The theory of "metafounders" proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders Γ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). METHODS: We derive likelihood methods to estimate Γ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of Γ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to Γ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of Γ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of Γ using estimates of the rate of increase of inbreeding ( Δ F ), resulting in an expanded Γ and in a pseudo-EM+ Δ F algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. RESULTS: For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ Δ F ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ Δ F ) approach yielded more accurate and unbiased estimates. CONCLUSIONS: We derived ML, pseudo-EM and pseudo-EM+ Δ F methods to estimate Γ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.

Breeding , Models, Genetic , Pedigree , Animals , Likelihood Functions , Breeding/methods , Algorithms , Sheep/genetics , Genomics/methods , Computer Simulation , Male , Female , Genotype

2.

Redefining and interpreting genomic relationships of metafounders.

Legarra, Andres; Bermann, Matias; Mei, Quanshun; Christensen, Ole F.

Genet Sel Evol ; 56(1): 34, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698373

Metafounders are a useful concept to characterize relationships within and across populations, and to help genetic evaluations because they help modelling the means and variances of unknown base population animals. Current definitions of metafounder relationships are sensitive to the choice of reference alleles and have not been compared to their counterparts in population genetics-namely, heterozygosities, FST coefficients, and genetic distances. We redefine the relationships across populations with an arbitrary base of a maximum heterozygosity population in Hardy-Weinberg equilibrium. Then, the relationship between or within populations is a cross-product of the form Γ b , b ' = 2 n 2 p b - 1 2 p b ' - 1 ' with p being vectors of allele frequencies at n markers in populations b and b ' . This is simply the genomic relationship of two pseudo-individuals whose genotypes are equal to twice the allele frequencies. We also show that this coding is invariant to the choice of reference alleles. In addition, standard population genetics metrics (inbreeding coefficients of various forms; FST differentiation coefficients; segregation variance; and Nei's genetic distance) can be obtained from elements of matrix Γ .

Gene Frequency , Genetics, Population , Models, Genetic , Animals , Genetics, Population/methods , Heterozygote , Alleles , Genomics/methods , Genotype , Genome

3.

Confidence intervals for validation statistics with data truncation in genomic prediction.

Bermann, Matias; Legarra, Andres; Munera, Alejandra Alvarez; Misztal, Ignacy; Lourenco, Daniela.

Genet Sel Evol ; 56(1): 18, 2024 Mar 08.

Article En | MEDLINE | ID: mdl-38459504

BACKGROUND: Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of "early" and "late" EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method. RESULTS: We derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping. CONCLUSIONS: Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.

Genomics , Models, Genetic , Humans , Genotype , Reproducibility of Results , Confidence Intervals , Pedigree , Genomics/methods , Phenotype

4.

Derivation of indirect predictions using genomic recursions across generations in a broiler population.

Hidalgo, Jorge; Lourenco, Daniela; Tsuruta, Shogo; Bermann, Matias; Breen, Vivian; Misztal, Ignacy.

J Anim Sci ; 1012023 Jan 03.

Article En | MEDLINE | ID: mdl-37837636

Genomic estimated breeding values (GEBV) of animals without phenotypes can be indirectly predicted using recursions on GEBV of a subset. To maximize predictive ability of indirect predictions (IP), the subset must represent the independent chromosome segments segregating in the population. We aimed to 1) determine the number of animals needed in recursions to maximize predictive ability, 2) evaluate equivalency IP-GEBV, and 3) investigate trends in predictive ability of IP derived from recent vs. distant generations or accumulating phenotypes from recent to past generations. Data comprised pedigree of 825K birds hatched over 12 overlapping generations, phenotypes for body weight (BW; 820K), residual feed intake (RF; 200K) and weight gain during a trial period (WG; 200K), and breast meat percent (BP; 43K). A total of 154K birds (last six generations) had genotypes. The number of animals that maximize predictive ability was assessed based on the number of largest eigenvalues explaining 99% of variation in the genomic relationship matrix (1Meâ=â7,131), twice (2Me), or a fraction of this number (i.e., 0.75, 0.50, or 0.25Me). Equivalency between IP and GEBV was measured by correlating these two sets of predictions. GEBV were obtained as if generation 12 (validation animals) was part of the evaluation. IP were derived from GEBV of animals from generations 8 to 11 or generations 11, 10, 9, or 8. IP predictive ability was defined as the correlation between IP and adjusted phenotypes. The IP predictive ability increased from 0.25Me to 1Me (11%, on average); the change from 1Me to 2Me was negligible (0.6%). The correlation IP-GEBV was the same when IP were derived from a subset of 1Me animals chosen randomly across generations (8 to 11) or from generation 11 (0.98 for BW, 0.99 for RF, WG, and BP). A marginal decline in the correlation was observed when IP were based on GEBV of animals from generation 8 (0.95 for BW, 0.98 for RF, WG, and BP). Predictive ability had a similar trend; from generation 11 to 8, it changed from 0.32 to 0.31 for BW, from 0.39 to 0.38 for BP, and was constant at 0.33(0.22) for RF(WG). Predictive ability had a slight to moderate increase accumulating up to four generations of phenotypes. 1Me animals provide accurate IP, equivalent to GEBV. A minimum decay in predictive ability is observed when IP are derived from GEBV of animals from four generations back, possibly because of strong selection or the model not being completely additive.

Genomic estimated breeding values (GEBV) of genotyped animals without phenotypes can be obtained by indirect predictions (IP) using recursions on GEBV from a subset. Our objectives were to 1) evaluate the number of animals needed in recursions to maximize predictive ability, 2) assess equivalency between IP and GEBV, and 3) investigate trends in predictive ability of IP derived from recent vs. distant generations or accumulating phenotypes from recent to past generations. The number of animals (7,131) in the recursions that provided high-predictive ability was equal to the number of largest eigenvalues explaining 99% of variation in the genomic relationship matrix. IP and GEBV were equivalent (correlationâ≥â0.98). IP predictive ability was similar when recursions were based on animals from recent or distant generations; it marginally decayed with animals from four generations apart. The decline in predictive ability can be explained by strong selection or the model not being fully additive. A slight to moderate increase in IP predictive ability was observed accumulating up to four generations of phenotypes. If GEBV of animals in the subset chosen for recursions are estimated using sufficient data, animals can be from up to four generations back without significant loss in predictive ability.

Chickens , Models, Genetic , Animals , Chickens/genetics , Genome , Genomics , Genotype , Phenotype , Pedigree

5.

Boundaries for genotype, phenotype, and pedigree truncation in genomic evaluations in pigs.

Bussiman, Fernando; Chen, Ching-Yi; Holl, Justin; Bermann, Matias; Legarra, Andres; Misztal, Ignacy; Lourenco, Daniela.

J Anim Sci ; 1012023 Jan 03.

Article En | MEDLINE | ID: mdl-37584978

Historical data collection for genetic evaluation purposes is a common practice in animal populations; however, the larger the dataset, the higher the computing power needed to perform the analyses. Also, fitting the same model to historical and recent data may be inappropriate. Data truncation can reduce the number of equations to solve, consequently decreasing computing costs; however, the large volume of genotypes is responsible for most of the increase in computations. This study aimed to assess the impact of removing genotypes along with phenotypes and pedigree on the computing performance, reliability, and inflation of genomic predicted breeding value (GEBV) from single-step genomic best linear unbiased predictor for selection candidates. Data from two pig lines, a terminal sire (L1) and a maternal line (L2), were analyzed in this study. Four analyses were implemented: growth and "weaning to finish" mortality on L1, pre-weaning and reproductive traits on L2. Four genotype removal scenarios were proposed: removing genotyped animals without phenotypes and progeny (noInfo), removing genotyped animals based on birth year (Age), the combination of noInfo and Age scenarios (noInfoâ+âAge), and no genotype removal (AllGen). In all scenarios, phenotypes were removed, based on birth year, and three pedigree depths were tested: two and three generations traced back and using the entire pedigree. The full dataset contained 1,452,257 phenotypes for growth traits, 324,397 for weaning to finish mortality, 517,446 for pre-weaning traits, and 7,853,629 for reproductive traits in pure and crossbred pigs. Pedigree files for lines L1 and L2 comprised 3,601,369 and 11,240,865 animals, of which 168,734 and 170,121 were genotyped, respectively. In each truncation scenario, the linear regression method was used to assess the reliability and dispersion of GEBV for genotyped parents (born after 2019). The number of years of data that could be removed without harming reliability depended on the number of records, type of analyses (multitrait vs. single trait), the heritability of the trait, and data structure. All scenarios had similar reliabilities, except for noInfo, which performed better in the growth analysis. Based on the data used in this study, considering the last ten years of phenotypes, tracing three generations back in the pedigree, and removing genotyped animals not contributing own or progeny phenotypes, increases computing efficiency with no change in the ability to predict breeding values.

Recording data for long years is common in animal breeding and genetics. However, the larger the data, the higher the computing cost of the analysis, especially with genomic information. This study aimed to investigate the impact of removing data, namely, genotypes, phenotypes, and pedigree, on the computing performance and prediction ability of genomic breeding values. We tested four scenarios to remove genotyped individuals in pig populations. For each scenario, phenotypes were removed according to birth year, and the pedigree was either kept complete or traced back from two to three generations. Reliabilities for young, genotyped animals did not differ after removing genotypes for older or less important animals. However, using only two generations of data slightly reduces the reliability for young, genotyped animals. The dispersion did not change across the studied scenarios, and its worst value was observed when using only one generation in the pedigree. Using the last ten years of phenotypes, a pedigree depth of three generations, and removing genotyped animals not contributing own or progeny phenotypes reduces computing cost with no change in the ability to predict breeding values.

Genomics , Models, Genetic , Animals , Swine/genetics , Pedigree , Reproducibility of Results , Phenotype , Genomics/methods

6.

Efficient ways to combine data from broiler and layer chickens to account for sequential genomic selection.

Hidalgo, Jorge; Lourenco, Daniela; Tsuruta, Shogo; Bermann, Matias; Breen, Vivian; Herring, William; Misztal, Ignacy.

J Anim Sci ; 1012023 Jan 03.

Article En | MEDLINE | ID: mdl-37249185

In broiler breeding, superior individuals for growth become parents and are later evaluated for reproduction in an independent evaluation; however, ignoring broiler data can produce inaccurate and biased predictions. This research aimed to determine the most accurate, unbiased, and time-efficient approach for jointly evaluating reproductive and broiler traits. The data comprised a pedigree with 577K birds, 146K genotypes, phenotypes for three reproductive (egg production [EP], fertility [FE], hatch of fertile eggs [HF]; 9K each), and four broiler traits (body weight [BW], breast meat percent [BP], fat percent [FP], residual feed intake [RF]; up to 467K). Broiler data were added sequentially to assess the impact on the quality of predictions for reproductive traits. The baseline scenario (RE) included pedigrees, genotypes, and phenotypes for reproductive traits of selected animals; in RE2, we added their broiler phenotypes; in RE_BR, broiler phenotypes of nonselected animals, and in RE_BR_GE, their genotypes. We computed accuracy, bias, and dispersion of predictions for hens from the last two breeding cycles and their sires. We tested three core definitions for the algorithm of proven and young to find the most time-efficient approach: two random cores with 7K and 12K animals and one with 19K animals, containing parents and young animals. From RE to RE_BR_GE, changes in accuracy were null or minimal for EP (0.51 in hens, 0.59 in roosters) and HF (0.47 in hens, 0.49 in roosters); for FE in hens (roosters), it changed from 0.4 (0.49) to 0.47 (0.53). In hens (roosters), bias (additive SD units) decreased from 0.69 (0.7) to 0.04 (0.05) for EP, 1.48 (1.44) to 0.11 (0.03) for FE, and 1.06 (0.96) to 0.09 (0.02) for HF. Dispersion remained stable in hens (roosters) at ~0.93 (~1.03) for EP, and it improved from 0.57 (0.72) to 0.87 (1.0) for FE and from 0.8 (0.79) to 0.88 (0.87) for HF. Ignoring broiler data deteriorated the predictions' quality. The impact was significant for the low heritability trait (0.02; FE); bias (up to 1.5) and dispersion (as low as 0.57) were farther from the ideal value, and accuracy losses were up to 17.5%. Accuracy was maintained in traits with moderate heritability (~0.3; EP and HF), and bias and dispersion were less substantial. Adding information from the broiler phase maximized accuracy and unbiased predictions. The most time-efficient approach is a random core with 7K animals in the algorithm for proven and young.

In breeding programs with sequential selection, the estimation of breeding values becomes biased and inaccurate if the information from the past selection is ignored. We investigated the impact of incorporating broiler data (traits for past selection) into the evaluation of broiler reproductive traits. Including all the information increased the computing demands; therefore, we tested three core definitions for the algorithm for proven and young to determine the most accurate, unbiased, and time-efficient approach for jointly evaluating broiler and reproductive traits. When we ignored broiler data, the estimated breeding values for reproductive traits were biased (up to ~1.5 additive standard deviations). For low heritability traits, accuracy was reduced by up to 17.5%, and breeding values were overestimated (dispersion ~ 0.6). In contrast, incorporating broiler data eliminated bias and overestimation; and it maximized accuracy. A random core definition for the algorithm for proven and young with a number of animals equal to the number of the largest eigenvalues explaining 99% of the variation in the genomic relationship matrix is the most time-efficient, keeping accurate and unbiased predictions in the joint evaluation of broiler and reproductive traits.

Chickens , Ovum , Animals , Female , Male , Chickens/genetics , Genome , Genomics , Genotype , Phenotype , Pedigree , Models, Genetic

7.

Reliabilities of estimated breeding values in models with metafounders.

Bermann, Matias; Aguilar, Ignacio; Lourenco, Daniela; Misztal, Ignacy; Legarra, Andres.

Genet Sel Evol ; 55(1): 6, 2023 Jan 23.

Article En | MEDLINE | ID: mdl-36690938

BACKGROUND: Reliabilities of best linear unbiased predictions (BLUP) of breeding values are defined as the squared correlation between true and estimated breeding values and are helpful in assessing risk and genetic gain. Reliabilities can be computed from the prediction error variances for models with a single base population but are undefined for models that include several base populations and when unknown parent groups are modeled as fixed effects. In such a case, the use of metafounders in principle enables reliabilities to be derived. METHODS: We propose to compute the reliability of the contrast of an individual's estimated breeding value with that of a metafounder based on the prediction error variances of the individual and the metafounder, their prediction error covariance, and their genetic relationship. Computation of the required terms demands only little extra work once the sparse inverse of the mixed model equations is obtained, or they can be approximated. This also allows the reliabilities of the metafounders to be obtained. We studied the reliabilities for both BLUP and single-step genomic BLUP (ssGBLUP), using several definitions of reliability in a large dataset with 1,961,687 dairy sheep and rams, most of which had phenotypes and among which 27,000 rams were genotyped with a 50K single nucleotide polymorphism (SNP) chip. There were 23 metafounders with progeny sizes between 100,000 and 2000 individuals. RESULTS: In models with metafounders, directly using the prediction error variance instead of the contrast with a metafounder leads to artificially low reliabilities because they refer to a population with maximum heterozygosity. When only one metafounder is fitted in the model, the reliability of the contrast is shown to be equivalent to the reliability of the individual in a model without metafounders. When there are several metafounders in the model, using a contrast with the oldest metafounder yields reliabilities that are on a meaningful scale and very close to reliabilities obtained from models without metafounders. The reliabilities using contrasts with ssGBLUP also resulted in meaningful values. CONCLUSIONS: This work provides a general method to obtain reliabilities for both BLUP and ssGBLUP when several base populations are included through metafounders.

Genome , Models, Genetic , Animals , Male , Sheep , Reproducibility of Results , Genotype , Genomics/methods , Phenotype , Pedigree

8.

Implication of the order of blending and tuning when computing the genomic relationship matrix in single-step GBLUP.

McWhorter, Taylor M; Bermann, Matias; Garcia, Andre L S; Legarra, Andrés; Aguilar, Ignacio; Misztal, Ignacy; Lourenco, Daniela.

J Anim Breed Genet ; 140(1): 60-78, 2023 Jan.

Article En | MEDLINE | ID: mdl-35946919

Single-step genomic BLUP (ssGBLUP) relies on the combination of the genomic ( G $$ \mathbf{G} $$ ) and pedigree relationship matrices for all ( A $$ \mathbf{A} $$ ) and genotyped ( A 22 $$ {\mathbf{A}}_{22} $$ ) animals. The procedure ensures G $$ \mathbf{G} $$ and A 22 $$ {\mathbf{A}}_{22} $$ are compatible so that both matrices refer to the same genetic base ('tuning'). Then G $$ \mathbf{G} $$ is combined with a proportion of A 22 $$ {\mathbf{A}}_{22} $$ ('blending') to avoid singularity problems and to account for the polygenic component not accounted for by markers. This computational procedure has been implemented in the reverse order (blending before tuning) following the sequential research developments. However, blending before tuning may result in less optimal tuning because the blended matrix already contains a proportion of A 22 $$ {\mathbf{A}}_{22} $$ . In this study, the impact of 'tuning before blending' was compared with 'blending before tuning' on genomic estimated breeding values (GEBV), single nucleotide polymorphism (SNP) effects and indirect predictions (IP) from ssGBLUP using American Angus Association and Holstein Association USA, Inc. data. Two slightly different tuning methods were used; one that adjusts the mean diagonals and off-diagonals of G $$ \mathbf{G} $$ to be similar to those in A 22 $$ {\mathbf{A}}_{22} $$ and another one that adjusts based on the average difference between all elements of G $$ \mathbf{G} $$ and A 22 $$ {\mathbf{A}}_{22} $$ . Over 6 million Angus growth records and 5.9 million Holstein udder depth records were available. Genomic information was available on 51,478 Angus and 105,116 Holstein animals. Average realized relationship estimates among groups of animals were similar across scenarios. Scatterplots show that GEBV, SNP effects and IP did not noticeably change for all animals in the evaluation regardless of the order of computations and when using blending parameter of 0.05. Formulas were derived to determine the blending parameter that maximizes changes in the genomic relationship matrix and GEBV when changing the order of blending and tuning. Algebraically, the change is maximized when the blending parameter is equal to 0.5. Overall, tuning G $$ \mathbf{G} $$ before blending, regardless of blending parameter used, had a negligible impact on genomic predictions and SNP effects in this study.

Genomics , Animals

9.

Impact of blending the genomic relationship matrix with different levels of pedigree relationships or the identity matrix on genetic evaluations.

Hollifield, Mary Kate; Bermann, Matias; Lourenco, Daniela; Misztal, Ignacy.

JDS Commun ; 3(5): 343-347, 2022 Sep.

Article En | MEDLINE | ID: mdl-36340904

Evaluations using single-step genomic BLUP require blending the genomic relationship matrix (G) with a positive definite matrix to ensure nonsingularity for solving the mixed model equations. Many organizations blend G with a proportion of the numerator relationship matrix for genotyped animals (A 22) to improve stability and possibly add a residual polygenic effect. However, when nearly all the polygenic variance is explained by G, blending with A 22 may cause inflation and add excess computing time; thus, blending with an identity matrix (I) multiplied by a small value may be a better solution. The objective of this study was to evaluate changes in reliability and inflation of genomic estimated breeding values, convergence rate, elapsed wall-clock time for blending G with different levels of A 22 or I, and develop a more time-efficient blending method. A US Holstein cattle data set was used with 9.7 million animals in the pedigree, 569,404 animals with genotypes, and 10.1 million stature phenotypes. Blending G by adding a small value to the diagonal elements had comparable performance to A 22 with fewer rounds to convergence required to solve the system of equations. Reliability and inflation of genomic estimated breeding values ranged from 0.63 to 0.68 and 0.86 to 0.89 for all blending scenarios tested. The current blending default in the BLUPF90 software is to replace G with (1 - ß)G + ßA 22, where ß equals 0.05. In this study, ß values of 0.30, 0.20, 0.05, 0.01, 0.005, and 0.001 were evaluated with A 22 and I. Negligible differences in elapsed computing time between the blending types and levels were observed. Subsequently, the current blending algorithm used in the BLUPF90 family of programs was optimized, reducing the blending time from approximately 2 h to 5 min for A 22 and less than 1 s for I. The new time difference between blending with A 22 or I is negligible and not computationally critical. The results indicate that blending G with A 22 does not have clear advantages over blending with a small proportion of I.

10.

Accounting for population structure in genomic predictions of Eucalyptus globulus.

Callister, Andrew N; Bermann, Matias; Elms, Stephen; Bradshaw, Ben P; Lourenco, Daniela; Brawner, Jeremy T.

G3 (Bethesda) ; 12(9)2022 08 25.

Article En | MEDLINE | ID: mdl-35920792

Genetic groups have been widely adopted in tree breeding to account for provenance effects within pedigree-derived relationship matrices. However, provenances or genetic groups have not yet been incorporated into single-step genomic BLUP ("HBLUP") analyses of tree populations. To quantify the impact of accounting for population structure in Eucalyptus globulus, we used HBLUP to compare breeding value predictions from models excluding base population effects and models including either fixed genetic groups or the marker-derived proxies, also known as metafounders. Full-sib families from 2 separate breeding populations were evaluated across 13 sites in the "Green Triangle" region of Australia. Gamma matrices (Γ) describing similarities among metafounders reflected the geographic distribution of populations and the origins of 2 land races were identified. Diagonal elements of Γ provided population diversity or allelic covariation estimates between 0.24 and 0.56. Genetic group solutions were strongly correlated with metafounder solutions across models and metafounder effects influenced the genetic solutions of base population parents. The accuracy, stability, dispersion, and bias of model solutions were compared using the linear regression method. Addition of genomic information increased accuracy from 0.41 to 0.47 and stability from 0.68 to 0.71, while increasing bias slightly. Dispersion was within 0.10 of the ideal value (1.0) for all models. Although inclusion of metafounders did not strongly affect accuracy or stability and had mixed effects on bias, we nevertheless recommend the incorporation of metafounders in prediction models to represent the hierarchical genetic population structure of recently domesticated populations.

Eucalyptus , Eucalyptus/genetics , Genome , Genomics/methods , Genotype , Humans , Models, Genetic , Phenotype , Plant Breeding

11.

On the equivalence between marker effect models and breeding value models and direct genomic values with the Algorithm for Proven and Young.

Bermann, Matias; Lourenco, Daniela; Forneris, Natalia S; Legarra, Andres; Misztal, Ignacy.

Genet Sel Evol ; 54(1): 52, 2022 Jul 16.

Article En | MEDLINE | ID: mdl-35842585

BACKGROUND: Single-step genomic predictions obtained from a breeding value model require calculating the inverse of the genomic relationship matrix [Formula: see text]. The Algorithm for Proven and Young (APY) creates a sparse representation of [Formula: see text] with a low computational cost. APY consists of selecting a group of core animals and expressing the breeding values of the remaining animals as a linear combination of those from the core animals plus an error term. The objectives of this study were to: (1) extend APY to marker effects models; (2) derive equations for marker effect estimates when APY is used for breeding value models, and (3) show the implication of selecting a specific group of core animals in terms of a marker effects model. RESULTS: We derived a family of marker effects models called APY-SNP-BLUP. It differs from the classic marker effects model in that the row space of the genotype matrix is reduced and an error term is fitted for non-core animals. We derived formulas for marker effect estimates that take this error term in account. The prediction error variance (PEV) of the marker effect estimates depends on the PEV for core animals but not directly on the PEV of the non-core animals. We extended the APY-SNP-BLUP to include a residual polygenic effect and accommodate non-genotyped animals. We show that selecting a specific group of core animals is equivalent to select a subspace of the row space of the genotype matrix. As the number of core animals increases, subspaces corresponding to different sets of core animals tend to overlap, showing that random selection of core animals is algebraically justified. CONCLUSIONS: The APY-(ss)GBLUP models can be expressed in terms of marker effect models. When the number of core animals is equal to the rank of the genotype matrix, APY-SNP-BLUP is identical to the classic marker effects model. If the number of core animals is less than the rank of the genotype matrix, genotypes for non-core animals are imputed as a linear combination of the genotypes of the core animals. For estimating SNP effects, only relationships and estimated breeding values for core animals are needed.

Genome , Models, Genetic , Algorithms , Animals , Genomics , Genotype , Pedigree , Phenotype

12.

Efficient approximation of reliabilities for single-step genomic best linear unbiased predictor models with the Algorithm for Proven and Young.

Bermann, Matias; Lourenco, Daniela; Misztal, Ignacy.

J Anim Sci ; 100(1)2022 Jan 01.

Article En | MEDLINE | ID: mdl-34877603

The objectives of this study were to develop an efficient algorithm for calculating prediction error variances (PEVs) for genomic best linear unbiased prediction (GBLUP) models using the Algorithm for Proven and Young (APY), extend it to single-step GBLUP (ssGBLUP), and apply this algorithm for approximating the theoretical reliabilities for single- and multiple-trait models in ssGBLUP. The PEV with APY was calculated by block sparse inversion, efficiently exploiting the sparse structure of the inverse of the genomic relationship matrix with APY. Single-step GBLUP reliabilities were approximated by combining reliabilities with and without genomic information in terms of effective record contributions. Multi-trait reliabilities relied on single-trait results adjusted using the genetic and residual covariance matrices among traits. Tests involved two datasets provided by the American Angus Association. A small dataset (Data1) was used for comparing the approximated reliabilities with the reliabilities obtained by the inversion of the left-hand side of the mixed model equations. A large dataset (Data2) was used for evaluating the computational performance of the algorithm. Analyses with both datasets used single-trait and three-trait models. The number of animals in the pedigree ranged from 167,951 in Data1 to 10,213,401 in Data2, with 50,000 and 20,000 genotyped animals for single-trait and multiple-trait analysis, respectively, in Data1 and 335,325 in Data2. Correlations between estimated and exact reliabilities obtained by inversion ranged from 0.97 to 0.99, whereas the intercept and slope of the regression of the exact on the approximated reliabilities ranged from 0.00 to 0.04 and from 0.93 to 1.05, respectively. For the three-trait model with the largest dataset (Data2), the elapsed time for the reliability estimation was 11 min. The computational complexity of the proposed algorithm increased linearly with the number of genotyped animals and with the number of traits in the model. This algorithm can efficiently approximate the theoretical reliability of genomic estimated breeding values in ssGBLUP with APY for large numbers of genotyped animals at a low cost.

The estimated breeding value (EBV) of an animal measures its genetic merit. For calculating EBVs, pedigree and genomic information are jointly used in a procedure called single-step genomic best linear unbiased prediction (ssGBLUP). Genetic evaluations report each EBV with its reliability, which measures how accurate the breeding value estimation was. Calculating EBV with ssGBLUP for large datasets is computationally expensive; Therefore, the Algorithm for Proven and Young (APY) was developed to reduce its computational cost. However, the procedure for obtaining the reliabilities of EBV is still computationally unfeasible to apply. Thus, this study aimed to develop a new method for approximating reliabilities for ssGBLUP with APY for large datasets. We required this new method to be accurate and with fewer computational requirements than the estimation of breeding values by itself. The method that we develop consists of accumulating pedigree and genomic information in successive steps, allowing for computational efficiency. Using a dataset with more than 300,000 genotypes in a pedigree of 10,000,000 animals provided by the American Angus Association, we showed that our proposed method is accurate and computationally efficient, with a correlation of 0.98 between the approximated and target values running in less than 12 min.

Genome , Models, Genetic , Algorithms , Animals , Genomics , Genotype , Pedigree , Phenotype , Reproducibility of Results

13.

Impact of including the cause of missing records on genetic evaluations for growth in commercial pigs.

Hollifield, Mary Kate; Lourenco, Daniela; Tsuruta, Shogo; Bermann, Matias; Howard, Jeremy T; Misztal, Ignacy.

J Anim Sci ; 99(8)2021 Aug 01.

Article En | MEDLINE | ID: mdl-34343280

It is of interest to evaluate crossbred pigs for hot carcass weight (HCW) and birth weight (BW); however, obtaining a HCW record is dependent on livability (LIV) and retained tag (RT). The purpose of this study is to analyze how HCW evaluations are affected when herd removal and missing identification are included in the model and examine if accounting for the reasons for missing traits improves the accuracy of predicting breeding values. Pedigree information was available for 1,965,077 purebred and crossbred animals. Records for 503,716 commercial three-way crossbred terminal animals from 2014 to 2019 were provided by Smithfield Premium Genetics. Two pedigree-based models were compared; model 1 (M1) was a threshold-linear model with all four traits (BW, HCW, RT, and LIV), and model 2 (M2) was a linear model including only BW and HCW. The fixed effects used in the model were contemporary group, sex, age at harvest (for HCW only), and dam parity. The random effects included direct additive genetic and random litter effects. Accuracy, dispersion, bias, and Pearson correlations were estimated using the linear regression method. The heritabilities were 0.11, 0.07, 0.02, and 0.04 for BW, HCW, RT, and LIV, respectively, with standard errors less than 0.01. No difference was observed in heritabilities or accuracies for BW and HCW between M1 and M2. Accuracies were 0.33, 0.37, 0.19, and 0.23 for BW, HCW, RT, and LIV, respectively. The genetic correlation between BW and RT was 0.34 ± 0.03, and between BW and LIV was 0.56 ± 0.03. Similarly, the genetic correlation between HCW and RT was 0.26 ± 0.04, and between HCW and LIV was 0.09 ± 0.05, respectively. The positive and moderate genetic correlations between BW and other traits imply a heavier BW resulted in a higher probability of surviving to harvest. Genetic correlations between HCW and other traits were lower due to the large quantity of missing records. Despite the heritable and correlated aspects of RT and LIV, results imply no major differences between M1 and M2; hence, it is unnecessary to include these traits in classical models for BW and HCW.

Hybridization, Genetic , Models, Genetic , Animals , Birth Weight , Body Weight , Female , Parity , Pedigree , Phenotype , Pregnancy , Swine/genetics

14.

Investigating the persistence of accuracy of genomic predictions over time in broilers.

Hidalgo, Jorge; Lourenco, Daniela; Tsuruta, Shogo; Masuda, Yutaka; Breen, Vivian; Hawken, Rachel; Bermann, Matias; Misztal, Ignacy.

J Anim Sci ; 99(9)2021 Sep 01.

Article En | MEDLINE | ID: mdl-34378776

Accuracy of genomic predictions is an important component of the selection response. The objectives of this research were: 1) to investigate trends for prediction accuracies over time in a broiler population of accumulated phenotypes, genotypes, and pedigrees and 2) to test if data from distant generations are useful to maintain prediction accuracies in selection candidates. The data contained 820K phenotypes for a growth trait (GT), 200K for two feed efficiency traits (FE1 and FE2), and 42K for a carcass yield trait (CY). The pedigree included 1,252,619 birds hatched over 7 years, of which 154,318 from the last 4 years were genotyped. Training populations were constructed adding 1 year of data sequentially, persistency of accuracy over time was evaluated using predictions from birds hatched in the three generations following or in the years after the training populations. In the first generation, before genotypes became available for the training populations (first 3 years of data), accuracies remained almost stable with successive additions of phenotypes and pedigree to the accumulated dataset. The inclusion of 1 year of genotypes in addition to 4 years of phenotypes and pedigree in the training population led to increases in accuracy of 54% for GT, 76% for FE1, 110% for CY, and 38% for FE2; on average, 74% of the increase was due to genomics. Prediction accuracies declined faster without than with genomic information in the training populations. When genotypes were unavailable, the average decline in prediction accuracy across traits was 41% from the first to the second generation of validation, and 51% from the second to the third generation of validation. When genotypes were available, the average decline across traits was 14% from the first to the second generation of validation, and 3% from the second to the third generation of validation. Prediction accuracies in the last three generations were the same when the training population included 5 or 2 years of data, and a decrease of ~7% was observed when the training population included only 1 year of data. Training sets including genomic information provided an increase in accuracy and persistence of genomic predictions compared with training sets without genomic data. The two most recent years of pedigree, phenotypic, and genomic data were sufficient to maintain prediction accuracies in selection candidates. Similar conclusions were obtained using validation populations per year.

Chickens , Models, Genetic , Animals , Chickens/genetics , Genome , Genomics , Genotype , Phenotype , Polymorphism, Single Nucleotide

15.

Accounting for Population Structure and Phenotypes From Relatives in Association Mapping for Farm Animals: A Simulation Study.

Mancin, Enrico; Lourenco, Daniela; Bermann, Matias; Mantovani, Roberto; Misztal, Ignacy.

Front Genet ; 12: 642065, 2021.

Article En | MEDLINE | ID: mdl-33995481

Population structure or genetic relatedness should be considered in genome association studies to avoid spurious association. The most used methods for genome-wide association studies (GWAS) account for population structure but are limited to genotyped individuals with phenotypes. Single-step GWAS (ssGWAS) can use phenotypes from non-genotyped relatives; however, its ability to account for population structure has not been explored. Here we investigate the equivalence among ssGWAS, efficient mixed-model association expedited (EMMAX), and genomic best linear unbiased prediction GWAS (GBLUP-GWAS), and how they differ from the single-SNP analysis without correction for population structure (SSA-NoCor). We used simulated, structured populations that mimicked fish, beef cattle, and dairy cattle populations with 1040, 5525, and 1,400 genotyped individuals, respectively. Larger populations were also simulated that had up to 10-fold more genotyped animals. The genomes were composed by 29 chromosomes, each harboring one QTN, and the number of simulated SNPs was 35,000 for the fish and 65,000 for the beef and dairy cattle populations. Males and females were genotyped in the fish and beef cattle populations, whereas only males had genotypes in the dairy population. Phenotypes for a trait with heritability varying from 0.25 to 0.35 were available in both sexes for the fish population, but only for females in the beef and dairy cattle populations. In the latter, phenotypes of daughters were projected into genotyped sires (i.e., deregressed proofs) before applying EMMAX and SSA-NoCor. Although SSA-NoCor had the largest number of true positive SNPs among the four methods, the number of false negatives was two-fivefold that of true positives. GBLUP-GWAS and EMMAX had a similar number of true positives, which was slightly smaller than in ssGWAS, although the difference was not significant. Additionally, no significant differences were observed when deregressed proofs were used as pseudo-phenotypes in EMMAX compared to daughter phenotypes in ssGWAS for the dairy cattle population. Single-step GWAS accounts for population structure and is a straightforward method for association analysis when only a fraction of the population is genotyped and/or when phenotypes are available on non-genotyped relatives.

16.

Modeling genetic differences of combined broiler chicken populations in single-step GBLUP.

Bermann, Matias; Lourenco, Daniela; Breen, Vivian; Hawken, Rachel; Brito Lopes, Fernando; Misztal, Ignacy.

J Anim Sci ; 99(4)2021 Apr 01.

Article En | MEDLINE | ID: mdl-33649764

The introduction of animals from a different environment or population is a common practice in commercial livestock populations. In this study, we modeled the inclusion of a group of external birds into a local broiler chicken population for the purpose of genomic evaluations. The pedigree was composed of 242,413 birds and genotypes were available for 107,216 birds. A five-trait model that included one growth, two yield, and two efficiency traits was used for the analyses. The strategies to model the introduction of external birds were to include a fixed effect representing the origin of parents and to use unknown parent groups (UPG) or metafounders (MF). Genomic estimated breeding values (GEBV) were obtained with single-step GBLUP using the Algorithm for Proven and Young. Bias, dispersion, and accuracy of GEBV for the validation birds, that is, from the most recent generation, were computed. The bias and dispersion were estimated with the linear regression (LR) method,whereas accuracy was estimated by the LR method and predictive ability. When fixed UPG were fit without estimated inbreeding, the model did not converge. In contrast, models with fixed UPG and estimated inbreeding or random UPG converged and resulted in similar GEBV. The inclusion of an extra fixed effect in the model made the GEBV unbiased and reduced the inflation. Genomic predictions with MF were slightly biased and inflated due to the unbalanced number of observations assigned to each metafounder. When combining local and external populations, the greatest accuracy can be obtained by adding an extra fixed effect to account for the origin of parents plus UPG with estimated inbreeding or random UPG. To estimate the accuracy, the LR method is more consistent among scenarios, whereas the predictive ability greatly depends on the model specification.

Chickens , Models, Genetic , Animals , Chickens/genetics , Genome , Genotype , Pedigree , Phenotype

17.

Determining the stability of accuracy of genomic estimated breeding values in future generations in commercial pig populations.

Hollifield, Mary Kate; Lourenco, Daniela; Bermann, Matias; Howard, Jeremy T; Misztal, Ignacy.

J Anim Sci ; 99(4)2021 Apr 01.

Article En | MEDLINE | ID: mdl-33733277

Genomic information has a limited dimensionality (number of independent chromosome segments [Me]) related to the effective population size. Under the additive model, the persistence of genomic accuracies over generations should be high when the nongenomic information (pedigree and phenotypes) is equivalent to Me animals with high accuracy. The objective of this study was to evaluate the decay in accuracy over time and to compare the magnitude of decay with varying quantities of data and with traits of low and moderate heritability. The dataset included 161,897 phenotypic records for a growth trait (GT) and 27,669 phenotypic records for a fitness trait (FT) related to prolificacy in a population with dimensionality around 5,000. The pedigree included 404,979 animals from 2008 to 2020, of which 55,118 were genotyped. Two single-trait models were used with all ancestral data and sliding subsets of 3-, 2-, and 1-generation intervals. Single-step genomic best linear unbiased prediction (ssGBLUP) was used to compute genomic estimated breeding values (GEBV). Estimated accuracies were calculated by the linear regression (LR) method. The validation population consisted of single generations succeeding the training population and continued forward for all generations available. The average accuracy for the first generation after training with all ancestral data was 0.69 and 0.46 for GT and FT, respectively. The average decay in accuracy from the first generation after training to generation 9 was -0.13 and -0.19 for GT and FT, respectively. The persistence of accuracy improves with more data. Old data have a limited impact on the predictions for young animals for a trait with a large amount of information but a bigger impact for a trait with less information.

Genome , Models, Genetic , Animals , Genomics , Genotype , Pedigree , Phenotype , Polymorphism, Single Nucleotide , Social Responsibility , Swine/genetics

18.

Changes in genomic predictions when new information is added.

Hidalgo, Jorge; Lourenco, Daniela; Tsuruta, Shogo; Masuda, Yutaka; Miller, Stephen; Bermann, Matias; Garcia, Andre L S; Misztal, Ignacy.

J Anim Sci ; 99(2)2021 Feb 01.

Article En | MEDLINE | ID: mdl-33544869

The stability of genomic evaluations depends on the amount of data and population parameters. When the dataset is large enough to estimate the value of nearly all independent chromosome segments (~10K in American Angus cattle), the accuracy and persistency of breeding values will be high. The objective of this study was to investigate changes in estimated breeding values (EBV) and genomic EBV (GEBV) across monthly evaluations for 1 yr in a large genotyped population of beef cattle. The American Angus data used included 8.2 million records for birth weight, 8.9 for weaning weight, and 4.4 for postweaning gain. A total of 10.1 million animals born until December 2017 had pedigree information, and 484,074 were genotyped. A truncated dataset included animals born until December 2016. To mimic a scenario with monthly evaluations, 2017 data were added 1 mo at a time to estimate EBV using best linear unbiased prediction (BLUP) and GEBV using single-step genomic BLUP with the algorithm for proven and young (APY) with core group fixed for 1 yr or updated monthly. Predictions from monthly evaluations in 2017 were contrasted with the predictions of the evaluation in December 2016 or the previous month for all genotyped animals born until December 2016 with or without their own phenotypes or progeny phenotypes. Changes in EBV and GEBV were similar across traits, and only results for weaning weight are presented. Correlations between evaluations from December 2016 and the 12 consecutive evaluations were ≥0.97 for EBV and ≥0.99 for GEBV. Average absolute changes for EBV were about two times smaller than for GEBV, except for animals with new progeny phenotypes (≤0.12 and ≤0.11 additive genetic SD [SDa] for EBV and GEBV). The maximum absolute changes for EBV (≤2.95 SDa) were greater than for GEBV (≤1.59 SDa). The average(maximum) absolute GEBV changes for young animals from December 2016 to January and December 2017 ranged from 0.05(0.25) to 0.10(0.53) SDa. Corresponding ranges for animals with new progeny phenotypes were from 0.05(0.88) to 0.11(1.59) SDa for GEBV changes. The average absolute change in EBV(GEBV) from December 2016 to December 2017 for sires with ≤50 progeny phenotypes was 0.26(0.14) and for sires with >50 progeny phenotypes was 0.25(0.16) SDa. Updating the core group in APY without adding data created an average absolute change of 0.07 SDa in GEBV. Genomic evaluations in large genotyped populations are as stable and persistent as the traditional genetic evaluations, with less extreme changes.

Genome , Models, Genetic , Animals , Cattle/genetics , Female , Genomics , Genotype , Pedigree , Phenotype , Pregnancy

19.

Comparison of models for missing pedigree in single-step genomic prediction.

Masuda, Yutaka; Tsuruta, Shogo; Bermann, Matias; Bradford, Heather L; Misztal, Ignacy.

J Anim Sci ; 99(2)2021 Feb 01.

Article En | MEDLINE | ID: mdl-33493284

Pedigree information is often missing for some animals in a breeding program. Unknown-parent groups (UPGs) are assigned to the missing parents to avoid biased genetic evaluations. Although the use of UPGs is well established for the pedigree model, it is unclear how UPGs are integrated into the inverse of the unified relationship matrix (H-inverse) required for single-step genomic best linear unbiased prediction. A generalization of the UPG model is the metafounder (MF) model. The objectives of this study were to derive 3 H-inverses and to compare genetic trends among models with UPG and MF H-inverses using a simulated purebred population. All inverses were derived using the joint density function of the random breeding values and genetic groups. The breeding values of genotyped animals (u2) were assumed to be adjusted for UPG effects (g) using matrix Q2 as u2∗=u2+Q2g before incorporating genomic information. The Quaas-Pollak-transformed (QP) H-inverse was derived using a joint density function of u2∗ and g updated with genomic information and assuming nonzero cov(u2∗,g'). The modified QP (altered) H-inverse also assumes that the genomic information updates u2∗ and g, but cov(u2∗,g')=0. The UPG-encapsulated (EUPG) H-inverse assumed genomic information updates the distribution of u2∗. The EUPG H-inverse had the same structure as the MF H-inverse. Fifty percent of the genotyped females in the simulation had a missing dam, and missing parents were replaced with UPGs by generation. The simulation study indicated that u2∗ and g in models using the QP and altered H-inverses may be inseparable leading to potential biases in genetic trends. Models using the EUPG and MF H-inverses showed no genetic trend biases. These 2 H-inverses yielded the same genomic EBV (GEBV). The predictive ability and inflation of GEBVs from young genotyped animals were nearly identical among models using the QP, altered, EUPG, and MF H-inverses. Although the choice of H-inverse in real applications with enough data may not result in biased genetic trends, the EUPG and MF H-inverses are to be preferred because of theoretical justification and possibility to reduce biases.

Genome , Models, Genetic , Animals , Female , Genomics , Genotype , Pedigree , Phenotype

20.

Validation of single-step GBLUP genomic predictions from threshold models using the linear regression method: An application in chicken mortality.

Bermann, Matias; Legarra, Andres; Hollifield, Mary Kate; Masuda, Yutaka; Lourenco, Daniela; Misztal, Ignacy.

J Anim Breed Genet ; 138(1): 4-13, 2021 Jan.

Article En | MEDLINE | ID: mdl-32985749

The objective of this study was to determine whether the linear regression (LR) method could be used to validate genomic threshold models. Statistics for the LR method were computed from estimated breeding values (EBVs) using the whole and truncated data sets with variances from the reference and validation populations. The method was tested using simulated and real chicken data sets. The simulated data set included 10 generations of 4,500 birds each; genotypes were available for the last three generations. Each animal was assigned a continuous trait, which was converted to a binary score assuming an incidence of failure of 7%. The real data set included the survival status of 186,596 broilers (mortality rate equal to 7.2%) and genotypes of 18,047 birds. Both data sets were analysed using best linear unbiased predictor (BLUP) or single-step GBLUP (ssGBLUP). The whole data set included all phenotypes available, whereas in the partial data set, phenotypes of the most recent generation were removed. In the simulated data set, the accuracies based on the LR formulas were 0.45 for BLUP and 0.76 for ssGBLUP, whereas the correlations between true breeding values and EBVs (i.e. true accuracies) were 0.37 and 0.65, respectively. The gain in accuracy by adding genomic information was overestimated by 0.09 when using the LR method compared to the true increase in accuracy. However, when the estimated ratio between the additive variance computed based on pedigree only and on pedigree and genomic information was considered, the difference between true and estimated gain was <0.02. Accuracies of BLUP and ssGBLUP with the real data set were 0.41 and 0.47, respectively. This small improvement in accuracy when using ssGBLUP with the real data set was due to population structure and lower heritability. The LR method is a useful tool for estimating improvements in accuracy of EBVs due to the inclusion of genomic information when traditional validation methods as k-fold validation and predictive ability are not applicable.

Chickens , Genome , Animals , Genomics , Genotype , Linear Models , Models, Genetic , Pedigree , Phenotype