RESUMEN
The validation of estimated breeding values from single-step genomic BLUP (ssGBLUP) is an important topic, as more and more countries and animal populations are currently changing their genomic prediction to single-step. The objective of this work was to compare different methods to validate single-step genomic breeding values (GEBV). The investigations were carried out using a simulation study based on the German-Austrian-Czech Fleckvieh population. To test the validation methods under different conditions, several biased and unbiased scenarios were simulated. The application of the widely used Interbull GEBV test to the single-step method is only possible to a limited extent, partly because of genomic preselection, which biases conventional estimated breeding values. Alternative validation methods considered in the study are the linear regression method proposed by Legarra and Reverter, the improved genomic validation including additional regressions as suggested by VanRaden and an adaptation of the Interbull GEBV test using daughter yield deviations (DYD) from ssGBLUP instead of pedigree BLUP. The comparison of the different methods for the different scenarios showed that for males the methods based on GEBV estimate the dispersion more accurate and less biased compared with the GEBV test using DYD from ssGBLUP, whereas the standard Interbull GEBV test is highly affected by genomic preselection for males. For females, the GEBV test using yield deviations from ssGBLUP results in better estimations for the true dispersion.
Asunto(s)
Genoma , Genómica , Femenino , Masculino , Bovinos/genética , Animales , Genotipo , Genómica/métodos , Análisis de Regresión , Modelos Lineales , Linaje , Modelos Genéticos , FenotipoRESUMEN
The physiological stress caused by excessive heat affects dairy cattle health and production. This study sought to investigate the effect of heat stress on test-day yields in US Holstein and Jersey cows and develop single-step genomic predictions to identify heat tolerant animals. Data included 12.8 million and 2.1 million test-day records, respectively, for 923,026 Holstein and 153,710 Jersey cows in 27 US states. From 2015 through 2021, test-day records from the first 5 lactations included milk, fat, and protein yields (kg). Cow records were included if they had at least 5 test-day records per lactation. Heat stress was quantified by analyzing the effect of a 5-d hourly average temperature-humidity index (THI5d¯) on observed test-day yields. Using a multiple trait repeatability model, a heat threshold (THI threshold) was determined fowr each breed based on the point that the average adjusted yields started to decrease, which was 69 for Holsteins and 72 for Jerseys. An additive genetic component of general production and heat tolerance production were estimated using a multiple trait reaction norm model and single-step genomic BLUP methodology. Random effects were regressed on a function of 5-d hourly average (THI5d¯) and THI threshold. The proportion of test-day records that occurred on or above the respective heat thresholds was 15% for Holstein and 10% for Jersey. Heritability of milk, fat, and protein yields under heat stress for Holsteins increased, with a small standard error, indicating that the additive genetic component for heat tolerance of these traits was observed. This was not as evident in Jersey traits. For Jersey, the permanent environment explained the same or more of the variation in fat and protein yield under heat stress indicating that nongenetic factors may determine heat tolerance for these Jersey traits. Correlations between the general genetic merit of production (in the absence of heat stress) and heat tolerance genetic merit of production traits were moderate in strength and negative. This indicated that selecting for general genetic merit without consideration of heat tolerance genetic merit of production may result in less favorable performance in hot and humid climates. A general genomic estimated breeding value for genetic merit and a heat tolerance genomic estimated breeding value were calculated for each animal. This study contributes to the investigation of the impact of heat stress on US dairy cattle production yields and offers a basis for the implementation of genomic selection. The results indicate that genomic selection for heat tolerance of production yields is possible for US Holsteins and Jerseys, but a study to validate the genomic predictions should be explored.
RESUMEN
The objectives of this study were to investigate the computational performance and the predictive ability and bias of a single-step SNP BLUP model (ssSNPBLUP) in genotyped young animals with unknown-parent groups (UPG) for type traits, using national genetic evaluation data from the Japanese Holstein population. The phenotype, genotype, and pedigree data were the same as those used in a national genetic evaluation of linear type traits classified between April 1984 and December 2020. In the current study, 2 data sets were prepared: the full data set containing all entries up to December 2020 and a truncated data set ending with December 2016. Genotyped animals were classified into 3 types: sires with classified daughters (S), cows with records (C), and young animals (Y). The computing performance and prediction accuracy of ssSNPBLUP were compared for the following 3 groups of genotyped animals: sires with classified daughters and young animals (SY); cows with records and young animals (CY); and sires with classified daughters, cows with records, and young animals (SCY). In addition, we tested 3 parameters of residual polygenic variance in ssSNPBLUP (0.1, 0.2, or 0.3). Daughter yield deviations (DYD) for the validation bulls and phenotypes adjusted for all fixed effects and random effects other than animal and residual (Yadj) for the validation cows were obtained using the full data set from the pedigree-based BLUP model. The regression coefficients of DYD for bulls (or Yadj for cows) on the genomic estimated breeding value (GEBV) using the truncated data set were used to measure the inflation of the predictions of young animals. The coefficient of determination of DYD on GEBV was used to measure the predictive ability of the predictions for the validation bulls. The reliability of the predictions for the validation cows was calculated as the square of the correlation between Yadj and GEBV divided by heritability. The predictive ability was highest in the SCY group and lowest in the CY group. However, minimal difference was found in predictive abilities with or without UPG models using different parameters of residual polygenic variance. The regression coefficients approached 1.0 as the parameter of residual polygenic variance increased, but regression coefficients were mostly similar regardless of the use of UPG across the groups of genotyped animals. The ssSNPBLUP model, including UPG, was demonstrated as feasible for implementation in the national evaluation of type traits in Japanese Holsteins.
Asunto(s)
Bovinos , Polimorfismo de Nucleótido Simple , Animales , Bovinos/genética , Femenino , Masculino , Genotipo , Modelos Genéticos , Linaje , Fenotipo , Reproducibilidad de los ResultadosRESUMEN
Genomic evaluation based on a single-step model uses all available data of phenotype, genotype, and pedigree; therefore, it should provide unbiased genomic breeding values with a higher correlation of prediction than the current multistep genomic model. Since 2019, a mixed reference population of cows and bulls has been applied to the routine multistep genomic evaluation in German Holsteins. For a fair comparison between the single-step and multistep genomic models, the same phenotype, genotype, and pedigree data were used. Because of its simple structure of the standard multitrait animal model used for German Holstein conventional evaluation, conformation traits were chosen as the first trait group to test a single-step SNP BLUP model for the large, genotyped population of German Holsteins. Genotype, phenotype, and pedigree data were taken from the official August 2020 conventional and genomic evaluation. Because of the same trait definition in national and multiple across-country evaluation for the conformation traits, deregressed multiple across-country evaluation estimated breeding value (EBV) of foreign bulls were treated as a new source of data for the same trait in the genomic evaluations. Due to a short history of female genotyping in Germany, the last 3 yr of youngest cows and bulls were deleted, instead of 4 yr, to perform a genomic validation. In comparison to the multistep genomic model, the single-step SNP BLUP model resulted in a higher correlation and greater variance of genomic EBV according to 798 national validation bulls. The regression of genomic prediction of the current, full evaluation on the earlier, truncated evaluation was slightly closer to 1 than the multistep model. For the validation bulls or youngest genomic artificial insemination bulls, correlation of genomic EBV between the 2 models was, on average, 0.95 across all the conformation traits. We did not find overprediction of young animals by the single-step SNP BLUP model for the conformation traits in German Holsteins.
Asunto(s)
Genoma , Polimorfismo de Nucleótido Simple , Animales , Bovinos/genética , Femenino , Genómica/métodos , Genotipo , Masculino , Modelos Genéticos , Linaje , FenotipoRESUMEN
The effects of climate change together with the projected future demand represents a huge challenge for wheat production systems worldwide. Wheat breeding can contribute to global food security through the creation of genotypes exhibiting stress tolerance and higher yield potential. The objectives of our study were to (i) estimate the annual grain yield (GY) genetic gain of High Rainfall Wheat Yield Trials (HRWYT) grown from 2007 (15th HRWYT) to 2016 (24th HRWYT) across international environments, and (ii) determine the changes in physiological traits associated with GY genetic improvement. The GY genetic gains were estimated as genetic progress per se (GYP) and in terms of local checks (GYLC). In total, 239 international locations were classified into two groups: high- and low-rainfall environments based on climate variables and trial management practices. In the high-rainfall environment, the annual genetic gains for GYP and GYLC were 3.8 and 1.17 % (160 and 65.1 kg ha-1 yr-1), respectively. In the low-rainfall environment, the genetic gains were 0.93 and 0.73 % (40 and 33.1 kg ha-1 yr-1), for GYP and GYLC respectively. The GY of the lines included in each nursery showed a significant phenotypic correlation between high- and low-rainfall environments in all the examined years and several of the five best performing lines were common in both environments. The GY progress was mainly associated with increased grain weight (R2 = 0.35 p < 0.001), days to maturity (R2 = 0.20, p < 0.001) and grain filling period (R2 = 0.06, p < 0.05). These results indicate continuous GY genetic progress and yield stability in the HRWYT germplasm developed and distributed by CIMMYT.
RESUMEN
The approximated non-linear least squares (ALS) tunes or calibrates the computer model by minimizing the squared error between the computer output and real observations by using an emulator such as a Gaussian process (GP) model. A potential defect of the ALS method is that the emulator is constructed once and it is no longer re-built. An iterative method is proposed in this study to address this difficulty. In the proposed method, the tuning parameters of the simulation model are calculated by the conditional expectation (E-step), whereas the GP parameters are updated by the maximum likelihood estimation (M-step). These EM-steps are alternately repeated until convergence by using both computer and experimental data. For comparative purposes, another iterative method (the max-min algorithm) and a likelihood-based method are considered. Five toy models are tested for a comparative analysis of these methods. According to the toy model study, both the variance and bias of the estimates obtained from the proposed EM algorithm are smaller than those from the existing calibration methods. Finally, the application to a nuclear fusion simulator is demonstrated.
RESUMEN
The approximated nonlinear least squares (ALS) method has been used for the estimation of unknown parameters in the complex computer code which is very time-consuming to execute. The ALS calibrates or tunes the computer code by minimizing the squared difference between real observations and computer output using a surrogate such as a Gaussian process model. When the differences (residuals) are correlated or heteroscedastic, the ALS may result in a distorted code tuning with a large variance of estimation. Another potential drawback of the ALS is that it does not take into account the uncertainty in the approximation of the computer model by a surrogate. To address these problems, we propose a generalized ALS (GALS) by constructing the covariance matrix of residuals. The inverse of the covariance matrix is multiplied to the residuals, and it is minimized with respect to the tuning parameters. In addition, we consider an iterative version for the GALS, which is called as the max-minG algorithm. In this algorithm, the parameters are re-estimated and updated by the maximum likelihood estimation and the GALS, by using both computer and experimental data repeatedly until convergence. Moreover, the iteratively re-weighted ALS method (IRWALS) was considered for a comparison purpose. Five test functions in different conditions are examined for a comparative analysis of the four methods. Based on the test function study, we find that both the bias and variance of estimates obtained from the proposed methods (the GALS and the max-minG) are smaller than those from the ALS and the IRWALS methods. Especially, the max-minG works better than others including the GALS for the relatively complex test functions. Lastly, an application to a nuclear fusion simulator is illustrated and it is shown that the abnormal pattern of residuals in the ALS can be resolved by the proposed methods.
RESUMEN
Genomic evaluation of French dairy goats is routinely conducted using the single-step genomic BLUP (ssGBLUP) method. This method has the advantage of simultaneously using all phenotypes, pedigrees, and genotypes. However, ssGBLUP assumes that all SNP explain the same amount of genetic variance, which is unlikely in the case of traits whose major genes or QTL are segregating. In this study, we investigated the effect of weighted ssGBLUP and its alternatives, which give more weight to SNP associated with the trait, on the accuracy of genomic evaluation of milk production, udder type traits, and somatic cell scores. The data set included 2,955 genotyped animals and 2,543,680 pedigree animals. The number of phenotypes varied with the trait. The accuracy of genomic evaluation was assessed on 205 genotyped Alpine and 146 genotyped Saanen goats born between 2009 and 2012. For traits with unknown QTL, weighted ssGBLUP was less accurate than, or as accurate as, ssGBLUP. For traits with identified QTL (i.e., QTL only present in the Saanen breed), weighted ssGBLUP outperformed ssGBLUP by between 2 and 14%.
Asunto(s)
Genoma , Cabras/genética , Cabras/metabolismo , Leche/metabolismo , Animales , Cruzamiento , Femenino , Francia , Genómica , Genotipo , Glándulas Mamarias Animales/metabolismo , Linaje , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
The use of multi-trait across-country evaluation (MACE) and the exchange of genomic information among countries allows national breeding programs to combine foreign and national data to increase the size of the training populations and potentially increase accuracy of genomic prediction of breeding values. By including genotyped and nongenotyped animals simultaneously in the evaluation, the single-step genomic BLUP (GBLUP) approach has the potential to deliver more accurate and less biased genomic evaluations. A single-step genomic BLUP approach, which enables integration of data from MACE evaluations, can be used to obtain genomic predictions while avoiding double-counting of information. The objectives of this study were to apply a single-step approach that simultaneously includes domestic and MACE information for genomic evaluation of workability traits in Canadian Holstein cattle, and compare the results obtained with this methodology with those obtained using a multi-step approach (msGBLUP). By including MACE bulls in the training population, msGBLUP led to an increase in reliability of genomic predictions of 4.8 and 15.4% for milking temperament and milking speed, respectively, compared with a traditional evaluation using only pedigree and phenotypic information. Integration of MACE data through a single-step approach (ssGBLUPIM) yielded the highest reliabilities compared with other considered methods. Integration of MACE data also helped reduce bias of genomic predictions. When using ssGBLUPIM, the bias of genomic predictions decreased by half compared with msGBLUP using domestic and MACE information. Therefore, the reliability and bias of genomic predictions for both traits improved substantially when a single-step approach was used for evaluation compared with a multi-step approach. The use of a single-step approach with integration of MACE information provides an alternative to the current method used in Canadian genomic evaluations.
Asunto(s)
Bovinos/genética , Genoma/genética , Genómica , Leche/metabolismo , Animales , Cruzamiento , Genotipo , Masculino , Linaje , Fenotipo , Reproducibilidad de los Resultados , TemperamentoRESUMEN
Genotypes, phenotypes and pedigrees of 6 breeds of dairy sheep (including subdivisions of Latxa, Manech, and Basco-Béarnaise) from the Spain and France Western Pyrenees were used to estimate genetic relationships across breeds (together with genotypes from the Lacaune dairy sheep) and to verify by forward cross-validation single-breed or multiple-breed genetic evaluations. The number of rams genotyped fluctuated between 100 and 1,300 but generally represented the 10 last cohorts of progeny-tested rams within each breed. Genetic relationships were assessed by principal components analysis of the genomic relationship matrices and also by the conservation of linkage disequilibrium patterns at given physical distances in the genome. Genomic and pedigree-based evaluations used daughter yield performances of all rams, although some of them were not genotyped. A pseudo-single step method was used in this case for genomic predictions. Results showed a clear structure in blond and black breeds for Manech and Latxa, reflecting historical exchanges, and isolation of Basco-Béarnaise and Lacaune. Relatedness between any 2 breeds was, however, lower than expected. Single-breed genomic predictions had accuracies comparable with other breeds of dairy sheep or small breeds of dairy cattle. They were more accurate than pedigree predictions for 5 out of 6 breeds, with absolute increases in accuracy ranging from 0.05 to 0.30 points. They were significantly better, as assessed by bootstrapping of candidates, for 2 of the breeds. Predictions using multiple populations only marginally increased the accuracy for a couple of breeds. Pooling populations does not increase the accuracy of genomic evaluations in dairy sheep; however, single-breed genomic predictions are more accurate, even for small breeds, and make the consideration of genomic schemes in dairy sheep interesting.
Asunto(s)
Cruzamiento , Ovinos/genética , Animales , Femenino , Francia , Genoma , Genómica/métodos , Genotipo , Desequilibrio de Ligamiento/genética , Masculino , Linaje , Fenotipo , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , EspañaRESUMEN
Although transmission disequilibrium tests (TDT) and the FBAT statistic are robust against population substructure, they have reduced statistical power, as compared with fully efficient tests that are not guarded against confounding because of population substructure. This has often limited the application of transmission disequilibrium tests/FBATs to candidate gene analysis, because, in a genome-wide association study, population substructure can be adjusted by approaches such as genomic control and EIGENSTRAT. Here, we provide new statistical methods for the analysis of quantitative and dichotomous phenotypes in extended families. Although the approach utilizes the polygenic model to maximize the efficiency, it still preserves the robustness to non-normality and misspecified covariance structures. In addition, the proposed method performs better than the existing methods for dichotomous phenotype, and the new transmission disequilibrium test for candidate gene analysis is more efficient than FBAT statistics.
Asunto(s)
Familia , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Carácter Cuantitativo Heredable , Sesgo , Simulación por Computador , Humanos , Herencia Multifactorial , FenotipoRESUMEN
Effect modification (EM) may cause bias in network meta-analysis (NMA). Existing population adjustment NMA methods use individual patient data to adjust for EM but disregard available subgroup information from aggregated data in the evidence network. Additionally, these methods often rely on the shared effect modification (SEM) assumption. In this paper, we propose Network Meta-Interpolation (NMI): a method using subgroup analyses to adjust for EM that does not assume SEM. NMI balances effect modifiers across studies by turning treatment effect (TE) estimates at the subgroup- and study level into TE and standard errors at EM values common to all studies. In an extensive simulation study, we simulate two evidence networks consisting of four treatments, and assess the impact of departure from the SEM assumption, variable EM correlation across trials, trial sample size and network size. NMI was compared to standard NMA, network meta-regression (NMR) and Multilevel NMR (ML-NMR) in terms of estimation accuracy and credible interval (CrI) coverage. In the base case non-SEM dataset, NMI achieved the highest estimation accuracy with root mean squared error (RMSE) of 0.228, followed by standard NMA (0.241), ML-NMR (0.447) and NMR (0.541). In the SEM dataset, NMI was again the most accurate method with RMSE of 0.222, followed by ML-NMR (0.255). CrI coverage followed a similar pattern. NMI's dominance in terms of estimation accuracy and CrI coverage appeared to be consistent across all scenarios. NMI represents an effective option for NMA in the presence of study imbalance and available subgroup data.
Asunto(s)
Metaanálisis en Red , Humanos , Sesgo , Tamaño de la MuestraRESUMEN
Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.
RESUMEN
Many important traits in plants, animals, and microbes are polygenic and challenging to improve through traditional marker-assisted selection. Genomic prediction addresses this by incorporating all genetic data in a mixed model framework. The primary method for predicting breeding values is genomic best linear unbiased prediction, which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. Genomic relationship matrices share information among entries to estimate the observed entries' genetic values and predict unobserved entries' genetic values. One of the main parameters of such models is genomic variance (σg2), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms, and genomic heritability (hg2); however, the seminal papers introducing different forms of K often do not discuss their effects on the model estimated variance components despite their importance in genetic research and breeding. Here, we discuss the effect of several standard methods for calculating the genomic relationship matrix on estimates of σg2 and hg2. With current approaches, we found that the genomic variance tends to be either overestimated or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population. Using the average semivariance, we propose a new matrix, KASV, that directly yields accurate estimates of σg2 and hg2 in the observed population and produces best linear unbiased predictors equivalent to routine methods in plants and animals.
Asunto(s)
Modelos Genéticos , Herencia Multifactorial , Alelos , Animales , Genómica/métodos , Genotipo , Fenotipo , Fitomejoramiento , Polimorfismo de Nucleótido SimpleRESUMEN
Genome-Wide Association Studies (GWAS) explain only a small fraction of heritability for most complex human phenotypes. Genomic heritability estimates the variance explained by the SNPs on the whole genome using mixed models and accounts for the many small contributions of SNPs in the explanation of a phenotype. This paper approaches heritability from a machine learning perspective, and examines the close link between mixed models and ridge regression. Our contribution is two-fold. First, we propose estimating genomic heritability using a predictive approach via ridge regression and Generalized Cross Validation (GCV). We show that this is consistent with classical mixed model based estimation. Second, we derive simple formulae that express prediction accuracy as a function of the ratio n p , where n is the population size and p the total number of SNPs. These formulae clearly show that a high heritability does not imply an accurate prediction when p > n. Both the estimation of heritability via GCV and the prediction accuracy formulae are validated using simulated data and real data from UK Biobank.
RESUMEN
The performance of genomic prediction (GP) on genetically correlated traits can be improved through an interdependence multi-trait model under a multi-environment context. In this study, a panel of 237 soft facultative wheat (Triticumaestivum L.) lines was evaluated to compare single- and multi-trait models for predicting grain yield (GY), harvest index (HI), spike fertility (SF), and thousand grain weight (TGW). The panel was phenotyped in two locations and two years in Florida under drought and moderately drought stress conditions, while the genotyping was performed using 27,957 genotyping-by-sequencing (GBS) single nucleotide polymorphism (SNP) makers. Five predictive models including Multi-environment Genomic Best Linear Unbiased Predictor (MGBLUP), Bayesian Multi-trait Multi-environment (BMTME), Bayesian Multi-output Regressor Stacking (BMORS), Single-trait Multi-environment Deep Learning (SMDL), and Multi-trait Multi-environment Deep Learning (MMDL) were compared. Across environments, the multi-trait statistical model (BMTME) was superior to the multi-trait DL model for prediction accuracy in most scenarios, but the DL models were comparable to the statistical models for response to selection. The multi-trait model also showed 5 to 22% more genetic gain compared to the single-trait model across environment reflected by the response to selection. Overall, these results suggest that multi-trait genomic prediction can be an efficient strategy for economically important yield component related traits in soft wheat.
Asunto(s)
Biología Computacional/métodos , Interacción Gen-Ambiente , Fitomejoramiento/métodos , Sitios de Carácter Cuantitativo/genética , Triticum/genética , Agricultura/métodos , Algoritmos , Teorema de Bayes , Grano Comestible/genética , Genoma de Planta/genética , Genómica/métodos , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Selección Genética/genéticaRESUMEN
Serially correlation binomial data with random cluster sizes occur frequently in environmental and health studies. Such data series have traditionally been analyzed using binomial state-space or hidden Markov models without appropriately accounting for the randomness in the cluster sizes. To characterize correlation and extra-variation arising from the random cluster sizes properly, we introduce a joint Poisson state-space modelling approach to analysis of binomial series with random cluster sizes. This approach enables us to model the marginal counts and binomial proportions simultaneously. An optimal estimation of our model has been developed using the orthodox best linear unbiased predictors. This estimation method is computationally efficient and robust since it depends only on the first- and second- moment assumptions of unobserved random effects. Our proposed approach is illustrated with analysis of birth delivery data.
Asunto(s)
Distribución Binomial , Interpretación Estadística de Datos , Distribución de Poisson , Sesgo , Cesárea/estadística & datos numéricos , Cadenas de Markov , Modelos Estadísticos , Proyectos de InvestigaciónRESUMEN
Medical research is often designed to investigate changes in a collection of response variables that are measured repeatedly on the same subjects. The multivariate generalized linear mixed model (MGLMM) can be used to evaluate random coefficient associations (e.g. simple correlations, partial regression coefficients) among outcomes that may be non-normal and differently distributed by specifying a multivariate normal distribution for their random effects and then evaluating the latent relationship between them. Empirical Bayes predictors are readily available for each subject from any mixed model and are observable and hence, plotable. Here, we evaluate whether second-stage association analyses of empirical Bayes predictors from a MGLMM, provide a good approximation and visual representation of these latent association analyses using medical examples and simulations. Additionally, we compare these results with association analyses of empirical Bayes predictors generated from separate mixed models for each outcome, a procedure that could circumvent computational problems that arise when the dimension of the joint covariance matrix of random effects is large and prohibits estimation of latent associations. As has been shown in other analytic contexts, the p-values for all second-stage coefficients that were determined by naively assuming normality of empirical Bayes predictors provide a good approximation to p-values determined via permutation analysis. Analyzing outcomes that are interrelated with separate models in the first stage and then associating the resulting empirical Bayes predictors in a second stage results in different mean and covariance parameter estimates from the maximum likelihood estimates generated by a MGLMM. The potential for erroneous inference from using results from these separate models increases as the magnitude of the association among the outcomes increases. Thus if computable, scatterplots of the conditionally independent empirical Bayes predictors from a MGLMM are always preferable to scatterplots of empirical Bayes predictors generated by separate models, unless the true association between outcomes is zero.
Asunto(s)
Teorema de Bayes , Modelos Lineales , Simulación por Computador , Humanos , Funciones de Verosimilitud , Proyectos de InvestigaciónRESUMEN
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied.
Asunto(s)
Interacción Gen-Ambiente , Genoma de Planta , Genómica , Modelos Genéticos , Zea mays/genética , Algoritmos , Ambiente , Genómica/métodos , Genotipo , Modelos Estadísticos , Fenotipo , Reproducibilidad de los Resultados , Selección GenéticaRESUMEN
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma (PPARG) gene associated with diabetes.