RESUMEN
When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multitrait linear models by 2.2-17.45% (datasets 1-3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multitrait kernel method can be attributed to the fact that the proposed model is able to capture nonlinear patterns more efficiently than linear multitrait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.
Asunto(s)
Genoma , Modelos Genéticos , Teorema de Bayes , Genotipo , FenotipoRESUMEN
Genomic selection uses whole-genome marker models to predict phenotypes or genetic values for complex traits. Some of these models fit interaction terms between markers, and are therefore called epistatic. The biological interpretation of the corresponding fitted effects is not straightforward and there is the threat of overinterpreting their functional meaning. Here we show that the predictive ability of epistatic models relative to additive models can change with the density of the marker panel. In more detail, we show that for publicly available Arabidopsis and rice datasets, an initial superiority of epistatic models over additive models, which can be observed at a lower marker density, vanishes when the number of markers increases. We relate these observations to earlier results reported in the context of association studies which showed that detecting statistical epistatic effects may not only be related to interactions in the underlying genetic architecture, but also to incomplete linkage disequilibrium at low marker density ("Phantom Epistasis"). Finally, we illustrate in a simulation study that due to phantom epistasis, epistatic models may also predict the genetic value of an underlying purely additive genetic architecture better than additive models, when the marker density is low. Our observations can encourage the use of genomic epistatic models with low density panels, and discourage their biological over-interpretation.
Asunto(s)
Epistasis Genética , Modelos Genéticos , Genoma , Genómica , Desequilibrio de LigamientoRESUMEN
The constrained linear genomic selection index (CLGSI) is a linear combination of genomic estimated breeding values useful for predicting the net genetic merit, which in turn is a linear combination of true unobservable breeding values of the traits weighted by their respective economic values. The CLGSI is the most general genomic index and allows imposing constraints on the expected genetic gain per trait to make some traits change their mean values based on a predetermined level, while the rest of them remain without restrictions. In addition, it includes the unconstrained linear genomic index as a particular case. Using two real datasets and simulated data for seven selection cycles, we compared the theoretical results of the CLGSI with the theoretical results of the constrained linear phenotypic selection index (CLPSI). The criteria used to compare CLGSI vs. CLPSI efficiency were the estimated expected genetic gain per trait values, the selection response, and the interval between selection cycles. The results indicated that because the interval between selection cycles is shorter for the CLGSI than for the CLPSI, CLGSI is more efficient than CLPSI per unit of time, but its efficiency could be lower per selection cycle. Thus, CLGSI is a good option for performing genomic selection when there are genotyped candidates for selection.
Asunto(s)
Genómica , Selección Genética , Zea mays/genética , Simulación por Computador , Cruzamientos Genéticos , Bases de Datos Genéticas , Genoma de Planta , Fenotipo , Fitomejoramiento , Carácter Cuantitativo HeredableRESUMEN
The genetic merit of individuals can be estimated using models with dense markers and pedigree information. Early genomic models accounted only for additive effects. However, the prediction of non-additive effects is important for different forest breeding systems where the whole genotypic value can be captured through clonal propagation. In this study, we evaluated the integration of marker data with pedigree information, in models that included or ignored non-additive effects. We tested the models Reproducing Kernel Hilbert Spaces (RKHS) and BayesA, with additive and additive-dominance frameworks. Model performance was assessed for the traits tree height, diameter at breast height and rust resistance, measured in 923 pine individuals from a structured population of 71 full-sib families. We have also simulated a population with similar genetic properties and evaluated the performance of models for six simulated traits with distinct genetic architectures. Different cross validation strategies were evaluated, and highest accuracies were achieved using within family cross validation. The inclusion of pedigree information in genomic prediction models did not yield higher accuracies. The different RKHS models resulted in similar predictions accuracies, and RKHS and BayesA generated substantially better predictions than pedigree-only models. The additive-BayesA resulted in higher accuracies than RKHS for rust incidence and in simulated additive-oligogenic traits. For DBH, HT and additive-dominance polygenic traits, the RKHS- based models showed slightly higher accuracies than BayesA. Our results indicate that BayesA performs the best for traits with few genes with major effects, while RKHS based models can best predict genotypic effects for clonal selection of complex traits.
Asunto(s)
Marcadores Genéticos , Genoma , Genómica , Modelos Genéticos , Linaje , Algoritmos , Cruzamiento , Genética de Población , Genómica/métodos , Genotipo , Fenotipo , Fitomejoramiento , Reproducibilidad de los ResultadosRESUMEN
Genomic selection is an efficient approach to get shorter breeding cycles in recurrent selection programs and greater genetic gains with selection of superior individuals. Despite advances in genotyping techniques, genetic studies for polyploid species have been limited to a rough approximation of studies in diploid species. The major challenge is to distinguish the different types of heterozygotes present in polyploid populations. In this work, we evaluated different genomic prediction models applied to a recurrent selection population of 530 genotypes of Panicum maximum, an autotetraploid forage grass. We also investigated the effect of the allele dosage in the prediction, i.e., considering tetraploid (GS-TD) or diploid (GS-DD) allele dosage. A longitudinal linear mixed model was fitted for each one of the six phenotypic traits, considering different covariance matrices for genetic and residual effects. A total of 41,424 genotyping-by-sequencing markers were obtained using 96-plex and Pst1 restriction enzyme, and quantitative genotype calling was performed. Six predictive models were generalized to tetraploid species and predictive ability was estimated by a replicated fivefold cross-validation process. GS-TD and GS-DD models were performed considering 1,223 informative markers. Overall, GS-TD data yielded higher predictive abilities than with GS-DD data. However, different predictive models had similar predictive ability performance. In this work, we provide bioinformatic and modeling guidelines to consider tetraploid dosage and observed that genomic selection may lead to additional gains in recurrent selection program of P. maximum.
Asunto(s)
Alelos , Dosificación de Gen , Genoma de Planta , Genómica , Panicum/genética , Algoritmos , Genómica/métodos , Fenotipo , Fitomejoramiento , Poliploidía , Selección GenéticaRESUMEN
Hyperspectral reflectance phenotyping and genomic selection are two emerging technologies that have the potential to increase plant breeding efficiency by improving prediction accuracy for grain yield. Hyperspectral cameras quantify canopy reflectance across a wide range of wavelengths that are associated with numerous biophysical and biochemical processes in plants. Genomic selection models utilize genome-wide marker or pedigree information to predict the genetic values of breeding lines. In this study, we propose a multi-kernel GBLUP approach to genomic selection that uses genomic marker-, pedigree-, and hyperspectral reflectance-derived relationship matrices to model the genetic main effects and genotype × environment (G × E) interactions across environments within a bread wheat (Triticum aestivum L.) breeding program. We utilized an airplane equipped with a hyperspectral camera to phenotype five differentially managed treatments of the yield trials conducted by the Bread Wheat Improvement Program of the International Maize and Wheat Improvement Center (CIMMYT) at Ciudad Obregón, México over four breeding cycles. We observed that single-kernel models using hyperspectral reflectance-derived relationship matrices performed similarly or superior to marker- and pedigree-based genomic selection models when predicting within and across environments. Multi-kernel models combining marker/pedigree information with hyperspectral reflectance phentoypes had the highest prediction accuracies; however, improvements in accuracy over marker- and pedigree-based models were marginal when correcting for days to heading. Our results demonstrate the potential of using hyperspectral imaging to predict grain yield within a multi-environment context and also support further studies on the integration of hyperspectral reflectance phenotyping into breeding programs.
Asunto(s)
Fitomejoramiento/métodos , Triticum/genética , Interacción Gen-Ambiente , Marcadores Genéticos , Genoma de Planta , Genotipo , México , Fenotipo , Selección Genética , Triticum/crecimiento & desarrolloRESUMEN
One of the major issues in plant breeding is the occurrence of genotype × environment (GE) interaction. Several models have been created to understand this phenomenon and explore it. In the genomic era, several models were employed to improve selection by using markers and account for GE interaction simultaneously. Some of these models use special genetic covariance matrices. In addition, the scale of multi-environment trials is getting larger, and this increases the computational challenges. In this context, we propose an R package that, in general, allows building GE genomic covariance matrices and fitting linear mixed models, in particular, to a few genomic GE models. Here we propose two functions: one to prepare the genomic kernels accounting for the genomic GE and another to perform genomic prediction using a Bayesian linear mixed model. A specific treatment is given for sparse covariance matrices, in particular, to block diagonal matrices that are present in some GE models in order to decrease the computational demand. In empirical comparisons with Bayesian Genomic Linear Regression (BGLR), accuracies and the mean squared error were similar; however, the computational time was up to five times lower than when using the classic approach. Bayesian Genomic Genotype × Environment Interaction (BGGE) is a fast, efficient option for creating genomic GE kernels and making genomic predictions.
Asunto(s)
Interacción Gen-Ambiente , Genotipo , Modelos Genéticos , Teorema de Bayes , Valor Predictivo de las PruebasRESUMEN
Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD.
Asunto(s)
ADN/genética , Resistencia a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Genómica , Oncorhynchus kisutch/genética , Oncorhynchus kisutch/microbiología , Piscirickettsia/fisiología , Mapeo Restrictivo/métodos , Animales , Cruzamiento , Femenino , Enfermedades de los Peces/genética , Enfermedades de los Peces/microbiología , Marcadores Genéticos , Estimación de Kaplan-Meier , Masculino , LinajeRESUMEN
Salmonid rickettsial syndrome (SRS), caused by the intracellular bacterium Piscirickettsia salmonis, is one of the main diseases affecting rainbow trout (Oncorhynchus mykiss) farming. To accelerate genetic progress, genomic selection methods can be used as an effective approach to control the disease. The aims of this study were: (i) to compare the accuracy of estimated breeding values using pedigree-based best linear unbiased prediction (PBLUP) with genomic BLUP (GBLUP), single-step GBLUP (ssGBLUP), Bayes C, and Bayesian Lasso (LASSO); and (ii) to test the accuracy of genomic prediction and PBLUP using different marker densities (0.5, 3, 10, 20, and 27 K) for resistance against P. salmonis in rainbow trout. Phenotypes were recorded as number of days to death (DD) and binary survival (BS) from 2416 fish challenged with P. salmonis A total of 1934 fish were genotyped using a 57 K single-nucleotide polymorphism (SNP) array. All genomic prediction methods achieved higher accuracies than PBLUP. The relative increase in accuracy for different genomic models ranged from 28 to 41% for both DD and BS at 27 K SNP. Between different genomic models, the highest relative increase in accuracy was obtained with Bayes C (â¼40%), where 3 K SNP was enough to achieve a similar accuracy to that of the 27 K SNP for both traits. For resistance against P. salmonis in rainbow trout, we showed that genomic predictions using GBLUP, ssGBLUP, Bayes C, and LASSO can increase accuracy compared with PBLUP. Moreover, it is possible to use relatively low-density SNP panels for genomic prediction without compromising accuracy predictions for resistance against P. salmonis in rainbow trout.
Asunto(s)
Resistencia a la Enfermedad/genética , Enfermedades de los Peces/genética , Genómica/métodos , Oncorhynchus mykiss/genética , Infecciones por Piscirickettsiaceae/genética , Animales , Teorema de Bayes , Enfermedades de los Peces/microbiología , Estudio de Asociación del Genoma Completo , Genotipo , Oncorhynchus mykiss/microbiología , Fenotipo , Piscirickettsia/fisiología , Infecciones por Piscirickettsiaceae/microbiología , Polimorfismo de Nucleótido SimpleRESUMEN
Nelore is the most economically important cattle breed in Brazil, and the use of genetically improved animals has contributed to increased beef production efficiency. The Brazilian beef feedlot industry has grown considerably in the last decade, so the selection of animals with higher growth rates on feedlot has become quite important. Genomic selection (GS) could be used to reduce generation intervals and improve the rate of genetic gains. The aim of this study was to evaluate the prediction of genomic-estimated breeding values (GEBV) for average daily weight gain (ADG) in 718 feedlot-finished Nelore steers. Analyses of three Bayesian model specifications [Bayesian GBLUP (BGBLUP), BayesA, and BayesCπ] were performed with four genotype panels [Illumina BovineHD BeadChip, TagSNPs, and GeneSeek High- and Low-density indicus (HDi and LDi, respectively)]. Estimates of Pearson correlations, regression coefficients, and mean squared errors were used to assess accuracy and bias of predictions. Overall, the BayesCπ model resulted in less biased predictions. Accuracies ranged from 0.18 to 0.27, which are reasonable values given the heritability estimates (from 0.40 to 0.44) and sample size (568 animals in the training population). Furthermore, results from Bos taurus indicus panels were as informative as those from Illumina BovineHD, indicating that they could be used to implement GS at lower costs.
Asunto(s)
Cruzamiento , Estudio de Asociación del Genoma Completo , Genoma , Genómica/métodos , Aumento de Peso/genética , Animales , Brasil , Bovinos , Genotipo , Modelos Genéticos , Fenotipo , Reproducibilidad de los ResultadosRESUMEN
Developing genomic selection (GS) models is an important step in applying GS to accelerate the rate of genetic gain in grain yield in plant breeding. In this study, seven genomic prediction models under two cross-validation (CV) scenarios were tested on 287 advanced elite spring wheat lines phenotyped for grain yield (GY), thousand-grain weight (GW), grain number (GN), and thermal time for flowering (TTF) in 18 international environments (year-location combinations) in major wheat-producing countries in 2010 and 2011. Prediction models with genomic and pedigree information included main effects and interaction with environments. Two random CV schemes were applied to predict a subset of lines that were not observed in any of the 18 environments (CV1), and a subset of lines that were not observed in a set of the environments, but were observed in other environments (CV2). Genomic prediction models, including genotype × environment (G×E) interaction, had the highest average prediction ability under the CV1 scenario for GY (0.31), GN (0.32), GW (0.45), and TTF (0.27). For CV2, the average prediction ability of the model including the interaction terms was generally high for GY (0.38), GN (0.43), GW (0.63), and TTF (0.53). Wheat lines in site-year combinations in Mexico and India had relatively high prediction ability for GY and GW. Results indicated that prediction ability of lines not observed in certain environments could be relatively high for genomic selection when predicting G×E interaction in multi-environment trials.
Asunto(s)
Interacción Gen-Ambiente , Genómica , Selección Genética , Triticum/genética , África del Norte , Asia , Cruzamiento , Genoma de Planta , Genotipo , México , Linaje , Fenotipo , Carácter Cuantitativo Heredable , Estaciones del Año , Triticum/crecimiento & desarrolloRESUMEN
This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials.
Asunto(s)
Genoma de Planta , Modelos Estadísticos , Carácter Cuantitativo Heredable , Triticum/genética , Adaptación Fisiológica/genética , Sequías , Interacción Gen-Ambiente , Genotipo , Calor , Irán , México , Modelos Genéticos , Fenotipo , Selección Genética , Estrés Fisiológico , Triticum/clasificaciónRESUMEN
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.
Asunto(s)
Cruzamiento/métodos , Modelos Genéticos , Triticum/genética , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Selección Genética , Alineación de SecuenciaRESUMEN
Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) arrays. Therefore, GBS has become an attractive alternative technology for genomic selection. However, the use of GBS data poses important challenges, and the accuracy of genomic prediction using GBS is currently undergoing investigation in several crops, including maize, wheat, and cassava. The main objective of this study was to evaluate various methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments (experiments 1 and 2). Given that GBS data come with a large percentage of uncalled genotypes, we evaluated methods using nonimputed, imputed, and GBS-inferred haplotypes of different lengths (short or long). GBS and pedigree data were incorporated into statistical models using either the genomic best linear unbiased predictors (GBLUP) or the reproducing kernel Hilbert spaces (RKHS) regressions, and prediction accuracy was quantified using cross-validation methods. The following results were found: relative to pedigree or marker-only models, there were consistent gains in prediction accuracy by combining pedigree and GBS data; there was increased predictive ability when using imputed or nonimputed GBS data over inferred haplotype in experiment 1, or nonimputed GBS and information-based imputed short and long haplotypes, as compared to the other methods in experiment 2; the level of prediction accuracy achieved using GBS data in experiment 2 is comparable to those reported by previous authors who analyzed this data set using SNP arrays; and GBLUP and RKHS models with pedigree with nonimputed and imputed GBS data provided the best prediction correlations for the three traits in experiment 1, whereas for experiment 2 RKHS provided slightly better prediction than GBLUP for drought-stressed environments, and both models provided similar predictions in well-watered environments.