RESUMO
Whole genome epistasis models with interactions between different loci can be approximated by genomic relationship models based on Hadamard powers of the additive genomic relationship. We illustrate that the quality of this approximation reduces when the degree of interaction d increases. Moreover, considering relationship models defined as weighted sum of interactions of different degree, we investigate the impact of this decreasing quality of approximation of the summands on the approximation of the weighted sum. Our results indicate that these approximations remain on a reliable level, but their quality reduces when the weights of interactions of higher degrees do not decrease quickly.
Assuntos
Epistasia Genética , Modelos Genéticos , Genoma , GenômicaRESUMO
Genome-environment Associations (GEA) or Environmental Genome-Wide Association scans (EnvGWAS) have been poorly applied for studying the genomics of adaptive traits in bread wheat landraces (Triticum aestivum L.). We analyzed 990 landraces and seven climatic variables (mean temperature, maximum temperature, precipitation, precipitation seasonality, heat index of mean temperature, heat index of maximum temperature, and drought index) in GEA using the FarmCPU approach with GAPIT. Historical temperature and precipitation values were obtained as monthly averages from 1970 to 2000. Based on 26,064 high-quality SNP loci, landraces were classified into ten subpopulations exhibiting high genetic differentiation. The GEA identified 59 SNPs and nearly 89 protein-encoding genes involved in the response processes to abiotic stress. Genes related to biosynthesis and signaling are mainly mediated by auxins, abscisic acid (ABA), ethylene (ET), salicylic acid (SA), and jasmonates (JA), which are known to operate together in modulation responses to heat stress and drought in plants. In addition, we identified some proteins associated with the response and tolerance to stress by high temperatures, water deficit, and cell wall functions. The results provide candidate regions for selection aimed to improve drought and heat tolerance in bread wheat and provide insights into the genetic mechanisms involved in adaptation to extreme environments.
RESUMO
When including genotype × environment interactions (G × E) in genomic prediction models, Hadamard or Kronecker products have been used to model the covariance structure of interactions. The relation between these two types of modeling has not been made clear in genomic prediction literature. Here, we demonstrate that a certain model based on a Hadamard formulation and another using the Kronecker product lead to exactly the same statistical model. Moreover, we illustrate how a multiplication of entries of covariance matrices is related to modeling locus × environmental-variable interactions explicitly. Finally, we use a wheat and a maize data set to illustrate that the environmental covariance E can be specified easily, also if no information on environmental variables - such as temperature or precipitation - is available. Given that lines have been tested in different environments, the corresponding environmental covariance can simply be estimated from the training set as phenotypic covariance between environments. To achieve a high level of increase in predictive ability, the environmental covariance has to be defined appropriately and records on the performance of the lines of the test set under different environmental conditions have to be included in the training set.
Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Genômica , Genótipo , Triticum/genéticaRESUMO
Some authors have evaluated the unconstrained optimum and decorrelated multistage linear phenotypic selection indices (OMLPSI and DMLPSI, respectively) theory. We extended this index theory to the constrained multistage linear phenotypic selection index context, where we denoted OMLPSI and DMLPSI as OCMLPSI and DCMLPSI, respectively. The OCMLPSI (DCMLPSI) is the most general multistage index and includes the OMLPSI (DMLPSI) as a particular case. The OCMLPSI (DCMLPSI) predicts the individual net genetic merit at different individual ages and allows imposing constraints on the genetic gains to make some traits change their mean values based on a predetermined level, while the rest of them remain without restrictions. The OCMLPSI takes into consideration the index correlation values among stages, whereas the DCMLPSI imposes the restriction that the index correlation values among stages be null. The criteria to evaluate OCMLPSI efficiency vs. DCMLPSI efficiency were that the total response of each index must be lower than or equal to the single-stage constrained linear phenotypic selection index response and that the expected genetic gain per trait values should be similar to the constraints imposed by the breeder. We used one real and one simulated dataset to validate the efficiency of the indices. The results indicated that OCMLPSI accuracy when predicting the selection response and expected genetic gain per trait was higher than DCMLPSI accuracy when predicting them. Thus, breeders should use the OCMLPSI when making a phenotypic selection.
RESUMO
The dna is the fundamental basis of genetic information, just as bits are for computers. Whenever computers are used to represent genetic data, the computational encoding must be efficient to allow the representation of processes driving the inheritance and variability. This is especially important across simulations in view of the increasing complexity and dimensions brought by genomics. This paper introduces a new binary representation of genetic information. Algorithms as bitwise operations that mimic the inheritance of a wide range of polymorphisms are also presented. Different kinds and mixtures of polymorphisms are discussed and exemplified. Proposed algorithms and data structures were implemented in C++ programming language and is available to end users in the R package "isqg" which is available at the R repository (cran). Supplementary data are available online.
Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , AlgoritmosRESUMO
Evidence that genomic selection (GS) is a technology that is revolutionizing plant breeding continues to grow. However, it is very well documented that its success strongly depends on statistical models, which are used by GS to perform predictions of candidate genotypes that were not phenotyped. Because there is no universally better model for prediction and models for each type of response variable are needed (continuous, binary, ordinal, count, etc.), an active area of research aims to develop statistical models for the prediction of univariate and multivariate traits in GS. However, most of the models developed so far are for univariate and continuous (Gaussian) traits. Therefore, to overcome the lack of multivariate statistical models for genome-based prediction by improving the original version of the BMTME, we propose an improved Bayesian multi-trait and multi-environment (BMTME) R package for analyzing breeding data with multiple traits and multiple environments. We also introduce Bayesian multi-output regressor stacking (BMORS) functions that are considerably efficient in terms of computational resources. The package allows parameter estimation and evaluates the prediction performance of multi-trait and multi-environment data in a reliable, efficient and user-friendly way. We illustrate the use of the BMTME with real toy datasets to show all the facilities that the software offers the user. However, for large datasets, the BME() and BMTME() functions of the BMTME R package are very intense in terms of computing time; on the other hand, less intensive computing is required with BMORS functions BMORS() and BMORS_Env() that are also included in the BMTME package.
Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Interação Gene-Ambiente , Genômica/métodos , Característica Quantitativa Herdável , Software , Algoritmos , Modelos Estatísticos , Zea mays/genéticaRESUMO
Plant and animal breeders are interested in selecting the best individuals from a candidate set for the next breeding cycle. In this paper, we propose a formal method under the Bayesian decision theory framework to tackle the selection problem based on genomic selection (GS) in single- and multi-trait settings. We proposed and tested three univariate loss functions (Kullback-Leibler, KL; Continuous Ranked Probability Score, CRPS; Linear-Linear loss, LinLin) and their corresponding multivariate generalizations (Kullback-Leibler, KL; Energy Score, EnergyS; and the Multivariate Asymmetric Loss Function, MALF). We derived and expressed all the loss functions in terms of heritability and tested them on a real wheat dataset for one cycle of selection and in a simulated selection program. The performance of each univariate loss function was compared with the standard method of selection (Std) that does not use loss functions. We compared the performance in terms of the selection response and the decrease in the population's genetic variance during recurrent breeding cycles. Results suggest that it is possible to obtain better performance in a long-term breeding program using the single-trait scheme by selecting 30% of the best individuals in each cycle but not by selecting 10% of the best individuals. For the multi-trait approach, results show that the population mean for all traits under consideration had positive gains, even though two of the traits were negatively correlated. The corresponding population variances were not statistically different from the different loss function during the 10th selection cycle. Using the loss function should be a useful criterion when selecting the candidates for selection for the next breeding cycle.
Assuntos
Genoma , Modelos Genéticos , Característica Quantitativa Herdável , Seleção Genética , Teorema de BayesRESUMO
When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments.
Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Melhoramento Vegetal/métodos , Característica Quantitativa Herdável , Teorema de Bayes , Distribuição de Poisson , Triticum/genéticaRESUMO
When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G × E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T × G × E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-[Formula: see text] priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were insensitive to the choice of hyper-parameters. We also developed a computationally efficient Markov Chain Monte Carlo (MCMC) under the above priors, which allowed us to obtain all required full conditional distributions of the parameters leading to an exact Gibbs sampling for the posterior distribution. We used two real data sets to implement and evaluate the proposed Bayesian method and found that when the correlation between traits was high (>0.5), the proposed model (with unstructured variance-covariance) improved prediction accuracy compared to the model with diagonal and standard variance-covariance structures. The R-software package Bayesian Multi-Trait and Multi-Environment (BMTME) offers optimized C++ routines to efficiently perform the analyses.