*Plant Genome ; 13(3): e20034, 2020 Nov.*

##### RESUMO

Wheat quality improvement is an important objective in all wheat breeding programs. However, due to the cost, time and quantity of seed required, wheat quality is typically analyzed only in the last stages of the breeding cycle on a limited number of samples. The use of genomic prediction could greatly help to select for wheat quality more efficiently by reducing the cost and time required for this analysis. Here were evaluated the prediction performances of 13 wheat quality traits under two multi-trait models (Bayesian multi-trait multi-environment [BMTME] and multi-trait ridge regression [MTR]) using five data sets of wheat lines evaluated in the field during two consecutive years. Lines in the second year (testing) were predicted using the quality information obtained in the first year (training). For most quality traits were found moderate to high prediction accuracies, suggesting that the use of genomic selection could be feasible. The best predictions were obtained with the BMTME model in all traits and the worst with the MTR model. The best predictions with the BMTME model under the mean arctangent absolute percentage error (MAAPE) were for test weight across the five data sets, whereas the worst predictions were for the alveograph trait ALVPL. In contrast, under Pearson's correlation, the best predictions depended on the data set. The results obtained suggest that the BMTME model should be preferred for multi-trait prediction analyses. This model allows to obtain not only the correlation among traits, but also the correlation among environments, helping to increase the prediction accuracy.

*G3 (Bethesda) ; 10(11): 4083-4102, 2020 Nov 05.*

##### RESUMO

Due to the ever-increasing data collected in genomic breeding programs, there is a need for genomic prediction models that can deal better with big data. For this reason, here we propose a Maximum a posteriori Threshold Genomic Prediction (MAPT) model for ordinal traits that is more efficient than the conventional Bayesian Threshold Genomic Prediction model for ordinal traits. The MAPT performs the predictions of the Threshold Genomic Prediction model by using the maximum a posteriori estimation of the parameters, that is, the values of the parameters that maximize the joint posterior density. We compared the prediction performance of the proposed MAPT to the conventional Bayesian Threshold Genomic Prediction model, the multinomial Ridge regression and support vector machine on 8 real data sets. We found that the proposed MAPT was competitive with regard to the multinomial and support vector machine models in terms of prediction performance, and slightly better than the conventional Bayesian Threshold Genomic Prediction model. With regard to the implementation time, we found that in general the MAPT and the support vector machine were the best, while the slowest was the multinomial Ridge regression model. However, it is important to point out that the successful implementation of the proposed MAPT model depends on the informative priors used to avoid underestimation of variance components.

*G3 (Bethesda) ; 10(11): 4177-4190, 2020 Nov 05.*

##### RESUMO

The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables). Since there is a lack of efficient methodologies for multivariate count data outcomes, in this paper, a multivariate Poisson deep neural network (MPDN) model is proposed for the genomic prediction of various count outcomes simultaneously. The MPDN model uses the minus log-likelihood of a Poisson distribution as a loss function, in hidden layers for capturing nonlinear patterns using the rectified linear unit (RELU) activation function and, in the output layer, the exponential activation function was used for producing outputs on the same scale of counts. The proposed MPDN model was compared to conventional generalized Poisson regression models and univariate Poisson deep learning models in two experimental data sets of count data. We found that the proposed MPDL outperformed univariate Poisson deep neural network models, but did not outperform, in terms of prediction, the univariate generalized Poisson regression models. All deep learning models were implemented in Tensorflow as back-end and Keras as front-end, which allows implementing these models on moderate and large data sets, which is a significant advantage over previous GS models for multivariate count data.

*Front Plant Sci ; 10: 1311, 2019.*

##### RESUMO

Although durum wheat (Triticum turgidum var. durum Desf.) is a minor cereal crop representing just 5-7% of the world's total wheat crop, it is a staple food in Mediterranean countries, where it is used to produce pasta, couscous, bulgur and bread. In this paper, we cover multi-trait prediction of grain yield (GY), days to heading (DH) and plant height (PH) of 270 durum wheat lines that were evaluated in 43 environments (country-location-year combinations) across a broad range of water regimes in the Mediterranean Basin and other locations. Multi-trait prediction analyses were performed by implementing a multi-trait deep learning model (MTDL) with a feed-forward network topology and a rectified linear unit activation function with a grid search approach for the selection of hyper-parameters. The results of the multi-trait deep learning method were also compared with univariate predictions of the genomic best linear unbiased predictor (GBLUP) method and the univariate counterpart of the multi-trait deep learning method (UDL). All models were implemented with and without the genotype × environment interaction term. We found that the best predictions were observed without the genotype × environment interaction term in the UDL and MTDL methods. However, under the GBLUP method, the best predictions were observed when the genotype × environment interaction term was taken into account. We also found that in general the best predictions were observed under the GBLUP model; however, the predictions of the MTDL were very similar to those of the GBLUP model. This result provides more evidence that the GBLUP model is a powerful approach for genomic prediction, but also that the deep learning method is a practical approach for predicting univariate and multivariate traits in the context of genomic selection.

*G3 (Bethesda) ; 9(10): 3381-3393, 2019 10 07.*

##### RESUMO

In this paper we propose a Bayesian multi-output regressor stacking (BMORS) model that is a generalization of the multi-trait regressor stacking method. The proposed BMORS model consists of two stages: in the first stage, a univariate genomic best linear unbiased prediction (GBLUP including genotype × environment interaction GE) model is implemented for each of the L traits under study; then the predictions of all traits are included as covariates in the second stage, by implementing a Ridge regression model. The main objectives of this research were to study alternative models to the existing multi-trait multi-environment (BMTME) model with respect to (1) genomic-enabled prediction accuracy, and (2) potential advantages in terms of computing resources and implementation. We compared the predictions of the BMORS model to those of the univariate GBLUP model using 7 maize and wheat datasets. We found that the proposed BMORS produced similar predictions to the univariate GBLUP model and to the BMTME model in terms of prediction accuracy; however, the best predictions were obtained under the BMTME model. In terms of computing resources, we found that the BMORS is at least 9 times faster than the BMTME method. Based on our empirical findings, the proposed BMORS model is an alternative for predicting multi-trait and multi-environment data, which are very common in genomic-enabled prediction in plant and animal breeding programs.

##### Assuntos

Teorema de Bayes , Meio Ambiente , Interação Gene-Ambiente , Genômica , Modelos Genéticos , Melhoramento Vegetal , Algoritmos , Genômica/métodos , Modelos Teóricos , Fenótipo , Triticum/genética , Zea mays/genética*G3 (Bethesda) ; 9(9): 2913-2924, 2019 09 04.*

##### RESUMO

Kernel methods are flexible and easy to interpret and have been successfully used in genomic-enabled prediction of various plant species. Kernel methods used in genomic prediction comprise the linear genomic best linear unbiased predictor (GBLUP or GB) kernel, and the Gaussian kernel (GK). In general, these kernels have been used with two statistical models: single-environment and genomic × environment (GE) models. Recently near infrared spectroscopy (NIR) has been used as an inexpensive and non-destructive high-throughput phenotyping method for predicting unobserved line performance in plant breeding trials. In this study, we used a non-linear arc-cosine kernel (AK) that emulates deep learning artificial neural networks. We compared AK prediction accuracy with the prediction accuracy of GB and GK kernel methods in four genomic data sets, one of which also includes pedigree and NIR information. Results show that for all four data sets, AK and GK kernels achieved higher prediction accuracy than the linear GB kernel for the single-environment and GE multi-environment models. In addition, AK achieved similar or slightly higher prediction accuracy than the GK kernel. For all data sets, the GE model achieved higher prediction accuracy than the single-environment model. For the data set that includes pedigree, markers and NIR, results show that the NIR wavelength alone achieved lower prediction accuracy than the genomic information alone; however, the pedigree plus NIR information achieved only slightly lower prediction accuracy than the marker plus the NIR high-throughput data.

##### Assuntos

Genômica/métodos , Modelos Genéticos , Melhoramento Vegetal/métodos , Espectrofotometria/métodos , Bases de Dados Genéticas , Aprendizado Profundo , Genômica/estatística & dados numéricos , Fenótipo , Espectrofotometria/estatística & dados numéricos , Triticum/genética , Zea mays/genética*Biomed Res Int ; 2019: 3613679, 2019.*

##### RESUMO

Conservation of eelgrass relies on transplants and evaluation of success depends on nondestructive measurements of average leaf biomass in shoots among other variables. Allometric proxies offer a convenient way to assessments. Identifying surrogates via log transformation and linear regression can set biased results. Views conceive this approach to be meaningful, asserting that curvature in geometrical space explains bias. Inappropriateness of correction factor of retransformation bias could also explain inconsistencies. Accounting for nonlinearity of the log transformed response relied on a generalized allometric model. Scaling parameters depend continuously on the descriptor. Joining correction factor is conceived as the partial sum of series expansion of mean retransformed residuals leading to highest reproducibility strength. Fits of particular characterizations of the generalized curvature model conveyed outstanding reproducibility of average eelgrass leaf biomass in shoots. Although nonlinear heteroscedastic regression resulted also to be suitable, only log transformation approaches can unmask a size related differentiation in growth form of the leaf. Generally, whenever structure of regression error is undetermined, choosing a suitable form of retransformation correction factor becomes elusive. Compared to customary nonparametric characterizations of this correction factor, present form proved more efficient. We expect that offered generalized allometric model along with proposed correction factor form provides a suitable analytical arrangement for the general settings of allometric examination.

##### Assuntos

Biomassa , Modelos Biológicos , Folhas de Planta/crescimento & desenvolvimento , Brotos de Planta/crescimento & desenvolvimento , Zosteraceae/crescimento & desenvolvimento*G3 (Bethesda) ; 9(5): 1545-1556, 2019 05 07.*

##### RESUMO

Multiple-trait experiments with mixed phenotypes (binary, ordinal and continuous) are not rare in animal and plant breeding programs. However, there is a lack of statistical models that can exploit the correlation between traits with mixed phenotypes in order to improve prediction accuracy in the context of genomic selection (GS). For this reason, when breeders have mixed phenotypes, they usually analyze them using univariate models, and thus are not able to exploit the correlation between traits, which many times helps improve prediction accuracy. In this paper we propose applying deep learning for analyzing multiple traits with mixed phenotype data in terms of prediction accuracy. The prediction performance of multiple-trait deep learning with mixed phenotypes (MTDLMP) models was compared to the performance of univariate deep learning (UDL) models. Both models were evaluated using predictors with and without the genotype × environment (G×E) interaction term (I and WI, respectively). The metric used for evaluating prediction accuracy was Pearson's correlation for continuous traits and the percentage of cases correctly classified (PCCC) for binary and ordinal traits. We found that a modest gain in prediction accuracy was obtained only in the continuous trait under the MTDLMP model compared to the UDL model, whereas for the other traits (1 binary and 2 ordinal) we did not find any difference between the two models. In both models we observed that the prediction performance was better for WI than for I. The MTDLMP model is a good alternative for performing simultaneous predictions of mixed phenotypes (binary, ordinal and continuous) in the context of GS.

##### Assuntos

Aprendizado Profundo , Estudos de Associação Genética , Genoma , Genômica , Modelos Genéticos , Fenótipo , Característica Quantitativa Herdável , Algoritmos , Genoma de Planta , Genômica/métodos , Genótipo , Melhoramento Vegetal , Reprodutibilidade dos Testes , Seleção Genética*G3 (Bethesda) ; 9(5): 1355-1369, 2019 05 07.*

##### RESUMO

Evidence that genomic selection (GS) is a technology that is revolutionizing plant breeding continues to grow. However, it is very well documented that its success strongly depends on statistical models, which are used by GS to perform predictions of candidate genotypes that were not phenotyped. Because there is no universally better model for prediction and models for each type of response variable are needed (continuous, binary, ordinal, count, etc.), an active area of research aims to develop statistical models for the prediction of univariate and multivariate traits in GS. However, most of the models developed so far are for univariate and continuous (Gaussian) traits. Therefore, to overcome the lack of multivariate statistical models for genome-based prediction by improving the original version of the BMTME, we propose an improved Bayesian multi-trait and multi-environment (BMTME) R package for analyzing breeding data with multiple traits and multiple environments. We also introduce Bayesian multi-output regressor stacking (BMORS) functions that are considerably efficient in terms of computational resources. The package allows parameter estimation and evaluates the prediction performance of multi-trait and multi-environment data in a reliable, efficient and user-friendly way. We illustrate the use of the BMTME with real toy datasets to show all the facilities that the software offers the user. However, for large datasets, the BME() and BMTME() functions of the BMTME R package are very intense in terms of computing time; on the other hand, less intensive computing is required with BMORS functions BMORS() and BMORS_Env() that are also included in the BMTME package.

##### Assuntos

Teorema de Bayes , Biologia Computacional/métodos , Interação Gene-Ambiente , Genômica/métodos , Característica Quantitativa Herdável , Software , Algoritmos , Modelos Estatísticos , Zea mays/genética*Theor Appl Genet ; 132(5): 1587-1606, 2019 May.*

##### RESUMO

KEY MESSAGE: Current genome-enabled prediction models assumed errors normally distributed, which are sensitive to outliers. We propose a model with errors assumed to follow a Laplace distribution to deal better with outliers. Current genome-enabled prediction models use regressions that fit the expected value (mean) of a response variable with errors assumed normally distributed, which are often sensitive to outliers, either genetic or environmental. For this reason, we propose a robust Bayesian genome median regression (BGMR) model that fits regressions to the medians of a distribution, with errors assumed to follow a Laplace distribution to deal better with outliers. The BGMR model was evaluated under a Bayesian framework with Markov Chain Monte Carlo sampling using a location-scale mixture representation of the Laplace distribution. The BGMR was implemented with two simulated and two real genomic data sets, and we compared its prediction performance with that of a conventional genomic best linear unbiased prediction (GBLUP) model and the Laplace maximum a posteriori (LMAP) method. The prediction accuracies of BGMR were higher than those of the GBLUP and LMAP methods when there were outliers. The BGMR model could be useful to breeders who need to predict and select genotypes based on data with unknown outliers.

##### Assuntos

Cruzamento , Genoma de Planta , Modelos Teóricos , Plantas/genética , Teorema de Bayes , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , Análise de Regressão*G3 (Bethesda) ; 9(2): 601-618, 2019 02 07.*

##### RESUMO

Genomic selection is revolutionizing plant breeding. However, still lacking are better statistical models for ordinal phenotypes to improve the accuracy of the selection of candidate genotypes. For this reason, in this paper we explore the genomic based prediction performance of two popular machine learning methods: the Multi Layer Perceptron (MLP) and support vector machine (SVM) methods vs. the Bayesian threshold genomic best linear unbiased prediction (TGBLUP) model. We used the percentage of cases correctly classified (PCCC) as a metric to measure the prediction performance, and seven real data sets to evaluate the prediction accuracy, and found that the best predictions (in four out of the seven data sets) in terms of PCCC occurred under the TGLBUP model, while the worst occurred under the SVM method. Also, in general we found no statistical differences between using 1, 2 and 3 layers under the MLP models, which means that many times the conventional neuronal network model with only one layer is enough. However, although even that the TGBLUP model was better, we found that the predictions of MLP and SVM were very competitive with the advantage that the SVM was the most efficient in terms of the computational time required.

##### Assuntos

Melhoramento Vegetal/métodos , Máquina de Vetores de Suporte , Teorema de Bayes , Característica Quantitativa Herdável , Seleção Artificial*Heredity (Edinb) ; 122(4): 381-401, 2019 04.*

##### RESUMO

Today, breeders perform genomic-assisted breeding to improve more than one trait. However, frequently there are several traits under study at one time, and the implementation of current genomic multiple-trait and multiple-environment models is challenging. Consequently, we propose a four-stage analysis for multiple-trait data in this paper. In the first stage, we perform singular value decomposition (SVD) on the resulting matrix of trait responses; in the second stage, we perform multiple trait analysis on transformed responses. In stages three and four, we collect and transform the traits back to their original state and obtain the parameter estimates and the predictions on these scale variables prior to transformation. The results of the proposed method are compared, in terms of parameter estimation and prediction accuracy, with the results of the Bayesian multiple-trait and multiple-environment model (BMTME) previously described in the literature. We found that the proposed method based on SVD produced similar results, in terms of parameter estimation and prediction accuracy, to those obtained with the BMTME model. Moreover, the proposed multiple-trait method is atractive because it can be implemented using current single-trait genomic prediction software, which yields a more efficient algorithm in terms of computation.

*Plant Genome ; 11(3)2018 11.*

##### RESUMO

The Item-Based Collaborative Filtering for Multitrait and Multienvironment Data (IBCF.MTME) package was developed to implement the item-based collaborative filtering (IBCF) algorithm for continuous phenotypic data in the context of plant breeding where data are collected for various traits and environments. The main difference between this package and the other available packages that can implement IBCF is that this one was developed for continuous phenotypic data, which cannot be implemented in the current packages because they can implement IBCF only for binary and ordinary phenotypes. In the following article, we will show how to both install the package and use it for studying the prediction accuracy of multitrait and multienvironment data under phenotypic and genomic selection. We illustrate its use with seven examples (with information from two datasets, Wheat_IBCF and Year_IBCF, which are included in the package) comprising multienvironment data, multitrait data, and both multitrait and multienvironment data that cover scenarios in which breeding scientists are interested. The package offers many advantages for studying the genomic-enabled prediction accuracy of multitrait and multienvironment data, ultimately helping plant breeders make better decisions.

##### Assuntos

Algoritmos , Interação Gene-Ambiente , Conjuntos de Dados como Assunto , Genótipo , Fenótipo*G3 (Bethesda) ; 8(12): 3813-3828, 2018 12 10.*

##### RESUMO

Genomic selection is revolutionizing plant breeding and therefore methods that improve prediction accuracy are useful. For this reason, active research is being conducted to build and test methods from other areas and adapt them to the context of genomic selection. In this paper we explore the novel deep learning (DL) methodology in the context of genomic selection. We compared DL methods with densely connected network architecture to one of the most often used genome-enabled prediction models: Genomic Best Linear Unbiased Prediction (GBLUP). We used nine published real genomic data sets to compare a fraction of all possible deep learning models to obtain a "meta picture" of the performance of DL methods with densely connected network architecture. In general, the best predictions were obtained with the GBLUP model when genotype×environment interaction (G×E) was taken into account (8 out of 9 data sets); when the interactions were ignored, the DL method was better than the GBLUP in terms of prediction accuracy in 6 out of the 9 data sets. For this reason, we believe that DL should be added to the data science toolkit of scientists working on animal and plant breeding. This study corroborates the view that there are no universally best prediction machines.

##### Assuntos

Interação Gene-Ambiente , Aprendizado de Máquina , Modelos Genéticos , Característica Quantitativa Herdável , Análise de Sequência de DNA/métodos , Triticum/genética , Zea mays/genética , Valor Preditivo dos Testes*G3 (Bethesda) ; 8(12): 3829-3840, 2018 12 10.*

##### RESUMO

Multi-trait and multi-environment data are common in animal and plant breeding programs. However, what is lacking are more powerful statistical models that can exploit the correlation between traits to improve prediction accuracy in the context of genomic selection (GS). Multi-trait models are more complex than univariate models and usually require more computational resources, but they are preferred because they can exploit the correlation between traits, which many times helps improve prediction accuracy. For this reason, in this paper we explore the power of multi-trait deep learning (MTDL) models in terms of prediction accuracy. The prediction performance of MTDL models was compared to the performance of the Bayesian multi-trait and multi-environment (BMTME) model proposed by Montesinos-López et al. (2016), which is a multi-trait version of the genomic best linear unbiased prediction (GBLUP) univariate model. Both models were evaluated with predictors with and without the genotype×environment interaction term. The prediction performance of both models was evaluated in terms of Pearson's correlation using cross-validation. We found that the best predictions in two of the three data sets were found under the BMTME model, but in general the predictions of both models, BTMTE and MTDL, were similar. Among models without the genotype×environment interaction, the MTDL model was the best, while among models with genotype×environment interaction, the BMTME model was superior. These results indicate that the MTDL model is very competitive for performing predictions in the context of GS, with the important practical advantage that it requires less computational resources than the BMTME model.

##### Assuntos

Genoma de Planta , Aprendizado de Máquina , Modelos Genéticos , Análise de Sequência de DNA/métodos , Triticum/genética , Zea mays/genética , Interação Gene-Ambiente*Plant Methods ; 14: 46, 2018.*

##### RESUMO

Background: Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. (Plant Methods 13(4):1-23, 2017a. 10.1186/s13007-016-0154-2; Plant Methods 13(62):1-29, 2017b. 10.1186/s13007-017-0212-4) proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis. Results: We used seven model-methods, one with the conventional model (M1), three methods using the B-splines model (M2, M4, and M6) and three methods using the Fourier basis model (M3, M5, and M7). The data set we used comprises 976 wheat lines under irrigated environments with 250 wavelengths. Under a Bayesian Ridge Regression (BRR), we compared the prediction accuracy of the model-methods proposed under different numbers of basis functions, and compared the implementation time (in seconds) of the seven proposed model-methods for different numbers of basis. Our results as well as previously analyzed data (Montesinos-López et al. 2017a, 2017b) support that around 23 basis functions are enough. Concerning the degree of the polynomial in the context of B-splines, degree 3 approximates most of the curves very well. Two satisfactory types of basis are the Fourier basis for period curves and the B-splines model for non-periodic curves. Under nine different basis, the seven method-models showed similar prediction accuracy. Regarding implementation time, results show that the lower the number of basis, the lower the implementation time required. Methods M2, M3, M6 and M7 were around 3.4 times faster than methods M1, M4 and M5. Conclusions: In this study, we promote the use of functional regression modeling for analyzing high-throughput phenotypic data and indicate the advantages and disadvantages of its implementation. In addition, many key elements that are needed to understand and implement this statistical technique appropriately are provided using a real data set. We provide details for implementing Bayesian functional regression using the developed genomic functional regression (GFR) package. In summary, we believe this paper is a good guide for breeders and scientists interested in using functional regression models for implementing prediction models when their data are curves.

*Plant Methods ; 14: 57, 2018.*

##### RESUMO

[This corrects the article DOI: 10.1186/s13007-018-0314-7.].

*G3 (Bethesda) ; 8(1): 131-147, 2018 01 04.*

##### RESUMO

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

##### Assuntos

Interação Gene-Ambiente , Genoma de Planta , Modelos Estatísticos , Melhoramento Vegetal/métodos , Característica Quantitativa Herdável , Triticum/genética , Zea mays/genética , Algoritmos , Produtos Agrícolas , Genótipo , Modelos Genéticos , Fenótipo , Ploidias , Polimorfismo de Nucleotídeo Único , Seleção Genética*Plant Methods ; 13: 62, 2017.*

##### RESUMO

BACKGROUND: Modern agriculture uses hyperspectral cameras that provide hundreds of reflectance data at discrete narrow bands in many environments. These bands often cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra. With the bands, vegetation indices are constructed for predicting agronomically important traits such as grain yield and biomass. However, since vegetation indices only use some wavelengths (referred to as bands), we propose using all bands simultaneously as predictor variables for the primary trait grain yield; results of several multi-environment maize (Aguate et al. in Crop Sci 57(5):1-8, 2017) and wheat (Montesinos-López et al. in Plant Methods 13(4):1-23, 2017) breeding trials indicated that using all bands produced better prediction accuracy than vegetation indices. However, until now, these prediction models have not accounted for the effects of genotype × environment (G × E) and band × environment (B × E) interactions incorporating genomic or pedigree information. RESULTS: In this study, we propose Bayesian functional regression models that take into account all available bands, genomic or pedigree information, the main effects of lines and environments, as well as G × E and B × E interaction effects. The data set used is comprised of 976 wheat lines evaluated for grain yield in three environments (Drought, Irrigated and Reduced Irrigation). The reflectance data were measured in 250 discrete narrow bands ranging from 392 to 851 nm (nm). The proposed Bayesian functional regression models were implemented using two types of basis: B-splines and Fourier. Results of the proposed Bayesian functional regression models, including all the wavelengths for predicting grain yield, were compared with results from conventional models with and without bands. CONCLUSIONS: We observed that the models with B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction accuracy. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated is 21 (number of basis), rather than estimating the 250 regression coefficients for all bands. In this study adding pedigree or genomic information did not increase prediction accuracy.

*G3 (Bethesda) ; 7(5): 1595-1606, 2017 05 05.*

##### RESUMO

When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments.